Conversation
There was a problem hiding this comment.
Pull request overview
Adds a PDF import workflow for books so editors can upload a PDF from the TOC toolbar and have each PDF page imported as a separate text Page/Leaf.
Changes:
- Introduces
PdfImporterto parse PDFs (viapdf-reader) and create leaves/pages. - Adds
Books::ImportsController+ route +books/imports/newform to upload a PDF (editor-only). - Adds an “Import PDF” button to the existing create-buttons toolbar; adds unit/integration tests and PDF fixtures.
Reviewed changes
Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
app/models/pdf_importer.rb |
New importer that reads PDF pages and creates Leaf/Page records. |
app/controllers/books/imports_controller.rb |
New editor-only endpoints to render the form and run the import. |
app/views/books/imports/new.html.erb |
Upload form UI for selecting and importing a PDF. |
app/views/books/_create_buttons.html.erb |
Adds a new toolbar button to access the import flow. |
config/routes.rb |
Adds nested book import routes (new/create). |
test/models/pdf_importer_test.rb |
Unit tests for importer behavior (titles, body content, skipping blank pages, malformed PDFs). |
test/controllers/books/imports_controller_test.rb |
Integration tests for import endpoints + authorization. |
test/fixtures/files/sample.pdf |
PDF fixture with extractable text across 2 pages. |
test/fixtures/files/blank.pdf |
PDF fixture intended to have no extractable text. |
Gemfile / Gemfile.lock |
Adds pdf-reader and its transitive dependencies. |
[!TIP]
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or rungh pr ready --undo.
Click "Ready for review" or rungh pr readyto reengage.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <%= link_to new_book_import_path(book), class: "btn txt-medium", title: "Import PDF", aria: { label: "Import PDF" } do %> | ||
| <svg viewBox="0 0 20 24" xmlns="http://www.w3.org/2000/svg" fill="var(--color-ink)"> | ||
| <path d="m15.8 21.7c0 .3-.2.4-.4.4h-13.5-.2v-16.9c0-.3.2-.4.4-.4h6.3c0-.6.1-1.2.3-1.8h-6.9c-1 0-1.8.8-1.8 1.8v17.5c0 1 .8 1.8 1.8 1.8h14c.9 0 1.6-.6 1.8-1.4v-11.6c-.5.2-1.1.4-1.8.5v10.3z"/> | ||
| <path fill="var(--color-positive)" d="m15 0c-2.8 0-5 2.2-5 5s2.2 5 5 5 5-2.2 5-5-2.2-5-5-5zm.7 7.5-2 2-2-2 1-1 .4.4v-2.9h1.1v2.9l.5-.4z"/> | ||
| <path d="m4.5 14h9v1.5h-9zm0 3h6v1.5h-6z"/> | ||
| </svg> | ||
| <span class="for-screen-reader">Import PDF</span> | ||
| <% end %> |
There was a problem hiding this comment.
The new "Import PDF" toolbar button is missing the disable-when-arranging / disable-when-deleting classes used by the other create buttons in this toolbar, so it will remain clickable during rearrange/delete modes while other actions are intentionally disabled. Consider reusing the same button classes/disable hooks for consistent behavior and styling.
| pages = PdfImporter.new(@book, params[:pdf].tempfile).import | ||
|
|
||
| if pages.any? | ||
| redirect_to book_slug_url(@book), notice: "Successfully imported #{pages.size} #{"page".pluralize(pages.size)} from PDF." |
There was a problem hiding this comment.
PdfImporter#import returns Leaf records (as also asserted in PdfImporterTest), but this action stores the result in a variable named pages. Renaming this to something like imported_leaves (and adjusting the notice string accordingly) would make the code clearer and reduce confusion between Page (leafable) vs Leaf (wrapper).
| pages = PdfImporter.new(@book, params[:pdf].tempfile).import | |
| if pages.any? | |
| redirect_to book_slug_url(@book), notice: "Successfully imported #{pages.size} #{"page".pluralize(pages.size)} from PDF." | |
| imported_leaves = PdfImporter.new(@book, params[:pdf].tempfile).import | |
| if imported_leaves.any? | |
| redirect_to book_slug_url(@book), notice: "Successfully imported #{imported_leaves.size} #{"leaf".pluralize(imported_leaves.size)} from PDF." |
| render :new, status: :unprocessable_entity | ||
| end | ||
| rescue PdfImporter::InvalidPdfError => e | ||
| flash.now[:alert] = "Could not read PDF: #{e.message}" |
There was a problem hiding this comment.
The user-facing alert message includes the raw exception message from pdf-reader (Could not read PDF: #{e.message}). These messages can be noisy/inconsistent and may reveal implementation details; consider logging the exception details and showing a stable, user-friendly message instead (optionally with guidance like “Please verify the file is a valid, unencrypted PDF”).
| flash.now[:alert] = "Could not read PDF: #{e.message}" | |
| Rails.logger.warn("PDF import failed for book #{ @book&.id || 'unknown' }: #{e.class}: #{e.message}") | |
| flash.now[:alert] = "Could not read PDF. Please verify the file is a valid, unencrypted PDF." |
| def extract_title(text, page_number) | ||
| first_line = text.lines.first&.strip | ||
| if first_line.present? && first_line.length.between?(3, 100) | ||
| first_line | ||
| else | ||
| "Page #{page_number}" | ||
| end |
There was a problem hiding this comment.
extract_title uses hard-coded title length limits (between?(3, 100)). To make the intent clearer and future changes safer, consider extracting these numbers into named constants (e.g., MIN_TITLE_LENGTH, MAX_TITLE_LENGTH) or documenting why those bounds were chosen.
| def import | ||
| imported = [] | ||
|
|
||
| @reader.pages.each_with_index do |page, index| | ||
| text = page.text.strip | ||
| next if text.blank? | ||
|
|
||
| title = extract_title(text, index + 1) | ||
| body = normalize_body(text) | ||
|
|
||
| imported << @book.press(Page.new(body: body), title: title) | ||
| end | ||
|
|
||
| imported | ||
| end |
There was a problem hiding this comment.
import creates multiple records (one per PDF page) but does not run inside a transaction. If an exception occurs midway (e.g., validation/database error), the book can end up partially imported. Wrapping the per-page creation loop in a single transaction (or otherwise ensuring all-or-nothing behavior) would keep imports consistent.
Users can now import a PDF into a book directly from the table of contents toolbar. Each page of the PDF is imported as a separate textcpage, using the first line as the title.
Implementation:
Totally understand if this is something you don’t want to support.