Skip to content

refactor: Migrate document reindexing to UploadDocumentAdapter with unified indexing pipeline#842

Open
AnishSarkar22 wants to merge 4 commits intoMODSetter:devfrom
AnishSarkar22:refactor/upload-document-adapter-class
Open

refactor: Migrate document reindexing to UploadDocumentAdapter with unified indexing pipeline#842
AnishSarkar22 wants to merge 4 commits intoMODSetter:devfrom
AnishSarkar22:refactor/upload-document-adapter-class

Conversation

@AnishSarkar22
Copy link
Contributor

@AnishSarkar22 AnishSarkar22 commented Feb 27, 2026

Description

  • Converted index_uploaded_file function into UploadDocumentAdapter class with index() and reindex() public methods, both routing through IndexingPipelineService.
  • Replaced legacy manual reindex logic in document_reindex_tasks.py with a single adapter.reindex() call.
  • Added dynamic character budget for document formatting based on model context window, with token-limit overflow handling in llm_router_service.
  • Expanded integration test suite from 4 to 11 tests covering reindex content/hash updates, chunk replacement, empty-markdown guard, and error message matching.

Motivation and Context

FIX #

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR refactors the document indexing logic by converting the standalone index_uploaded_file function into an UploadDocumentAdapter class with separate index() and reindex() methods. The new reindex() method replaces the legacy manual reindexing logic in the Celery task handler, routing both workflows through the unified IndexingPipelineService. The changes also include comprehensive test coverage expansion from 4 to 11 tests, covering various reindexing scenarios including content hash updates, chunk replacement, empty markdown guards, and error handling.

⏱️ Estimated Review Time: 15-30 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/indexing_pipeline/adapters/file_upload_adapter.py
2 surfsense_backend/tests/integration/indexing_pipeline/adapters/test_file_upload_adapter.py
3 surfsense_backend/app/tasks/celery_tasks/document_reindex_tasks.py
4 surfsense_backend/app/tasks/document_processors/file_processors.py

Need help? Join our Discord

Analyze latest changes

@vercel
Copy link

vercel bot commented Feb 27, 2026

@AnishSarkar22 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 1e4b8d3..ce82807

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (4)

surfsense_backend/app/indexing_pipeline/adapters/file_upload_adapter.py
surfsense_backend/app/tasks/celery_tasks/document_reindex_tasks.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/tests/integration/indexing_pipeline/adapters/test_file_upload_adapter.py

@AnishSarkar22 AnishSarkar22 marked this pull request as ready for review February 27, 2026 21:37
@AnishSarkar22 AnishSarkar22 changed the title refactor: Migrate document reindexing to UploadDocumentAdapter with unified indexing pipeline refactor: Migrate document reindexing to UploadDocumentAdapter with unified indexing pipeline Feb 27, 2026
Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on ce82807..b2bf00e

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (1)

surfsense_backend/tests/integration/indexing_pipeline/adapters/test_file_upload_adapter.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant