Skip to content

feat:Enhance RAG module with document processing and retrieval improvements#395

Merged
Dallas98 merged 17 commits intomainfrom
feat/rag
Mar 2, 2026
Merged

feat:Enhance RAG module with document processing and retrieval improvements#395
Dallas98 merged 17 commits intomainfrom
feat/rag

Conversation

@Dallas98
Copy link
Collaborator

@Dallas98 Dallas98 commented Mar 2, 2026

This pull request introduces major improvements and refactoring to the knowledge base (RAG) management module, including backend infrastructure, ORM models, error codes, and frontend API integration. The changes standardize API parameter mapping, add new configuration options, expand error handling, and lay the foundation for robust document processing and file management. The most important changes are grouped below:

Backend: Knowledge Base (RAG) ORM Models & Infrastructure

  • Introduced new SQLAlchemy ORM models KnowledgeBase and RagFile in knowledge_gen.py to align with Java entity definitions, including enums for RAG types and file status, and detailed schema for both knowledge bases and files.
  • Registered the new models in the database models module for migration and usage. [1] [2]
  • Added new configuration options for Milvus vector database and file storage path in backend settings.
  • Refactored and documented the RAG infrastructure layer, introducing unified document processing, chunking, and loading interfaces (ingest_file_to_chunks, load_and_split, etc.) for future extensibility and code clarity. [1] [2] [3] [4] [5] [6]

Backend: Error Handling

  • Expanded and refined error codes for the RAG module to cover more scenarios such as file not found, file processing/parsing failures, Milvus errors, and embedding failures, making error reporting more granular and actionable.

Frontend: API Integration & Parameter Mapping

  • Standardized parameter mapping between frontend and backend for knowledge base and file listing APIs, ensuring size is mapped to page_size and page starts from 1 as expected by the Python backend. [1] [2] [3]
  • Updated frontend data fetching hooks to match new API expectations, including explicit polling intervals, disabling auto-polling, and correct page offset handling. [1] [2]

Frontend: Development Proxy Configuration

  • Refactored Vite proxy configuration to support both Python and Java backend services, with path-based routing and consistent header/cookie handling for local development.

These changes collectively modernize the knowledge base management subsystem, improve maintainability, and prepare the codebase for advanced RAG features and integrations.

References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

@Dallas98 Dallas98 changed the title Enhance RAG module with document processing and retrieval improvements feat:Enhance RAG module with document processing and retrieval improvements Mar 2, 2026
@Dallas98 Dallas98 merged commit 8a9a072 into main Mar 2, 2026
14 checks passed
@Dallas98 Dallas98 deleted the feat/rag branch March 2, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant