Skip to content

feat: Enhance File Parsing Pipeline with Chunk-Level Source Tracking & Unified Multi-Modal Parsing#645

Merged
fridayL merged 14 commits intoMemTensor:devfrom
CaralHsi:feat/evaluation_doc_qa
Dec 8, 2025
Merged

feat: Enhance File Parsing Pipeline with Chunk-Level Source Tracking & Unified Multi-Modal Parsing#645
fridayL merged 14 commits intoMemTensor:devfrom
CaralHsi:feat/evaluation_doc_qa

Conversation

@CaralHsi
Copy link
Collaborator

@CaralHsi CaralHsi commented Dec 8, 2025

Description

Summary:
This PR focuses on strengthening the multi-modal file-parsing pipeline by establishing a clear, traceable mapping from file → chunk → memory item, while improving consistency and robustness across both fine and fast modes.

Key improvements include:
This update significantly enhances the structure, traceability, and reliability of the file-ingestion pipeline, while unifying multi-modal behavior and preparing the system for upcoming evaluation and doc-QA features.

Fix: #590

Docs Issue/PR: (docs-issue-or-pr-link)

Reviewer: @(reviewer)

Checklist:

  • I have performed a self-review of my own code | 我已自行检查了自己的代码
  • I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释
  • I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常
  • I have created related documentation issue/PR in MemOS-Docs (if applicable) | 我已在 MemOS-Docs 中创建了相关的文档 issue/PR(如果适用)
  • I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR(如果适用)
  • I have mentioned the person who will review this PR | 我已提及将审查此 PR 的人

@CaralHsi CaralHsi changed the title Feat/evaluation doc qa feat: Enhance File Parsing Pipeline with Chunk-Level Source Tracking & Unified Multi-Modal Parsing Dec 8, 2025
@CaralHsi CaralHsi requested a review from fridayL December 8, 2025 04:20
@CaralHsi CaralHsi marked this pull request as ready for review December 8, 2025 04:23
@fridayL fridayL merged commit c8500ec into MemTensor:dev Dec 8, 2025
20 checks passed
tianxing02 pushed a commit to tianxing02/MemOS that referenced this pull request Feb 24, 2026
…& Unified Multi-Modal Parsing (MemTensor#645)

* fix: doc fine mode bug

* fix: doc fine mode bug

* feat: init longbench_v2

* feat: more strict embedder trucation

* feat: parallel processing fine mode in multi-modal-fine

* feat: update parsers; add chunk info into source; remove origin_part

* feat: modify chunk_content in file-fine-parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants