Multimodal KS by HARSHDIPSAHA · Pull Request #27 · INCF/knowledge-space-agent

HARSHDIPSAHA · 2026-01-11T06:51:18Z

Fixes issue #16
Currently, the KnowledgeSpace AI agent relies only on text-based queries, while a large amount of
neuroscience metadata remains locked inside non-editable formats such as figures, tables, and
presentation screenshots. This PR introduces a multimodal pipeline that allows users to upload
images and automatically convert them into refined, high-signal search queries.

The feature includes:

A React-based upload interface with an auto-expanding search box for long scientific queries.
A FastAPI backend endpoint that performs OCR using Pytesseract.
An intelligence layer using Gemini 2.0 Flash-Lite for zero-shot Named Entity Recognition (NER).
A refined prompt that outputs only a clean, comma-separated list of scientific entities,
preventing conversational noise from polluting the search engine.

Impact:

Unlocks neuroscience metadata trapped in images.
Improves dataset discovery precision and recall.
Enables a fully multimodal search workflow for KnowledgeSpace.

HARSHDIPSAHA · 2026-01-13T04:39:13Z

@QuantumByte-01 please review.

Removed comment about adding method to assistant.

QuantumByte-01 · 2026-03-12T12:26:07Z

Interesting direction — extracting neuroscience terms from uploaded images via OCR and Gemini is a genuinely useful idea for researchers uploading paper figures. Three things needed before merging: pytesseract requires a system-level Tesseract installation which is not reflected in the Dockerfile or setup docs (add it to both), the frontend file upload UI needs styling and proper error states, and OCR accuracy on scientific figures without preprocessing will be poor — consider adding image preprocessing (contrast, binarization) before OCR. Please address these and reopen.

HARSHDIPSAHA · 2026-03-12T12:30:16Z

Hi @QuantumByte-01 , thanks for looking into this.

Interesting direction — extracting neuroscience terms from uploaded images via OCR and Gemini is a genuinely useful idea for researchers uploading paper figures. Three things needed before merging: pytesseract requires a system-level Tesseract installation which is not reflected in the Dockerfile or setup docs (add it to both), the frontend file upload UI needs styling and proper error states, and OCR accuracy on scientific figures without preprocessing will be poor — consider adding image preprocessing (contrast, binarization) before OCR. Please address these and reopen.

Perfectly valid points, I wil work on this..

has been a week

42ccf83

k

cc4c9a0

Removed comment about adding method to assistant.

HARSHDIPSAHA mentioned this pull request Feb 12, 2026

Multimodal KS: Image-based Dataset Discovery via OCR & LLM Refinement #16

Open

QuantumByte-01 closed this Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal KS#27

Multimodal KS#27
HARSHDIPSAHA wants to merge 2 commits intoINCF:mainfrom
HARSHDIPSAHA:multiModal

HARSHDIPSAHA commented Jan 11, 2026

Uh oh!

HARSHDIPSAHA commented Jan 13, 2026

Uh oh!

QuantumByte-01 commented Mar 12, 2026

Uh oh!

HARSHDIPSAHA commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HARSHDIPSAHA commented Jan 11, 2026

Uh oh!

HARSHDIPSAHA commented Jan 13, 2026

Uh oh!

QuantumByte-01 commented Mar 12, 2026

Uh oh!

HARSHDIPSAHA commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants