π MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition
MAKAR: Grounded Multimodal Named Entity Recognition (GMNER), which aims to extract textual entities, their types, and corresponding visual regions from image-text data. For details, see our EMNLP 2025 paper: MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition.
π [August 2025] We are thrilled to announce that our paper,
"MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition",
has been accepted by EMNLP 2025! π
We are currently finalizing the camera-ready version and meticulously organizing our experimental code.
β
Code and datasets will be released publicly very soon!
π Stay tuned for updates!
Follow the instructions below to set up and train the MAKAR model components.
MAKAR is based on AdaSeq, AdaSeq project is based on Python version >= 3.7 and PyTorch version >= 1.8.
Step 1: Installation
git clone https://github.com/modelscope/adaseq.git
cd adaseq
pip install -r requirements.txt -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.htmlStep 2: Copy MAKAR folder into .../adaseq/examples/
cd MNER_code/AdaSeqNavigate to the KEA directory:
-adaseq
---|examples
-----|MAKAR
-------|twitter-10000-FMNERG.yaml
-------|twitter-10000-GMNER.yamlStep 3οΌReplace the original adaseq folder with our adaseq folder
-adaseq
---|.git
---|.github
---|adaseq <-- (Use our adaseq replace it)
---|docs
---|examples
---|scripts
---|tests
---|toolsStep 4: Training Model
-
For GMNER:
python -m scripts.train -c examples/MAKAR/twitter-10000-GMNER.yaml
-
For FMNERG:
python -m scripts.train -c examples/MAKAR/twitter-10000-FMNERG.yaml
β οΈ Note: Bing Search has been discontinued.
As a temporary workaround, we are using GLM-Search and ChromeDriver-based web scraping for knowledge retrieval.
A more robust long-term solution is under active investigation.
cd Search-
GLM-Search (via ZhipuAI):
python zhipu_search.py
-
Web Scraping (Entity Names):
python web_newsearch_name.py
-
Web Scraping (Text Queries):
python web_newsearch_text.py
-
Navigate to the KEA directory:
cd LLaMA-Factory -
Install dependencies:
pip install -e ".[torch,metrics]" pip install "deepspeed>=0.10.0,<=0.16.9"
-
Train the model:
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/easy_qwen25vl_full_sft_3k.yaml
-
Navigate to the ERGA directory and install in development mode:
cd EasyR1 pip install -e .
-
Install or upgrade required packages:
pip install -U transformers pip install --upgrade tqdm ray pip install transformers==4.51.3
-
Launch training:
bash examples/3k_qwen2_5_vl_7b_gmner_sft_grpo_easy2hard.sh
-
Merge model checkpoints (optional):
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/3k_qwen2_5_vl_7b_sft_grpo_GMNER_easy2hard/global_step_60/actor
π‘ Tip: Ensure your environment satisfies all dependency requirements before running any scripts.
π GPU support is strongly recommended for efficient training and inference.
- MAKAR-3B
Lightweight version optimized for resource-constrained environments - MAKAR-7B
Full-capacity version with enhanced reasoning capabilities
Our implementation builds upon the open-source frameworks RIVEG and PGIM.
We sincerely thank the authors for their outstanding contributions to the community!
Additionally, our multi-stage training framework is built on top of AdaSeq, LLaMA-Factory, and EasyR1, which are powerful and flexible toolkits that greatly accelerated our development and experimentation.
π¬ Contact: For questions or collaboration, please reach out via GitHub Issues or email (linxinkui@iie.ac.cn).