🌐 MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition

MAKAR: Grounded Multimodal Named Entity Recognition (GMNER), which aims to extract textual entities, their types, and corresponding visual regions from image-text data. For details, see our EMNLP 2025 paper: MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition.

📰 News 🔥

🎉 [August 2025] We are thrilled to announce that our paper,
"MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition",
has been accepted by EMNLP 2025! 🎉

We are currently finalizing the camera-ready version and meticulously organizing our experimental code.
✅ Code and datasets will be released publicly very soon!
🔔 Stay tuned for updates!

🛠️ Training the MAKAR Model

Follow the instructions below to set up and train the MAKAR model components.

🧠 Knowledge Enhancement Agent (KEA)

MAKAR is based on AdaSeq, AdaSeq project is based on Python version >= 3.7 and PyTorch version >= 1.8.

Step 1: Installation

git clone https://github.com/modelscope/adaseq.git
cd adaseq
pip install -r requirements.txt -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

Step 2: Copy MAKAR folder into .../adaseq/examples/

cd MNER_code/AdaSeq

Navigate to the KEA directory:

-adaseq
---|examples
-----|MAKAR
-------|twitter-10000-FMNERG.yaml
-------|twitter-10000-GMNER.yaml

Step 3：Replace the original adaseq folder with our adaseq folder

-adaseq
---|.git
---|.github
---|adaseq   <-- (Use our adaseq replace it)  
---|docs
---|examples
---|scripts
---|tests
---|tools

Step 4: Training Model

For GMNER:

python -m scripts.train -c examples/MAKAR/twitter-10000-GMNER.yaml

For FMNERG:

python -m scripts.train -c examples/MAKAR/twitter-10000-FMNERG.yaml

🔍 Entity Correction Agent (ECA)

⚠️ Note: Bing Search has been discontinued.
As a temporary workaround, we are using GLM-Search and ChromeDriver-based web scraping for knowledge retrieval.
A more robust long-term solution is under active investigation.

cd Search

GLM-Search (via ZhipuAI):
```
python zhipu_search.py
```
Web Scraping (Entity Names):
```
python web_newsearch_name.py
```
Web Scraping (Text Queries):
```
python web_newsearch_text.py
```

🤖 Entity Reasoning Grounding Agent (ERGA)

SFT

Navigate to the KEA directory:
```
cd LLaMA-Factory
```

Install dependencies:

 pip install -e ".[torch,metrics]"

 pip install "deepspeed>=0.10.0,<=0.16.9"

Train the model:

 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/easy_qwen25vl_full_sft_3k.yaml

GRPO

Navigate to the ERGA directory and install in development mode:
```
cd EasyR1
pip install -e .
```

Install or upgrade required packages:

pip install -U transformers
pip install --upgrade tqdm ray
pip install transformers==4.51.3

Launch training:

bash examples/3k_qwen2_5_vl_7b_gmner_sft_grpo_easy2hard.sh

Merge model checkpoints (optional):

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/3k_qwen2_5_vl_7b_sft_grpo_GMNER_easy2hard/global_step_60/actor

💡 Tip: Ensure your environment satisfies all dependency requirements before running any scripts.
🚀 GPU support is strongly recommended for efficient training and inference.

🔗 Pre-trained Models

MAKAR-3B
Lightweight version optimized for resource-constrained environments
MAKAR-7B
Full-capacity version with enhanced reasoning capabilities

🙏 Acknowledgments

Our implementation builds upon the open-source frameworks RIVEG and PGIM.
We sincerely thank the authors for their outstanding contributions to the community!

Additionally, our multi-stage training framework is built on top of AdaSeq, LLaMA-Factory, and EasyR1, which are powerful and flexible toolkits that greatly accelerated our development and experimentation.

📬 Contact: For questions or collaboration, please reach out via GitHub Issues or email (linxinkui@iie.ac.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
MAKAR		MAKAR
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition

📰 News 🔥

🛠️ Training the MAKAR Model

🧠 Knowledge Enhancement Agent (KEA)

🔍 Entity Correction Agent (ECA)

🤖 Entity Reasoning Grounding Agent (ERGA)

SFT

GRPO

🔗 Pre-trained Models

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Nikol-coder/MAKAR

Folders and files

Latest commit

History

Repository files navigation

🌐 MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition

📰 News 🔥

🛠️ Training the MAKAR Model

🧠 Knowledge Enhancement Agent (KEA)

🔍 Entity Correction Agent (ECA)

🤖 Entity Reasoning Grounding Agent (ERGA)

SFT

GRPO

🔗 Pre-trained Models

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages