Skip to content

[EMNLP 2025 Main] Official repository of "MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition"

Notifications You must be signed in to change notification settings

Nikol-coder/MAKAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🌐 MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition

MAKAR: Grounded Multimodal Named Entity Recognition (GMNER), which aims to extract textual entities, their types, and corresponding visual regions from image-text data. For details, see our EMNLP 2025 paper: MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition.


πŸ“° News πŸ”₯

πŸŽ‰ [August 2025] We are thrilled to announce that our paper,
"MAKAR: A Multi-Agent Framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition",
has been accepted by EMNLP 2025! πŸŽ‰

We are currently finalizing the camera-ready version and meticulously organizing our experimental code.
βœ… Code and datasets will be released publicly very soon!
πŸ”” Stay tuned for updates!


πŸ› οΈ Training the MAKAR Model

Follow the instructions below to set up and train the MAKAR model components.


🧠 Knowledge Enhancement Agent (KEA)

MAKAR is based on AdaSeq, AdaSeq project is based on Python version >= 3.7 and PyTorch version >= 1.8.

Step 1: Installation

git clone https://github.com/modelscope/adaseq.git
cd adaseq
pip install -r requirements.txt -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

Step 2: Copy MAKAR folder into .../adaseq/examples/

cd MNER_code/AdaSeq

Navigate to the KEA directory:

-adaseq
---|examples
-----|MAKAR
-------|twitter-10000-FMNERG.yaml
-------|twitter-10000-GMNER.yaml

Step 3:Replace the original adaseq folder with our adaseq folder

-adaseq
---|.git
---|.github
---|adaseq   <-- (Use our adaseq replace it)  
---|docs
---|examples
---|scripts
---|tests
---|tools

Step 4: Training Model

  • For GMNER:

    python -m scripts.train -c examples/MAKAR/twitter-10000-GMNER.yaml
  • For FMNERG:

    python -m scripts.train -c examples/MAKAR/twitter-10000-FMNERG.yaml

πŸ” Entity Correction Agent (ECA)

⚠️ Note: Bing Search has been discontinued.
As a temporary workaround, we are using GLM-Search and ChromeDriver-based web scraping for knowledge retrieval.
A more robust long-term solution is under active investigation.

cd Search
  • GLM-Search (via ZhipuAI):

    python zhipu_search.py
  • Web Scraping (Entity Names):

    python web_newsearch_name.py
  • Web Scraping (Text Queries):

    python web_newsearch_text.py

πŸ€– Entity Reasoning Grounding Agent (ERGA)

SFT

  1. Navigate to the KEA directory:

    cd LLaMA-Factory
  2. Install dependencies:

     pip install -e ".[torch,metrics]"
    
     pip install "deepspeed>=0.10.0,<=0.16.9"
  3. Train the model:

     FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/easy_qwen25vl_full_sft_3k.yaml

GRPO

  1. Navigate to the ERGA directory and install in development mode:

    cd EasyR1
    pip install -e .
  2. Install or upgrade required packages:

    pip install -U transformers
    pip install --upgrade tqdm ray
    pip install transformers==4.51.3
  3. Launch training:

    bash examples/3k_qwen2_5_vl_7b_gmner_sft_grpo_easy2hard.sh
  4. Merge model checkpoints (optional):

    python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/3k_qwen2_5_vl_7b_sft_grpo_GMNER_easy2hard/global_step_60/actor

πŸ’‘ Tip: Ensure your environment satisfies all dependency requirements before running any scripts.
πŸš€ GPU support is strongly recommended for efficient training and inference.

πŸ”— Pre-trained Models

  • MAKAR-3B
    Lightweight version optimized for resource-constrained environments
  • MAKAR-7B
    Full-capacity version with enhanced reasoning capabilities

πŸ™ Acknowledgments

Our implementation builds upon the open-source frameworks RIVEG and PGIM.
We sincerely thank the authors for their outstanding contributions to the community!

Additionally, our multi-stage training framework is built on top of AdaSeq, LLaMA-Factory, and EasyR1, which are powerful and flexible toolkits that greatly accelerated our development and experimentation.


πŸ“¬ Contact: For questions or collaboration, please reach out via GitHub Issues or email (linxinkui@iie.ac.cn).

About

[EMNLP 2025 Main] Official repository of "MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages