Skip to content

Conversation

@yurekami
Copy link

Summary

  • Integrate Xiaomi's MiMo-7B reasoning model as an alternative to Llama for self-explanation tasks
  • MiMo-7B achieves 95.8% on MATH500 and 68.2% on AIME 2024, rivaling OpenAI o1-mini
  • Add ContinuousMiMo adapter class based on Qwen2 architecture with MTP support

Changes

  • model/continuous_mimo.py - New MiMo adapter with continuous token support
  • model/utils.py - Register MiMo in MODEL_TYPE_TO_VANILLA_MODEL_MAPPING
  • model/__init__.py - Export ContinuousMiMo
  • Config files for feature descriptions, activation patching, and input ablation

Supported Models

  • XiaomiMiMo/MiMo-7B-Base
  • XiaomiMiMo/MiMo-7B-RL
  • XiaomiMiMo/MiMo-7B-RL-0530 (latest)
  • XiaomiMiMo/MiMo-7B-SFT
  • XiaomiMiMo/MiMo-7B-RL-Zero

Test plan

  • Syntax verification passed for all new/modified files
  • Run python train.py --config config/feature_descriptions/mimo_131k.yaml --debug
  • Verify model loads with trust_remote_code=True

🤖 Generated with Claude Code

Integrate Xiaomi's MiMo-7B reasoning model as an alternative to Llama
for self-explanation tasks. MiMo-7B achieves 95.8% on MATH500 and
68.2% on AIME 2024, rivaling OpenAI o1-mini performance.

Changes:
- Add ContinuousMiMo adapter class based on Qwen2 architecture
- Register MiMo models in MODEL_TYPE_TO_VANILLA_MODEL_MAPPING
- Add config files for feature descriptions, activation patching,
  and input ablation tasks
- Support all MiMo variants (Base, SFT, RL, RL-Zero, RL-0530)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant