Add Matryoshka Representation Learning (MRL) support and tests by bbkx226 · Pull Request #39 · codefuse-ai/CodeFuse-Embeddings

bbkx226 · 2025-12-14T07:18:08Z

Resolves #8

This pull request introduces Matryoshka Representation Learning (MRL) to the codebase, enabling models to produce multi-granular embeddings that can be flexibly truncated to various dimensions for downstream tasks. It adds configuration options, utility functions, and a comprehensive test suite for MRL, and integrates the MRL loss into both training and validation workflows.

Matryoshka Representation Learning (MRL) Integration:

Added support for MRL in the training pipeline, allowing a single model to serve multiple embedding dimensions by adding an auxiliary MRL loss over truncated embeddings. This is configurable via use_mrl, mrl_dimensions, and mrl_temperature parameters in the config and arguments. [1] [2] [3]
Implemented the matryoshka_loss function in utils.py to compute the MRL loss, which encourages high-quality embeddings at multiple dimensions.

Configuration and Argument Enhancements:

Added MRL-specific parameters (use_mrl, mrl_dimensions, mrl_temperature) to the Args class and provided an example configuration file config_mrl.json. [1] [2]

Training and Validation Pipeline Updates:

Integrated MRL loss into the inbatch_loss and hard_loss functions, and updated the training (accelerate_train) and validation (validate) routines to use the new MRL parameters when enabled. [1] [2] [3] [4] [5] [6]

Testing and Documentation:

Added a comprehensive test suite in test_mrl.py to validate MRL loss computation, integration with existing losses, and embedding truncation behavior.
Updated the README.md with an explanation of MRL, configuration steps, and quick validation instructions.

bbkx226 · 2025-12-14T07:18:22Z

#8

Add Matryoshka Representation Learning (MRL) support and tests

ab3103d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Matryoshka Representation Learning (MRL) support and tests#39

Add Matryoshka Representation Learning (MRL) support and tests#39
bbkx226 wants to merge 1 commit intocodefuse-ai:mainfrom
bbkx226:matryoshka_support

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant