Skip to content

Remove DPO (Direct Preference Optimization) feature#3064

Open
ecnal-cienet wants to merge 1 commit intomainfrom
feat/Remove-DPO-features
Open

Remove DPO (Direct Preference Optimization) feature#3064
ecnal-cienet wants to merge 1 commit intomainfrom
feat/Remove-DPO-features

Conversation

@ecnal-cienet
Copy link
Collaborator

Description

Summary

  • Remove all DPO (Direct Preference Optimization) related features from the codebase
  • This includes the DPO loss implementation, configuration parameters, data pipeline support, metrics, and tests

Changes

Deleted Files

  • src/maxtext/trainers/post_train/dpo/dpo_utils.py - core DPO loss implementation
  • src/MaxText/configs/dpo.yml - DPO configuration file
  • tests/end_to_end/tpu/test_dpo.sh - DPO end-to-end test

Modified Files

Training Pipeline:

  • src/MaxText/train.py - removed DPO loss integration, reference param handling, and DPO metrics
  • src/maxtext/utils/train_utils.py - removed DPO state restoration logic
  • src/MaxText/gradient_accumulation.py - removed extra_dpo_args parameter

Data Pipelines:

  • src/MaxText/input_pipeline/_grain_data_processing.py - removed dpo_preprocessing_pipeline and DPO branches
  • src/MaxText/input_pipeline/_tfds_data_processing.py - removed use_dpo parameter
  • src/MaxText/input_pipeline/_hf_data_processing.py - removed use_dpo parameter

Configuration:

  • src/MaxText/configs/base.yml - removed use_dpo, dpo_label_smoothing, dpo_beta parameters
  • src/MaxText/configs/types.py - removed DPO fields from FineTuning class

Utilities:

  • src/maxtext/common/metric_logger.py - removed DPO reward accuracy metrics
  • src/maxtext/utils/maxtext_utils.py - removed DPO FLOPs calculation
  • src/MaxText/__init__.py - removed dpo_utils export

Tests:

  • tests/unit/configs_test.py - removed dpo.yml from config validation tests
  • tests/unit/sft_data_processing_test.py - removed use_dpo argument

Other:

  • src/MaxText/experimental/rl/grpo_trainer.py - removed errant use_dpo check

Tests

Verified synthetic data training runs successfully with:

python3 src/MaxText/train.py src/MaxText/configs/base.yml run_name=nnx-train-test base_output_directory=gs://wanglance-maxtext/dpo-removal-test/after/gemma2-2b model_name=gemma2-2b dataset_type=synthetic steps=5

Log: view (no DPO related params at all)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@ecnal-cienet ecnal-cienet force-pushed the feat/Remove-DPO-features branch from f13c153 to 67c601e Compare February 2, 2026 17:41
@RissyRan
Copy link
Collaborator

RissyRan commented Feb 2, 2026

Is this removal indicating the integration with Tunix DPO?

@ecnal-cienet
Copy link
Collaborator Author

Hi Ranran,
No, this removal is not related to Tunix DPO integration. Alex identified that the DPO feature is no longer actively used in MaxText, so Xibin suggested removing it to simplify the codebase.

@RissyRan
Copy link
Collaborator

RissyRan commented Feb 3, 2026

Thanks!

Hi @shralex I am wondering if we should keep this DPO feature in MaxText or integrate with Tunix version afterwards. @gagika and I were conducting repro work for Olmo3, and DPO is one step in post-training.

https://screenshot.googleplex.com/7WY6SAFFXgT3Wvz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants