Remove DPO (Direct Preference Optimization) feature by ecnal-cienet · Pull Request #3064 · AI-Hypercomputer/maxtext

ecnal-cienet · 2026-02-02T17:36:42Z

Description

Summary

Remove all DPO (Direct Preference Optimization) related features from the codebase
This includes the DPO loss implementation, configuration parameters, data pipeline support, metrics, and tests

Changes

Deleted Files

src/maxtext/trainers/post_train/dpo/dpo_utils.py - core DPO loss implementation
src/MaxText/configs/dpo.yml - DPO configuration file
tests/end_to_end/tpu/test_dpo.sh - DPO end-to-end test

Modified Files

Training Pipeline:

src/MaxText/train.py - removed DPO loss integration, reference param handling, and DPO metrics
src/maxtext/utils/train_utils.py - removed DPO state restoration logic
src/MaxText/gradient_accumulation.py - removed extra_dpo_args parameter

Data Pipelines:

src/MaxText/input_pipeline/_grain_data_processing.py - removed dpo_preprocessing_pipeline and DPO branches
src/MaxText/input_pipeline/_tfds_data_processing.py - removed use_dpo parameter
src/MaxText/input_pipeline/_hf_data_processing.py - removed use_dpo parameter

Configuration:

src/MaxText/configs/base.yml - removed use_dpo, dpo_label_smoothing, dpo_beta parameters
src/MaxText/configs/types.py - removed DPO fields from FineTuning class

Utilities:

src/maxtext/common/metric_logger.py - removed DPO reward accuracy metrics
src/maxtext/utils/maxtext_utils.py - removed DPO FLOPs calculation
src/MaxText/__init__.py - removed dpo_utils export

Tests:

tests/unit/configs_test.py - removed dpo.yml from config validation tests
tests/unit/sft_data_processing_test.py - removed use_dpo argument

Other:

src/MaxText/experimental/rl/grpo_trainer.py - removed errant use_dpo check

Tests

Verified synthetic data training runs successfully with:

python3 src/MaxText/train.py src/MaxText/configs/base.yml run_name=nnx-train-test base_output_directory=gs://wanglance-maxtext/dpo-removal-test/after/gemma2-2b model_name=gemma2-2b dataset_type=synthetic steps=5

Log: view (no DPO related params at all)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

RissyRan · 2026-02-02T21:14:58Z

Is this removal indicating the integration with Tunix DPO?

ecnal-cienet · 2026-02-02T22:12:34Z

Hi Ranran,
No, this removal is not related to Tunix DPO integration. Alex identified that the DPO feature is no longer actively used in MaxText, so Xibin suggested removing it to simplify the codebase.

RissyRan · 2026-02-03T00:59:26Z

Thanks!

Hi @shralex I am wondering if we should keep this DPO feature in MaxText or integrate with Tunix version afterwards. @gagika and I were conducting repro work for Olmo3, and DPO is one step in post-training.

https://screenshot.googleplex.com/7WY6SAFFXgT3Wvz

Remove DPO (Direct Preference Optimization) feature

67c601e

ecnal-cienet force-pushed the feat/Remove-DPO-features branch from f13c153 to 67c601e Compare February 2, 2026 17:41

xibinliu marked this pull request as ready for review February 5, 2026 17:15

xibinliu requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025, vipannalla and xuefgu as code owners February 5, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DPO (Direct Preference Optimization) feature#3064

Remove DPO (Direct Preference Optimization) feature#3064
ecnal-cienet wants to merge 1 commit intomainfrom
feat/Remove-DPO-features

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 2, 2026

Uh oh!

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ecnal-cienet commented Feb 2, 2026

Description

Summary

Changes

Deleted Files

Modified Files

Tests

Checklist

Uh oh!

RissyRan commented Feb 2, 2026

Uh oh!

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants