[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent by Rohan-Bierneni · Pull Request #3077 · AI-Hypercomputer/maxtext

Rohan-Bierneni · 2026-02-04T18:08:22Z

Description

Using the suggestions from kernel & xprof agent, try to improve the Gated Delta Net implementation in qwen3.py. We test our changes using the script added as part of this pr. The script tests the forward pass, backward pass, overall train step, and memory consumption between the baseline implementation of the GDN versus our optimized version in qwen3.py. This allowed us to test out changes iteratively and quickly.

To test the script, please use the command:

python3 /src/maxtext/scratch_code/benchmark_gdn_optimization.py

Note: run this script on a TPU/GPU vm since on CPU it will take a while.

So far, total improvements on the Gated Delta Rule using Q3-Next configs & 4k Seq len are:

https://paste.googleplex.com/5438820566827008
Forward pass speedup: 2.27x
Train step speedup: 3.75x
Memory reduction: 76.01%

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456

Tests

Tested our changes using the benchmarking script and pr unit tests (train_compile test for qwen3 next)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Add backward pass checks & memory checks Add backward pass & memory consumption checks Update memory calcs Optimizations made to GDN impl in qwen3.py (3x speedup)

codecov · 2026-02-04T18:18:51Z

Codecov Report

❌ Patch coverage is 85.52632% with 11 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/layers/qwen3.py	84.93%	8 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

Script to test gdn changes

5872358

Add backward pass checks & memory checks Add backward pass & memory consumption checks Update memory calcs Optimizations made to GDN impl in qwen3.py (3x speedup)

Rohan-Bierneni requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners February 4, 2026 18:08

Rohan-Bierneni added 2 commits February 4, 2026 19:08

Update dummy configs to align with q3-next

121678b

Update tflops calc to align with WY-optimized GDN

09f85a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent#3077

[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent#3077
Rohan-Bierneni wants to merge 3 commits intomainfrom
rbierneni-test-kernelagent

Rohan-Bierneni commented Feb 4, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rohan-Bierneni commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rohan-Bierneni commented Feb 4, 2026 •

edited

Loading

codecov bot commented Feb 4, 2026 •

edited

Loading