Fix GPT3 attention missing KV cache initialization and handling by shralex · Pull Request #3927 · AI-Hypercomputer/maxtext

shralex · 2026-05-17T06:42:28Z

This pull request resolves an issue where Gpt3MultiHeadAttention called AttentionOp without passing cached_values, causing decoding to fail with an AssertionError: assert prefill_kv_cache.

FIXES: b/452778717

Tests

Updated GPT-3 tests. Verified that b/452778717 is fixed on a TPU VM.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-17T06:47:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

NuojCheng · 2026-05-18T15:23:43Z

    self.rng = jax.random.PRNGKey(1234)

-    devices_array = maxtext_utils.create_device_mesh(self.cfg)
+    devices_array = maxtext_utils.create_device_mesh(self.cfg, devices=[jax.devices()[0]])


why we only use one device for testing? No sharding involved?

This is for the KV cache test. Running everything on a single device eliminates communication and ensures the test is fully deterministic.

NuojCheng · 2026-05-18T15:24:14Z

        enable_checkpointing=False,
        model_name="gpt3-52k",
        dtype="float32",
+        per_device_batch_size=1.0 / jax.device_count(),


what is the purpose?

Its a way to set global batch size to 1 regardless of device count.

shralex force-pushed the shralex_test_5 branch 2 times, most recently from ddaccf2 to 12bc21b Compare May 17, 2026 15:58

igorts-git approved these changes May 18, 2026

View reviewed changes

NuojCheng reviewed May 18, 2026

View reviewed changes

NuojCheng approved these changes May 19, 2026

View reviewed changes

github-actions Bot added the pull ready label May 19, 2026

shralex added pull ready and removed pull ready labels May 19, 2026

Fix GPT3 attention missing KV cache initialization and handling

17efd39

shralex force-pushed the shralex_test_5 branch from 12bc21b to 17efd39 Compare May 19, 2026 04:38

shralex added pull ready and removed pull ready labels May 19, 2026

copybara-service Bot merged commit 4ebab2c into main May 19, 2026
35 checks passed

copybara-service Bot deleted the shralex_test_5 branch May 19, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPT3 attention missing KV cache initialization and handling#3927

Fix GPT3 attention missing KV cache initialization and handling#3927
copybara-service[bot] merged 1 commit into
mainfrom
shralex_test_5

shralex commented May 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 17, 2026 •

edited

Loading

Uh oh!

NuojCheng May 18, 2026

Uh oh!

shralex May 18, 2026

Uh oh!

NuojCheng May 18, 2026

Uh oh!

shralex May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shralex commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Checklist

Uh oh!

codecov Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NuojCheng May 18, 2026

Choose a reason for hiding this comment

Uh oh!

shralex May 18, 2026

Choose a reason for hiding this comment

Uh oh!

NuojCheng May 18, 2026

Choose a reason for hiding this comment

Uh oh!

shralex May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shralex commented May 17, 2026 •

edited

Loading

codecov Bot commented May 17, 2026 •

edited

Loading