Add Gemma 4 E2B / E4B (text) support to MaxText by gagika · Pull Request #3904 · AI-Hypercomputer/maxtext

gagika · 2026-05-14T06:01:01Z

Description

Adds Gemma 4 small variants — E2B and E4B (text-only) — to MaxText.

These are the smaller members of the Gemma 4 family. They share the
broader Gemma 4 attention / norm structure but introduce two new features
that drive their parameter efficiency:

Per-Layer Embeddings (PLE). Each decoder layer consumes a per-layer
slice of an extra embedding tensor injected by a new Gemma4SmallPLE
block. Controlled by hidden_size_per_layer_input /
vocab_size_per_layer_input.
KV sharing. The last num_kv_shared_layers decoder layers reuse
K / V from the most recent non-shared layer of the same attention type
(sliding↔sliding, full↔full). E2B additionally widens the MLP on those
shared layers (use_double_wide_mlp: true) to compensate for the
missing parameters.

Both features carry per-layer state that is not expressible inside
nn.scan, so a new GEMMA4_SMALL DecoderBlockType is added with its
own non-scanned execution path (Decoder._apply_gemma4_small_layers).
The model validator enforces scan_layers=False for these variants.

What's included

New model file src/maxtext/models/gemma4_small.py (PLE + attention
with optional KV sharing + decoder layer).
New configs configs/models/gemma4-e2b.yml and gemma4-e4b.yml.
HF round-trip: hf_model_configs.py, hf_shape.py,
param_mapping.py updated to handle PLE params, KV-shared layers, and
the (optional) double-wide MLP.
TFLOP/MFU accounting: calculate_gemma4_small_tflops_training_per_device.
Config plumbing: DecoderBlockType.GEMMA4_SMALL, four new
Attention fields in configs/types.py, base.yml defaults, and
validation that rejects scan_layers=true / use_multimodal=true for
E2B / E4B.

Out of scope

Multimodal. E2B / E4B ship a vision tower in their HF configs, but
MaxText support for the gemma4-small vision encoder (clipped linears
in particular) is not in this PR. use_multimodal=true is rejected by
the validator with a clear error.
Scanned layers. Per-layer KV sharing isn't expressible under
nn.scan; rejected by the validator.

Tests

New unit tests:
- tests/unit/gemma4_small_test.py — attention-pattern dispatch,
  layer-type tuples, KV donor/shared-layer mapping for both variants.
- tests/unit/flop_calculation_test.py::test_calculate_gemma4_small_tflops_* —
  closed-form TFLOP accounting matching the layer/donor structure.
- tests/unit/configs_test.py — E2B / E4B yml configs are loaded by
  the existing config-instantiation sweep.
End-to-end forward-pass logit checks:
- tests/end_to_end/tpu/gemma4/e2b/{convert_gemma4,convert_gemma4_pt}.sh
- tests/end_to_end/tpu/gemma4/e4b/{convert_gemma4,convert_gemma4_pt}.sh
- Each converts the HF checkpoint with to_maxtext, then runs
  forward_pass_logit_checker against the HF model with
  --max_kl_div=0.03. This is the recommended smoke test after
  touching the model code, param map, or either YAML.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

github-actions · 2026-05-14T06:01:40Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

codecov · 2026-05-14T06:05:22Z

Codecov Report

❌ Patch coverage is 27.11864% with 344 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/models/gemma4_small.py	35.29%	108 Missing and 2 partials ⚠️
...xtext/checkpoint_conversion/utils/param_mapping.py	1.96%	100 Missing ⚠️
...rc/maxtext/checkpoint_conversion/utils/hf_shape.py	0.00%	59 Missing ⚠️
src/maxtext/layers/decoders.py	3.03%	30 Missing and 2 partials ⚠️
src/maxtext/layers/attentions.py	28.57%	22 Missing and 8 partials ⚠️
src/maxtext/multimodal/processor.py	0.00%	6 Missing and 1 partial ⚠️
src/maxtext/utils/maxtext_utils.py	93.87%	1 Missing and 2 partials ⚠️
...xt/checkpoint_conversion/utils/hf_model_configs.py	75.00%	2 Missing ⚠️
src/maxtext/layers/encoders.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-14T06:06:55Z

🤖 I'm sorry @gagika, but I was unable to process your request. Please see the logs for more details.

aireenmei

Thanks for the rapid implementation! I wonder if you have test results from forward_pass_logit_checker? Also do you add some unit tests for comparison with torch on the new modules such as Gemma4SmallPLE, Gemma4SmallAttention, Gemma4SmallDecoderLayer? https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/unit/gemma4_layers_test.py This was added recently.

shuningjin

Thanks! I agree with @aireenmei that it would be good if we could reuse rope/attention, along with component-wise unit tests (potentially as follow-up). Some minor comments.

github-actions · 2026-05-19T00:01:57Z

🤖 Hi @shuningjin, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-19T00:06:34Z

🤖 I'm sorry @shuningjin, but I was unable to process your request. Please see the logs for more details.

gagika · 2026-05-19T03:36:29Z

Thanks for the rapid implementation! I wonder if you have test results from forward_pass_logit_checker? Also do you add some unit tests for comparison with torch on the new modules such as Gemma4SmallPLE, Gemma4SmallAttention, Gemma4SmallDecoderLayer? https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/unit/gemma4_layers_test.py This was added recently.

Added those unit test, PTL

for forward logits test, yes, I have done for both models.
I did again after addressing PR feedback:

https://paste.googleplex.com/5790253452492800
https://paste.googleplex.com/6349200758538240

github-actions · 2026-05-19T03:47:42Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-19T03:50:36Z

🤖 I'm sorry @gagika, but I was unable to process your request. Please see the logs for more details.

gagika added the gemini-review label May 14, 2026

gagika force-pushed the agagik-gemma branch from 0f2e5fc to 00765cc Compare May 14, 2026 06:06

gagika force-pushed the agagik-gemma branch from 00765cc to 98011c1 Compare May 17, 2026 20:41

gagika marked this pull request as ready for review May 17, 2026 20:46

aireenmei reviewed May 18, 2026

View reviewed changes

Comment thread src/maxtext/models/gemma4_small.py Outdated

Comment thread src/maxtext/models/gemma4_small.py Outdated

shuningjin reviewed May 18, 2026

View reviewed changes

shuningjin added gemini-review and removed gemini-review labels May 19, 2026

gagika force-pushed the agagik-gemma branch from 98011c1 to 0e0c798 Compare May 19, 2026 03:44

gagika added gemini-review and removed gemini-review labels May 19, 2026

gagika force-pushed the agagik-gemma branch from 0e0c798 to 63f82b8 Compare May 19, 2026 03:54

Small Gemma draft

07c0ff0

gagika force-pushed the agagik-gemma branch from 63f82b8 to 07c0ff0 Compare May 19, 2026 03:57

Conversation

gagika commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's included

Out of scope

Tests

Checklist

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

aireenmei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

gagika commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gagika commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading

shuningjin left a comment •

edited

Loading