Dump activation shardings by charlesli640 · Pull Request #3080 · AI-Hypercomputer/maxtext

charlesli640 · 2026-02-04T18:54:25Z

Description

To dump activation shardings to golden file for further comparison. It can include in unit test in case further code change touches activation shardings. This PR is the initial submission for draft review. The change is based on PR 3034

Output

The output format is readable and comparable by both human and machine. For example llama3.1-70b/v6e-16/slice_1 activation dump as below

"Activation Sharding Dump": [
    {
      "llama2/inputs: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/inputs_q: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_attn_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/input_kv: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_attn_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/query: bfloat16[192,2048,64,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/key: bfloat16[192,2048,8,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/value: bfloat16[192,2048,8,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "Unknown: bfloat16[192,64,2048,128]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "Unknown: bfloat16[192,8,2048,128]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/out: bfloat16[192,2048,64,128]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_heads', 'activation_kv')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "llama2/attention_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/hidden_states: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "linears/x: bfloat16[192,2048,28672]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_mlp')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/mlp_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/mlp_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/layer_output: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    }
]

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-04T19:34:21Z

Codecov Report

❌ Patch coverage is 54.79452% with 33 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/sharding.py	33.33%	10 Missing ⚠️
src/MaxText/layers/pipeline.py	0.00%	6 Missing ⚠️
src/MaxText/layers/attention_mla.py	66.66%	5 Missing ⚠️
src/MaxText/layers/attentions.py	77.77%	4 Missing ⚠️
src/MaxText/layers/decoders.py	50.00%	4 Missing ⚠️
src/MaxText/train_compile.py	0.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

gobbleturk · 2026-02-04T19:49:09Z

I think this LGTM although there are a lot of names to review! How did you generate these names?

Sharon Yu added 6 commits February 4, 2026 14:59

remove xxx_256_8192 json

084fcbb

resolved comments

8d1dad3

resolved conflict

0e69a77

fix path

4e4f7b3

update unit test and dump sharding tool

4f89df4

remove debugging code

1321643

charlesli640 requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners February 4, 2026 18:54

charlesli640 added the draft Draft PR label Feb 4, 2026

charlesli640 marked this pull request as draft February 5, 2026 01:01

Dump activation sharding in concise format

511be4b

charlesli640 force-pushed the charlesli/input_sharding branch from 4b17fdb to 511be4b Compare February 5, 2026 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump activation shardings#3080

Dump activation shardings#3080
charlesli640 wants to merge 7 commits intoAI-Hypercomputer:mainfrom
CIeNET-International:charlesli/input_sharding

charlesli640 commented Feb 4, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 4, 2026

Uh oh!

gobbleturk commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charlesli640 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Output

Checklist

Uh oh!

codecov bot commented Feb 4, 2026

Codecov Report

Uh oh!

gobbleturk commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charlesli640 commented Feb 4, 2026 •

edited

Loading