Skip to content

Dump activation shardings#3080

Draft
charlesli640 wants to merge 7 commits intoAI-Hypercomputer:mainfrom
CIeNET-International:charlesli/input_sharding
Draft

Dump activation shardings#3080
charlesli640 wants to merge 7 commits intoAI-Hypercomputer:mainfrom
CIeNET-International:charlesli/input_sharding

Conversation

@charlesli640
Copy link
Collaborator

@charlesli640 charlesli640 commented Feb 4, 2026

Description

To dump activation shardings to golden file for further comparison. It can include in unit test in case further code change touches activation shardings. This PR is the initial submission for draft review. The change is based on PR 3034

Output

The output format is readable and comparable by both human and machine. For example llama3.1-70b/v6e-16/slice_1 activation dump as below

"Activation Sharding Dump": [
    {
      "llama2/inputs: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/inputs_q: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_attn_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/input_kv: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_attn_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention/query: bfloat16[192,2048,64,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/key: bfloat16[192,2048,8,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/value: bfloat16[192,2048,8,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_attn_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "Unknown: bfloat16[192,64,2048,128]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "Unknown: bfloat16[192,8,2048,128]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention/out: bfloat16[192,2048,64,128]": {
        "logic_axes": "('activation_batch', 'activation_attn_length_no_exp', 'activation_heads', 'activation_kv')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "llama2/attention_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/hidden_states: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "linears/x: bfloat16[192,2048,28672]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_mlp')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/mlp_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/mlp_lnx: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "llama2/layer_output: bfloat16[192,2048,8192]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    }
]

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Feb 4, 2026

@gobbleturk
Copy link
Collaborator

I think this LGTM although there are a lot of names to review! How did you generate these names?

@charlesli640 charlesli640 marked this pull request as draft February 5, 2026 01:01
@charlesli640 charlesli640 force-pushed the charlesli/input_sharding branch from 4b17fdb to 511be4b Compare February 5, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

draft Draft PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants