Skip to content

Fix CLT normalization and distributed attribution#9

Open
JitUikey28 wants to merge 1 commit into
LLM-Interp:masterfrom
JitUikey28:jitesh/clt-normalization-distributed-attribution
Open

Fix CLT normalization and distributed attribution#9
JitUikey28 wants to merge 1 commit into
LLM-Interp:masterfrom
JitUikey28:jitesh/clt-normalization-distributed-attribution

Conversation

@JitUikey28

Copy link
Copy Markdown

Task Writeup

This PR addresses two CLT-Forge engineering tasks:

  1. Fix activation normalization so estimated_norm_scaling_factor_in/out are computed and applied on the correct device, ideally on GPU, avoiding CPU/GPU mismatch and unnecessary transfers.

  2. Extend attribution graph computation beyond single-GPU execution by making the attribution runner distributed-aware for DDP/FSDP/feature-sharding style setups and batched prompt computation.

Summary

  • Keeps norm scaling factor estimation on-device.
  • Ensures scaling factors are moved to match activation tensor device/dtype before apply/remove operations.
  • Adds distributed configuration tracking for attribution runs.
  • Adds rank/world-size handling and rank-aware logging.
  • Adds synchronization and graph/result gathering helpers.
  • Distributes batched prompts across ranks.
  • Ensures only rank 0 saves final attribution results.
  • Adds focused tests for normalization and distributed attribution behavior.

Tests

Ran locally:

  • python -m pytest tests/test_norm_fix.py -v
    Result: 10 passed

  • python -m pytest tests/test_attribution_ddp.py -v -k "TestShardingMath or TestModelUnwrapping or TestInterventionPerFeatureDistributed"
    Result: 28 passed

Notes

I also attempted a broader test-suite run. Some unrelated/stale/GPU-specific tests fail locally, including a missing pipeline_new import, tests with literal assert False, and distributed/GPU tests that require proper CUDA/torchrun setup.

Full end-to-end multi-GPU validation should be run on the lab environment with real CLT checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant