Skip to content

Added additional bridge analysis tools#1237

Merged
jlarson4 merged 1 commit intodev-3.x-canaryfrom
feature/bridge-analysis-methods
Apr 7, 2026
Merged

Added additional bridge analysis tools#1237
jlarson4 merged 1 commit intodev-3.x-canaryfrom
feature/bridge-analysis-methods

Conversation

@jlarson4
Copy link
Copy Markdown
Collaborator

@jlarson4 jlarson4 commented Apr 7, 2026

Description

  • W_E identity: same tensor object as embed.W_E (verified via is)
  • W_U/b_U equality: same values as unembed.W_U/unembed.b_U (view, so torch.equal not is)
  • W_U matches HookedTransformer: max diff < 1e-4
  • tokens_to_residual_directions: result equals W_U[:, token] exactly (torch.equal), matches HT
  • accumulated_bias: layer 0 is zeros, layer 1+ is non-zero, matches HT for all layers including mlp_input flag
  • all_composition_scores: correct shape, upper-triangular masking enforced, non-zero above diagonal, all 3 modes work, invalid mode raises
  • all_head_labels: correct count, format, matches HT exactly

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4 jlarson4 merged commit b0ef355 into dev-3.x-canary Apr 7, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant