Architecture Benchmarks – Review & Extension by jlarson4 · Pull Request #1176 · TransformerLensOrg/TransformerLens

jlarson4 · 2026-02-18T17:13:06Z

Re-ran all benchmarks against a collection of models (gpt2, neo, pythia, OPT, Qwen2, Bloom, OpenELM), and resolved any new discrepancies caused by updated to transformers v5, and all of our other latest changes.

Added new Text Quality benchmark, which runs a generation of text and scores it with GPT2 to ensure that we are generating valid human-readable text
Stabilizing float types – some benchmark comparisons were converting back and forth between float16 and float32 due to the source type of the model. These updates stabilize those types to allow for more accurate testing of the model's accuracy when loaded via TransformerBridge
Resolved bugs discovered in bloom_attention
Resolved deprecation issues caused by transformers v5 in T5
the following models were failing to properly generate or pass the other benchmarks, updated Architecture adapter to pass base benchmarks & new generate benchmark
- BERT
- Bloom
- Neo
- Pythia
Cleaned up any duplicate code or unused benchmark functions that have been superseded by newer, better testing
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…rchitecture-adapter

…chitecture-adapter

…ture-benchmarks

jlarson4 and others added 27 commits February 10, 2026 19:27

Testing R1 Distills to confirm functional in TransformerLens

6c4a213

Updating order to be alphabetical

fe7067a

Setup StableLM architecture adapter

f8de02a

Resolved weight and qk issues with stablelm. Added more models

0c6bfe6

Added more models

a561675

Merge remote-tracking branch 'origin/dev-3.x' into feature/StableLM-a…

7d07205

…rchitecture-adapter

reformatted

6238f5a

Created a ArchitectureAdapter for OpenElm, handled trusting remote code

ae378aa

Fix formatting

b4dfd2a

Removed test file, update benchmark

fc4a19f

Add mock model test

16d2361

Merge branch 'dev-3.x' into feature/OpenELM-architecture-adapter

688986b

More benchmark adjustments

21d18d2

removed improperly listed supported models

4630b8b

Merge remote-tracking branch 'origin/dev-3.x' into feature/OpenELM-ar…

0f1dc31

…chitecture-adapter

Updating to resolve existing weight diff issues

f760e74

began working through issues with exsting architecture benchmarks

2179be5

Resolve any existing weight folding issues we can possibly resolve

b59ecf1

Fixing test failures

3f26fe4

Clean up format and other changes

6dc104b

Added text quality benchmark, updated to pass CI

123425f

Merge remote-tracking branch 'origin/dev-3.x' into debugging/architec…

c84f5cd

…ture-benchmarks

Cleaned up comment, tightened tolerances further for bfloat16 models

cb9e18f

Merge remote-tracking branch 'origin/dev-3.x' into debugging/architec…

dfd089d

…ture-benchmarks

Removed unnecessary testing file

6cc9d6e

Cleanup of redundant code

5bd5798

Resolve type issues and format issues

e855e57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Benchmarks – Review & Extension#1176

Architecture Benchmarks – Review & Extension#1176
jlarson4 wants to merge 27 commits intodev-3.xfrom
debugging/architecture-benchmarks

jlarson4 commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

jlarson4 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

jlarson4 commented Feb 18, 2026 •

edited

Loading