feat(test-benchmark): implement opcode count verification #1869

LouisTsai-Csie · 2025-12-09T07:55:48Z

🗒️ Description

There are several enhancements in this PR:

Implement opcode-count verification for --fixed-opcode-count mode using the opcount field provided when generating tests with evmone with 5% deviation.
Re-label the worst-case scenarios for the repricing marker based on the Grafana dashboard data and this script that sort the cases based on the MGas result.
- The repricing marker under tests/benchmark/compute/instruction/ are udpated.

Based on our discussion about the verification mechanism during sync up call, specifically whether to remove the target_opcode field. I would suggest not removing it for now, since this feature depends on it, which is very helpful for me to show coverage to others, without manually update the Notion page.

🔗 Related Issues or PRs

Issue #1835

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

codecov · 2025-12-09T09:23:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (forks/amsterdam@8b089c9). Learn more about missing BASE report.
⚠️ Report is 6 commits behind head on forks/amsterdam.

Additional details and impacted files

@@                Coverage Diff                 @@
##             forks/amsterdam    #1869   +/-   ##
==================================================
  Coverage                   ?   86.33%           
==================================================
  Files                      ?      538           
  Lines                      ?    34557           
  Branches                   ?     3222           
==================================================
  Hits                       ?    29835           
  Misses                     ?     4148           
  Partials                   ?      574

Flag	Coverage Δ
unittests	`86.33% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

spencer-tb

Awesome work on implementing this, nice solution to a tricky problem!

Added some questions below. I think we should rebase once #1790 is merged, as there will be some conflicts. :D

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

packages/testing/src/execution_testing/specs/benchmark.py

spencer-tb

Approved from my end (as going ooo)! :D

LouisTsai-Csie · 2025-12-15T14:15:18Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/execute/pre_alloc.py

+                # Handle both Storage objects and plain dicts
+                storage_dict = (
+                    storage.root if isinstance(storage, Storage) else storage
+                )
                logger.debug(
-                    f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots"
+                    f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots"


Solving failing benchmark scenario due to typing issue

I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:

@validate_call(config=ConfigDict(arbitrary_types_allowed=True)) def fund_eoa( self, ... ) -> EOA: ...

What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.

Do you know what test was giving issues here?

I really like this idea, although this would basically touch every single test created, so perhaps for a follow up PR.

What I would fix in this PR though is to make it explicit that this can receive a Dict in that argument.

This breaks the test_ext_account_query_warm benchmark, its behavior is different under fill and execute remote. Now it passes during fill but fails when using execute remote, so now our CI cannot catch the bug!

packages/testing/src/execution_testing/specs/benchmark.py

fselmo

From an implementation standpoint, this looks good on my end. I tested the tolerance and the counts, CALL vs STATICCALL, looks like it works as intended. The review looks good from an outsider's perspective, I just left some things to think about and some things for me to understand a bit better as I'm not so used to this flow yet.

I will take a look again tomorrow with fresh eyes. Really nice work 👌🏼

fselmo · 2025-12-16T03:47:10Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/execute/pre_alloc.py

+                # Handle both Storage objects and plain dicts
+                storage_dict = (
+                    storage.root if isinstance(storage, Storage) else storage
+                )
                logger.debug(
-                    f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots"
+                    f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots"


I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:

@validate_call(config=ConfigDict(arbitrary_types_allowed=True)) def fund_eoa( self, ... ) -> EOA: ...

What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.

Do you know what test was giving issues here?

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

tests/benchmark/compute/instruction/test_arithmetic.py

packages/testing/src/execution_testing/specs/benchmark.py

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

marioevz · 2025-12-17T20:41:16Z

Rebased to latest forks/amsterdam to make review easier.

Co-authored-by: felipe <fselmo2@gmail.com>

marioevz · 2025-12-18T00:19:09Z

Created a PR on top of this to use pydantic: LouisTsai-Csie#1

Other than that it looks good to me! @LouisTsai-Csie

Opcode count verification

danceratopz

This is looking good. A few very minor nits below!

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

packages/testing/src/execution_testing/cli/benchmark_parser.py

danceratopz

Thanks @LouisTsai-Csie! One more minor nit!

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

danceratopz

Thanks @LouisTsai-Csie! LGTM!

) Co-authored-by: Mario Vega <marioevz@gmail.com> Co-authored-by: felipe <fselmo2@gmail.com>

LouisTsai-Csie self-assigned this Dec 9, 2025

LouisTsai-Csie added the A-test-benchmark Area: execution_testing.benchmark and tests/benchmark label Dec 9, 2025

LouisTsai-Csie marked this pull request as ready for review December 9, 2025 08:15

LouisTsai-Csie added C-feat Category: an improvement or new feature P-high labels Dec 9, 2025

LouisTsai-Csie requested a review from spencer-tb December 9, 2025 08:51

spencer-tb reviewed Dec 9, 2025

View reviewed changes

LouisTsai-Csie mentioned this pull request Dec 9, 2025

Gas Lighting Committee #8, Dec 9, 2025 ethpandaops/gas-lighting-tracker#21

Open

LouisTsai-Csie force-pushed the opcode-count-verification branch from 4aef34b to 81ee8d7 Compare December 10, 2025 15:18

LouisTsai-Csie mentioned this pull request Dec 11, 2025

feat(testing-cli): implement benchmark coverage script #1898

Open

8 tasks

spencer-tb approved these changes Dec 11, 2025

View reviewed changes

LouisTsai-Csie commented Dec 15, 2025

View reviewed changes

packages/testing/src/execution_testing/specs/benchmark.py Show resolved Hide resolved

LouisTsai-Csie requested review from fselmo and marioevz December 15, 2025 14:31

LouisTsai-Csie mentioned this pull request Dec 16, 2025

feat(benchmarks): fix benchmark for SELFDESTRUCT of created accounts for Osaka #1906

Merged

3 tasks

fselmo reviewed Dec 16, 2025

View reviewed changes

SamWilsn changed the base branch from forks/osaka to forks/amsterdam December 16, 2025 21:35

LouisTsai-Csie added 11 commits December 17, 2025 20:38

feat: implement opcode verification

a993a35

feat: label targeted opcode

3614cae

chore: ignore unsupported tests

4156532

refactor: update worst scenario for repricing marker

0dd11f0

chore: remove slow fixed opcount label

61e45f2

refactor: remove non benchmark test wrapper fixed opcode count feature

1cfccfc

refactor: update filter logic with detailed comment

274c30e

tests: fixed opcode count filtered logic

ec2212a

feat(tests): add target opcode for transient storage

f85d829

fix: incorrect data type

a722fdd

refactor: replace staticcall for state changing operations

caa15e9

marioevz force-pushed the opcode-count-verification branch from adaeee9 to caa15e9 Compare December 17, 2025 20:40

marioevz and others added 5 commits December 17, 2025 22:14

Apply suggestions from code review

0cb137d

Co-authored-by: felipe <fselmo2@gmail.com>

fix(test-types): Implement len for Storage

3de23ac

fix(test-execute): Improve typing

d04194e

fix: coerce type

fdd0f81

feat(benchmark): Use pydantic

6b04255

LouisTsai-Csie and others added 2 commits December 22, 2025 11:36

fix: typing issue

641079f

Merge pull request #1 from marioevz/opcode-count-verification

6c9f48d

Opcode count verification

danceratopz reviewed Dec 29, 2025

View reviewed changes

refactor: update structure

e872e00

danceratopz reviewed Dec 29, 2025

View reviewed changes

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py Outdated Show resolved Hide resolved

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py Outdated Show resolved Hide resolved

refactor: update user warning

aeae308

danceratopz approved these changes Dec 29, 2025

View reviewed changes

danceratopz merged commit 81276a0 into ethereum:forks/amsterdam Dec 29, 2025
14 checks passed

fselmo added a commit to fselmo/execution-specs that referenced this pull request Jan 5, 2026

feat(test-benchmark): implement opcode count verification (ethereum#1869

18b9d9f

) Co-authored-by: Mario Vega <marioevz@gmail.com> Co-authored-by: felipe <fselmo2@gmail.com>

feat(test-benchmark): implement opcode count verification #1869

feat(test-benchmark): implement opcode count verification #1869

Uh oh!

Conversation

LouisTsai-Csie commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

spencer-tb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spencer-tb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

fselmo Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

marioevz Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fselmo left a comment

Choose a reason for hiding this comment

Uh oh!

fselmo Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marioevz commented Dec 17, 2025

Uh oh!

marioevz commented Dec 18, 2025

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

LouisTsai-Csie commented Dec 9, 2025 •

edited

Loading

codecov bot commented Dec 9, 2025 •

edited

Loading

spencer-tb left a comment •

edited

Loading