Skip to content

Conversation

@LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Dec 9, 2025

🗒️ Description

There are several enhancements in this PR:

  • Implement opcode-count verification for --fixed-opcode-count mode using the opcount field provided when generating tests with evmone with 5% deviation.

  • Re-label the worst-case scenarios for the repricing marker based on the Grafana dashboard data and this script that sort the cases based on the MGas result.

    • The repricing marker under tests/benchmark/compute/instruction/ are udpated.

Based on our discussion about the verification mechanism during sync up call, specifically whether to remove the target_opcode field. I would suggest not removing it for now, since this feature depends on it, which is very helpful for me to show coverage to others, without manually update the Notion page.

🔗 Related Issues or PRs

Issue #1835

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Gemini_Generated_Image_d360w8d360w8d360

@LouisTsai-Csie LouisTsai-Csie self-assigned this Dec 9, 2025
@LouisTsai-Csie LouisTsai-Csie added the A-test-benchmark Area: execution_testing.benchmark and tests/benchmark label Dec 9, 2025
@LouisTsai-Csie LouisTsai-Csie marked this pull request as ready for review December 9, 2025 08:15
@LouisTsai-Csie LouisTsai-Csie added C-feat Category: an improvement or new feature P-high labels Dec 9, 2025
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (forks/amsterdam@8b089c9). Learn more about missing BASE report.
⚠️ Report is 6 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                Coverage Diff                 @@
##             forks/amsterdam    #1869   +/-   ##
==================================================
  Coverage                   ?   86.33%           
==================================================
  Files                      ?      538           
  Lines                      ?    34557           
  Branches                   ?     3222           
==================================================
  Hits                       ?    29835           
  Misses                     ?     4148           
  Partials                   ?      574           
Flag Coverage Δ
unittests 86.33% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work on implementing this, nice solution to a tricky problem!

Added some questions below. I think we should rebase once #1790 is merged, as there will be some conflicts. :D

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from my end (as going ooo)! :D

Comment on lines 435 to 440
# Handle both Storage objects and plain dicts
storage_dict = (
storage.root if isinstance(storage, Storage) else storage
)
logger.debug(
f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots"
f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solving failing benchmark scenario due to typing issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:

@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def fund_eoa(
    self,
    ...
) -> EOA:
    ...

What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.

Do you know what test was giving issues here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, although this would basically touch every single test created, so perhaps for a follow up PR.

What I would fix in this PR though is to make it explicit that this can receive a Dict in that argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the test_ext_account_query_warm benchmark, its behavior is different under fill and execute remote. Now it passes during fill but fails when using execute remote, so now our CI cannot catch the bug!

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an implementation standpoint, this looks good on my end. I tested the tolerance and the counts, CALL vs STATICCALL, looks like it works as intended. The review looks good from an outsider's perspective, I just left some things to think about and some things for me to understand a bit better as I'm not so used to this flow yet.

I will take a look again tomorrow with fresh eyes. Really nice work 👌🏼

Comment on lines 435 to 440
# Handle both Storage objects and plain dicts
storage_dict = (
storage.root if isinstance(storage, Storage) else storage
)
logger.debug(
f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots"
f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:

@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def fund_eoa(
    self,
    ...
) -> EOA:
    ...

What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.

Do you know what test was giving issues here?

@SamWilsn SamWilsn changed the base branch from forks/osaka to forks/amsterdam December 16, 2025 21:35
@marioevz marioevz force-pushed the opcode-count-verification branch from adaeee9 to caa15e9 Compare December 17, 2025 20:40
@marioevz
Copy link
Member

Rebased to latest forks/amsterdam to make review easier.

@marioevz
Copy link
Member

Created a PR on top of this to use pydantic: LouisTsai-Csie#1

Other than that it looks good to me! @LouisTsai-Csie

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. A few very minor nits below!

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LouisTsai-Csie! One more minor nit!

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LouisTsai-Csie! LGTM!

@danceratopz danceratopz merged commit 81276a0 into ethereum:forks/amsterdam Dec 29, 2025
14 checks passed
fselmo added a commit to fselmo/execution-specs that referenced this pull request Jan 5, 2026
)

Co-authored-by: Mario Vega <marioevz@gmail.com>
Co-authored-by: felipe <fselmo2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-benchmark Area: execution_testing.benchmark and tests/benchmark C-feat Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants