feat(benchmarks): fix benchmark for SELFDESTRUCT of created accounts for Osaka by jsign · Pull Request #1906 · ethereum/execution-specs

jsign · 2025-12-11T22:34:37Z

🗒️ Description

This PR fixes the test_selfdestruct_created benchmark to work in Osaka.

Filling in this PR for Osaka and Prague for 45M target:

Osaka:
- test_system.py::test_selfdestruct_created[fork_Osaka-blockchain_test-value_bearing_False] 36M cycles
- test_system.py::test_selfdestruct_created[fork_Osaka-blockchain_test-value_bearing_True] 36M cycles
Prague:
- test_system.py::test_selfdestruct_created[fork_Prague-blockchain_test-value_bearing_False] 37M cycles
- test_system.py::test_selfdestruct_created[fork_Prague-blockchain_test-value_bearing_True] 36M cycles

Although each fork is coherent with the other, felt quite low number of cycles to me. So I debugged in Geth further and got convinced all is good, since saw all the SELFDESTRUCT instructions being executed.

Then remembered two things:

We already have a post-state check of storage slot 0 having the value 42, to check that the attack loop ended fine without unexpected EVM errors. So that is already quite good.
Found the original PR that created this benchmark, and indeed the used gas was ~35M cycles. So that's good too. (This is why I filled with 45M above to make them comparable).

🔗 Related Issues or PRs

#1695

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Set appropriate labels for the changes (only maintainers can apply labels).

jochem-brouwer · 2025-12-11T22:39:06Z

Maybe idea to help quickly debug these (also for sanity check, for instance here check if SELFDESTRUCT is ran a lot of times), use the opcode counter report? #1353

jochem-brouwer

Some comments 😄 👍

jsign · 2025-12-11T23:47:25Z

Maybe idea to help quickly debug these (also for sanity check, for instance here check if SELFDESTRUCT is ran a lot of times), use the opcode counter report? #1353

@jochem-brouwer , Oh, interesting.

Trying this this way:

uv run fill --from=Osaka --until=Osaka --evm-bin=/data/code-data/go-ethereum/evm -m "benchmark and blockchain_test" --block-gas-limit 45000000 -k "test_selfdestruct_created" --evm-dump-dir=dumpy (I'm filling with Geth).
cat /data/code-data/execution-specs/dumpy/benchmark__compute__instruction__system__test_selfdestruct_created/fork_Osaka_blockchain_test_value_bearing_False/0/stdin.txt | uv run ethereum-spec-evm t8n \ --input.alloc=stdin --input.txs=stdin --input.env=stdin \ --output.result=stdout --output.alloc=stdout --output.body=stdout \ --state.fork=Osaka --state.chainid=1 --state.reward=0 \ --opcode.count stdout

But I see:

"opcodeCount": {
        "CALLER": 1,
        "PUSH20": 1,
        "EQ": 4,
        "PUSH1": 11,
        "JUMPI": 6,
        "JUMPDEST": 8,
        "SLOAD": 4,
        "DUP1": 4,
        "DUP3": 3,
        "SUB": 1,
        "GT": 2,
        "PUSH0": 8,
        "DUP2": 2,
        "PUSH2": 5,
        "SWAP2": 1,
        "ADD": 2,
        "SWAP3": 1,
        "SWAP1": 1,
        "POP": 3,
        "SSTORE": 4,
        "PUSH32": 1,
        "ISZERO": 1,
        "JUMP": 1,
        "MUL": 1,
        "RETURN": 1
    }

This doesn't make much sense to me, since there is no SELFDESTRUCT there.
But if I debug with t8n in Geth with the same stdin.txt, I definitely see breakpoints in the SELFDESTRUCT instruction handler. (And clearly the post-state check is passing).

Feels I'm misusing the opcode counter or might have a bug?

jochem-brouwer

Some more comments 😄 👍

jochem-brouwer · 2025-12-12T00:30:12Z

RE: opcode counter.

This seems to report the wrong block? It's just guessing here, I'd assume Geth is reporting on the wrong block. One of my hunches could be the system contracts (beacon root storing / blockhash storing), but CALLER is only applied once so that does not seem to be the case 🤔

codecov · 2025-12-12T01:03:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.26%. Comparing base (9e161b6) to head (58e2c0d).
⚠️ Report is 13 commits behind head on forks/amsterdam.

Additional details and impacted files

@@                 Coverage Diff                 @@
##           forks/amsterdam    #1906      +/-   ##
===================================================
+ Coverage            82.61%   86.26%   +3.64%     
===================================================
  Files                  417      538     +121     
  Lines                26862    34557    +7695     
  Branches              2492     3222     +730     
===================================================
+ Hits                 22193    29809    +7616     
+ Misses                4230     4161      -69     
- Partials               439      587     +148

Flag	Coverage Δ
unittests	`86.26% <ø> (+3.64%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jochem-brouwer

LGTM

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

LouisTsai-Csie

Thanks @jsign and @jochem-brouwer , i add some comments for the root cause of the failing benchmark CI, there is some gas mismatch. After applying the suggested changes it should work as expected for both fill and execute!

LouisTsai-Csie · 2025-12-16T04:08:31Z

@jsign @jochem-brouwer

Regarding the opcode count feature, currently, when filling the tests with evmone, the opcode counts are recorded in the fixture files, please check the _info field in the corresponding files.

Given this, do you think we still need the opcode-count sanity check here? Personally, I think:

The gas-usage check is sufficient for this scenario.
Most of benchmark cases also cover the fixed-opcode-count tests, where the sanity check could be performed instead (Please check PR feat(test-benchmark): implement opcode count verification #1869 )

That said, if you feel additional checks are still necessary, I’m happy to explore further options here.

Co-authored-by: 蔡佳誠 Louis Tsai <72684086+LouisTsai-Csie@users.noreply.github.com>

jsign · 2025-12-16T12:06:38Z

@jsign @jochem-brouwer

Regarding the opcode count feature, currently, when filling the tests with evmone, the opcode counts are recorded in the fixture files, please check the _info field in the corresponding files.

Given this, do you think we still need the opcode-count sanity check here? Personally, I think:

The gas-usage check is sufficient for this scenario.

Most of benchmark cases also cover the fixed-opcode-count tests, where the sanity check could be performed instead (Please check PR feat(test-benchmark): implement opcode count verification #1869 )

That said, if you feel additional checks are still necessary, I’m happy to explore further options here.

I think as is it should be okay -- eventually we can keep adding checks if we feel they can provide extra safety checks.

jsign · 2025-12-16T12:08:18Z

@LouisTsai-Csie, applied some extra changes reg the last review -- maybe you want to do a final check before merging!

LouisTsai-Csie

One more minor suggestion to fix the failing CI!

jsign · 2025-12-17T11:33:09Z

One more minor suggestion to fix the failing CI!

Ah, I think the comment might get lost?
Let me check.

Edit: found the problem, pushed the fix.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

LouisTsai-Csie

Sorry, i indeed missed to submit the suggested changes...

LGTM, thanks a lot

…for Osaka (ethereum#1906)

jsign changed the base branch from forks/amsterdam to forks/osaka December 11, 2025 22:34

jsign commented Dec 11, 2025

View reviewed changes

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py Outdated

jsign requested a review from LouisTsai-Csie December 11, 2025 22:38

jsign marked this pull request as ready for review December 11, 2025 22:38

jochem-brouwer reviewed Dec 11, 2025

View reviewed changes

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py Outdated

jsign requested a review from jochem-brouwer December 11, 2025 23:44

jochem-brouwer reviewed Dec 11, 2025

View reviewed changes

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py Outdated

Comment thread tests/benchmark/compute/instruction/test_system.py

jochem-brouwer approved these changes Dec 12, 2025

View reviewed changes

jsign mentioned this pull request Dec 12, 2025

feat(benchmarks): fix benchmark for SELFDESTRUCT of initcode contracts for Osaka #1914

Merged

3 tasks

LouisTsai-Csie mentioned this pull request Dec 15, 2025

Refactor benchmark test for Osaka support #1695

Closed

16 tasks

jsign added 9 commits December 15, 2025 16:40

bench: fix test_selfdestruct_created for osaka

cb75e1a

revert

0fcb5b7

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

feedback

8eae7b0

feedback: better post-state check

cb4c8c1

lints

b964707

cleanup

777f594

lint

d02c2da

fix suffix cost

19d0cfb

use tx_gas_limit

78c214b

jsign force-pushed the jsign-fix-selfdestruct-created-osaka branch from 7c1e35b to 78c214b Compare December 15, 2025 19:41

jsign changed the base branch from forks/osaka to forks/amsterdam December 15, 2025 19:41

LouisTsai-Csie requested changes Dec 16, 2025

View reviewed changes

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py

Comment thread tests/benchmark/compute/instruction/test_system.py Outdated

jsign and others added 2 commits December 16, 2025 08:45

Apply suggestions from code review

4661f7f

Co-authored-by: 蔡佳誠 Louis Tsai <72684086+LouisTsai-Csie@users.noreply.github.com>

adjustments and lints

ea37794

jsign requested a review from LouisTsai-Csie December 16, 2025 12:06

LouisTsai-Csie requested changes Dec 17, 2025

View reviewed changes

fix

58e2c0d

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign requested a review from LouisTsai-Csie December 18, 2025 18:08

LouisTsai-Csie approved these changes Dec 19, 2025

View reviewed changes

LouisTsai-Csie merged commit dca4de0 into ethereum:forks/amsterdam Dec 19, 2025
11 checks passed

jsign deleted the jsign-fix-selfdestruct-created-osaka branch December 19, 2025 11:46

fselmo pushed a commit to fselmo/execution-specs that referenced this pull request Jan 5, 2026

feat(benchmarks): fix benchmark for SELFDESTRUCT of created accounts …

e4c65a0

…for Osaka (ethereum#1906)

CPerezz pushed a commit to CPerezz/execution-specs that referenced this pull request Feb 27, 2026

feat(benchmarks): fix benchmark for SELFDESTRUCT of created accounts …

1da44ab

…for Osaka (ethereum#1906)

Conversation

jsign commented Dec 11, 2025

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

Uh oh!

Uh oh!

jochem-brouwer commented Dec 11, 2025

Uh oh!

jochem-brouwer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jsign commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jochem-brouwer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jochem-brouwer commented Dec 12, 2025

Uh oh!

codecov Bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jochem-brouwer left a comment

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Dec 16, 2025

Uh oh!

jsign commented Dec 16, 2025

Uh oh!

jsign commented Dec 16, 2025

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

jsign commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsign commented Dec 11, 2025 •

edited

Loading

codecov Bot commented Dec 12, 2025 •

edited

Loading

jsign commented Dec 17, 2025 •

edited

Loading