-
Notifications
You must be signed in to change notification settings - Fork 404
feat(test-benchmark): implement opcode count verification #1869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(test-benchmark): implement opcode count verification #1869
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #1869 +/- ##
==================================================
Coverage ? 86.33%
==================================================
Files ? 538
Lines ? 34557
Branches ? 3222
==================================================
Hits ? 29835
Misses ? 4148
Partials ? 574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
spencer-tb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work on implementing this, nice solution to a tricky problem!
Added some questions below. I think we should rebase once #1790 is merged, as there will be some conflicts. :D
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Show resolved
Hide resolved
4aef34b to
81ee8d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved from my end (as going ooo)! :D
| # Handle both Storage objects and plain dicts | ||
| storage_dict = ( | ||
| storage.root if isinstance(storage, Storage) else storage | ||
| ) | ||
| logger.debug( | ||
| f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots" | ||
| f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solving failing benchmark scenario due to typing issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def fund_eoa(
self,
...
) -> EOA:
...What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.
Do you know what test was giving issues here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this idea, although this would basically touch every single test created, so perhaps for a follow up PR.
What I would fix in this PR though is to make it explicit that this can receive a Dict in that argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the test_ext_account_query_warm benchmark, its behavior is different under fill and execute remote. Now it passes during fill but fails when using execute remote, so now our CI cannot catch the bug!
fselmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an implementation standpoint, this looks good on my end. I tested the tolerance and the counts, CALL vs STATICCALL, looks like it works as intended. The review looks good from an outsider's perspective, I just left some things to think about and some things for me to understand a bit better as I'm not so used to this flow yet.
I will take a look again tomorrow with fresh eyes. Really nice work 👌🏼
| # Handle both Storage objects and plain dicts | ||
| storage_dict = ( | ||
| storage.root if isinstance(storage, Storage) else storage | ||
| ) | ||
| logger.debug( | ||
| f"Deploying storage contract for EOA {eoa} with {len(storage.root)} storage slots" | ||
| f"Deploying storage contract for EOA {eoa} with {len(storage_dict)} storage slots" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of manual coercion / handling here, we can make use of pydantic more directly. It wasn't immediately obvious (and I couldn't find) what test was giving issues here but it would be really nice if this worked:
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def fund_eoa(
self,
...
) -> EOA:
...What this does is it will validate all the inputs as the models that they are defined, with the same type coercion that pydantic gives to model instantiation. I wouldn't spend a ton of time on it but it would be nice to use pydantic for all of it's bells and whistles if we can.
Do you know what test was giving issues here?
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Show resolved
Hide resolved
adaeee9 to
caa15e9
Compare
|
Rebased to latest |
Co-authored-by: felipe <fselmo2@gmail.com>
|
Created a PR on top of this to use pydantic: LouisTsai-Csie#1 Other than that it looks good to me! @LouisTsai-Csie |
Opcode count verification
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. A few very minor nits below!
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Show resolved
Hide resolved
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LouisTsai-Csie! One more minor nit!
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py
Outdated
Show resolved
Hide resolved
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LouisTsai-Csie! LGTM!
🗒️ Description
There are several enhancements in this PR:
Implement opcode-count verification for
--fixed-opcode-countmode using the opcount field provided when generating tests withevmonewith 5% deviation.Re-label the worst-case scenarios for the repricing marker based on the Grafana dashboard data and this script that sort the cases based on the MGas result.
tests/benchmark/compute/instruction/are udpated.Based on our discussion about the verification mechanism during sync up call, specifically whether to remove the
target_opcodefield. I would suggest not removing it for now, since this feature depends on it, which is very helpful for me to show coverage to others, without manually update the Notion page.🔗 Related Issues or PRs
Issue #1835
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture