ci: Add GHA workflow executing benchmarks #422

elle-j · 2025-11-26T10:54:41Z

Description

Adds a workflow executing the benchmarks in resolc and revive-yul.

The workflow can be manually triggered via the Actions UI.
User input:
- Branch, tag, or SHA before changes (e.g. main) (required)
- Branch, tag, or SHA after changes (e.g. lj/ci-benchmarks) (required)
- PR number (optional)
  - If provided, the benchmarking result will be posted as a PR comment
    - The comment is updated if it exists, otherwise it is created
  - If omitted, the result is uploaded as an artifact

This PR originally implemented the workflow to automatically run on each PR once its tests passed successfully. As we likely do not need to run this workflow for each PR, the workflow has been rewritten to manually trigger it.

Providing user input also allows for comparisons between any commits, such as releases.

Notes

File must exist on the default branch

This workflow is triggered by a workflow_dispatch event only if the workflow file exists on the default branch. Therefore, the workflow cannot be run for this PR. (See section below on how it has still been tested.)
Once merged, for any subsequent PRs that modify this workflow, those changes will be testable before merge (by choosing to run the workflow from the PR branch).

Testing the Workflow

I've tested a similar enough workflow in a personal repo in order to push it to main and trigger it. Will verify again once this PR is merged.

Runs when provided with branch names
Runs when provided with SHAs
A PR comment is created if a PR number is provided and the comment does not exist
A PR comment is updated if a PR number is provided and the comment already exists
An artifact is created if a PR number is omitted
An in-progress run is cancelled if another one is triggered comparing the same refs

Example PR Comment

Expand to see an example PR comment

Benchmarks

Ref Before: main
Ref After: lj/ci-benchmarks

Compile E2E

Results

group                                            resolc_before                            resolc_changes
-----                                            -------------                            --------------
Dependency/resolc                                1.00    144.0±6.94ms        ? ?/sec      1.00    144.6±5.97ms        ? ?/sec
Dependency/solc                                  1.04    65.9±10.73ms        ? ?/sec      1.00    63.5±10.10ms        ? ?/sec
Empty/resolc                                     1.03     68.8±4.40ms        ? ?/sec      1.00     66.6±4.61ms        ? ?/sec
Empty/solc                                       1.13     11.2±2.01ms        ? ?/sec      1.00     10.0±2.28ms        ? ?/sec
LargeDivRem/resolc                               1.00    119.3±5.62ms        ? ?/sec      1.02   121.3±17.03ms        ? ?/sec
LargeDivRem/solc                                 1.15     24.7±3.15ms        ? ?/sec      1.00     21.5±1.03ms        ? ?/sec
Memset (`--yul`)/resolc                          1.02     62.7±3.92ms        ? ?/sec      1.00     61.7±3.76ms        ? ?/sec
Memset (`--yul`)/solc                            1.00      9.0±0.84ms        ? ?/sec      1.12     10.1±2.23ms        ? ?/sec
Multiple Contracts (`--standard-json`)/resolc    1.01  1559.2±68.95ms        ? ?/sec      1.00  1539.3±64.92ms        ? ?/sec
Multiple Contracts (`--standard-json`)/solc      1.01   652.0±19.38ms        ? ?/sec      1.00   648.4±31.83ms        ? ?/sec
Return (`--yul`)/resolc                          1.00    60.1±17.02ms        ? ?/sec      1.02    61.2±14.50ms        ? ?/sec
Return (`--yul`)/solc                            1.09      8.8±2.37ms        ? ?/sec      1.00      8.0±3.08ms        ? ?/sec

Parse Yul

Results

group             yul_parse_before                         yul_parse_changes
-----             ----------------                         -----------------
Baseline/parse    1.07      8.9±3.64µs        ? ?/sec      1.00      8.4±0.09µs        ? ?/sec
ERC20/parse       1.00   162.5±34.25µs        ? ?/sec      1.07  173.3±123.25µs        ? ?/sec
SHA1/parse        1.01    80.8±33.04µs        ? ?/sec      1.00    80.3±52.82µs        ? ?/sec
Storage/parse     1.00     16.9±8.15µs        ? ?/sec      1.01     17.2±7.54µs        ? ?/sec
Transfer/parse    1.00     18.5±0.23µs        ? ?/sec      1.23    22.7±18.50µs        ? ?/sec

Lower Yul

Results

group             yul_lower_before                         yul_lower_changes
-----             ----------------                         -----------------
Baseline/lower    1.00    122.1±5.95µs        ? ?/sec      1.09   133.6±35.12µs        ? ?/sec
ERC20/lower       1.00   641.7±13.24µs        ? ?/sec      1.15  735.6±665.92µs        ? ?/sec
SHA1/lower        1.00   361.4±78.47µs        ? ?/sec      1.09  392.3±114.48µs        ? ?/sec
Storage/lower     1.00   167.2±50.34µs        ? ?/sec      1.01   168.2±32.45µs        ? ?/sec
Transfer/lower    1.00    167.5±8.70µs        ? ?/sec      1.03   173.1±16.32µs        ? ?/sec

Resolved Issues

Closes #405

elle-j · 2025-12-02T11:32:52Z

.github/workflows/benchmark.yml

+          cargo bench --package resolc --bench compile -- --save-baseline main_resolc
+          cargo bench --package revive-yul --bench parse -- --save-baseline main_yul_parse
+          cargo bench --package revive-yul --bench lower -- --save-baseline main_yul_lower
+        timeout-minutes: 20


We could also remove the timeout-minutes and add it only if we notice unusually long runs. (This took about 6-7min on M4 Pro.)

I think we should narrow down the workflow trigger. Thinking about it, this will be generating a lot of CI hours. I don't think it's necessary to run this workflow in every PR. Is there a simple way to configure it to to be manually triggered by the PR author?

I agree, also thought about that once I was off. I've rewritten the workflow to instead be manually triggered via the Actions UI. (See updated PR description.)

kvpanch

Is it known if CI machine is good for such perf testing ? For example, can CI machine run several tasks in parallel ? If so, perf results might not be stable and it might be required to run such perf testing on a dedicated machine

kvpanch · 2025-12-02T14:17:20Z

.github/workflows/benchmark.yml

+          git fetch origin main
+          git checkout -f main


IMO, that makes sense to parameterize it by allowing to specify branch or tag to test against. This will help to nicely measure improvements/regressions over some release.
Obviously, that can be done later, not in this PR

That can be a neat and valuable feature, thanks. Can add in a follow-up 👍

Regarding your other comment and concurrency, jobs and workflows (not steps within jobs) can run concurrently by default. Which, thanks for the reminder, made me add a concurrency group where this workflow only runs one at a time for the given PR branch (and cancels in-progress ones if it's triggered again while in progress).

I'd have to look up whether another CI machine is still more optimial for perf testing tho.

Update: Regarding parameterization, I've incorporated this into the rewrite of this workflow that I've now done in order to manually trigger this workflow on demand, rather than automatically on each PR which may not be necessary.

xermicus

lgtm overall

xermicus · 2025-12-03T09:37:49Z

.github/workflows/benchmark.yml

+
+jobs:
+  benchmark:
+    runs-on: ubuntu-24.04


I'd like to use the same host we benchmark the polkadot SDK with.

Seems like different ones are being used for benches in the Polkadot SDK, but two common ones are e.g. parity-large and parity-benchmark (both enabled in our revive repo). The differences can be seen here.

Update: Opted for parity-large in our case.

.github/workflows/benchmark.yml

xermicus · 2025-12-03T09:50:16Z

.github/workflows/benchmark.yml

+          ## Parse Yul
+
+          <details>
+              <summary>Results</summary>


Hiding the results is definitively what we want here in order to not make it too noisy. Can we also have a summary or something on top that tells PR authors quickly whether the benchmarks have regressed, improved or no significant change was detected?

I will make a note of this for a follow-up 👍 Will likely require some parsing of critcmp's output. But def improves the readability.

Rather than having the long-running workflow be automatically triggered on each PR and PR update, we now manually trigger the workflow via the Actions UI. This also allows the user to compare benchmarks between any branches, tags, or SHAs by providing these values as input.

It seems as though parity-large and parity-benchmark are among some runners used in the Polkadot SDK. parity-large is chosen here based on the differences described here: https://github.com/paritytech/ci_cd/wiki/GitHub#paritytech-self-hosted-runners

elle-j · 2026-01-06T18:52:13Z

I've slightly rewritten the workflow to instead be manually triggered, rather than running this on each PR and each of its updates. We can now also compare any branch name, SHA, or tag. (See updated PR description.)

elle-j added 2 commits November 26, 2025 11:35

Implement initial workflow to run benchmarks.

0847b99

Update checkouts and report generation.

3446aaa

elle-j changed the title ~~[WIP] ci: Add GHA workflow executing benchmarks~~ ci: Add GHA workflow executing benchmarks Nov 27, 2025

elle-j marked this pull request as ready for review November 27, 2025 17:36

elle-j requested review from kvpanch and xermicus November 27, 2025 17:36

elle-j and others added 2 commits November 27, 2025 21:31

Update naming.

7cc1921

Merge branch 'main' into lj/ci-benchmarks

e6ccaa0

elle-j commented Dec 2, 2025

View reviewed changes

kvpanch reviewed Dec 2, 2025

View reviewed changes

kvpanch approved these changes Dec 2, 2025

View reviewed changes

Control concurrency and cancel in-progress runs when re-triggered.

6187f76

xermicus reviewed Dec 3, 2025

View reviewed changes

elle-j added 4 commits January 6, 2026 17:47

Merge branch 'main' into lj/ci-benchmarks

24daac7

Merge branch 'main' into lj/ci-benchmarks

d927006

elle-j requested review from kvpanch and xermicus January 6, 2026 18:52

ci: Add GHA workflow executing benchmarks #422

Are you sure you want to change the base?

ci: Add GHA workflow executing benchmarks #422

Uh oh!

Conversation

elle-j commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes

File must exist on the default branch

Testing the Workflow

Example PR Comment

Benchmarks

Compile E2E

Parse Yul

Lower Yul

Resolved Issues

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elle-j Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kvpanch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elle-j Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elle-j Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xermicus left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elle-j commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

elle-j commented Nov 26, 2025 •

edited

Loading

elle-j Jan 6, 2026 •

edited

Loading

elle-j Dec 2, 2025 •

edited

Loading

elle-j Jan 6, 2026 •

edited

Loading

elle-j commented Jan 6, 2026 •

edited

Loading