Skip to content

Conversation

@elle-j
Copy link
Contributor

@elle-j elle-j commented Nov 26, 2025

Description

Adds a workflow executing the benchmarks in resolc and revive-yul.

  • The workflow can be manually triggered via the Actions UI.
  • User input:
    • Branch, tag, or SHA before changes (e.g. main) (required)
    • Branch, tag, or SHA after changes (e.g. lj/ci-benchmarks) (required)
    • PR number (optional)
      • If provided, the benchmarking result will be posted as a PR comment
        • The comment is updated if it exists, otherwise it is created
      • If omitted, the result is uploaded as an artifact

This PR originally implemented the workflow to automatically run on each PR once its tests passed successfully. As we likely do not need to run this workflow for each PR, the workflow has been rewritten to manually trigger it.

Providing user input also allows for comparisons between any commits, such as releases.

Notes

File must exist on the default branch

  • This workflow is triggered by a workflow_dispatch event only if the workflow file exists on the default branch. Therefore, the workflow cannot be run for this PR. (See section below on how it has still been tested.)
  • Once merged, for any subsequent PRs that modify this workflow, those changes will be testable before merge (by choosing to run the workflow from the PR branch).

Testing the Workflow

I've tested a similar enough workflow in a personal repo in order to push it to main and trigger it. Will verify again once this PR is merged.

  • Runs when provided with branch names
  • Runs when provided with SHAs
  • A PR comment is created if a PR number is provided and the comment does not exist
  • A PR comment is updated if a PR number is provided and the comment already exists
  • An artifact is created if a PR number is omitted
  • An in-progress run is cancelled if another one is triggered comparing the same refs

Example PR Comment

Expand to see an example PR comment

Benchmarks

  • Ref Before: main
  • Ref After: lj/ci-benchmarks

Compile E2E

Results
group                                            resolc_before                            resolc_changes
-----                                            -------------                            --------------
Dependency/resolc                                1.00    144.0±6.94ms        ? ?/sec      1.00    144.6±5.97ms        ? ?/sec
Dependency/solc                                  1.04    65.9±10.73ms        ? ?/sec      1.00    63.5±10.10ms        ? ?/sec
Empty/resolc                                     1.03     68.8±4.40ms        ? ?/sec      1.00     66.6±4.61ms        ? ?/sec
Empty/solc                                       1.13     11.2±2.01ms        ? ?/sec      1.00     10.0±2.28ms        ? ?/sec
LargeDivRem/resolc                               1.00    119.3±5.62ms        ? ?/sec      1.02   121.3±17.03ms        ? ?/sec
LargeDivRem/solc                                 1.15     24.7±3.15ms        ? ?/sec      1.00     21.5±1.03ms        ? ?/sec
Memset (`--yul`)/resolc                          1.02     62.7±3.92ms        ? ?/sec      1.00     61.7±3.76ms        ? ?/sec
Memset (`--yul`)/solc                            1.00      9.0±0.84ms        ? ?/sec      1.12     10.1±2.23ms        ? ?/sec
Multiple Contracts (`--standard-json`)/resolc    1.01  1559.2±68.95ms        ? ?/sec      1.00  1539.3±64.92ms        ? ?/sec
Multiple Contracts (`--standard-json`)/solc      1.01   652.0±19.38ms        ? ?/sec      1.00   648.4±31.83ms        ? ?/sec
Return (`--yul`)/resolc                          1.00    60.1±17.02ms        ? ?/sec      1.02    61.2±14.50ms        ? ?/sec
Return (`--yul`)/solc                            1.09      8.8±2.37ms        ? ?/sec      1.00      8.0±3.08ms        ? ?/sec

Parse Yul

Results
group             yul_parse_before                         yul_parse_changes
-----             ----------------                         -----------------
Baseline/parse    1.07      8.9±3.64µs        ? ?/sec      1.00      8.4±0.09µs        ? ?/sec
ERC20/parse       1.00   162.5±34.25µs        ? ?/sec      1.07  173.3±123.25µs        ? ?/sec
SHA1/parse        1.01    80.8±33.04µs        ? ?/sec      1.00    80.3±52.82µs        ? ?/sec
Storage/parse     1.00     16.9±8.15µs        ? ?/sec      1.01     17.2±7.54µs        ? ?/sec
Transfer/parse    1.00     18.5±0.23µs        ? ?/sec      1.23    22.7±18.50µs        ? ?/sec

Lower Yul

Results
group             yul_lower_before                         yul_lower_changes
-----             ----------------                         -----------------
Baseline/lower    1.00    122.1±5.95µs        ? ?/sec      1.09   133.6±35.12µs        ? ?/sec
ERC20/lower       1.00   641.7±13.24µs        ? ?/sec      1.15  735.6±665.92µs        ? ?/sec
SHA1/lower        1.00   361.4±78.47µs        ? ?/sec      1.09  392.3±114.48µs        ? ?/sec
Storage/lower     1.00   167.2±50.34µs        ? ?/sec      1.01   168.2±32.45µs        ? ?/sec
Transfer/lower    1.00    167.5±8.70µs        ? ?/sec      1.03   173.1±16.32µs        ? ?/sec

Resolved Issues

Closes #405

@elle-j elle-j changed the title [WIP] ci: Add GHA workflow executing benchmarks ci: Add GHA workflow executing benchmarks Nov 27, 2025
@elle-j elle-j marked this pull request as ready for review November 27, 2025 17:36
@elle-j elle-j requested review from kvpanch and xermicus November 27, 2025 17:36
cargo bench --package resolc --bench compile -- --save-baseline main_resolc
cargo bench --package revive-yul --bench parse -- --save-baseline main_yul_parse
cargo bench --package revive-yul --bench lower -- --save-baseline main_yul_lower
timeout-minutes: 20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove the timeout-minutes and add it only if we notice unusually long runs. (This took about 6-7min on M4 Pro.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should narrow down the workflow trigger. Thinking about it, this will be generating a lot of CI hours. I don't think it's necessary to run this workflow in every PR. Is there a simple way to configure it to to be manually triggered by the PR author?

Copy link
Contributor Author

@elle-j elle-j Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, also thought about that once I was off. I've rewritten the workflow to instead be manually triggered via the Actions UI. (See updated PR description.)

Copy link
Contributor

@kvpanch kvpanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it known if CI machine is good for such perf testing ? For example, can CI machine run several tasks in parallel ? If so, perf results might not be stable and it might be required to run such perf testing on a dedicated machine

Comment on lines 46 to 47
git fetch origin main
git checkout -f main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that makes sense to parameterize it by allowing to specify branch or tag to test against. This will help to nicely measure improvements/regressions over some release.
Obviously, that can be done later, not in this PR

Copy link
Contributor Author

@elle-j elle-j Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a neat and valuable feature, thanks. Can add in a follow-up 👍

Regarding your other comment and concurrency, jobs and workflows (not steps within jobs) can run concurrently by default. Which, thanks for the reminder, made me add a concurrency group where this workflow only runs one at a time for the given PR branch (and cancels in-progress ones if it's triggered again while in progress).

I'd have to look up whether another CI machine is still more optimial for perf testing tho.

Copy link
Contributor Author

@elle-j elle-j Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Regarding parameterization, I've incorporated this into the rewrite of this workflow that I've now done in order to manually trigger this workflow on demand, rather than automatically on each PR which may not be necessary.

Copy link
Member

@xermicus xermicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall


jobs:
benchmark:
runs-on: ubuntu-24.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use the same host we benchmark the polkadot SDK with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like different ones are being used for benches in the Polkadot SDK, but two common ones are e.g. parity-large and parity-benchmark (both enabled in our revive repo). The differences can be seen here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Opted for parity-large in our case.

## Parse Yul

<details>
<summary>Results</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiding the results is definitively what we want here in order to not make it too noisy. Can we also have a summary or something on top that tells PR authors quickly whether the benchmarks have regressed, improved or no significant change was detected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make a note of this for a follow-up 👍 Will likely require some parsing of critcmp's output. But def improves the readability.

elle-j added 4 commits January 6, 2026 17:47
Rather than having the long-running workflow be automatically triggered
on each PR and PR update, we now manually trigger the workflow via the
Actions UI. This also allows the user to compare benchmarks between any
branches, tags, or SHAs by providing these values as input.
It seems as though parity-large and parity-benchmark are among some
runners used in the Polkadot SDK. parity-large is chosen here based on
the differences described here:
https://github.com/paritytech/ci_cd/wiki/GitHub#paritytech-self-hosted-runners
@elle-j
Copy link
Contributor Author

elle-j commented Jan 6, 2026

I've slightly rewritten the workflow to instead be manually triggered, rather than running this on each PR and each of its updates. We can now also compare any branch name, SHA, or tag. (See updated PR description.)

@elle-j elle-j requested review from kvpanch and xermicus January 6, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GHA workflow executing benchmarks

4 participants