Motivation
As the MIR semantics grows in complexity and test coverage expands (see #964), there is no systematic way to detect performance regressions or identify bottlenecks in symbolic execution. Issue #655 shows that performance problems have already surfaced (slow use functions), but we lack the tooling to investigate them systematically or to catch regressions automatically.
Proposed Feature
1. Benchmark Suite
Define a curated set of prove-rs test cases as a benchmark suite — a subset of the existing integration/data/prove-rs/ programs chosen to cover representative workloads (arithmetic, branching, enums, closures, iterators, etc.).
Each benchmark would record:
- Wall-clock time for
kmir prove-rs
- Number of proof steps / rewrite rule applications (via K's
--statistics output)
- Peak memory usage
Results are stored as a JSON artifact (e.g., bench-results.json) committed or uploaded as a workflow artifact for comparison.
2. Python-level Profiling via pyk.testing.Profiler
pyk already ships a cProfile-based Profiler class (pyk/testing/_profiler.py). We can integrate it into the pytest integration-test runner to generate .prof files for selected tests, enabling:
A --profile pytest option (or a dedicated make profile-integration target) would enable this on demand.
3. GitHub Actions Workflow: benchmark.yml
A new workflow triggered on:
push to master — baseline tracking
workflow_dispatch — on-demand profiling for PRs under investigation
- Optionally:
pull_request with a label like perf to gate on demand
Steps:
- Build
stable-mir-json + kmir (reuse Docker setup from test.yml)
- Run the benchmark suite, capturing timing/step counts
- Upload
bench-results.json as a workflow artifact
- On
master push: compare against the previous baseline stored in a GitHub Actions cache and post a summary to the job summary ($GITHUB_STEP_SUMMARY)
- Optionally: use
benchmark-action/github-action-benchmark to track trends over time and comment on PRs when a regression threshold is exceeded
4. make benchmark Target
A local Makefile target for developers to run the benchmark suite locally and view a summary, mirroring CI behaviour.
Acceptance Criteria
Related
Motivation
As the MIR semantics grows in complexity and test coverage expands (see #964), there is no systematic way to detect performance regressions or identify bottlenecks in symbolic execution. Issue #655 shows that performance problems have already surfaced (slow
usefunctions), but we lack the tooling to investigate them systematically or to catch regressions automatically.Proposed Feature
1. Benchmark Suite
Define a curated set of
prove-rstest cases as a benchmark suite — a subset of the existingintegration/data/prove-rs/programs chosen to cover representative workloads (arithmetic, branching, enums, closures, iterators, etc.).Each benchmark would record:
kmir prove-rs--statisticsoutput)Results are stored as a JSON artifact (e.g.,
bench-results.json) committed or uploaded as a workflow artifact for comparison.2. Python-level Profiling via
pyk.testing.Profilerpykalready ships acProfile-basedProfilerclass (pyk/testing/_profiler.py). We can integrate it into the pytest integration-test runner to generate.proffiles for selected tests, enabling:kmir.py,smir.py)A
--profilepytest option (or a dedicatedmake profile-integrationtarget) would enable this on demand.3. GitHub Actions Workflow:
benchmark.ymlA new workflow triggered on:
pushtomaster— baseline trackingworkflow_dispatch— on-demand profiling for PRs under investigationpull_requestwith a label likeperfto gate on demandSteps:
stable-mir-json+kmir(reuse Docker setup fromtest.yml)bench-results.jsonas a workflow artifactmasterpush: compare against the previous baseline stored in a GitHub Actions cache and post a summary to the job summary ($GITHUB_STEP_SUMMARY)benchmark-action/github-action-benchmarkto track trends over time and comment on PRs when a regression threshold is exceeded4.
make benchmarkTargetA local
Makefiletarget for developers to run the benchmark suite locally and view a summary, mirroring CI behaviour.Acceptance Criteria
make benchmarktarget runs locally and outputs a timing/step-count table--profilemode generates.proffiles for pytest integration tests usingpyk.testing.Profilerbenchmark.ymlGitHub Actions workflow uploads benchmark artifacts on everymasterpushdocs/dev/on how to run benchmarks and interpret resultsRelated
usefunctions investigationpyk/testing/_profiler.py— existing profiler infrastructure available in thepykdependency