[FEAT] Add performance benchmarking and profiling infrastructure

## Motivation

As the MIR semantics grows in complexity and test coverage expands (see #964), there is no systematic way to detect performance regressions or identify bottlenecks in symbolic execution. Issue #655 shows that performance problems have already surfaced (slow `use` functions), but we lack the tooling to investigate them systematically or to catch regressions automatically.

## Proposed Feature

### 1. Benchmark Suite

Define a curated set of `prove-rs` test cases as a **benchmark suite** — a subset of the existing `integration/data/prove-rs/` programs chosen to cover representative workloads (arithmetic, branching, enums, closures, iterators, etc.).

Each benchmark would record:
- Wall-clock time for `kmir prove-rs`
- Number of proof steps / rewrite rule applications (via K's `--statistics` output)
- Peak memory usage

Results are stored as a JSON artifact (e.g., `bench-results.json`) committed or uploaded as a workflow artifact for comparison.

### 2. Python-level Profiling via `pyk.testing.Profiler`

`pyk` already ships a `cProfile`-based `Profiler` class (`pyk/testing/_profiler.py`). We can integrate it into the pytest integration-test runner to generate `.prof` files for selected tests, enabling:
- Identification of hot Python functions in the SMIR→K transformation pipeline (`kmir.py`, `smir.py`)
- Easy investigation of issues like #655 without manual instrumentation

A `--profile` pytest option (or a dedicated `make profile-integration` target) would enable this on demand.

### 3. GitHub Actions Workflow: `benchmark.yml`

A new workflow triggered on:
- **`push` to `master`** — baseline tracking
- **`workflow_dispatch`** — on-demand profiling for PRs under investigation
- Optionally: **`pull_request`** with a label like `perf` to gate on demand

Steps:
1. Build `stable-mir-json` + `kmir` (reuse Docker setup from `test.yml`)
2. Run the benchmark suite, capturing timing/step counts
3. Upload `bench-results.json` as a workflow artifact
4. On `master` push: compare against the previous baseline stored in a GitHub Actions cache and post a summary to the job summary (`$GITHUB_STEP_SUMMARY`)
5. Optionally: use [`benchmark-action/github-action-benchmark`](https://github.com/benchmark-action/github-action-benchmark) to track trends over time and comment on PRs when a regression threshold is exceeded

### 4. `make benchmark` Target

A local `Makefile` target for developers to run the benchmark suite locally and view a summary, mirroring CI behaviour.

## Acceptance Criteria

- [ ] Benchmark suite defined (list of representative test cases with expected step counts)
- [ ] `make benchmark` target runs locally and outputs a timing/step-count table
- [ ] `--profile` mode generates `.prof` files for pytest integration tests using `pyk.testing.Profiler`
- [ ] `benchmark.yml` GitHub Actions workflow uploads benchmark artifacts on every `master` push
- [ ] CI posts a step-summary table comparing current vs. baseline results
- [ ] Documentation added to `docs/dev/` on how to run benchmarks and interpret results

## Related

- #655 — slow `use` functions investigation
- #964 — external test suite integration (benchmark suite could overlap)
- `pyk/testing/_profiler.py` — existing profiler infrastructure available in the `pyk` dependency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Add performance benchmarking and profiling infrastructure #970

Motivation

Proposed Feature

1. Benchmark Suite

2. Python-level Profiling via `pyk.testing.Profiler`

3. GitHub Actions Workflow: `benchmark.yml`

4. `make benchmark` Target

Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEAT] Add performance benchmarking and profiling infrastructure #970

Description

Motivation

Proposed Feature

1. Benchmark Suite

2. Python-level Profiling via pyk.testing.Profiler

3. GitHub Actions Workflow: benchmark.yml

4. make benchmark Target

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Python-level Profiling via `pyk.testing.Profiler`

3. GitHub Actions Workflow: `benchmark.yml`

4. `make benchmark` Target