feat: add cairo-metrics benchmark harness + CI tracking #9401
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

Summary
cairo-metrics: a small CLI that currently checks wall-clock regressions from main.repo_root/results.db(git-ignored), currently SQLite keyed by runid (defaults to current git SHA). This enables caching and tracking results across time locally by
checking out builds starting from this commit, and having the tool populate the db.
The tool then allows comparisons based on the db results. Future work can easily extend this to
a stateful DB machine (like RDS), since the DB is behind a trait.
via artifacts (saves ci runtime for multiple PR runs over the same base branch), and posts a
“Benchmark Comparison” PR comment, not blocking, the reviewer decided.
is bundeled as a vendored release (not a submodule for simplicity).
or via
hyperfinewhich uses the binary. It uses hyperfine bydefault if available (need to install with
apt) otherwise the builtin.This is useful locally, to debug the builtin engine itself
(results are similar since hyperfine cancels out shell overhead), and since
hyperfine outputs useful statistical anomaly messages.
But if hyperfine is a pain to maintain it can be removed.
Type of change
Please check one:
Why is this change needed?
Implemented a benchmarking harness for testing and tracking performance regressions
What was the behavior or documentation before?
What is the behavior or documentation after?
Related issue or discussion (if any)
Additional context