Added: Benchmarks and plotting for performance analysis #4

acgetchell · 2025-12-15T22:54:11Z

Adds a comprehensive benchmark suite comparing la-stack against nalgebra for various linear algebra operations across different dimensions. Includes tooling to automatically generate plots and update the README with benchmark results. This facilitates ongoing performance monitoring and optimization.

Summary by CodeRabbit

New Features
- Added Python tooling (lint, type, security, test targets), a CLI to generate benchmark CSVs and SVG plots, and new helper targets for benchmarking/plotting.
Documentation
- Added benchmark visuals/tables, an "Anti‑goals" section, and usage docs for the new scripts.
Benchmarks
- Expanded suite to compare multiple libraries across many matrix dimensions.
Tests
- Added unit tests for the plotting/README-update tool and CI checks validating Python security config.
Chores
- Updated CI workflows, packaging metadata, ignore rules, and project tooling configs for Python support.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Adds a comprehensive benchmark suite comparing `la-stack` against `nalgebra` for various linear algebra operations across different dimensions. Includes tooling to automatically generate plots and update the README with benchmark results. This facilitates ongoing performance monitoring and optimization.

coderabbitai · 2025-12-15T22:54:21Z

Walkthrough

Adds Python tooling, packaging, scripts, tests, GitHub/Codacy checks, and Just targets; expands benchmarks to multi-dimension comparisons including faer; updates docs, ignore rules, and small doctest edits in Rust source examples.

Changes

Cohort / File(s)	Summary
Codacy & CI workflows `/.codacy.yml`, `.github/workflows/codacy.yml`, `.github/workflows/ci.yml`	Add ruff and bandit engines and Python include/exclude paths to Codacy; add CI step to install `uv`; add a workflow step that validates `.codacy.yml` contains `bandit`.
Python project & Just targets `pyproject.toml`, `.python-version`, `justfile`	Add pyproject for scripts (console script `criterion-dim-plot`), Ruff/Mypy config and dev deps, pin Python 3.11; add Just targets for python-sync, python-lint, test-python, bench/plot targets and integrate them into existing task graph.
Scripts, tests & docs `scripts/criterion_dim_plot.py`, `scripts/README.md`, `scripts/tests/__init__.py`, `scripts/tests/test_criterion_dim_plot.py`	Add CLI to aggregate Criterion estimates, produce CSV, Markdown table, optional README update and SVG via gnuplot; add tests and scripts README.
Benchmarks & README assets `benches/vs_nalgebra.rs`, `benches/vs_linalg.rs`, `README.md`, `docs/assets/bench/.gitkeep`, `WARP.md`	Replace single-dim nalgebra bench with macro-driven multi-dimension benchmarks comparing la-stack, nalgebra, and faer; add assets placeholder; update README benchmark sections and developer guidance.
Repo ignores & spell-check `.gitignore`, `cspell.json`	Add Python-related ignore patterns (venv, caches, pyc, pycache, uv.lock) and expand cspell wordlist and ignore patterns (e.g., SVG).
Cargo metadata & doctests `Cargo.toml`, `src/lu.rs`, `src/matrix.rs`, `src/vector.rs`	Add package documentation URL and dev-dependency (faer) in Cargo.toml; remove doctest `allow(unused_imports)` directives from example blocks (docs-only edits).
Miscellaneous repo files `docs/assets/bench/.gitkeep`, `/.python-version`	Add directory placeholder and `.python-version` (3.11).

Sequence Diagram(s)

sequenceDiagram
    participant User as CLI user
    participant Script as criterion_dim_plot.py
    participant FS as File system (criterion dirs / README / CSV / SVG)
    participant Gnuplot as gnuplot

    User->>Script: run CLI (metric/stat/sample/flags)
    Script->>FS: discover criterion group dirs, open estimates.json files
    FS-->>Script: estimates (or missing indicators)
    Script->>Script: compute rows, CIs, CSV content, Markdown table
    Script->>FS: write CSV
    Script->>FS: update README between markers (optional)
    alt plotting enabled
      Script->>Gnuplot: invoke with generated data and parameters
      Gnuplot-->>FS: emit SVG
    end
    FS-->>User: CSV (and README/SVG if requested)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- scripts/criterion_dim_plot.py: JSON parsing, README marker detection/ordering, CSV formatting, gnuplot command construction and error handling, CLI argument validation.
- benches/vs_linalg.rs and benches/vs_nalgebra.rs: macro generation correctness and parity of operations across la-stack, nalgebra, faer.
- pyproject.toml / justfile / CI workflows: console script mapping, uv install step, and integration of new Just targets into CI.

Poem

🐇
I hopped through JSON, row by row,
Counted dims where timings grow,
Plotted lines and wrote a chart,
CSV, README — done with heart.
Thump, whisker, code — a rabbit's art.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.91% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Added: Benchmarks and plotting for performance analysis' directly aligns with the PR's main objective—adding comprehensive benchmarks and plotting tooling for performance monitoring.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/benchmark

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-15T22:55:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.95%. Comparing base (0ad77f6) to head (cd6873f).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main       #4   +/-   ##
=======================================
  Coverage   94.95%   94.95%           
=======================================
  Files           3        3           
  Lines         119      119           
=======================================
  Hits          113      113           
  Misses          6        6

Flag	Coverage Δ
unittests	`94.95% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

.github/workflows/ci.yml (1)
58-62: Consider pinning uv to a specific version for reproducibility.

Using "latest" may cause CI inconsistencies if a breaking uv release occurs. Other tools in this workflow (Rust, Node.js) use explicit versions for stability.
       - name: Install uv (for Python scripts and pytest)
         if: matrix.os != 'windows-latest'
         uses: astral-sh/setup-uv@ed21f2f24f8dd64503750218de024bcf64c7250a  # v7.1.5
         with:
-          version: "latest"
+          version: "0.5"  # or specific version like "0.5.11"
.gitignore (1)
8-16: Minor redundancy and missing pattern.

Line 9 (__pycache__/) is redundant since line 10 (**/__pycache__/) already covers all directories including the root. Also consider adding venv/ alongside .venv/ as both naming conventions are common.
 # Python / uv
-__pycache__/
 **/__pycache__/
 *.egg-info/
 .venv/
+venv/
 .ruff_cache/
 .pytest_cache/
 .mypy_cache/
 uv.lock
.github/workflows/codacy.yml (1)
64-75: Fragile regex for YAML validation.

The regex ^ bandit: assumes exactly 2-space indentation. If the YAML formatting changes (e.g., 4 spaces, tabs), this check will fail incorrectly. Consider a more flexible pattern:
-          if ! grep -qE '^  bandit:' "$config"; then
+          if ! grep -qE '^\s+bandit:' "$config"; then
Alternatively, use a YAML parser like yq for robust validation, though that adds a dependency.
scripts/criterion_dim_plot.py (1)

118-118: Static analysis note: consider custom exception for clearer error handling.

Ruff flags the inline long message (TRY003). For a utility script, the current approach is acceptable, but you could optionally extract to a custom exception class if this pattern grows.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ad77f6 and d7dd9b2.

⛔ Files ignored due to path filters (2)

docs/assets/bench/vs_nalgebra_lu_solve_median.csv is excluded by !**/*.csv
docs/assets/bench/vs_nalgebra_lu_solve_median.svg is excluded by !**/*.svg

📒 Files selected for processing (19)

.codacy.yml (6 hunks)
.github/workflows/ci.yml (1 hunks)
.github/workflows/codacy.yml (1 hunks)
.gitignore (1 hunks)
.python-version (1 hunks)
Cargo.toml (1 hunks)
README.md (2 hunks)
benches/vs_nalgebra.rs (1 hunks)
cspell.json (2 hunks)
docs/assets/bench/.gitkeep (1 hunks)
justfile (4 hunks)
pyproject.toml (1 hunks)
scripts/README.md (1 hunks)
scripts/criterion_dim_plot.py (1 hunks)
scripts/tests/__init__.py (1 hunks)
scripts/tests/test_criterion_dim_plot.py (1 hunks)
src/lu.rs (0 hunks)
src/matrix.rs (0 hunks)
src/vector.rs (0 hunks)

💤 Files with no reviewable changes (3)

src/lu.rs
src/matrix.rs
src/vector.rs

🧰 Additional context used

🧬 Code graph analysis (2)

benches/vs_nalgebra.rs (3)

src/matrix.rs (5)

from_rows (24-26)

lu (158-160)

det (178-180)

inf_norm (122-133)

default (185-187)

src/vector.rs (4)

new (21-23)

dot (82-90)

norm2_sq (103-105)

default (110-112)

src/lu.rs (2)

det (167-173)

solve_vec (108-148)

scripts/tests/test_criterion_dim_plot.py (1)

scripts/criterion_dim_plot.py (4)

_readme_table_markers (151-153)

_markdown_table (138-148)

_update_readme_table (156-182)

main (376-424)

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

118-118: Avoid specifying long messages outside the exception class

(TRY003)

163-163: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build (windows-latest)
GitHub Check: build (macos-latest)
GitHub Check: Codacy Security Scan

🔇 Additional comments (23)

.python-version (1)

1-1: No issues identified. The .python-version file correctly specifies Python 3.11, which is consistently declared in pyproject.toml (requires-python = ">=3.11" and python_version = "3.11").

docs/assets/bench/.gitkeep (1)

1-1: Standard .gitkeep placeholder.

This correctly preserves the docs/assets/bench/ directory for future benchmark artifacts.

scripts/tests/__init__.py (1)

1-1: LGTM! This follows standard Python conventions for package initialization.

Empty __init__.py files are the conventional marker for Python packages. This enables the scripts.tests module to be properly recognized and allows relative imports within the test suite.

Cargo.toml (1)

9-9: LGTM!

The documentation field correctly points to docs.rs, which will automatically host the generated documentation once the crate is published.

cspell.json (1)

7-69: LGTM!

The new dictionary words appropriately cover Python tooling (mypy, pytest, venv), plotting terminology (gnuplot, xlabel, ylabel, logscale), and linear algebra libraries (blas, lapack, openblas). The SVG ignore pattern is sensible for generated plot outputs.

scripts/README.md (1)

1-120: Well-documented Python tooling setup.

The README provides clear, comprehensive documentation covering setup, usage workflows, CLI examples, and gnuplot requirements. The structure follows a logical progression from setup to advanced usage.

pyproject.toml (2)

1-101: Well-structured project configuration.

The pyproject.toml is comprehensive with:

Clear metadata and classifiers

Well-documented Ruff ignores with rationale for each

Appropriate per-file ignores for tests

Sensible MyPy settings for CLI scripts

103-108: No issues found. The ruff version constraint ruff>=0.12.11 is valid and reasonable. As of December 2025, ruff has progressed to version 0.14.9, confirming that 0.12.11 exists within a normal release progression and will not cause installation failures.

justfile (3)

9-25: Clean Python tooling integration.

The _ensure-uv helper pattern with informative error messages and the cascading dependencies (python-lint: python-sync) provide a good developer experience. The lint target properly chains ruff check, format, and mypy.

49-89: Well-designed benchmark and plotting targets.

The benchmark targets provide good flexibility:

Optional filter parameter for selective benchmarking

Quick variant for rapid iteration

Plot targets with configurable metric/stat/sample and optional log-y scale

The boolean string comparison for log_y is the correct approach in just/bash.

176-176: Good integration of Python checks into existing workflows.

Adding python-lint to lint-code and test-python to test-all ensures the Python tooling is validated alongside Rust in CI and commit checks.

Also applies to: 222-223

.codacy.yml (2)

36-48: Appropriate Bandit configuration for CLI scripts.

The Bandit skips are reasonable for this context:

B101 (assert) - common in tests

B404, B603, B607 - subprocess-related, acceptable for dev tooling

Setting severity: high and confidence: high focuses on high-impact issues while reducing noise.

49-75: Comprehensive Python analysis integration.

The Codacy configuration properly extends existing analysis to cover Python:

Lizard for complexity metrics

Semgrep for security patterns

Appropriate exclusions for Python artifacts (cache, venv, etc.)

Language extension declarations

Also applies to: 104-143

README.md (2)

35-39: Clear positioning with anti-goals.

The anti-goals section effectively communicates the library's scope, directing users to appropriate alternatives (nalgebra for comprehensive features, BLAS/LAPACK for performance-critical workloads).

115-135: Well-structured benchmark documentation.

The benchmark section provides:

Visual comparison via SVG plot

Raw CSV data for reproducibility

Summary table with automated update markers (BENCH_TABLE)

The results transparently show the performance crossover where nalgebra (likely with SIMD optimizations) overtakes la-stack's pure Rust implementation at larger dimensions (D≥16), which is valuable information for users choosing between libraries.

scripts/tests/test_criterion_dim_plot.py (1)

1-174: Comprehensive test coverage for the plotting module.

The test suite covers marker generation, markdown formatting, README update scenarios (including error cases), and an end-to-end path. Good use of tmp_path fixtures to isolate file operations.

benches/vs_nalgebra.rs (3)

14-24: Well-designed matrix generation for numerical stability.

The diagonal dominance approach ensures non-singular matrices for LU factorization benchmarks. The small off-diagonal values prevent ill-conditioning while keeping the benchmark representative.

84-249: Macro-driven benchmarks enable comprehensive dimension coverage.

The macro cleanly generates paired benchmarks for both libraries with consistent inputs. Pre-computing LU factorizations for the "from_lu" benchmarks isolates the operation being measured.

62-82: Manual infinity norm is necessary for fair comparison.

nalgebra does not provide a built-in matrix infinity norm method. The custom nalgebra_inf_norm implementation correctly computes the maximum absolute row sum, matching the algorithm in la_stack::Matrix::inf_norm for accurate benchmarking.

scripts/criterion_dim_plot.py (4)

28-43: Clean dataclass definitions with appropriate immutability.

Using frozen=True and slots=True provides immutability guarantees and memory efficiency for these configuration objects.

185-230: Gnuplot invocation is properly secured.

The executable is resolved via shutil.which, and all file paths are quoted through _gp_quote. The # noqa: S603 suppression is justified given the controlled inputs.

156-182: Robust README update logic with proper validation.

The function correctly validates marker uniqueness and order, and is idempotent (returns False when content unchanged). The detailed error messages aid debugging.

376-424: Well-orchestrated main function with clear error handling.

The function provides helpful error messages (e.g., suggesting cargo bench --bench vs_nalgebra) and uses appropriate exit codes. The flow handles missing data gracefully.

scripts/criterion_dim_plot.py

Corrects how zero values for nalgebra time are displayed in the benchmark table, preventing potential crashes due to division by zero and displaying "n/a" instead. Also pins uv version for reproducible CI and relaxes whitespace check in codacy config.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

scripts/criterion_dim_plot.py (1)

138-153: Already addressed: Division by zero handling.

The concern raised in the previous review about potential ZeroDivisionError has been resolved. The current implementation (lines 145-149) checks if na == 0.0 before performing division and displays "n/a" as a safe fallback.

🧹 Nitpick comments (1)

scripts/criterion_dim_plot.py (1)
190-235: Consider handling subprocess failures more gracefully.

While the gnuplot executable availability is checked (lines 196-199), if gnuplot exists but fails to execute (e.g., due to script errors), subprocess.run() with check=True will raise CalledProcessError. The caller at lines 364-369 only catches FileNotFoundError, so subprocess failures would propagate uncaught.

Consider catching CalledProcessError in _maybe_render_plot():
     try:
         _render_svg_with_gnuplot(req)
-    except FileNotFoundError as e:
+    except (FileNotFoundError, subprocess.CalledProcessError) as e:
         print(str(e), file=sys.stderr)
         print(f"Wrote CSV instead: {req.csv_path}", file=sys.stderr)
         return 1
Add the subprocess import at the top if needed for the exception type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7dd9b2 and 71908f2.

📒 Files selected for processing (5)

.github/workflows/ci.yml (1 hunks)
.github/workflows/codacy.yml (1 hunks)
.gitignore (1 hunks)
scripts/criterion_dim_plot.py (1 hunks)
scripts/tests/test_criterion_dim_plot.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

.github/workflows/codacy.yml
.github/workflows/ci.yml
scripts/tests/test_criterion_dim_plot.py

🧰 Additional context used

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

118-118: Avoid specifying long messages outside the exception class

(TRY003)

168-168: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build (macos-latest)
GitHub Check: Codacy Security Scan

🔇 Additional comments (10)

.gitignore (1)

7-16: LGTM!

The Python/uv ignore patterns are comprehensive and follow standard conventions for Python projects.

scripts/criterion_dim_plot.py (9)

1-26: LGTM!

The module structure is clean with a clear docstring and appropriate imports for benchmark aggregation and plotting functionality.

28-43: LGTM!

The dataclasses use frozen=True and slots=True appropriately for immutable data holders with reduced memory overhead.

45-87: LGTM!

The METRICS mapping comprehensively covers all benchmark operations with clear naming and helpful comments about cross-crate naming differences.

90-110: LGTM!

The dimension discovery logic correctly parses directory names with regex and handles missing or invalid directories gracefully.

113-135: LGTM!

The estimate reading function handles missing confidence intervals gracefully, and CSV writing includes proper directory creation with clear column headers.

156-187: LGTM!

The README update logic includes robust marker validation (exactly one of each, correct order) and is idempotent, returning False when no changes are needed.

238-295: LGTM!

The CLI interface is comprehensive with constrained choices, sensible defaults, and clear help messages for all options.

298-335: LGTM!

The path resolution logic correctly handles both absolute and relative paths, and data collection gracefully tracks skipped dimensions when estimate files are missing.

338-429: LGTM!

The main orchestration follows a clear flow with appropriate early exits, informative error messages, and separated concerns for optional operations (README updates, plotting). The error handling returns meaningful exit codes for different failure scenarios.

Renames and updates the benchmarking suite to compare against multiple linear algebra crates, including faer, in addition to nalgebra, providing a broader performance context. Adds faer implementations and benchmarks. Updates data and plots.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/criterion_dim_plot.py (1)

313-313: Type alias defined after referencing functions, but valid.

The Row type alias is defined at line 313, after functions like _write_csv (line 139) and _markdown_table (line 155) that reference it in their signatures. This works correctly because from __future__ import annotations enables PEP 563 postponed evaluation of annotations.

For improved readability, consider moving this type alias near the top of the file (after the dataclass definitions around line 44), which is a more conventional placement.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71908f2 and 8224d86.

⛔ Files ignored due to path filters (3)

Cargo.lock is excluded by !**/*.lock
docs/assets/bench/vs_linalg_lu_solve_median.csv is excluded by !**/*.csv
docs/assets/bench/vs_linalg_lu_solve_median.svg is excluded by !**/*.svg

📒 Files selected for processing (9)

Cargo.toml (2 hunks)
README.md (2 hunks)
WARP.md (1 hunks)
benches/vs_linalg.rs (1 hunks)
cspell.json (1 hunks)
justfile (4 hunks)
scripts/README.md (1 hunks)
scripts/criterion_dim_plot.py (1 hunks)
scripts/tests/test_criterion_dim_plot.py (1 hunks)

✅ Files skipped from review due to trivial changes (2)

WARP.md
scripts/tests/test_criterion_dim_plot.py

🚧 Files skipped from review as they are similar to previous changes (1)

Cargo.toml

🧰 Additional context used

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

127-127: Avoid specifying long messages outside the exception class

(TRY003)

181-181: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Code Coverage
GitHub Check: Codacy Security Scan
GitHub Check: build (windows-latest)
GitHub Check: build (macos-latest)
GitHub Check: build (ubuntu-latest)

🔇 Additional comments (17)

cspell.json (1)

1-78: LGTM!

The spell check configuration appropriately adds domain-specific vocabulary for the new benchmarking and Python tooling (gnuplot, pytest, faer, etc.), and the SVG exclusion pattern correctly ignores generated plot assets.

benches/vs_linalg.rs (6)

16-44: Well-implemented permutation sign calculation.

The cycle-counting algorithm correctly determines permutation parity. The formula sign = (-1)^(n - cycles) is mathematically sound for computing the determinant sign of a permutation matrix.

46-54: LGTM!

Correct determinant computation from PA = LU factorization. The product of U's diagonal elements times the permutation sign gives the correct result.

56-83: LGTM!

The matrix generation ensures strict diagonal dominance for numerical stability during LU factorization, making it appropriate for benchmarking comparisons.

126-377: LGTM!

The benchmark macro is well-structured with consistent methodology across all three libraries. Precomputing LU for solve-only benchmarks ensures fair timing comparisons.

379-393: LGTM!

Standard Criterion harness setup with good dimension coverage spanning small (2-5) to larger (8-64) matrices.

39-43: The is_multiple_of method is stable as of Rust 1.87.0 (released May 2025) and requires no changes. The code is correct and appropriate to use.

Likely an incorrect or invalid review comment.

justfile (4)

9-25: LGTM!

The Python tooling integration is well-structured with proper dependency chaining (python-sync ensures uv is available, and python-lint/test-python depend on python-sync).

49-69: LGTM!

The benchmark targets correctly handle optional filtering with proper shell quoting and conditionals.

72-89: LGTM!

The plotting targets are well-designed with sensible defaults and proper parameter handling for the log-scale option.

176-176: LGTM!

Good integration of Python linting and testing into the existing workflow structure.

Also applies to: 222-223

README.md (2)

35-40: LGTM!

The "Anti-goals" section clearly communicates the library's scope and helpfully directs users to appropriate alternatives for different use cases.

116-135: LGTM!

Excellent transparency in the benchmark presentation—honestly showing where la-stack outperforms alternatives (small dimensions) and where it doesn't (larger dimensions). The BENCH_TABLE markers enable automated updates via the plotting script.

scripts/README.md (1)

1-120: LGTM!

Comprehensive and well-organized documentation covering setup, usage workflows, and dependencies. The examples progress nicely from basic to advanced use cases.

scripts/criterion_dim_plot.py (3)

147-152: Division by zero issue properly addressed.

The extracted _pct_reduction function correctly guards against division by zero by returning "n/a" when baseline == 0.0. This addresses the previously flagged concern about potential ZeroDivisionError in the markdown table generation.

208-250: Good security practices in subprocess usage.

The gnuplot rendering function follows good practices:

Resolves the executable via shutil.which rather than passing a user-controlled path

Quotes all file paths in the generated script using _gp_quote

Uses subprocess.run with check=True for proper error propagation

The noqa: S603 comment is appropriate since this is a controlled script execution

398-446: Well-structured main function with appropriate error handling.

The main function has a clear control flow:

Validates inputs and reports actionable error messages

Uses distinct exit codes (0 for success, 1 for partial success with gnuplot failure, 2 for data/validation errors)

Handles the --no-plot and --update-readme flags cleanly

scripts/criterion_dim_plot.py

Fixes an issue where backslashes were not properly escaped when generating strings for gnuplot, leading to incorrect rendering. The gnuplot quoting function now escapes both backslashes and single quotes to ensure correct string interpretation by gnuplot. Refs: docs/benchmark

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

scripts/criterion_dim_plot.py (2)
46-46: Consider using a dataclass for Row instead of a type alias.

The 10-tuple for Row is functional but less self-documenting than a dataclass. A dataclass would make field names explicit (e.g., dim, la_time, la_lo, etc.) and improve code readability.

Example refactor:
-Row = tuple[int, float, float, float, float, float, float, float, float, float]
+@dataclass(frozen=True, slots=True)
+class Row:
+    dim: int
+    la_time: float
+    la_lo: float
+    la_hi: float
+    na_time: float
+    na_lo: float
+    na_hi: float
+    fa_time: float
+    fa_lo: float
+    fa_hi: float
Then update unpacking throughout (e.g., line 146: for row in rows: instead of tuple unpacking).

177-190: Consider using custom exception classes for better error handling.

The static analysis tool suggests avoiding long error messages directly in raise statements (TRY003). While the current messages are informative and work well, you could optionally define custom exceptions if this becomes a pattern.

Example refactor:
+class ReadmeMarkerError(ValueError):
+    """Raised when README markers are missing, duplicated, or misordered."""
+    pass
+
 def _update_readme_table(readme_path: Path, marker_begin: str, marker_end: str, table_md: str) -> bool:
     lines = readme_path.read_text(encoding="utf-8").splitlines(keepends=True)
     
     begin_indices = [i for i, line in enumerate(lines) if line.strip() == marker_begin]
     end_indices = [i for i, line in enumerate(lines) if line.strip() == marker_end]
     
     if len(begin_indices) != 1 or len(end_indices) != 1:
-        raise ValueError(f"README markers not found or not unique. Expected exactly one of each:\n  {marker_begin}\n  {marker_end}\n")
+        raise ReadmeMarkerError(f"README markers not found or not unique. Expected exactly one of each:\n  {marker_begin}\n  {marker_end}\n")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8224d86 and 732480a.

📒 Files selected for processing (2)

scripts/criterion_dim_plot.py (1 hunks)
scripts/tests/test_criterion_dim_plot.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

scripts/tests/test_criterion_dim_plot.py (1)

scripts/criterion_dim_plot.py (7)

_readme_table_markers (172-174)

_markdown_table (158-169)

_gp_quote (206-208)

PlotRequest (37-43)

_maybe_render_plot (376-395)

_update_readme_table (177-203)

main (398-446)

scripts/criterion_dim_plot.py (3)

src/matrix.rs (2)

get (80-86)

default (185-187)

src/vector.rs (1)

default (110-112)

benches/vs_linalg.rs (1)

main (379-393)

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

130-130: Avoid specifying long messages outside the exception class

(TRY003)

184-184: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Code Coverage
GitHub Check: Codacy Security Scan
GitHub Check: build (windows-latest)
GitHub Check: build (ubuntu-latest)
GitHub Check: build (macos-latest)

🔇 Additional comments (19)

scripts/criterion_dim_plot.py (13)

49-99: LGTM!

The METRICS dictionary is comprehensive and well-organized, covering all major linear algebra operations. The note about different naming conventions between crates (line 86) is helpful context.

102-122: LGTM!

The discovery functions correctly identify the repository root and extract dimensions from Criterion's directory naming convention (d{N}). Returning sorted dimensions ensures consistent output ordering.

125-139: LGTM!

The function robustly handles Criterion's JSON format with appropriate fallbacks for missing confidence intervals. The error message clearly identifies which file and stat caused the issue.

142-147: LGTM!

Straightforward CSV writing with appropriate parent directory creation and clear column headers.

150-155: LGTM!

The function correctly handles the zero baseline case (returning "n/a") and calculates percentage reductions accurately. The signed format string makes it clear whether la-stack is faster or slower than the baseline.

158-169: LGTM!

The markdown table generation is well-formatted with clear column headers, thousands separators, and dual comparisons against both nalgebra and faer.

172-203: LGTM!

The README update logic is robust with comprehensive error handling for edge cases (missing markers, duplicates, wrong order). The idempotent return value is a nice touch.

206-208: LGTM!

The quote escaping now correctly handles both backslashes and single quotes in the proper order, resolving the past review comment. This ensures gnuplot strings are properly escaped even with edge cases like paths containing \'.

211-253: LGTM!

The gnuplot integration is well-implemented with proper path quoting, clear error messages, and appropriate security considerations. The script generation is readable and the yerrorlines plotting correctly displays confidence intervals.

256-313: LGTM!

The CLI argument parser is comprehensive and user-friendly with sensible defaults, clear help text, and proper validation through choices. The dynamic metric choices from the METRICS dictionary is a good design.

316-352: LGTM!

The path resolution and data collection functions are well-structured. Tracking skipped dimensions and providing clear feedback about missing benchmark files enhances usability.

355-395: LGTM!

The conditional execution functions provide clear separation of concerns with appropriate error handling and informative user feedback. The exit code convention (0/1/2) is consistent with Unix conventions.

398-450: LGTM!

The main orchestration function has clear control flow with informative error messages that guide users toward resolution. The early validation prevents wasted processing when data is missing.

scripts/tests/test_criterion_dim_plot.py (6)

1-17: LGTM!

The test setup is clean and the first test appropriately validates that marker generation is deterministic, which is important for README updates.

20-42: LGTM!

The markdown table tests provide good coverage of both typical formatting (thousands separators, percentages) and edge cases (zero baseline). The assertion messages are clear and specific.

45-49: LGTM!

The quote escaping test comprehensively validates all edge cases, including the previously problematic backslash-quote combination. This confirms the fix is working correctly.

52-73: LGTM!

The test appropriately simulates gnuplot failures and validates error handling. The use of type() to create a minimal Args object is acceptable for testing purposes.

76-153: LGTM!

The README update tests are comprehensive, covering both the happy path (including idempotency checks) and all error scenarios (missing, misordered, and duplicate markers). The idempotency test at line 109 is particularly valuable.

156-216: LGTM!

The end-to-end test provides excellent coverage of the full workflow, from directory structure creation through CSV and README validation. The test data is realistic and the assertions verify both structural correctness and computed values.

Refactors the criterion plotting script to improve readability and maintainability. It replaces the `Row` tuple with a `Row` dataclass for structured data handling and introduces `ReadmeMarkerError` for better error context when handling README markers. The script now uses named attributes, enhancing code clarity. (internal)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

scripts/criterion_dim_plot.py (1)
187-268: LGTM! README updates and gnuplot rendering are solid.

The quote escaping in _gp_quote (line 223) correctly handles both backslashes and quotes per past review feedback. The subprocess call (line 268) is safe since gnuplot is resolved via PATH and all inputs are properly quoted.

Optional: Address static analysis hints about exception messages.

Lines 199 and 205 raise ReadmeMarkerError with long messages. While functional, Ruff suggests defining specific exception subclasses for clearer error types:
class MarkerNotFoundError(ReadmeMarkerError):
    """Raised when README markers are missing or not unique."""

class MarkerOrderError(ReadmeMarkerError):
    """Raised when README markers are out of order."""
Then raise these specific exceptions with shorter messages. This is a style improvement, not a functional issue.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 732480a and be24d03.

📒 Files selected for processing (2)

scripts/criterion_dim_plot.py (1 hunks)
scripts/tests/test_criterion_dim_plot.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/tests/test_criterion_dim_plot.py

🧰 Additional context used

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

145-145: Avoid specifying long messages outside the exception class

(TRY003)

199-199: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Codacy Security Scan
GitHub Check: Code Coverage
GitHub Check: build (windows-latest)
GitHub Check: build (macos-latest)

🔇 Additional comments (5)

scripts/criterion_dim_plot.py (5)

1-26: LGTM! Clear module documentation and appropriate imports.

The docstring effectively describes the script's purpose and I/O paths, and all imports are standard library modules suitable for the task.

28-62: LGTM! Well-designed immutable data models.

The use of frozen=True and slots=True ensures immutability and memory efficiency. The Row dataclass clearly captures all benchmark data (time + CI bounds) for three libraries.

64-114: LGTM! Comprehensive benchmark metric definitions.

The METRICS dictionary covers a good range of linear algebra operations and correctly handles naming differences between libraries (e.g., nalgebra_norm_squared vs *_norm2_sq).

117-184: LGTM! Robust data processing helpers.

The division-by-zero safeguard in _pct_reduction (lines 167-168) properly handles edge cases. The CSV writing and markdown table generation are clear and correct.

271-478: LGTM! Well-orchestrated main flow with comprehensive error handling.

The argument parsing is thorough, path resolution handles both absolute and relative paths correctly, and the main function validates data availability at multiple stages. Exit codes are appropriate (0=success, 1=plot failure, 2=data errors).

The handling of argv in line 427 correctly supports both CLI and programmatic invocation.

Refactors README table update error handling for better clarity and specificity. Introduces dedicated exception classes for missing/non-unique markers and out-of-order markers, providing more descriptive error messages.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

scripts/criterion_dim_plot.py (1)
125-163: Logic is correct. Consider extracting the error message.

The repository discovery, dimension parsing, and estimate reading functions are well-implemented and handle edge cases appropriately (e.g., missing confidence intervals return safe defaults).

The static analysis tool flags line 153 for specifying a long message directly in the raise statement. While not critical, extracting complex messages to constants or custom exception constructors can improve maintainability.

Optional refactor:
+_STAT_NOT_FOUND_MSG = "stat '{stat}' not found in {path}"
+
 def _read_estimate(estimates_json: Path, stat: str) -> tuple[float, float, float]:
     data = json.loads(estimates_json.read_text(encoding="utf-8"))
 
     stat_obj = data.get(stat)
     if not isinstance(stat_obj, dict):
-        raise KeyError(f"stat '{stat}' not found in {estimates_json}")
+        raise KeyError(_STAT_NOT_FOUND_MSG.format(stat=stat, path=estimates_json))
As per static analysis hints, this aligns with TRY003 guidance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be24d03 and cd6873f.

📒 Files selected for processing (1)

scripts/criterion_dim_plot.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.8)

scripts/criterion_dim_plot.py

153-153: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (windows-latest)
GitHub Check: build (macos-latest)
GitHub Check: Codacy Security Scan
GitHub Check: Code Coverage

🔇 Additional comments (11)

scripts/criterion_dim_plot.py (11)

1-26: LGTM! Clean module structure and standard library imports.

The docstring clearly documents the script's purpose and input/output paths, and all imports are from the Python standard library.

28-70: LGTM! Well-designed data structures.

The use of frozen=True and slots=True on all dataclasses ensures immutability and memory efficiency. The exception hierarchy provides clear, specific error types for README marker issues.

72-122: LGTM! Comprehensive metric definitions.

The METRICS constant provides a clear mapping between metric keys and benchmark names across all three crates. The inline comment at line 109 helpfully documents naming inconsistencies.

165-192: LGTM! Past division-by-zero fix confirmed.

The CSV writing and markdown table generation are well-implemented. The _pct_reduction function now correctly handles zero baselines by returning "n/a", addressing the issue flagged in the previous review.

195-227: LGTM! Robust README marker handling.

The marker validation is thorough:

Ensures exactly one begin and one end marker

Validates correct marker ordering

Avoids unnecessary writes when content is unchanged (line 223)

Uses specific exception types for clear error reporting

230-277: LGTM! Past quote-escaping fix confirmed and subprocess use is safe.

The _gp_quote function now correctly escapes backslashes before single quotes (line 232), addressing the issue from the previous review.

The subprocess call at line 277 is appropriately justified: gnuplot is resolved via PATH, the input script is fully generated (no user injection), and all file paths are properly quoted.

280-337: LGTM! Comprehensive CLI with sensible defaults.

The argument parser provides clear help strings and reasonable defaults for all options. The choices constraint on --metric prevents invalid metric selection.

340-354: LGTM! Clear path resolution logic.

The functions correctly handle both absolute and relative paths, with sensible default output locations under docs/assets/bench/.

357-389: LGTM! Robust data collection with graceful degradation.

The function gracefully handles missing benchmark results by skipping dimensions rather than failing, and tracks what was skipped for user feedback.

392-432: LGTM! Clean separation of concerns with appropriate error handling.

Both functions follow a clear pattern:

Check whether to perform the action

Execute with proper exception handling

Return meaningful exit codes (0, 1, or 2)

Provide user feedback via stdout/stderr

435-487: LGTM! Well-structured main function with excellent error messages.

The main function orchestrates the workflow cleanly:

Parse arguments and discover dimensions

Validate that benchmark data exists

Collect rows and write CSV

Optionally update README

Optionally render plot

Error messages provide actionable guidance (e.g., line 445 suggests running cargo bench), and the use of raise SystemExit(main()) is the proper pattern for entry points.

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

scripts/criterion_dim_plot.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

scripts/criterion_dim_plot.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

Fixed: Improves error handling for README table updates

cd6873f

Refactors README table update error handling for better clarity and specificity. Introduces dedicated exception classes for missing/non-unique markers and out-of-order markers, providing more descriptive error messages.

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

acgetchell merged commit 417c103 into main Dec 16, 2025
9 checks passed

acgetchell deleted the docs/benchmark branch December 16, 2025 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added: Benchmarks and plotting for performance analysis #4

Added: Benchmarks and plotting for performance analysis #4

Uh oh!

acgetchell commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added: Benchmarks and plotting for performance analysis #4

Added: Benchmarks and plotting for performance analysis #4

Uh oh!

Conversation

acgetchell commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acgetchell commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

codecov bot commented Dec 15, 2025 •

edited

Loading