-
Notifications
You must be signed in to change notification settings - Fork 0
Added: Benchmarks and plotting for performance analysis #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a comprehensive benchmark suite comparing `la-stack` against `nalgebra` for various linear algebra operations across different dimensions. Includes tooling to automatically generate plots and update the README with benchmark results. This facilitates ongoing performance monitoring and optimization.
WalkthroughAdds Python tooling, packaging, scripts, tests, GitHub/Codacy checks, and Just targets; expands benchmarks to multi-dimension comparisons including faer; updates docs, ignore rules, and small doctest edits in Rust source examples. Changes
Sequence Diagram(s)sequenceDiagram
participant User as CLI user
participant Script as criterion_dim_plot.py
participant FS as File system (criterion dirs / README / CSV / SVG)
participant Gnuplot as gnuplot
User->>Script: run CLI (metric/stat/sample/flags)
Script->>FS: discover criterion group dirs, open estimates.json files
FS-->>Script: estimates (or missing indicators)
Script->>Script: compute rows, CIs, CSV content, Markdown table
Script->>FS: write CSV
Script->>FS: update README between markers (optional)
alt plotting enabled
Script->>Gnuplot: invoke with generated data and parameters
Gnuplot-->>FS: emit SVG
end
FS-->>User: CSV (and README/SVG if requested)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4 +/- ##
=======================================
Coverage 94.95% 94.95%
=======================================
Files 3 3
Lines 119 119
=======================================
Hits 113 113
Misses 6 6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.github/workflows/ci.yml (1)
58-62: Consider pinning uv to a specific version for reproducibility.Using
"latest"may cause CI inconsistencies if a breaking uv release occurs. Other tools in this workflow (Rust, Node.js) use explicit versions for stability.- name: Install uv (for Python scripts and pytest) if: matrix.os != 'windows-latest' uses: astral-sh/setup-uv@ed21f2f24f8dd64503750218de024bcf64c7250a # v7.1.5 with: - version: "latest" + version: "0.5" # or specific version like "0.5.11".gitignore (1)
8-16: Minor redundancy and missing pattern.Line 9 (
__pycache__/) is redundant since line 10 (**/__pycache__/) already covers all directories including the root. Also consider addingvenv/alongside.venv/as both naming conventions are common.# Python / uv -__pycache__/ **/__pycache__/ *.egg-info/ .venv/ +venv/ .ruff_cache/ .pytest_cache/ .mypy_cache/ uv.lock.github/workflows/codacy.yml (1)
64-75: Fragile regex for YAML validation.The regex
^ bandit:assumes exactly 2-space indentation. If the YAML formatting changes (e.g., 4 spaces, tabs), this check will fail incorrectly. Consider a more flexible pattern:- if ! grep -qE '^ bandit:' "$config"; then + if ! grep -qE '^\s+bandit:' "$config"; thenAlternatively, use a YAML parser like
yqfor robust validation, though that adds a dependency.scripts/criterion_dim_plot.py (1)
118-118: Static analysis note: consider custom exception for clearer error handling.Ruff flags the inline long message (TRY003). For a utility script, the current approach is acceptable, but you could optionally extract to a custom exception class if this pattern grows.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/assets/bench/vs_nalgebra_lu_solve_median.csvis excluded by!**/*.csvdocs/assets/bench/vs_nalgebra_lu_solve_median.svgis excluded by!**/*.svg
📒 Files selected for processing (19)
.codacy.yml(6 hunks).github/workflows/ci.yml(1 hunks).github/workflows/codacy.yml(1 hunks).gitignore(1 hunks).python-version(1 hunks)Cargo.toml(1 hunks)README.md(2 hunks)benches/vs_nalgebra.rs(1 hunks)cspell.json(2 hunks)docs/assets/bench/.gitkeep(1 hunks)justfile(4 hunks)pyproject.toml(1 hunks)scripts/README.md(1 hunks)scripts/criterion_dim_plot.py(1 hunks)scripts/tests/__init__.py(1 hunks)scripts/tests/test_criterion_dim_plot.py(1 hunks)src/lu.rs(0 hunks)src/matrix.rs(0 hunks)src/vector.rs(0 hunks)
💤 Files with no reviewable changes (3)
- src/lu.rs
- src/matrix.rs
- src/vector.rs
🧰 Additional context used
🧬 Code graph analysis (2)
benches/vs_nalgebra.rs (3)
src/matrix.rs (5)
from_rows(24-26)lu(158-160)det(178-180)inf_norm(122-133)default(185-187)src/vector.rs (4)
new(21-23)dot(82-90)norm2_sq(103-105)default(110-112)src/lu.rs (2)
det(167-173)solve_vec(108-148)
scripts/tests/test_criterion_dim_plot.py (1)
scripts/criterion_dim_plot.py (4)
_readme_table_markers(151-153)_markdown_table(138-148)_update_readme_table(156-182)main(376-424)
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
118-118: Avoid specifying long messages outside the exception class
(TRY003)
163-163: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (windows-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: Codacy Security Scan
🔇 Additional comments (23)
.python-version (1)
1-1: No issues identified. The.python-versionfile correctly specifies Python 3.11, which is consistently declared inpyproject.toml(requires-python = ">=3.11" and python_version = "3.11").docs/assets/bench/.gitkeep (1)
1-1: Standard.gitkeepplaceholder.This correctly preserves the
docs/assets/bench/directory for future benchmark artifacts.scripts/tests/__init__.py (1)
1-1: LGTM! This follows standard Python conventions for package initialization.Empty
__init__.pyfiles are the conventional marker for Python packages. This enables thescripts.testsmodule to be properly recognized and allows relative imports within the test suite.Cargo.toml (1)
9-9: LGTM!The documentation field correctly points to docs.rs, which will automatically host the generated documentation once the crate is published.
cspell.json (1)
7-69: LGTM!The new dictionary words appropriately cover Python tooling (
mypy,pytest,venv), plotting terminology (gnuplot,xlabel,ylabel,logscale), and linear algebra libraries (blas,lapack,openblas). The SVG ignore pattern is sensible for generated plot outputs.scripts/README.md (1)
1-120: Well-documented Python tooling setup.The README provides clear, comprehensive documentation covering setup, usage workflows, CLI examples, and gnuplot requirements. The structure follows a logical progression from setup to advanced usage.
pyproject.toml (2)
1-101: Well-structured project configuration.The pyproject.toml is comprehensive with:
- Clear metadata and classifiers
- Well-documented Ruff ignores with rationale for each
- Appropriate per-file ignores for tests
- Sensible MyPy settings for CLI scripts
103-108: No issues found. The ruff version constraintruff>=0.12.11is valid and reasonable. As of December 2025, ruff has progressed to version 0.14.9, confirming that 0.12.11 exists within a normal release progression and will not cause installation failures.justfile (3)
9-25: Clean Python tooling integration.The
_ensure-uvhelper pattern with informative error messages and the cascading dependencies (python-lint: python-sync) provide a good developer experience. The lint target properly chains ruff check, format, and mypy.
49-89: Well-designed benchmark and plotting targets.The benchmark targets provide good flexibility:
- Optional filter parameter for selective benchmarking
- Quick variant for rapid iteration
- Plot targets with configurable metric/stat/sample and optional log-y scale
The boolean string comparison for
log_yis the correct approach in just/bash.
176-176: Good integration of Python checks into existing workflows.Adding
python-linttolint-codeandtest-pythontotest-allensures the Python tooling is validated alongside Rust in CI and commit checks.Also applies to: 222-223
.codacy.yml (2)
36-48: Appropriate Bandit configuration for CLI scripts.The Bandit skips are reasonable for this context:
- B101 (assert) - common in tests
- B404, B603, B607 - subprocess-related, acceptable for dev tooling
Setting
severity: highandconfidence: highfocuses on high-impact issues while reducing noise.
49-75: Comprehensive Python analysis integration.The Codacy configuration properly extends existing analysis to cover Python:
- Lizard for complexity metrics
- Semgrep for security patterns
- Appropriate exclusions for Python artifacts (cache, venv, etc.)
- Language extension declarations
Also applies to: 104-143
README.md (2)
35-39: Clear positioning with anti-goals.The anti-goals section effectively communicates the library's scope, directing users to appropriate alternatives (nalgebra for comprehensive features, BLAS/LAPACK for performance-critical workloads).
115-135: Well-structured benchmark documentation.The benchmark section provides:
- Visual comparison via SVG plot
- Raw CSV data for reproducibility
- Summary table with automated update markers (
BENCH_TABLE)The results transparently show the performance crossover where nalgebra (likely with SIMD optimizations) overtakes la-stack's pure Rust implementation at larger dimensions (D≥16), which is valuable information for users choosing between libraries.
scripts/tests/test_criterion_dim_plot.py (1)
1-174: Comprehensive test coverage for the plotting module.The test suite covers marker generation, markdown formatting, README update scenarios (including error cases), and an end-to-end path. Good use of
tmp_pathfixtures to isolate file operations.benches/vs_nalgebra.rs (3)
14-24: Well-designed matrix generation for numerical stability.The diagonal dominance approach ensures non-singular matrices for LU factorization benchmarks. The small off-diagonal values prevent ill-conditioning while keeping the benchmark representative.
84-249: Macro-driven benchmarks enable comprehensive dimension coverage.The macro cleanly generates paired benchmarks for both libraries with consistent inputs. Pre-computing LU factorizations for the "from_lu" benchmarks isolates the operation being measured.
62-82: Manual infinity norm is necessary for fair comparison.nalgebra does not provide a built-in matrix infinity norm method. The custom
nalgebra_inf_normimplementation correctly computes the maximum absolute row sum, matching the algorithm inla_stack::Matrix::inf_normfor accurate benchmarking.scripts/criterion_dim_plot.py (4)
28-43: Clean dataclass definitions with appropriate immutability.Using
frozen=Trueandslots=Trueprovides immutability guarantees and memory efficiency for these configuration objects.
185-230: Gnuplot invocation is properly secured.The executable is resolved via
shutil.which, and all file paths are quoted through_gp_quote. The# noqa: S603suppression is justified given the controlled inputs.
156-182: Robust README update logic with proper validation.The function correctly validates marker uniqueness and order, and is idempotent (returns
Falsewhen content unchanged). The detailed error messages aid debugging.
376-424: Well-orchestrated main function with clear error handling.The function provides helpful error messages (e.g., suggesting
cargo bench --bench vs_nalgebra) and uses appropriate exit codes. The flow handles missing data gracefully.
Corrects how zero values for nalgebra time are displayed in the benchmark table, preventing potential crashes due to division by zero and displaying "n/a" instead. Also pins uv version for reproducible CI and relaxes whitespace check in codacy config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
scripts/criterion_dim_plot.py (1)
138-153: Already addressed: Division by zero handling.The concern raised in the previous review about potential
ZeroDivisionErrorhas been resolved. The current implementation (lines 145-149) checks ifna == 0.0before performing division and displays "n/a" as a safe fallback.
🧹 Nitpick comments (1)
scripts/criterion_dim_plot.py (1)
190-235: Consider handling subprocess failures more gracefully.While the gnuplot executable availability is checked (lines 196-199), if gnuplot exists but fails to execute (e.g., due to script errors),
subprocess.run()withcheck=Truewill raiseCalledProcessError. The caller at lines 364-369 only catchesFileNotFoundError, so subprocess failures would propagate uncaught.Consider catching
CalledProcessErrorin_maybe_render_plot():try: _render_svg_with_gnuplot(req) - except FileNotFoundError as e: + except (FileNotFoundError, subprocess.CalledProcessError) as e: print(str(e), file=sys.stderr) print(f"Wrote CSV instead: {req.csv_path}", file=sys.stderr) return 1Add the subprocess import at the top if needed for the exception type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/ci.yml(1 hunks).github/workflows/codacy.yml(1 hunks).gitignore(1 hunks)scripts/criterion_dim_plot.py(1 hunks)scripts/tests/test_criterion_dim_plot.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/codacy.yml
- .github/workflows/ci.yml
- scripts/tests/test_criterion_dim_plot.py
🧰 Additional context used
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
118-118: Avoid specifying long messages outside the exception class
(TRY003)
168-168: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build (macos-latest)
- GitHub Check: Codacy Security Scan
🔇 Additional comments (10)
.gitignore (1)
7-16: LGTM!The Python/uv ignore patterns are comprehensive and follow standard conventions for Python projects.
scripts/criterion_dim_plot.py (9)
1-26: LGTM!The module structure is clean with a clear docstring and appropriate imports for benchmark aggregation and plotting functionality.
28-43: LGTM!The dataclasses use
frozen=Trueandslots=Trueappropriately for immutable data holders with reduced memory overhead.
45-87: LGTM!The METRICS mapping comprehensively covers all benchmark operations with clear naming and helpful comments about cross-crate naming differences.
90-110: LGTM!The dimension discovery logic correctly parses directory names with regex and handles missing or invalid directories gracefully.
113-135: LGTM!The estimate reading function handles missing confidence intervals gracefully, and CSV writing includes proper directory creation with clear column headers.
156-187: LGTM!The README update logic includes robust marker validation (exactly one of each, correct order) and is idempotent, returning
Falsewhen no changes are needed.
238-295: LGTM!The CLI interface is comprehensive with constrained choices, sensible defaults, and clear help messages for all options.
298-335: LGTM!The path resolution logic correctly handles both absolute and relative paths, and data collection gracefully tracks skipped dimensions when estimate files are missing.
338-429: LGTM!The main orchestration follows a clear flow with appropriate early exits, informative error messages, and separated concerns for optional operations (README updates, plotting). The error handling returns meaningful exit codes for different failure scenarios.
Renames and updates the benchmarking suite to compare against multiple linear algebra crates, including faer, in addition to nalgebra, providing a broader performance context. Adds faer implementations and benchmarks. Updates data and plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/criterion_dim_plot.py (1)
313-313: Type alias defined after referencing functions, but valid.The
Rowtype alias is defined at line 313, after functions like_write_csv(line 139) and_markdown_table(line 155) that reference it in their signatures. This works correctly becausefrom __future__ import annotationsenables PEP 563 postponed evaluation of annotations.For improved readability, consider moving this type alias near the top of the file (after the dataclass definitions around line 44), which is a more conventional placement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
Cargo.lockis excluded by!**/*.lockdocs/assets/bench/vs_linalg_lu_solve_median.csvis excluded by!**/*.csvdocs/assets/bench/vs_linalg_lu_solve_median.svgis excluded by!**/*.svg
📒 Files selected for processing (9)
Cargo.toml(2 hunks)README.md(2 hunks)WARP.md(1 hunks)benches/vs_linalg.rs(1 hunks)cspell.json(1 hunks)justfile(4 hunks)scripts/README.md(1 hunks)scripts/criterion_dim_plot.py(1 hunks)scripts/tests/test_criterion_dim_plot.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- WARP.md
- scripts/tests/test_criterion_dim_plot.py
🚧 Files skipped from review as they are similar to previous changes (1)
- Cargo.toml
🧰 Additional context used
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
127-127: Avoid specifying long messages outside the exception class
(TRY003)
181-181: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Code Coverage
- GitHub Check: Codacy Security Scan
- GitHub Check: build (windows-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: build (ubuntu-latest)
🔇 Additional comments (17)
cspell.json (1)
1-78: LGTM!The spell check configuration appropriately adds domain-specific vocabulary for the new benchmarking and Python tooling (gnuplot, pytest, faer, etc.), and the SVG exclusion pattern correctly ignores generated plot assets.
benches/vs_linalg.rs (6)
16-44: Well-implemented permutation sign calculation.The cycle-counting algorithm correctly determines permutation parity. The formula
sign = (-1)^(n - cycles)is mathematically sound for computing the determinant sign of a permutation matrix.
46-54: LGTM!Correct determinant computation from PA = LU factorization. The product of U's diagonal elements times the permutation sign gives the correct result.
56-83: LGTM!The matrix generation ensures strict diagonal dominance for numerical stability during LU factorization, making it appropriate for benchmarking comparisons.
126-377: LGTM!The benchmark macro is well-structured with consistent methodology across all three libraries. Precomputing LU for solve-only benchmarks ensures fair timing comparisons.
379-393: LGTM!Standard Criterion harness setup with good dimension coverage spanning small (2-5) to larger (8-64) matrices.
39-43: Theis_multiple_ofmethod is stable as of Rust 1.87.0 (released May 2025) and requires no changes. The code is correct and appropriate to use.Likely an incorrect or invalid review comment.
justfile (4)
9-25: LGTM!The Python tooling integration is well-structured with proper dependency chaining (
python-syncensuresuvis available, andpython-lint/test-pythondepend onpython-sync).
49-69: LGTM!The benchmark targets correctly handle optional filtering with proper shell quoting and conditionals.
72-89: LGTM!The plotting targets are well-designed with sensible defaults and proper parameter handling for the log-scale option.
176-176: LGTM!Good integration of Python linting and testing into the existing workflow structure.
Also applies to: 222-223
README.md (2)
35-40: LGTM!The "Anti-goals" section clearly communicates the library's scope and helpfully directs users to appropriate alternatives for different use cases.
116-135: LGTM!Excellent transparency in the benchmark presentation—honestly showing where
la-stackoutperforms alternatives (small dimensions) and where it doesn't (larger dimensions). The BENCH_TABLE markers enable automated updates via the plotting script.scripts/README.md (1)
1-120: LGTM!Comprehensive and well-organized documentation covering setup, usage workflows, and dependencies. The examples progress nicely from basic to advanced use cases.
scripts/criterion_dim_plot.py (3)
147-152: Division by zero issue properly addressed.The extracted
_pct_reductionfunction correctly guards against division by zero by returning"n/a"whenbaseline == 0.0. This addresses the previously flagged concern about potentialZeroDivisionErrorin the markdown table generation.
208-250: Good security practices in subprocess usage.The gnuplot rendering function follows good practices:
- Resolves the executable via
shutil.whichrather than passing a user-controlled path- Quotes all file paths in the generated script using
_gp_quote- Uses
subprocess.runwithcheck=Truefor proper error propagation- The
noqa: S603comment is appropriate since this is a controlled script execution
398-446: Well-structured main function with appropriate error handling.The
mainfunction has a clear control flow:
- Validates inputs and reports actionable error messages
- Uses distinct exit codes (0 for success, 1 for partial success with gnuplot failure, 2 for data/validation errors)
- Handles the
--no-plotand--update-readmeflags cleanly
Fixes an issue where backslashes were not properly escaped when generating strings for gnuplot, leading to incorrect rendering. The gnuplot quoting function now escapes both backslashes and single quotes to ensure correct string interpretation by gnuplot. Refs: docs/benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
scripts/criterion_dim_plot.py (2)
46-46: Consider using a dataclass for Row instead of a type alias.The 10-tuple for
Rowis functional but less self-documenting than a dataclass. A dataclass would make field names explicit (e.g.,dim,la_time,la_lo, etc.) and improve code readability.Example refactor:
-Row = tuple[int, float, float, float, float, float, float, float, float, float] +@dataclass(frozen=True, slots=True) +class Row: + dim: int + la_time: float + la_lo: float + la_hi: float + na_time: float + na_lo: float + na_hi: float + fa_time: float + fa_lo: float + fa_hi: floatThen update unpacking throughout (e.g., line 146:
for row in rows:instead of tuple unpacking).
177-190: Consider using custom exception classes for better error handling.The static analysis tool suggests avoiding long error messages directly in raise statements (TRY003). While the current messages are informative and work well, you could optionally define custom exceptions if this becomes a pattern.
Example refactor:
+class ReadmeMarkerError(ValueError): + """Raised when README markers are missing, duplicated, or misordered.""" + pass + def _update_readme_table(readme_path: Path, marker_begin: str, marker_end: str, table_md: str) -> bool: lines = readme_path.read_text(encoding="utf-8").splitlines(keepends=True) begin_indices = [i for i, line in enumerate(lines) if line.strip() == marker_begin] end_indices = [i for i, line in enumerate(lines) if line.strip() == marker_end] if len(begin_indices) != 1 or len(end_indices) != 1: - raise ValueError(f"README markers not found or not unique. Expected exactly one of each:\n {marker_begin}\n {marker_end}\n") + raise ReadmeMarkerError(f"README markers not found or not unique. Expected exactly one of each:\n {marker_begin}\n {marker_end}\n")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/criterion_dim_plot.py(1 hunks)scripts/tests/test_criterion_dim_plot.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
scripts/tests/test_criterion_dim_plot.py (1)
scripts/criterion_dim_plot.py (7)
_readme_table_markers(172-174)_markdown_table(158-169)_gp_quote(206-208)PlotRequest(37-43)_maybe_render_plot(376-395)_update_readme_table(177-203)main(398-446)
scripts/criterion_dim_plot.py (3)
src/matrix.rs (2)
get(80-86)default(185-187)src/vector.rs (1)
default(110-112)benches/vs_linalg.rs (1)
main(379-393)
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
130-130: Avoid specifying long messages outside the exception class
(TRY003)
184-184: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Code Coverage
- GitHub Check: Codacy Security Scan
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (19)
scripts/criterion_dim_plot.py (13)
49-99: LGTM!The METRICS dictionary is comprehensive and well-organized, covering all major linear algebra operations. The note about different naming conventions between crates (line 86) is helpful context.
102-122: LGTM!The discovery functions correctly identify the repository root and extract dimensions from Criterion's directory naming convention (
d{N}). Returning sorted dimensions ensures consistent output ordering.
125-139: LGTM!The function robustly handles Criterion's JSON format with appropriate fallbacks for missing confidence intervals. The error message clearly identifies which file and stat caused the issue.
142-147: LGTM!Straightforward CSV writing with appropriate parent directory creation and clear column headers.
150-155: LGTM!The function correctly handles the zero baseline case (returning "n/a") and calculates percentage reductions accurately. The signed format string makes it clear whether la-stack is faster or slower than the baseline.
158-169: LGTM!The markdown table generation is well-formatted with clear column headers, thousands separators, and dual comparisons against both nalgebra and faer.
172-203: LGTM!The README update logic is robust with comprehensive error handling for edge cases (missing markers, duplicates, wrong order). The idempotent return value is a nice touch.
206-208: LGTM!The quote escaping now correctly handles both backslashes and single quotes in the proper order, resolving the past review comment. This ensures gnuplot strings are properly escaped even with edge cases like paths containing
\'.
211-253: LGTM!The gnuplot integration is well-implemented with proper path quoting, clear error messages, and appropriate security considerations. The script generation is readable and the yerrorlines plotting correctly displays confidence intervals.
256-313: LGTM!The CLI argument parser is comprehensive and user-friendly with sensible defaults, clear help text, and proper validation through choices. The dynamic metric choices from the METRICS dictionary is a good design.
316-352: LGTM!The path resolution and data collection functions are well-structured. Tracking skipped dimensions and providing clear feedback about missing benchmark files enhances usability.
355-395: LGTM!The conditional execution functions provide clear separation of concerns with appropriate error handling and informative user feedback. The exit code convention (0/1/2) is consistent with Unix conventions.
398-450: LGTM!The main orchestration function has clear control flow with informative error messages that guide users toward resolution. The early validation prevents wasted processing when data is missing.
scripts/tests/test_criterion_dim_plot.py (6)
1-17: LGTM!The test setup is clean and the first test appropriately validates that marker generation is deterministic, which is important for README updates.
20-42: LGTM!The markdown table tests provide good coverage of both typical formatting (thousands separators, percentages) and edge cases (zero baseline). The assertion messages are clear and specific.
45-49: LGTM!The quote escaping test comprehensively validates all edge cases, including the previously problematic backslash-quote combination. This confirms the fix is working correctly.
52-73: LGTM!The test appropriately simulates gnuplot failures and validates error handling. The use of
type()to create a minimal Args object is acceptable for testing purposes.
76-153: LGTM!The README update tests are comprehensive, covering both the happy path (including idempotency checks) and all error scenarios (missing, misordered, and duplicate markers). The idempotency test at line 109 is particularly valuable.
156-216: LGTM!The end-to-end test provides excellent coverage of the full workflow, from directory structure creation through CSV and README validation. The test data is realistic and the assertions verify both structural correctness and computed values.
Refactors the criterion plotting script to improve readability and maintainability. It replaces the `Row` tuple with a `Row` dataclass for structured data handling and introduces `ReadmeMarkerError` for better error context when handling README markers. The script now uses named attributes, enhancing code clarity. (internal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
scripts/criterion_dim_plot.py (1)
187-268: LGTM! README updates and gnuplot rendering are solid.The quote escaping in
_gp_quote(line 223) correctly handles both backslashes and quotes per past review feedback. The subprocess call (line 268) is safe since gnuplot is resolved via PATH and all inputs are properly quoted.Optional: Address static analysis hints about exception messages.
Lines 199 and 205 raise ReadmeMarkerError with long messages. While functional, Ruff suggests defining specific exception subclasses for clearer error types:
class MarkerNotFoundError(ReadmeMarkerError): """Raised when README markers are missing or not unique.""" class MarkerOrderError(ReadmeMarkerError): """Raised when README markers are out of order."""Then raise these specific exceptions with shorter messages. This is a style improvement, not a functional issue.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/criterion_dim_plot.py(1 hunks)scripts/tests/test_criterion_dim_plot.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/tests/test_criterion_dim_plot.py
🧰 Additional context used
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
145-145: Avoid specifying long messages outside the exception class
(TRY003)
199-199: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Security Scan
- GitHub Check: Code Coverage
- GitHub Check: build (windows-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (5)
scripts/criterion_dim_plot.py (5)
1-26: LGTM! Clear module documentation and appropriate imports.The docstring effectively describes the script's purpose and I/O paths, and all imports are standard library modules suitable for the task.
28-62: LGTM! Well-designed immutable data models.The use of
frozen=Trueandslots=Trueensures immutability and memory efficiency. The Row dataclass clearly captures all benchmark data (time + CI bounds) for three libraries.
64-114: LGTM! Comprehensive benchmark metric definitions.The METRICS dictionary covers a good range of linear algebra operations and correctly handles naming differences between libraries (e.g.,
nalgebra_norm_squaredvs*_norm2_sq).
117-184: LGTM! Robust data processing helpers.The division-by-zero safeguard in
_pct_reduction(lines 167-168) properly handles edge cases. The CSV writing and markdown table generation are clear and correct.
271-478: LGTM! Well-orchestrated main flow with comprehensive error handling.The argument parsing is thorough, path resolution handles both absolute and relative paths correctly, and the main function validates data availability at multiple stages. Exit codes are appropriate (0=success, 1=plot failure, 2=data errors).
The handling of
argvin line 427 correctly supports both CLI and programmatic invocation.
Refactors README table update error handling for better clarity and specificity. Introduces dedicated exception classes for missing/non-unique markers and out-of-order markers, providing more descriptive error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
scripts/criterion_dim_plot.py (1)
125-163: Logic is correct. Consider extracting the error message.The repository discovery, dimension parsing, and estimate reading functions are well-implemented and handle edge cases appropriately (e.g., missing confidence intervals return safe defaults).
The static analysis tool flags line 153 for specifying a long message directly in the
raisestatement. While not critical, extracting complex messages to constants or custom exception constructors can improve maintainability.Optional refactor:
+_STAT_NOT_FOUND_MSG = "stat '{stat}' not found in {path}" + def _read_estimate(estimates_json: Path, stat: str) -> tuple[float, float, float]: data = json.loads(estimates_json.read_text(encoding="utf-8")) stat_obj = data.get(stat) if not isinstance(stat_obj, dict): - raise KeyError(f"stat '{stat}' not found in {estimates_json}") + raise KeyError(_STAT_NOT_FOUND_MSG.format(stat=stat, path=estimates_json))As per static analysis hints, this aligns with TRY003 guidance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/criterion_dim_plot.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
scripts/criterion_dim_plot.py
153-153: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (windows-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: Codacy Security Scan
- GitHub Check: Code Coverage
🔇 Additional comments (11)
scripts/criterion_dim_plot.py (11)
1-26: LGTM! Clean module structure and standard library imports.The docstring clearly documents the script's purpose and input/output paths, and all imports are from the Python standard library.
28-70: LGTM! Well-designed data structures.The use of
frozen=Trueandslots=Trueon all dataclasses ensures immutability and memory efficiency. The exception hierarchy provides clear, specific error types for README marker issues.
72-122: LGTM! Comprehensive metric definitions.The
METRICSconstant provides a clear mapping between metric keys and benchmark names across all three crates. The inline comment at line 109 helpfully documents naming inconsistencies.
165-192: LGTM! Past division-by-zero fix confirmed.The CSV writing and markdown table generation are well-implemented. The
_pct_reductionfunction now correctly handles zero baselines by returning"n/a", addressing the issue flagged in the previous review.
195-227: LGTM! Robust README marker handling.The marker validation is thorough:
- Ensures exactly one begin and one end marker
- Validates correct marker ordering
- Avoids unnecessary writes when content is unchanged (line 223)
- Uses specific exception types for clear error reporting
230-277: LGTM! Past quote-escaping fix confirmed and subprocess use is safe.The
_gp_quotefunction now correctly escapes backslashes before single quotes (line 232), addressing the issue from the previous review.The subprocess call at line 277 is appropriately justified: gnuplot is resolved via PATH, the input script is fully generated (no user injection), and all file paths are properly quoted.
280-337: LGTM! Comprehensive CLI with sensible defaults.The argument parser provides clear help strings and reasonable defaults for all options. The
choicesconstraint on--metricprevents invalid metric selection.
340-354: LGTM! Clear path resolution logic.The functions correctly handle both absolute and relative paths, with sensible default output locations under
docs/assets/bench/.
357-389: LGTM! Robust data collection with graceful degradation.The function gracefully handles missing benchmark results by skipping dimensions rather than failing, and tracks what was skipped for user feedback.
392-432: LGTM! Clean separation of concerns with appropriate error handling.Both functions follow a clear pattern:
- Check whether to perform the action
- Execute with proper exception handling
- Return meaningful exit codes (0, 1, or 2)
- Provide user feedback via stdout/stderr
435-487: LGTM! Well-structured main function with excellent error messages.The main function orchestrates the workflow cleanly:
- Parse arguments and discover dimensions
- Validate that benchmark data exists
- Collect rows and write CSV
- Optionally update README
- Optionally render plot
Error messages provide actionable guidance (e.g., line 445 suggests running
cargo bench), and the use ofraise SystemExit(main())is the proper pattern for entry points.
Adds a comprehensive benchmark suite comparing
la-stackagainstnalgebrafor various linear algebra operations across different dimensions. Includes tooling to automatically generate plots and update the README with benchmark results. This facilitates ongoing performance monitoring and optimization.Summary by CodeRabbit
New Features
Documentation
Benchmarks
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.