|
| 1 | +# AGENTS.md — Pathrex |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +**Pathrex** is a Rust library and CLI tool for benchmarking queries on edge-labeled graphs |
| 6 | +constrained by regular languages and context-free languages. |
| 7 | +It uses **SuiteSparse:GraphBLAS** (via **LAGraph**) for sparse Boolean matrix operations and |
| 8 | +decomposes a graph by edge label into one Boolean adjacency matrix per label. |
| 9 | + |
| 10 | +## Repository Layout |
| 11 | + |
| 12 | +``` |
| 13 | +pathrex/ |
| 14 | +├── Cargo.toml # Crate manifest (edition 2024) |
| 15 | +├── build.rs # Links LAGraph + LAGraphX; optionally regenerates FFI bindings |
| 16 | +├── src/ |
| 17 | +│ ├── lib.rs # Public modules: graph, formats, lagraph_sys; utils is pub(crate) |
| 18 | +│ ├── main.rs # Binary entry point (placeholder) |
| 19 | +│ ├── lagraph_sys.rs # FFI module — includes generated bindings |
| 20 | +│ ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI) |
| 21 | +│ ├── utils.rs # Internal helpers: CountingBuilder, CountOutput, VecSource, |
| 22 | +│ │ # grb_ok! and la_ok! macros |
| 23 | +│ ├── graph/ |
| 24 | +│ │ ├── mod.rs # Core traits (GraphBuilder, GraphDecomposition, GraphSource, |
| 25 | +│ │ │ # Backend, Graph<B>), error types, RAII wrappers, GrB init |
| 26 | +│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph |
| 27 | +│ └── formats/ |
| 28 | +│ ├── mod.rs # FormatError enum, re-exports |
| 29 | +│ └── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec) |
| 30 | +├── tests/ |
| 31 | +│ └── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph |
| 32 | +├── deps/ |
| 33 | +│ └── LAGraph/ # Git submodule (SparseLinearAlgebra/LAGraph) |
| 34 | +└── .github/workflows/ci.yml # CI: build GraphBLAS + LAGraph, cargo build & test |
| 35 | +``` |
| 36 | + |
| 37 | +## Build & Dependencies |
| 38 | + |
| 39 | +### System prerequisites |
| 40 | + |
| 41 | +| Dependency | Purpose | |
| 42 | +|---|---| |
| 43 | +| **SuiteSparse:GraphBLAS** | Sparse matrix engine (`libgraphblas`) | |
| 44 | +| **LAGraph** | Graph algorithm library on top of GraphBLAS (`liblagraph`) | |
| 45 | +| **cmake** | Building LAGraph from source | |
| 46 | +| **libclang-dev / clang** | Required by `bindgen` when `regenerate-bindings` feature is active | |
| 47 | + |
| 48 | +### Building |
| 49 | + |
| 50 | +```bash |
| 51 | +# Ensure submodules are present |
| 52 | +git submodule update --init --recursive |
| 53 | + |
| 54 | +# Build and install SuiteSparse:GraphBLAS system-wide |
| 55 | +git clone --depth 1 https://github.com/DrTimothyAldenDavis/GraphBLAS.git |
| 56 | +cd GraphBLAS && make compact && sudo make install && cd .. |
| 57 | + |
| 58 | +# Build LAGraph inside the submodule (no system-wide install required) |
| 59 | +cd deps/LAGraph && make && cd ../.. |
| 60 | + |
| 61 | +# Build pathrex |
| 62 | +cargo build |
| 63 | + |
| 64 | +# Run tests |
| 65 | +LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test |
| 66 | +``` |
| 67 | + |
| 68 | +### How `build.rs` handles linking |
| 69 | + |
| 70 | +[`build.rs`](build.rs) performs two jobs: |
| 71 | + |
| 72 | +1. **Native linking.** It emits six Cargo directives: |
| 73 | + - `cargo:rustc-link-lib=dylib=graphblas` — dynamically links `libgraphblas`. |
| 74 | + - `cargo:rustc-link-search=native=/usr/local/lib` — adds the system GraphBLAS |
| 75 | + install path to the native library search path. |
| 76 | + - `cargo:rustc-link-lib=dylib=lagraph` — dynamically links `liblagraph`. |
| 77 | + - `cargo:rustc-link-search=native=deps/LAGraph/build/src` — adds the |
| 78 | + submodule's core build output to the native library search path. |
| 79 | + - `cargo:rustc-link-lib=dylib=lagraphx` — dynamically links `liblagraphx` |
| 80 | + (experimental algorithms). |
| 81 | + - `cargo:rustc-link-search=native=deps/LAGraph/build/experimental` — |
| 82 | + adds the experimental build output to the native library search path. |
| 83 | + |
| 84 | + LAGraph does **not** need to be installed system-wide; building the submodule |
| 85 | + in `deps/LAGraph/` is sufficient for compilation and linking. |
| 86 | + SuiteSparse:GraphBLAS **must** be installed system-wide (`sudo make install`). |
| 87 | + |
| 88 | + At **runtime** the OS dynamic linker (`ld.so`) does not use Cargo's link |
| 89 | + search paths — it only consults `LD_LIBRARY_PATH`, `rpath`, and the system |
| 90 | + library cache. Set `LD_LIBRARY_PATH=/usr/local/lib` after a system-wide |
| 91 | + LAGraph install, or include the submodule build paths if not installing |
| 92 | + system-wide. |
| 93 | + |
| 94 | +2. **Optional FFI binding regeneration** (feature `regenerate-bindings`). |
| 95 | + When the feature is active, [`regenerate_bindings()`](build.rs:20) runs |
| 96 | + `bindgen` against `deps/LAGraph/include/LAGraph.h` and |
| 97 | + `deps/LAGraph/include/LAGraphX.h` (always from the submodule — no system |
| 98 | + path search), plus `GraphBLAS.h` (searched in |
| 99 | + `/usr/local/include/suitesparse` and `/usr/include/suitesparse`). The |
| 100 | + generated Rust file is written to |
| 101 | + [`src/lagraph_sys_generated.rs`](src/lagraph_sys_generated.rs). Only a |
| 102 | + curated allowlist of GraphBLAS/LAGraph types and functions is exposed |
| 103 | + (see the `allowlist_*` calls in [`build.rs`](build.rs:59)). |
| 104 | + |
| 105 | +### Feature flags |
| 106 | + |
| 107 | +| Feature | Effect | |
| 108 | +|---|---| |
| 109 | +| `regenerate-bindings` | Runs `bindgen` at build time to regenerate `src/lagraph_sys_generated.rs` from `LAGraph.h`, `LAGraphX.h` (both from `deps/LAGraph/include`) and `GraphBLAS.h`. Without this feature the checked-in bindings are used as-is. | |
| 110 | + |
| 111 | +### Pre-generated FFI bindings |
| 112 | + |
| 113 | +The file `src/lagraph_sys_generated.rs` is checked into version control. CI |
| 114 | +regenerates it with `--features regenerate-bindings`. **Do not hand-edit this file.** |
| 115 | + |
| 116 | +## Architecture & Key Abstractions |
| 117 | + |
| 118 | +### Edge |
| 119 | + |
| 120 | +[`Edge`](src/graph/mod.rs:154) is the universal currency between format parsers and graph |
| 121 | +builders: `{ source: String, target: String, label: String }`. |
| 122 | + |
| 123 | +### GraphSource trait |
| 124 | + |
| 125 | +[`GraphSource<B>`](src/graph/mod.rs:164) is implemented by any data source that knows how to |
| 126 | +feed itself into a specific [`GraphBuilder`]: |
| 127 | + |
| 128 | +- [`apply_to(self, builder: B) -> Result<B, B::Error>`](src/graph/mod.rs:165) — consumes the |
| 129 | + source and returns the populated builder. |
| 130 | + |
| 131 | +[`Csv<R>`](src/formats/csv.rs:52) implements `GraphSource<InMemoryBuilder>` directly, so it |
| 132 | +can be passed to [`GraphBuilder::load`]. |
| 133 | + |
| 134 | +### GraphBuilder trait |
| 135 | + |
| 136 | +[`GraphBuilder`](src/graph/mod.rs:169) accumulates edges and produces a |
| 137 | +[`GraphDecomposition`](src/graph/mod.rs:188): |
| 138 | + |
| 139 | +- [`load<S: GraphSource<Self>>(self, source: S)`](src/graph/mod.rs:179) — primary entry point; |
| 140 | + delegates to `GraphSource::apply_to`. |
| 141 | +- [`build(self)`](src/graph/mod.rs:184) — finalise into an immutable graph. |
| 142 | + |
| 143 | +`InMemoryBuilder` also exposes lower-level helpers outside the trait: |
| 144 | + |
| 145 | +- [`push_edge(&mut self, edge: Edge)`](src/graph/inmemory.rs:62) — ingest one edge. |
| 146 | +- [`with_stream<I, E>(self, stream: I)`](src/graph/inmemory.rs:72) — consume an |
| 147 | + `IntoIterator<Item = Result<Edge, E>>`. |
| 148 | +- [`push_grb_matrix(&mut self, label, matrix: GrB_Matrix)`](src/graph/inmemory.rs:85) — accept |
| 149 | + a pre-built `GrB_Matrix` for a label, wrapping it in an `LAGraph_Graph` immediately. |
| 150 | + |
| 151 | +### Backend trait & Graph\<B\> handle |
| 152 | + |
| 153 | +[`Backend`](src/graph/mod.rs:217) associates a marker type with a concrete builder/graph pair: |
| 154 | + |
| 155 | +```rust |
| 156 | +pub trait Backend { |
| 157 | + type Graph: GraphDecomposition; |
| 158 | + type Builder: GraphBuilder<Graph = Self::Graph>; |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +[`Graph<B>`](src/graph/mod.rs:229) is a zero-sized handle parameterised by a `Backend`: |
| 163 | + |
| 164 | +- [`Graph::<InMemory>::builder()`](src/graph/mod.rs:234) — returns a fresh `InMemoryBuilder`. |
| 165 | +- [`Graph::<InMemory>::try_from(source)`](src/graph/mod.rs:238) — builds a graph from a single |
| 166 | + source in one call. |
| 167 | + |
| 168 | +[`InMemory`](src/graph/inmemory.rs:26) is the concrete backend marker type. |
| 169 | + |
| 170 | +### GraphDecomposition trait |
| 171 | + |
| 172 | +[`GraphDecomposition`](src/graph/mod.rs:188) is the read-only query interface: |
| 173 | + |
| 174 | +- [`get_graph(label)`](src/graph/mod.rs:192) — returns `Arc<LagraphGraph>` for a given edge label. |
| 175 | +- [`get_node_id(string_id)`](src/graph/mod.rs:195) / [`get_node_name(mapped_id)`](src/graph/mod.rs:198) — bidirectional string ↔ integer dictionary. |
| 176 | +- [`num_nodes()`](src/graph/mod.rs:199) — total unique nodes. |
| 177 | + |
| 178 | +### InMemoryBuilder / InMemoryGraph |
| 179 | + |
| 180 | +[`InMemoryBuilder`](src/graph/inmemory.rs:35) is the primary `GraphBuilder` implementation. |
| 181 | +It collects edges in RAM, then [`build()`](src/graph/inmemory.rs:110) calls |
| 182 | +GraphBLAS to create one `GrB_Matrix` per label via COO format, wraps each in an |
| 183 | +`LAGraph_Graph`, and returns an [`InMemoryGraph`](src/graph/inmemory.rs:153). |
| 184 | + |
| 185 | +Multiple CSV sources can be chained with repeated `.load()` calls; all edges are merged |
| 186 | +into a single graph. |
| 187 | + |
| 188 | +### Format parsers |
| 189 | + |
| 190 | +[`Csv<R>`](src/formats/csv.rs:52) is the only built-in parser. It yields |
| 191 | +`Iterator<Item = Result<Edge, FormatError>>` and is directly pluggable into |
| 192 | +`GraphBuilder::load()` via its `GraphSource<InMemoryBuilder>` impl. |
| 193 | + |
| 194 | +Configuration is via [`CsvConfig`](src/formats/csv.rs:17): |
| 195 | + |
| 196 | +| Field | Default | Description | |
| 197 | +|---|---|---| |
| 198 | +| `source_column` | `Index(0)` | Column for the source node (by index or name) | |
| 199 | +| `target_column` | `Index(1)` | Column for the target node | |
| 200 | +| `label_column` | `Index(2)` | Column for the edge label | |
| 201 | +| `has_header` | `true` | Whether the first row is a header | |
| 202 | +| `delimiter` | `b','` | Field delimiter byte | |
| 203 | + |
| 204 | +[`ColumnSpec`](src/formats/csv.rs:11) is either `Index(usize)` or `Name(String)`. |
| 205 | +Name-based lookup requires `has_header: true`. |
| 206 | + |
| 207 | +### FFI layer |
| 208 | + |
| 209 | +[`lagraph_sys`](src/lagraph_sys.rs) exposes raw C bindings for GraphBLAS and |
| 210 | +LAGraph. Safe Rust wrappers live in [`graph::mod`](src/graph/mod.rs): |
| 211 | + |
| 212 | +- [`LagraphGraph`](src/graph/mod.rs:48) — RAII wrapper around `LAGraph_Graph` (calls |
| 213 | + `LAGraph_Delete` on drop). Also provides |
| 214 | + [`LagraphGraph::from_coo()`](src/graph/mod.rs:85) to build directly from COO arrays. |
| 215 | +- [`GraphblasVector`](src/graph/mod.rs:124) — RAII wrapper around `GrB_Vector`. |
| 216 | +- [`ensure_grb_init()`](src/graph/mod.rs:39) — one-time `LAGraph_Init` via `std::sync::Once`. |
| 217 | + |
| 218 | +### Macros (`src/utils.rs`) |
| 219 | + |
| 220 | +Two `#[macro_export]` macros handle FFI error mapping: |
| 221 | + |
| 222 | +- [`grb_ok!(expr)`](src/utils.rs:138) — evaluates a GraphBLAS call inside `unsafe`, maps the |
| 223 | + `i32` return to `Result<(), GraphError::GraphBlas(info)>`. |
| 224 | +- [`la_ok!(fn::path(args…))`](src/utils.rs:167) — evaluates a LAGraph call, automatically |
| 225 | + appending the required `*mut i8` message buffer, and maps failure to |
| 226 | + `GraphError::LAGraph(info, msg)`. |
| 227 | + |
| 228 | +## Coding Conventions |
| 229 | + |
| 230 | +- **Rust edition 2024**. |
| 231 | +- Error handling via `thiserror` derive macros; two main error enums: |
| 232 | + [`GraphError`](src/graph/mod.rs:15) and [`FormatError`](src/formats/mod.rs:24). |
| 233 | +- `FormatError` converts into `GraphError` via `#[from] FormatError` on the |
| 234 | + `GraphError::Format` variant. |
| 235 | +- Unsafe FFI calls are confined to `lagraph_sys`, `graph/mod.rs`, and |
| 236 | + `graph/inmemory.rs`. All raw pointers are wrapped in RAII types that free |
| 237 | + resources on drop. |
| 238 | +- `unsafe impl Send + Sync` is provided for `LagraphGraph` and |
| 239 | + `GraphblasVector` because GraphBLAS handles are thread-safe after init. |
| 240 | +- Unit tests live in `#[cfg(test)] mod tests` blocks inside each module. |
| 241 | + Integration tests that need GraphBLAS live in [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs). |
| 242 | + |
| 243 | +## Testing |
| 244 | + |
| 245 | +```bash |
| 246 | +# Run all tests (LAGraph installed system-wide) |
| 247 | +LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose |
| 248 | + |
| 249 | +# If LAGraph is NOT installed system-wide (only built in the submodule): |
| 250 | +LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test --verbose |
| 251 | +``` |
| 252 | + |
| 253 | +Tests in `src/graph/mod.rs` use `CountingBuilder` / `CountOutput` / `VecSource` from |
| 254 | +[`src/utils.rs`](src/utils.rs) — these do **not** call into GraphBLAS and run without |
| 255 | +native libraries. |
| 256 | + |
| 257 | +Tests in `src/formats/csv.rs` are pure Rust and need no native dependencies. |
| 258 | + |
| 259 | +Tests in `src/graph/inmemory.rs` and [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs) |
| 260 | +call real GraphBLAS/LAGraph and require the native libraries to be present. |
| 261 | + |
| 262 | +## CI |
| 263 | + |
| 264 | +The GitHub Actions workflow ([`.github/workflows/ci.yml`](.github/workflows/ci.yml)) |
| 265 | +runs on every push and PR across `stable`, `beta`, and `nightly` toolchains: |
| 266 | + |
| 267 | +1. Checks out with `submodules: recursive`. |
| 268 | +2. Installs cmake, libclang-dev, clang. |
| 269 | +3. Builds and installs SuiteSparse:GraphBLAS from source (`sudo make install`). |
| 270 | +4. Builds and installs LAGraph from the submodule (`sudo make install`). |
| 271 | +5. `cargo build --features regenerate-bindings` — rebuilds FFI bindings. |
| 272 | +6. `LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose` — runs the full test suite. |
0 commit comments