Skip to content

Commit eb29a4e

Browse files
committed
ref: prettify graph abstraction
1 parent 62d1ee7 commit eb29a4e

8 files changed

Lines changed: 523 additions & 113 deletions

File tree

AGENTS.md

Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
# AGENTS.md — Pathrex
2+
3+
## Project Overview
4+
5+
**Pathrex** is a Rust library and CLI tool for benchmarking queries on edge-labeled graphs
6+
constrained by regular languages and context-free languages.
7+
It uses **SuiteSparse:GraphBLAS** (via **LAGraph**) for sparse Boolean matrix operations and
8+
decomposes a graph by edge label into one Boolean adjacency matrix per label.
9+
10+
## Repository Layout
11+
12+
```
13+
pathrex/
14+
├── Cargo.toml # Crate manifest (edition 2024)
15+
├── build.rs # Links LAGraph + LAGraphX; optionally regenerates FFI bindings
16+
├── src/
17+
│ ├── lib.rs # Public modules: graph, formats, lagraph_sys; utils is pub(crate)
18+
│ ├── main.rs # Binary entry point (placeholder)
19+
│ ├── lagraph_sys.rs # FFI module — includes generated bindings
20+
│ ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI)
21+
│ ├── utils.rs # Internal helpers: CountingBuilder, CountOutput, VecSource,
22+
│ │ # grb_ok! and la_ok! macros
23+
│ ├── graph/
24+
│ │ ├── mod.rs # Core traits (GraphBuilder, GraphDecomposition, GraphSource,
25+
│ │ │ # Backend, Graph<B>), error types, RAII wrappers, GrB init
26+
│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph
27+
│ └── formats/
28+
│ ├── mod.rs # FormatError enum, re-exports
29+
│ └── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
30+
├── tests/
31+
│ └── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
32+
├── deps/
33+
│ └── LAGraph/ # Git submodule (SparseLinearAlgebra/LAGraph)
34+
└── .github/workflows/ci.yml # CI: build GraphBLAS + LAGraph, cargo build & test
35+
```
36+
37+
## Build & Dependencies
38+
39+
### System prerequisites
40+
41+
| Dependency | Purpose |
42+
|---|---|
43+
| **SuiteSparse:GraphBLAS** | Sparse matrix engine (`libgraphblas`) |
44+
| **LAGraph** | Graph algorithm library on top of GraphBLAS (`liblagraph`) |
45+
| **cmake** | Building LAGraph from source |
46+
| **libclang-dev / clang** | Required by `bindgen` when `regenerate-bindings` feature is active |
47+
48+
### Building
49+
50+
```bash
51+
# Ensure submodules are present
52+
git submodule update --init --recursive
53+
54+
# Build and install SuiteSparse:GraphBLAS system-wide
55+
git clone --depth 1 https://github.com/DrTimothyAldenDavis/GraphBLAS.git
56+
cd GraphBLAS && make compact && sudo make install && cd ..
57+
58+
# Build LAGraph inside the submodule (no system-wide install required)
59+
cd deps/LAGraph && make && cd ../..
60+
61+
# Build pathrex
62+
cargo build
63+
64+
# Run tests
65+
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test
66+
```
67+
68+
### How `build.rs` handles linking
69+
70+
[`build.rs`](build.rs) performs two jobs:
71+
72+
1. **Native linking.** It emits six Cargo directives:
73+
- `cargo:rustc-link-lib=dylib=graphblas` — dynamically links `libgraphblas`.
74+
- `cargo:rustc-link-search=native=/usr/local/lib` — adds the system GraphBLAS
75+
install path to the native library search path.
76+
- `cargo:rustc-link-lib=dylib=lagraph` — dynamically links `liblagraph`.
77+
- `cargo:rustc-link-search=native=deps/LAGraph/build/src` — adds the
78+
submodule's core build output to the native library search path.
79+
- `cargo:rustc-link-lib=dylib=lagraphx` — dynamically links `liblagraphx`
80+
(experimental algorithms).
81+
- `cargo:rustc-link-search=native=deps/LAGraph/build/experimental`
82+
adds the experimental build output to the native library search path.
83+
84+
LAGraph does **not** need to be installed system-wide; building the submodule
85+
in `deps/LAGraph/` is sufficient for compilation and linking.
86+
SuiteSparse:GraphBLAS **must** be installed system-wide (`sudo make install`).
87+
88+
At **runtime** the OS dynamic linker (`ld.so`) does not use Cargo's link
89+
search paths — it only consults `LD_LIBRARY_PATH`, `rpath`, and the system
90+
library cache. Set `LD_LIBRARY_PATH=/usr/local/lib` after a system-wide
91+
LAGraph install, or include the submodule build paths if not installing
92+
system-wide.
93+
94+
2. **Optional FFI binding regeneration** (feature `regenerate-bindings`).
95+
When the feature is active, [`regenerate_bindings()`](build.rs:20) runs
96+
`bindgen` against `deps/LAGraph/include/LAGraph.h` and
97+
`deps/LAGraph/include/LAGraphX.h` (always from the submodule — no system
98+
path search), plus `GraphBLAS.h` (searched in
99+
`/usr/local/include/suitesparse` and `/usr/include/suitesparse`). The
100+
generated Rust file is written to
101+
[`src/lagraph_sys_generated.rs`](src/lagraph_sys_generated.rs). Only a
102+
curated allowlist of GraphBLAS/LAGraph types and functions is exposed
103+
(see the `allowlist_*` calls in [`build.rs`](build.rs:59)).
104+
105+
### Feature flags
106+
107+
| Feature | Effect |
108+
|---|---|
109+
| `regenerate-bindings` | Runs `bindgen` at build time to regenerate `src/lagraph_sys_generated.rs` from `LAGraph.h`, `LAGraphX.h` (both from `deps/LAGraph/include`) and `GraphBLAS.h`. Without this feature the checked-in bindings are used as-is. |
110+
111+
### Pre-generated FFI bindings
112+
113+
The file `src/lagraph_sys_generated.rs` is checked into version control. CI
114+
regenerates it with `--features regenerate-bindings`. **Do not hand-edit this file.**
115+
116+
## Architecture & Key Abstractions
117+
118+
### Edge
119+
120+
[`Edge`](src/graph/mod.rs:154) is the universal currency between format parsers and graph
121+
builders: `{ source: String, target: String, label: String }`.
122+
123+
### GraphSource trait
124+
125+
[`GraphSource<B>`](src/graph/mod.rs:164) is implemented by any data source that knows how to
126+
feed itself into a specific [`GraphBuilder`]:
127+
128+
- [`apply_to(self, builder: B) -> Result<B, B::Error>`](src/graph/mod.rs:165) — consumes the
129+
source and returns the populated builder.
130+
131+
[`Csv<R>`](src/formats/csv.rs:52) implements `GraphSource<InMemoryBuilder>` directly, so it
132+
can be passed to [`GraphBuilder::load`].
133+
134+
### GraphBuilder trait
135+
136+
[`GraphBuilder`](src/graph/mod.rs:169) accumulates edges and produces a
137+
[`GraphDecomposition`](src/graph/mod.rs:188):
138+
139+
- [`load<S: GraphSource<Self>>(self, source: S)`](src/graph/mod.rs:179) — primary entry point;
140+
delegates to `GraphSource::apply_to`.
141+
- [`build(self)`](src/graph/mod.rs:184) — finalise into an immutable graph.
142+
143+
`InMemoryBuilder` also exposes lower-level helpers outside the trait:
144+
145+
- [`push_edge(&mut self, edge: Edge)`](src/graph/inmemory.rs:62) — ingest one edge.
146+
- [`with_stream<I, E>(self, stream: I)`](src/graph/inmemory.rs:72) — consume an
147+
`IntoIterator<Item = Result<Edge, E>>`.
148+
- [`push_grb_matrix(&mut self, label, matrix: GrB_Matrix)`](src/graph/inmemory.rs:85) — accept
149+
a pre-built `GrB_Matrix` for a label, wrapping it in an `LAGraph_Graph` immediately.
150+
151+
### Backend trait & Graph\<B\> handle
152+
153+
[`Backend`](src/graph/mod.rs:217) associates a marker type with a concrete builder/graph pair:
154+
155+
```rust
156+
pub trait Backend {
157+
type Graph: GraphDecomposition;
158+
type Builder: GraphBuilder<Graph = Self::Graph>;
159+
}
160+
```
161+
162+
[`Graph<B>`](src/graph/mod.rs:229) is a zero-sized handle parameterised by a `Backend`:
163+
164+
- [`Graph::<InMemory>::builder()`](src/graph/mod.rs:234) — returns a fresh `InMemoryBuilder`.
165+
- [`Graph::<InMemory>::try_from(source)`](src/graph/mod.rs:238) — builds a graph from a single
166+
source in one call.
167+
168+
[`InMemory`](src/graph/inmemory.rs:26) is the concrete backend marker type.
169+
170+
### GraphDecomposition trait
171+
172+
[`GraphDecomposition`](src/graph/mod.rs:188) is the read-only query interface:
173+
174+
- [`get_graph(label)`](src/graph/mod.rs:192) — returns `Arc<LagraphGraph>` for a given edge label.
175+
- [`get_node_id(string_id)`](src/graph/mod.rs:195) / [`get_node_name(mapped_id)`](src/graph/mod.rs:198) — bidirectional string ↔ integer dictionary.
176+
- [`num_nodes()`](src/graph/mod.rs:199) — total unique nodes.
177+
178+
### InMemoryBuilder / InMemoryGraph
179+
180+
[`InMemoryBuilder`](src/graph/inmemory.rs:35) is the primary `GraphBuilder` implementation.
181+
It collects edges in RAM, then [`build()`](src/graph/inmemory.rs:110) calls
182+
GraphBLAS to create one `GrB_Matrix` per label via COO format, wraps each in an
183+
`LAGraph_Graph`, and returns an [`InMemoryGraph`](src/graph/inmemory.rs:153).
184+
185+
Multiple CSV sources can be chained with repeated `.load()` calls; all edges are merged
186+
into a single graph.
187+
188+
### Format parsers
189+
190+
[`Csv<R>`](src/formats/csv.rs:52) is the only built-in parser. It yields
191+
`Iterator<Item = Result<Edge, FormatError>>` and is directly pluggable into
192+
`GraphBuilder::load()` via its `GraphSource<InMemoryBuilder>` impl.
193+
194+
Configuration is via [`CsvConfig`](src/formats/csv.rs:17):
195+
196+
| Field | Default | Description |
197+
|---|---|---|
198+
| `source_column` | `Index(0)` | Column for the source node (by index or name) |
199+
| `target_column` | `Index(1)` | Column for the target node |
200+
| `label_column` | `Index(2)` | Column for the edge label |
201+
| `has_header` | `true` | Whether the first row is a header |
202+
| `delimiter` | `b','` | Field delimiter byte |
203+
204+
[`ColumnSpec`](src/formats/csv.rs:11) is either `Index(usize)` or `Name(String)`.
205+
Name-based lookup requires `has_header: true`.
206+
207+
### FFI layer
208+
209+
[`lagraph_sys`](src/lagraph_sys.rs) exposes raw C bindings for GraphBLAS and
210+
LAGraph. Safe Rust wrappers live in [`graph::mod`](src/graph/mod.rs):
211+
212+
- [`LagraphGraph`](src/graph/mod.rs:48) — RAII wrapper around `LAGraph_Graph` (calls
213+
`LAGraph_Delete` on drop). Also provides
214+
[`LagraphGraph::from_coo()`](src/graph/mod.rs:85) to build directly from COO arrays.
215+
- [`GraphblasVector`](src/graph/mod.rs:124) — RAII wrapper around `GrB_Vector`.
216+
- [`ensure_grb_init()`](src/graph/mod.rs:39) — one-time `LAGraph_Init` via `std::sync::Once`.
217+
218+
### Macros (`src/utils.rs`)
219+
220+
Two `#[macro_export]` macros handle FFI error mapping:
221+
222+
- [`grb_ok!(expr)`](src/utils.rs:138) — evaluates a GraphBLAS call inside `unsafe`, maps the
223+
`i32` return to `Result<(), GraphError::GraphBlas(info)>`.
224+
- [`la_ok!(fn::path(args…))`](src/utils.rs:167) — evaluates a LAGraph call, automatically
225+
appending the required `*mut i8` message buffer, and maps failure to
226+
`GraphError::LAGraph(info, msg)`.
227+
228+
## Coding Conventions
229+
230+
- **Rust edition 2024**.
231+
- Error handling via `thiserror` derive macros; two main error enums:
232+
[`GraphError`](src/graph/mod.rs:15) and [`FormatError`](src/formats/mod.rs:24).
233+
- `FormatError` converts into `GraphError` via `#[from] FormatError` on the
234+
`GraphError::Format` variant.
235+
- Unsafe FFI calls are confined to `lagraph_sys`, `graph/mod.rs`, and
236+
`graph/inmemory.rs`. All raw pointers are wrapped in RAII types that free
237+
resources on drop.
238+
- `unsafe impl Send + Sync` is provided for `LagraphGraph` and
239+
`GraphblasVector` because GraphBLAS handles are thread-safe after init.
240+
- Unit tests live in `#[cfg(test)] mod tests` blocks inside each module.
241+
Integration tests that need GraphBLAS live in [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs).
242+
243+
## Testing
244+
245+
```bash
246+
# Run all tests (LAGraph installed system-wide)
247+
LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose
248+
249+
# If LAGraph is NOT installed system-wide (only built in the submodule):
250+
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test --verbose
251+
```
252+
253+
Tests in `src/graph/mod.rs` use `CountingBuilder` / `CountOutput` / `VecSource` from
254+
[`src/utils.rs`](src/utils.rs) — these do **not** call into GraphBLAS and run without
255+
native libraries.
256+
257+
Tests in `src/formats/csv.rs` are pure Rust and need no native dependencies.
258+
259+
Tests in `src/graph/inmemory.rs` and [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs)
260+
call real GraphBLAS/LAGraph and require the native libraries to be present.
261+
262+
## CI
263+
264+
The GitHub Actions workflow ([`.github/workflows/ci.yml`](.github/workflows/ci.yml))
265+
runs on every push and PR across `stable`, `beta`, and `nightly` toolchains:
266+
267+
1. Checks out with `submodules: recursive`.
268+
2. Installs cmake, libclang-dev, clang.
269+
3. Builds and installs SuiteSparse:GraphBLAS from source (`sudo make install`).
270+
4. Builds and installs LAGraph from the submodule (`sudo make install`).
271+
5. `cargo build --features regenerate-bindings` — rebuilds FFI bindings.
272+
6. `LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose` — runs the full test suite.

build.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,10 @@ fn regenerate_bindings() {
7373
.allowlist_function("GrB_Vector_nvals")
7474
.allowlist_function("GrB_Vector_extractTuples_BOOL")
7575
.allowlist_function("GrB_vxm")
76+
.allowlist_item("LAGRAPH_MSG_LEN")
7677
.allowlist_type("LAGraph_Graph")
7778
.allowlist_type("LAGraph_Kind")
79+
.allowlist_function("LAGraph_CheckGraph")
7880
.allowlist_function("LAGraph_Init")
7981
.allowlist_function("LAGraph_Finalize")
8082
.allowlist_function("LAGraph_New")

src/formats/csv.rs

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use std::io::Read;
55
use csv::StringRecord;
66

77
use crate::formats::FormatError;
8-
use crate::graph::{Edge};
8+
use crate::graph::Edge;
99

1010
#[derive(Debug, Clone)]
1111
pub enum ColumnSpec {
@@ -130,7 +130,11 @@ impl<R: Read> Iterator for Csv<R> {
130130
let source = Self::get_field(&record, self.source_idx)?;
131131
let target = Self::get_field(&record, self.target_idx)?;
132132
let label = Self::get_field(&record, self.label_idx)?;
133-
Ok(Edge { source, target, label })
133+
Ok(Edge {
134+
source,
135+
target,
136+
label,
137+
})
134138
})())
135139
}
136140
}

0 commit comments

Comments
 (0)