Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 90 additions & 36 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,19 @@ pathrex/
├── Cargo.toml # Crate manifest (edition 2024)
├── build.rs # Links LAGraph + LAGraphX; optionally regenerates FFI bindings
├── src/
│ ├── lib.rs # Public modules: formats, graph, sparql, lagraph_sys, utils
│ ├── lib.rs # Modules: formats, graph, rpq, sparql, utils, lagraph_sys
│ ├── main.rs # Binary entry point (placeholder)
│ ├── lagraph_sys.rs # FFI module — includes generated bindings
│ ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI)
│ ├── utils.rs # Internal helpers: CountingBuilder, CountOutput, VecSource,
│ │ # grb_ok! and la_ok! macros
│ ├── utils.rs # Public helpers: CountingBuilder, CountOutput, VecSource,
│ │ # grb_ok! and la_ok! macros, build_graph
│ ├── graph/
│ │ ├── mod.rs # Core traits (GraphBuilder, GraphDecomposition, GraphSource,
│ │ │ # Backend, Graph<B>), error types, RAII wrappers, GrB init
│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph
│ ├── rpq/
│ │ ├── mod.rs # RPQ evaluation trait (RpqEvaluator), RpqResult, RpqError
│ │ └── nfarpq.rs # NFA-based RPQ evaluator using LAGraph_RegularPathQuery
│ ├── sparql/
│ │ └── mod.rs # SPARQL parsing (spargebra), PathTriple extraction, parse_rpq
│ └── formats/
Expand All @@ -32,7 +35,8 @@ pathrex/
│ └── mm.rs # MatrixMarket directory loader (vertices.txt, edges.txt, *.txt)
├── tests/
│ ├── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
│ └── mm_tests.rs # Integration tests for MatrixMarket format
│ ├── mm_tests.rs # Integration tests for MatrixMarket format
│ └── nfarpq_tests.rs # Integration tests for NfaRpqEvaluator
├── deps/
│ └── LAGraph/ # Git submodule (SparseLinearAlgebra/LAGraph)
└── .github/workflows/ci.yml # CI: build GraphBLAS + LAGraph, cargo build & test
Expand Down Expand Up @@ -121,40 +125,40 @@ regenerates it with `--features regenerate-bindings`. **Do not hand-edit this fi

### Edge

[`Edge`](src/graph/mod.rs:154) is the universal currency between format parsers and graph
[`Edge`](src/graph/mod.rs:158) is the universal currency between format parsers and graph
builders: `{ source: String, target: String, label: String }`.

### GraphSource trait

[`GraphSource<B>`](src/graph/mod.rs:164) is implemented by any data source that knows how to
[`GraphSource<B>`](src/graph/mod.rs:168) is implemented by any data source that knows how to
feed itself into a specific [`GraphBuilder`]:

- [`apply_to(self, builder: B) -> Result<B, B::Error>`](src/graph/mod.rs:165) — consumes the
- [`apply_to(self, builder: B) -> Result<B, B::Error>`](src/graph/mod.rs:169) — consumes the
source and returns the populated builder.

[`Csv<R>`](src/formats/csv.rs:52) implements `GraphSource<InMemoryBuilder>` directly, so it
can be passed to [`GraphBuilder::load`].

### GraphBuilder trait

[`GraphBuilder`](src/graph/mod.rs:169) accumulates edges and produces a
[`GraphDecomposition`](src/graph/mod.rs:188):
[`GraphBuilder`](src/graph/mod.rs:173) accumulates edges and produces a
[`GraphDecomposition`](src/graph/mod.rs:193):

- [`load<S: GraphSource<Self>>(self, source: S)`](src/graph/mod.rs:179) — primary entry point;
- [`load<S: GraphSource<Self>>(self, source: S)`](src/graph/mod.rs:183) — primary entry point;
delegates to `GraphSource::apply_to`.
- [`build(self)`](src/graph/mod.rs:184) — finalise into an immutable graph.
- [`build(self)`](src/graph/mod.rs:188) — finalise into an immutable graph.

`InMemoryBuilder` also exposes lower-level helpers outside the trait:

- [`push_edge(&mut self, edge: Edge)`](src/graph/inmemory.rs:62) — ingest one edge.
- [`with_stream<I, E>(self, stream: I)`](src/graph/inmemory.rs:72) — consume an
- [`push_edge(&mut self, edge: Edge)`](src/graph/inmemory.rs:83) — ingest one edge.
- [`with_stream<I, E>(self, stream: I)`](src/graph/inmemory.rs:93) — consume an
`IntoIterator<Item = Result<Edge, E>>`.
- [`push_grb_matrix(&mut self, label, matrix: GrB_Matrix)`](src/graph/inmemory.rs:85) — accept
- [`push_grb_matrix(&mut self, label, matrix: GrB_Matrix)`](src/graph/inmemory.rs:106) — accept
a pre-built `GrB_Matrix` for a label, wrapping it in an `LAGraph_Graph` immediately.

### Backend trait & Graph\<B\> handle

[`Backend`](src/graph/mod.rs:217) associates a marker type with a concrete builder/graph pair:
[`Backend`](src/graph/mod.rs:221) associates a marker type with a concrete builder/graph pair:

```rust
pub trait Backend {
Expand All @@ -163,28 +167,28 @@ pub trait Backend {
}
```

[`Graph<B>`](src/graph/mod.rs:229) is a zero-sized handle parameterised by a `Backend`:
[`Graph<B>`](src/graph/mod.rs:233) is a zero-sized handle parameterised by a `Backend`:

- [`Graph::<InMemory>::builder()`](src/graph/mod.rs:234) — returns a fresh `InMemoryBuilder`.
- [`Graph::<InMemory>::try_from(source)`](src/graph/mod.rs:238) — builds a graph from a single
- [`Graph::<InMemory>::builder()`](src/graph/mod.rs:238) — returns a fresh `InMemoryBuilder`.
- [`Graph::<InMemory>::try_from(source)`](src/graph/mod.rs:242) — builds a graph from a single
source in one call.

[`InMemory`](src/graph/inmemory.rs:26) is the concrete backend marker type.
[`InMemory`](src/graph/inmemory.rs:27) is the concrete backend marker type.

### GraphDecomposition trait

[`GraphDecomposition`](src/graph/mod.rs:188) is the read-only query interface:
[`GraphDecomposition`](src/graph/mod.rs:193) is the read-only query interface:

- [`get_graph(label)`](src/graph/mod.rs:192) — returns `Arc<LagraphGraph>` for a given edge label.
- [`get_node_id(string_id)`](src/graph/mod.rs:195) / [`get_node_name(mapped_id)`](src/graph/mod.rs:198) — bidirectional string ↔ integer dictionary.
- [`num_nodes()`](src/graph/mod.rs:199) — total unique nodes.
- [`get_graph(label)`](src/graph/mod.rs:197) — returns `Arc<LagraphGraph>` for a given edge label.
- [`get_node_id(string_id)`](src/graph/mod.rs:200) / [`get_node_name(mapped_id)`](src/graph/mod.rs:203) — bidirectional string ↔ integer dictionary.
- [`num_nodes()`](src/graph/mod.rs:204) — total unique nodes.

### InMemoryBuilder / InMemoryGraph

[`InMemoryBuilder`](src/graph/inmemory.rs:35) is the primary `GraphBuilder` implementation.
[`InMemoryBuilder`](src/graph/inmemory.rs:36) is the primary `GraphBuilder` implementation.
It collects edges in RAM, then [`build()`](src/graph/inmemory.rs:131) calls
GraphBLAS to create one `GrB_Matrix` per label via COO format, wraps each in an
`LAGraph_Graph`, and returns an [`InMemoryGraph`](src/graph/inmemory.rs:173).
`LAGraph_Graph`, and returns an [`InMemoryGraph`](src/graph/inmemory.rs:174).

Multiple CSV sources can be chained with repeated `.load()` calls; all edges are merged
into a single graph.
Expand All @@ -196,7 +200,7 @@ which is used by the MatrixMarket loader.

### Format parsers

Two built-in parsers are available:
CSV and MatrixMarket edge loaders are available:

#### CSV format

Expand All @@ -220,7 +224,7 @@ Name-based lookup requires `has_header: true`.

[`MatrixMarket`](src/formats/mm.rs:159) loads an edge-labeled graph from a directory with:

- `vertices.txt` — one line per node: `<node_name> <1-based-index>` on disk; [`get_node_id`](src/graph/mod.rs:199) returns the matching **0-based** matrix index
- `vertices.txt` — one line per node: `<node_name> <1-based-index>` on disk; [`get_node_id`](src/graph/mod.rs:200) returns the matching **0-based** matrix index
- `edges.txt` — one line per label: `<label_name> <1-based-index>` (selects `n.txt`)
- `<n>.txt` — MatrixMarket adjacency matrix for label with index `n`

Expand Down Expand Up @@ -272,6 +276,43 @@ The module also handles spargebra's desugaring of sequence paths
(`?x <a>/<b>/<c> ?y`) from a chain of BGP triples back into a single
[`PropertyPathExpression::Sequence`].

### RPQ evaluation (`src/rpq/`)

The [`rpq`](src/rpq/mod.rs) module provides an abstraction for evaluating
Regular Path Queries (RPQs) over edge-labeled graphs using GraphBLAS/LAGraph.

Key public items:

- [`RpqEvaluator`](src/rpq/mod.rs:47) — trait with a single method
[`evaluate(subject, path, object, graph)`](src/rpq/mod.rs:48) that takes
SPARQL [`TermPattern`] endpoints, a [`PropertyPathExpression`] path, and a
[`GraphDecomposition`], returning an [`RpqResult`](src/rpq/mod.rs:42).
- [`RpqResult`](src/rpq/mod.rs:42) — wraps a [`GraphblasVector`] of reachable
vertices.
- [`RpqError`](src/rpq/mod.rs:21) — error enum covering parse errors, extraction
errors, unsupported paths, missing labels/vertices, and GraphBLAS failures.

#### `NfaRpqEvaluator` (`src/rpq/nfarpq.rs`)

[`NfaRpqEvaluator`](src/rpq/nfarpq.rs:265) implements [`RpqEvaluator`] by:

1. Converting a [`PropertyPathExpression`] into an [`Nfa`](src/rpq/nfarpq.rs:27)
via Thompson's construction ([`Nfa::from_property_path()`](src/rpq/nfarpq.rs:35)).
2. Eliminating ε-transitions via epsilon closure
([`NfaBuilder::epsilon_closure()`](src/rpq/nfarpq.rs:198)).
3. Building one `LAGraph_Graph` per NFA label transition
([`Nfa::build_lagraph_matrices()`](src/rpq/nfarpq.rs:43)).
4. Calling [`LAGraph_RegularPathQuery`] with the NFA matrices, data-graph
matrices, start/final states, and source vertices.

Supported path operators: `NamedNode`, `Sequence`, `Alternative`,
`ZeroOrMore`, `OneOrMore`, `ZeroOrOne`. `Reverse` and `NegatedPropertySet`
return [`RpqError::UnsupportedPath`].

Subject/object resolution: a [`TermPattern::Variable`] means "all vertices";
a [`TermPattern::NamedNode`] resolves to a single vertex via
[`GraphDecomposition::get_node_id()`](src/graph/mod.rs:200).

### FFI layer

[`lagraph_sys`](src/lagraph_sys.rs) exposes raw C bindings for GraphBLAS and
Expand All @@ -280,10 +321,11 @@ LAGraph. Safe Rust wrappers live in [`graph::mod`](src/graph/mod.rs):
- [`LagraphGraph`](src/graph/mod.rs:48) — RAII wrapper around `LAGraph_Graph` (calls
`LAGraph_Delete` on drop). Also provides
[`LagraphGraph::from_coo()`](src/graph/mod.rs:85) to build directly from COO arrays.
- [`GraphblasVector`](src/graph/mod.rs:124) — RAII wrapper around `GrB_Vector`.
- [`GraphblasVector`](src/graph/mod.rs:128) — RAII wrapper around `GrB_Vector`
(derives `Debug`).
- [`ensure_grb_init()`](src/graph/mod.rs:39) — one-time `LAGraph_Init` via `std::sync::Once`.

### Macros (`src/utils.rs`)
### Macros & helpers (`src/utils.rs`)

Two `#[macro_export]` macros handle FFI error mapping:

Expand All @@ -293,20 +335,28 @@ Two `#[macro_export]` macros handle FFI error mapping:
appending the required `*mut i8` message buffer, and maps failure to
`GraphError::LAGraph(info, msg)`.

A convenience function is also provided:

- [`build_graph(edges)`](src/utils.rs:184) — builds an `InMemoryGraph` from a
slice of `(&str, &str, &str)` triples (source, target, label). Used by
integration tests.

## Coding Conventions

- **Rust edition 2024**.
- Error handling via `thiserror` derive macros; two main error enums:
[`GraphError`](src/graph/mod.rs:15) and [`FormatError`](src/formats/mod.rs:24).
- Error handling via `thiserror` derive macros; three main error enums:
[`GraphError`](src/graph/mod.rs:15), [`FormatError`](src/formats/mod.rs:24),
and [`RpqError`](src/rpq/mod.rs:21).
- `FormatError` converts into `GraphError` via `#[from] FormatError` on the
`GraphError::Format` variant.
- Unsafe FFI calls are confined to `lagraph_sys`, `graph/mod.rs`, and
`graph/inmemory.rs`. All raw pointers are wrapped in RAII types that free
resources on drop.
- Unsafe FFI calls are confined to `lagraph_sys`, `graph/mod.rs`,
`graph/inmemory.rs`, and `rpq/nfarpq.rs`. All raw pointers are wrapped in
RAII types that free resources on drop.
- `unsafe impl Send + Sync` is provided for `LagraphGraph` and
`GraphblasVector` because GraphBLAS handles are thread-safe after init.
- Unit tests live in `#[cfg(test)] mod tests` blocks inside each module.
Integration tests that need GraphBLAS live in [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs).
Integration tests that need GraphBLAS live in [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs),
[`tests/mm_tests.rs`](tests/mm_tests.rs), and [`tests/nfarpq_tests.rs`](tests/nfarpq_tests.rs).

## Testing

Expand All @@ -326,7 +376,11 @@ Tests in `src/formats/csv.rs` are pure Rust and need no native dependencies.

Tests in `src/sparql/mod.rs` are pure Rust and need no native dependencies.

Tests in `src/graph/inmemory.rs` and [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs)
Tests in `src/rpq/nfarpq.rs` (NFA construction unit tests) are pure Rust and need no
native dependencies.

Tests in `src/graph/inmemory.rs`, [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs),
[`tests/mm_tests.rs`](tests/mm_tests.rs), and [`tests/nfarpq_tests.rs`](tests/nfarpq_tests.rs)
call real GraphBLAS/LAGraph and require the native libraries to be present.

## CI
Expand Down
1 change: 1 addition & 0 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ fn regenerate_bindings() {
.allowlist_function("LAGraph_Delete")
.allowlist_function("LAGraph_Cached_AT")
.allowlist_function("LAGraph_MMRead")
.allowlist_function("LAGraph_RegularPathQuery")
.default_enum_style(bindgen::EnumVariation::Rust {
non_exhaustive: false,
})
Expand Down
1 change: 1 addition & 0 deletions src/graph/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ impl Drop for LagraphGraph {
unsafe impl Send for LagraphGraph {}
unsafe impl Sync for LagraphGraph {}

#[derive(Debug)]
pub struct GraphblasVector {
pub inner: GrB_Vector,
}
Expand Down
34 changes: 34 additions & 0 deletions src/lagraph_sys_generated.rs
Original file line number Diff line number Diff line change
Expand Up @@ -261,3 +261,37 @@ unsafe extern "C" {
msg: *mut ::std::os::raw::c_char,
) -> ::std::os::raw::c_int;
}
unsafe extern "C" {
pub fn LAGraph_RegularPathQuery(
reachable: *mut GrB_Vector,
R: *mut LAGraph_Graph,
nl: usize,
QS: *const GrB_Index,
nqs: usize,
QF: *const GrB_Index,
nqf: usize,
G: *mut LAGraph_Graph,
S: *const GrB_Index,
ns: usize,
msg: *mut ::std::os::raw::c_char,
) -> ::std::os::raw::c_int;
}
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum RPQMatrixOp {
RPQ_MATRIX_OP_LABEL = 0,
RPQ_MATRIX_OP_LOR = 1,
RPQ_MATRIX_OP_CONCAT = 2,
RPQ_MATRIX_OP_KLEENE = 3,
RPQ_MATRIX_OP_KLEENE_L = 4,
RPQ_MATRIX_OP_KLEENE_R = 5,
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct RPQMatrixPlan {
pub op: RPQMatrixOp,
pub lhs: *mut RPQMatrixPlan,
pub rhs: *mut RPQMatrixPlan,
pub mat: GrB_Matrix,
pub res_mat: GrB_Matrix,
}
3 changes: 2 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
pub mod formats;
pub mod graph;
pub mod rpq;
pub mod sparql;
#[allow(unused_unsafe, dead_code)]
pub(crate) mod utils;
pub mod utils;

pub mod lagraph_sys;
54 changes: 54 additions & 0 deletions src/rpq/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
//! Regular Path Query (RPQ) evaluation over edge-labeled graphs.
//! ```rust,ignore
//! use pathrex::sparql::parse_rpq;
//! use pathrex::rpq::{RpqEvaluator, nfarpq::NfaRpqEvaluator};
//!
//! let triple = parse_rpq("SELECT ?x ?y WHERE { ?x <knows>/<likes>* ?y . }")?;
//! let result = NfaRpqEvaluator.evaluate(&triple.subject, &triple.path, &triple.object, &graph)?;
//! ```

pub mod nfarpq;

use crate::graph::GraphDecomposition;
use crate::graph::GraphblasVector;
use crate::sparql::ExtractError;
use spargebra::SparqlSyntaxError;
use spargebra::algebra::PropertyPathExpression;
use spargebra::term::TermPattern;
use thiserror::Error;

#[derive(Debug, Error)]
pub enum RpqError {
#[error("SPARQL syntax error: {0}")]
Parse(#[from] SparqlSyntaxError),

#[error("query extraction error: {0}")]
Extract(#[from] ExtractError),

#[error("unsupported path expression: {0}")]
UnsupportedPath(String),

#[error("label not found in graph: '{0}'")]
LabelNotFound(String),

#[error("vertex not found in graph: '{0}'")]
VertexNotFound(String),

#[error("GraphBLAS/LAGraph error: {0}")]
GraphBlas(String),
}

#[derive(Debug)]
pub struct RpqResult {
pub reachable: GraphblasVector,
}

pub trait RpqEvaluator {
fn evaluate<G: GraphDecomposition>(
&self,
subject: &TermPattern,
path: &PropertyPathExpression,
object: &TermPattern,
graph: &G,
) -> Result<RpqResult, RpqError>;
}
Loading
Loading