Pathrex is a Rust library and CLI tool for benchmarking queries on edge-labeled graphs constrained by regular languages and context-free languages. It uses SuiteSparse:GraphBLAS (via LAGraph) for sparse Boolean matrix operations and decomposes a graph by edge label into one Boolean adjacency matrix per label.
pathrex/
├── Cargo.toml # Crate manifest (edition 2024)
├── build.rs # Links LAGraph + LAGraphX; optionally regenerates FFI bindings
├── src/
│ ├── lib.rs # Modules: eval, formats, graph, rpq, sparql, utils, lagraph_sys
│ ├── main.rs # Binary entry point (placeholder)
│ ├── lagraph_sys.rs # FFI module — includes generated bindings
│ ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI)
│ ├── utils.rs # Public helpers: CountingBuilder, CountOutput, VecSource,
│ │ # grb_ok! and la_ok! macros, build_graph
│ ├── graph/
│ │ ├── mod.rs # Core traits (GraphBuilder, GraphDecomposition, GraphSource,
│ │ │ # Backend, Graph<B>), error types, RAII wrappers, GrB init
│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph
│ ├── eval/
│ │ └── mod.rs # Evaluator, PreparedEvaluator, ResultCount traits
│ ├── rpq/
│ │ ├── mod.rs # RPQ query types, RpqError, RPQ marker subtraits
│ │ ├── nfarpq.rs # NfaRpqEvaluator (LAGraph_RegularPathQuery)
│ │ └── rpqmatrix.rs # Matrix-plan RPQ evaluator
│ ├── sparql/
│ │ └── mod.rs # parse_rpq / extract_rpq → RpqQuery (spargebra)
│ └── formats/
│ ├── mod.rs # FormatError enum, re-exports
│ ├── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│ ├── mm.rs # MatrixMarket directory loader (vertices.txt, edges.txt, *.txt)
│ └── rdf.rs # Rdf — unified RDF parser (N-Triples, Turtle) → Edge iterator
├── tests/
│ ├── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
│ ├── mm_tests.rs # Integration tests for MatrixMarket format
│ ├── nfarpq_tests.rs # Integration tests for NfaRpqEvaluator
│ └── rpqmatrix_tests.rs # Integration tests for matrix-plan RPQ evaluator
├── deps/
│ └── LAGraph/ # Git submodule (SparseLinearAlgebra/LAGraph)
└── .github/workflows/ci.yml # CI: build GraphBLAS + LAGraph, cargo build & test
| Dependency | Purpose |
|---|---|
| SuiteSparse:GraphBLAS | Sparse matrix engine (libgraphblas) |
| LAGraph | Graph algorithm library on top of GraphBLAS (liblagraph) |
| cmake | Building LAGraph from source |
| libclang-dev / clang | Required by bindgen when regenerate-bindings feature is active |
# Ensure submodules are present
git submodule update --init --recursive
# Build and install SuiteSparse:GraphBLAS system-wide
git clone --depth 1 https://github.com/DrTimothyAldenDavis/GraphBLAS.git
cd GraphBLAS && make compact && sudo make install && cd ..
# Build LAGraph inside the submodule (no system-wide install required)
cd deps/LAGraph && make && cd ../..
# Build pathrex
cargo build
# Run tests
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo testbuild.rs performs two jobs:
-
Native linking. It emits six Cargo directives:
cargo:rustc-link-lib=dylib=graphblas— dynamically linkslibgraphblas.cargo:rustc-link-search=native=/usr/local/lib— adds the system GraphBLAS install path to the native library search path.cargo:rustc-link-lib=dylib=lagraph— dynamically linksliblagraph.cargo:rustc-link-search=native=deps/LAGraph/build/src— adds the submodule's core build output to the native library search path.cargo:rustc-link-lib=dylib=lagraphx— dynamically linksliblagraphx(experimental algorithms).cargo:rustc-link-search=native=deps/LAGraph/build/experimental— adds the experimental build output to the native library search path.
LAGraph does not need to be installed system-wide; building the submodule in
deps/LAGraph/is sufficient for compilation and linking. SuiteSparse:GraphBLAS must be installed system-wide (sudo make install).At runtime the OS dynamic linker (
ld.so) does not use Cargo's link search paths — it only consultsLD_LIBRARY_PATH,rpath, and the system library cache. SetLD_LIBRARY_PATH=/usr/local/libafter a system-wide LAGraph install, or include the submodule build paths if not installing system-wide. -
Optional FFI binding regeneration (feature
regenerate-bindings). When the feature is active,regenerate_bindings()runsbindgenagainstdeps/LAGraph/include/LAGraph.handdeps/LAGraph/include/LAGraphX.h(always from the submodule — no system path search), plusGraphBLAS.h(searched in/usr/local/include/suitesparseand/usr/include/suitesparse). The generated Rust file is written tosrc/lagraph_sys_generated.rs. Only a curated allowlist of GraphBLAS/LAGraph types and functions is exposed (see theallowlist_*calls inbuild.rs).
| Feature | Effect |
|---|---|
regenerate-bindings |
Runs bindgen at build time to regenerate src/lagraph_sys_generated.rs from LAGraph.h, LAGraphX.h (both from deps/LAGraph/include) and GraphBLAS.h. Without this feature the checked-in bindings are used as-is. |
The file src/lagraph_sys_generated.rs is checked into version control. CI
regenerates it with --features regenerate-bindings. Do not hand-edit this file.
Edge is the universal currency between format parsers and graph
builders: { source: String, target: String, label: String }.
GraphSource<B> is implemented by any data source that knows how to
feed itself into a specific [GraphBuilder]:
apply_to(self, builder: B) -> Result<B, B::Error>— consumes the source and returns the populated builder.
Csv<R>, MatrixMarket, and Rdf
implement GraphSource<InMemoryBuilder> (see src/graph/inmemory.rs), so they
can be passed to [GraphBuilder::load] and [Graph::try_from].
GraphBuilder accumulates edges and produces a
GraphDecomposition:
load<S: GraphSource<Self>>(self, source: S)— primary entry point; delegates toGraphSource::apply_to.build(self)— finalise into an immutable graph.
InMemoryBuilder also exposes lower-level helpers outside the trait:
push_edge(&mut self, edge: Edge)— ingest one edge.with_stream<I, E>(self, stream: I)— consume anIntoIterator<Item = Result<Edge, E>>.push_grb_matrix(&mut self, label, matrix: GrB_Matrix)— accept a pre-builtGrB_Matrixfor a label, wrapping it in anLAGraph_Graphimmediately.
Backend associates a marker type with a concrete builder/graph pair:
pub trait Backend {
type Graph: GraphDecomposition;
type Builder: GraphBuilder<Graph = Self::Graph>;
}Graph<B> is a zero-sized handle parameterised by a Backend:
Graph::<InMemory>::builder()— returns a freshInMemoryBuilder.Graph::<InMemory>::try_from(source)— builds a graph from a single source in one call.
InMemory is the concrete backend marker type.
GraphDecomposition is the read-only query interface:
get_graph(label)— returnsArc<LagraphGraph>for a given edge label.get_node_id(string_id)/get_node_name(mapped_id)— bidirectional string ↔ integer dictionary.num_nodes()— total unique nodes.
src/eval/mod.rs defines query-language-agnostic evaluator traits:
Evaluatoruses associated types forQuery,Result,Error, andPrepared. The graph backend stays a method-level generic (G: GraphDecomposition) so one evaluator type can run against any graph backend selected at the call site.PreparedEvaluatorrepresents prepared(query, graph)state that can be executed repeatedly, which is used by benchmark timing loops.ResultCountis separate fromEvaluator::Result; only CLI runners that need counts require this bound, leaving room for future evaluators with richer result types.
InMemoryBuilder is the primary GraphBuilder implementation.
It collects edges in RAM, then build() calls
GraphBLAS to create one GrB_Matrix per label via COO format, wraps each in an
LAGraph_Graph, and returns an InMemoryGraph.
Multiple CSV sources can be chained with repeated .load() calls; all edges are merged
into a single graph.
Node ID representation: Internally, InMemoryBuilder uses HashMap<usize, String> for
id_to_node (changed from Vec<String> to support sparse/pre-assigned IDs from MatrixMarket).
The set_node_map() method allows bulk-installing a node mapping,
which is used by the MatrixMarket loader.
Three built-in parsers are available, each yielding
Iterator<Item = Result<Edge, FormatError>> and pluggable into
GraphBuilder::load() via GraphSource<InMemoryBuilder> (see src/graph/inmemory.rs).
Csv<R> parses delimiter-separated edge files.
Configuration is via CsvConfig:
| Field | Default | Description |
|---|---|---|
source_column |
Index(0) |
Column for the source node (by index or name) |
target_column |
Index(1) |
Column for the target node |
label_column |
Index(2) |
Column for the edge label |
has_header |
true |
Whether the first row is a header |
delimiter |
b',' |
Field delimiter byte |
ColumnSpec is either Index(usize) or Name(String).
Name-based lookup requires has_header: true.
MatrixMarket loads an edge-labeled graph from a directory with:
vertices.txt— one line per node:<node_name> <1-based-index>on disk;get_node_idreturns the matching 0-based matrix indexedges.txt— one line per label:<label_name> <1-based-index>(selectsn.txt)<n>.txt— MatrixMarket adjacency matrix for label with indexn
Names in mapping files may be written with SPARQL-style angle brackets (e.g. <Article1>).
parse_index_map strips a single pair of surrounding </> so
dictionary keys match short labels (Article1), aligning with IRIs after
RpqQuery::strip_base on SPARQL-derived queries.
The loader uses LAGraph_MMRead to parse each .txt file into a
GrB_Matrix, then wraps it in an LAGraph_Graph. Vertex indices from vertices.txt are
converted to 0-based and installed via InMemoryBuilder::set_node_map().
Helper functions:
load_mm_file(path)— reads a single MatrixMarket file into aGrB_Matrix.parse_index_map(path)— parses<name> <index>lines; indices must be >= 1 and unique within the file.
MatrixMarket implements GraphSource<InMemoryBuilder> in src/graph/inmemory.rs (see the impl at line 215): vertices.txt maps are converted from 1-based file indices to 0-based matrix ids before set_node_map; edges.txt indices are unchanged for n.txt lookup.
Rdf is a unified parser for RDF formats using oxttl and oxrdf.
It supports both N-Triples (.nt) and Turtle (.ttl) formats via the RdfFormat enum.
Each triple (subject, predicate, object) becomes an Edge where:
source— subject IRI or blank-node ID (_:label).target— object IRI or blank-node ID; triples whose object is an RDF literal yieldErr(FormatError::LiteralAsNode)(callers may filter these out).label— full predicate IRI string (including fragment#…when present).
Constructor:
Rdf::from_path(path)— auto-detects format from file extension (.nt→ N-Triples,.ttl→ Turtle). Parses in parallel using memory-mapping and rayon.
Format detection via RdfFormat::from_path(path):
| Extension | Format |
|---|---|
.nt, .ntriples |
RdfFormat::NTriples |
.ttl, .turtle |
RdfFormat::Turtle |
Example usage:
use pathrex::formats::Rdf;
use pathrex::graph::{Graph, InMemory};
// Auto-detect from extension
let graph = Graph::<InMemory>::try_from(
Rdf::from_path("data.ttl")?
)?;The sparql module uses the spargebra
crate to parse SPARQL 1.1 query strings and build a pathrex-native RpqQuery
for RPQ evaluators.
Supported query form: SELECT queries with exactly one triple or property
path pattern in the WHERE clause. Relative IRIs such as <knows> require a
BASE declaration (or PREFIX / full IRIs). Example:
BASE <http://example.org/>
SELECT ?x ?y WHERE { ?x <knows>/<likes>* ?y . }Key public items:
parse_rpq(sparql)— parses a SPARQL string withSparqlParserand returns anRpqQuery.extract_rpq(query)— validates a parsed [spargebra::Query] is aSELECTwith a single path pattern and returns anRpqQuery. Use this when you construct a customSparqlParser(e.g. with prefix declarations) and callparse_queryyourself.ExtractError— error enum for extraction failures (NotSelect,NotSinglePath,UnsupportedSubject,UnsupportedObject,VariablePredicate). Converts toRpqError::Extractvia#[from].
Call RpqQuery::strip_base when graph edge labels are short names
and the parsed query contains full IRIs sharing a common prefix.
The module handles spargebra's desugaring of sequence paths (?x <a>/<b>/<c> ?y)
from a chain of BGP triples back into a single path expression.
The rpq module provides an abstraction for evaluating
Regular Path Queries (RPQs) over edge-labeled graphs using GraphBLAS/LAGraph.
Key public items:
Endpoint—Variable(String)orNamed(String)(IRI string).PathExpr—Label,Sequence,Alternative,ZeroOrMore,OneOrMore,ZeroOrOne.RpqQuery—{ subject, path, object }using the types above;strip_base(&mut self, base)removes a shared IRI prefix from named endpoints and labels.RpqEvaluator— marker subtrait overEvaluator<Query = RpqQuery, Error = RpqError>, preserving the RPQ-facing trait name while the generic evaluator hierarchy lives insrc/eval/.PreparedRpq— marker subtrait overPreparedEvaluator<Error = RpqError>.RpqError— unified error type for RPQ parsing and evaluation:Parse(SPARQL syntax),Extract(query extraction),UnsupportedPath,VertexNotFound, andGraph(wrapsGraphErrorfor label-not-found and GraphBLAS/LAGraph failures).
NfaRpqResult wraps a [GraphblasVector] of reachable target
vertices. When the subject is a variable, every vertex is used as a source and
LAGraph_RegularPathQuery returns the union of targets — individual (source, target)
pairs are not preserved.
RpqMatrixEvaluator compiles [PathExpr] into a Boolean matrix plan
over label adjacency matrices and runs [LAGraph_RPQMatrix]. It returns
RpqMatrixResult: the path-relation nnz plus a
[GraphblasMatrix] duplicate of the result matrix (full reachability relation for the path).
Subject/object do not filter the matrix; a named subject is only validated to exist.
Bound objects are not supported yet ([RpqError::UnsupportedPath]).
NTriples<R> parses W3C N-Triples
RDF files using oxttl and oxrdf. Each triple (subject, predicate, object) becomes an
Edge where:
source— subject IRI or blank-node ID (_:label).target— object IRI or blank-node ID; triples whose object is an RDF literal yieldErr(FormatError::LiteralAsNode)(callers may filter these out).label— full predicate IRI string (including fragment#…when present).
Constructor:
NTriples::new(reader)— parses the stream; each predicate IRI is copied verbatim to the edge label.
The rpq module provides an abstraction for evaluating
Regular Path Queries (RPQs) over edge-labeled graphs using GraphBLAS/LAGraph.
Key public items:
Endpoint—Variable(String)orNamed(String)(IRI string).PathExpr—Label,Sequence,Alternative,ZeroOrMore,OneOrMore,ZeroOrOne.RpqQuery—{ subject, path, object }using the types above;strip_base(&mut self, base)removes a shared IRI prefix from named endpoints and labels.RpqEvaluator— marker subtrait overEvaluator<Query = RpqQuery, Error = RpqError>, preserving the RPQ-facing trait name while the generic evaluator hierarchy lives insrc/eval/.PreparedRpq— marker subtrait overPreparedEvaluator<Error = RpqError>.RpqError— unified error type for RPQ parsing and evaluation:Parse(SPARQL syntax),Extract(query extraction),UnsupportedPath,VertexNotFound, andGraph(wrapsGraphErrorfor label-not-found and GraphBLAS/LAGraph failures).
NfaRpqEvaluator implements [RpqEvaluator] by:
- Converting a [
PathExpr] into anNfavia Thompson's construction (Nfa::from_path_expr()). - Eliminating ε-transitions via epsilon closure (
NfaBuilder::epsilon_closure()). - Building one
LAGraph_Graphper NFA label transition (Nfa::build_lagraph_matrices()). - Calling [
LAGraph_RegularPathQuery] with the NFA matrices, data-graph matrices, start/final states, and source vertices.
type Result = NfaRpqResult ([GraphblasVector] of reachable targets).
Supported path operators match [PathExpr] variants above. Reverse and
NegatedPropertySet from SPARQL map to [RpqError::UnsupportedPath] when they
appear in extracted paths.
Subject/object resolution: [Endpoint::Variable] means "all vertices";
[Endpoint::Named] resolves to a single vertex via
GraphDecomposition::get_node_id().
NfaRpqResult wraps a [GraphblasVector] of reachable target
vertices. When the subject is a variable, every vertex is used as a source and
LAGraph_RegularPathQuery returns the union of targets — individual (source, target)
pairs are not preserved.
RpqMatrixEvaluator compiles [PathExpr] into a Boolean matrix plan
over label adjacency matrices and runs [LAGraph_RPQMatrix]. It returns
RpqMatrixResult: the path-relation nnz plus a
[GraphblasMatrix] duplicate of the result matrix (full reachability relation for the path).
Subject/object do not filter the matrix; a named subject is only validated to exist.
Bound objects are not supported yet ([RpqError::UnsupportedPath]).
With the bench feature enabled, src/cli/dispatch.rs is the single
mapping from Algo variants to concrete evaluator types. dispatch_query
and dispatch_bench each perform one exhaustive match per requested algorithm, then call
generic runners (run_query_for_evaluator<E> and run_bench_for_evaluator<E>) that are
monomorphized for the selected evaluator.
Adding a new algorithm requires a new Algo variant, its Display arm, one dispatch_query
arm, one dispatch_bench arm, an impl Evaluator for the evaluator type, and an
impl ResultCount for any result type used by CLI count reporting.
lagraph_sys exposes raw C bindings for GraphBLAS and
LAGraph. Safe Rust wrappers live in graph::mod:
LagraphGraph— RAII wrapper aroundLAGraph_Graph(callsLAGraph_Deleteon drop). Also providesLagraphGraph::from_coo()to build directly from COO arrays.GraphblasVector— RAII wrapper aroundGrB_Vector(derivesDebug).GraphblasMatrix— RAII wrapper aroundGrB_Matrix(dup+freeon drop).ensure_grb_init()— internal one-timeLAGraph_Initviastd::sync::Once. Called automatically by RAII-wrapped constructors (LagraphGraph::from_coo,LagraphGraph::from_matrix,ThreadScope::enter) and byload_mm_file. Crate-private; no other code should call it.
Two #[macro_export] macros handle FFI error mapping:
grb_ok!(expr)— evaluates a GraphBLAS call insideunsafe, maps thei32return toResult<(), GraphError::GraphBlas(info)>.la_ok!(fn::path(args…))— evaluates a LAGraph call, automatically appending the required*mut i8message buffer, and maps failure toGraphError::LAGraph(info, msg).
A convenience function is also provided:
build_graph(edges)— builds anInMemoryGraphfrom a slice of(&str, &str, &str)triples (source, target, label). Used by integration tests.
- Rust edition 2024.
- Error handling via
thiserrorderive macros; three main error enums:GraphError,FormatError, andRpqError. FormatErrorconverts intoGraphErrorvia#[from] FormatErroron theGraphError::Formatvariant.GraphErrorconverts intoRpqErrorvia#[from] GraphErroron theRpqError::Graphvariant, enabling?propagation in evaluators.- Unsafe FFI calls are confined to
lagraph_sys,graph/mod.rs,graph/inmemory.rs,rpq/nfarpq.rs. All raw pointers are wrapped in RAII types that free resources on drop. unsafe impl Send + Syncis provided forLagraphGraph,GraphblasVector, andGraphblasMatrixbecause GraphBLAS handles are thread-safe after init.- Unit tests live in
#[cfg(test)] mod testsblocks inside each module. Integration tests that need GraphBLAS live intests/inmemory_tests.rs,tests/mm_tests.rs,tests/nfarpq_tests.rs.
# Run all tests (LAGraph installed system-wide)
LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose
# If LAGraph is NOT installed system-wide (only built in the submodule):
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test --verboseTests in src/graph/mod.rs use CountingBuilder / CountOutput / VecSource from
src/utils.rs — these do not call into GraphBLAS and run without
native libraries.
Tests in src/formats/csv.rs and src/formats/rdf.rs are pure Rust and need no native dependencies.
Tests in src/sparql/mod.rs are pure Rust and need no native dependencies.
Tests in src/rpq/nfarpq.rs (NFA construction unit tests) are pure Rust and need no
native dependencies.
Tests in src/graph/inmemory.rs, tests/inmemory_tests.rs,
tests/mm_tests.rs, tests/nfarpq_tests.rs,
and tests/rpqmatrix_tests.rs call real GraphBLAS/LAGraph and
require the native libraries to be present.
The GitHub Actions workflow (.github/workflows/ci.yml)
runs on every push and PR across stable, beta, and nightly toolchains:
- Checks out with
submodules: recursive. - Installs cmake, libclang-dev, clang.
- Builds and installs SuiteSparse:GraphBLAS from source (
sudo make install). - Builds and installs LAGraph from the submodule (
sudo make install). cargo build --features regenerate-bindings— rebuilds FFI bindings.LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose— runs the full test suite.