Skip to content

Implement regular path query algorithm with pathes#13

Open
suvorovrain wants to merge 11 commits into
stablefrom
2-rpq-alloc
Open

Implement regular path query algorithm with pathes#13
suvorovrain wants to merge 11 commits into
stablefrom
2-rpq-alloc

Conversation

@suvorovrain
Copy link
Copy Markdown

@suvorovrain suvorovrain commented May 19, 2026

Algorithm for Evaluating 2-RPQ Queries with Path Storing

This PR introduces an algorithm for evaluating 2-RPQ queries with path storing.

The algorithm supports three path semantics:

  1. ALL-SHORTEST-PATHS
  2. ALL-SIMPLE-PATHS
  3. ALL-TRAILS

These semantics correspond to the path semantics used in the
MillenniumDB Path Query Challenge.

Algorithm idea

The algorithm is based on the standard linear-algebra approach for evaluating
RPQ. In the classical version, RPQ evaluation is reduced to
reachability computation over Boolean matrices: graph labels are represented as
adjacency matrices, the query is represented as a finite automaton, and matrix
operations propagate reachable graph/automaton states.

This PR generalizes that approach by replacing the Boolean semiring with custom
path semirings. Instead of storing only whether a state is reachable, matrix
elements store path information. Semiring operations define how paths are
extended, combined, and filtered according to the selected semantics.

In particular:

  • multiplication extends existing paths with graph edges;
  • addition merges alternative paths reaching the same state;
  • semantic-specific checks reject invalid path extensions:
    • repeated vertices are forbidden for ALL-SIMPLE-PATHS;
    • repeated edges are forbidden for ALL-TRAILS;
    • only shortest paths are preserved for ALL-SHORTEST-PATHS.

As a result, the Boolean reachability algorithm can be seen as a special case of
the same framework, while path-producing semantics are implemented by changing
the underlying semiring.
One step of algorithm with ALL-SIMPLE semantic are provided on following figure:
image

Memory management

Path-storing evaluation may create a large number of intermediate path objects.
To reduce allocation overhead, the implementation uses a custom allocator for
path data structures. The allocator centralizes ownership of intermediate path
objects and allows the algorithm to allocate paths efficiently during semiring
operations, instead of relying on many small heap allocations. Also this approach provide
the opportunity to work with variable-size object using custom structures in GraphBLAS
primitives which size must be constant.

@suvorovrain suvorovrain changed the title Implement regular path query algorithm Implement regular path query algorithm with pathes May 19, 2026
georgiy-belyanin and others added 8 commits May 19, 2026 15:53
This commit adds an implementation of the regular path query algorithm based
on linear-algebra graph processing approach. The algorithm finds a set of nodes
in a edge-labelled directed graph. These nodes are reachable by paths starting
from one of source nodes and having edges labels conform a word from the
specified regular language.

This algorithm is based on the bread-first-search algorithm over the adjacency
matrices. Regular languages are defined by non-deterministic finite
automaton. The algorithm considers the paths on which "label words" are accepted
by the specified NFA.

The algorithm is used with the following inputs:
* A regular automaton adjacency matrix decomposition.
* A graph adjacency matrix decomposition.
* An array of the starting node indices.

It results with a vector, having v[i] = 1 iff the node is reachable by a
path satisfying the provided regular constraints.
This patch is used to make the regular path query algorithm work with
2-RPQs. 2-RPQs represent RPQs extended with possibility of traversing
graphs into the directions opposite to the presented edges.

E.g. SPARQL 2-RPQ `Alice ^<mother> <daughter> ?x` could be used to find
Alice and all of her sisters by getting all Alice mother's daughters.

2-RPQ support is provided by adding two extra parameters to the RPQ
algorithm. One of them is used to specify some of the provided labels as
inversed. The second one inverses the whole query allowing to execute
single-destination RPQs (e.g. `?x <Son> Bob` gets Bob's parents).
This patch provides a workaround for benchmarking 2-RPQ algorithm on
a few real-world datasets like Wikidata or yago-2s by allowing
duplicates in MatrixMarket files corresponding to boolean matrices
since most of the publicly available graphs likely to have duplicates.
Handle too many paths via custom arena-based linear allocator that is
cleared at the end of the 2RPQ ALL PATHS procedure. It is used to
construct elements of matrices having too many paths in them. It also
offers OOM detection.
This patch introduces ALL SHORTEST PATH semantics in the regular path
query algorithm. The key insight is really similar to the reachability
(i.e. ENPOINTS) semantics described in detail in [^1].

The idea of SINGLE SOURCE ALL SHORTEST PATH semantics is for a given
query $Q$, a graph $G$, and a vertex $s$ is for all vertices $v$ to find
all minimum length paths from $s$ to $v$.

The implementation combines custom semirings for ALL PATHS along with
filtering already-visited pairs of NFA states and graph vertices.

[^1] https://arxiv.org/abs/2412.10287
@suvorovrain suvorovrain changed the base branch from 2-rpq-path-pr to stable May 19, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants