Skip to content

Comments

Spanning tree new process and driver#3073

Open
cvvergara wants to merge 31 commits intopgRouting:developfrom
cvvergara:spanningTree-new-process-and-driver
Open

Spanning tree new process and driver#3073
cvvergara wants to merge 31 commits intopgRouting:developfrom
cvvergara:spanningTree-new-process-and-driver

Conversation

@cvvergara
Copy link
Member

@cvvergara cvvergara commented Feb 19, 2026

Fixes #3075 .

Changes proposed in this pull request:

  • Creates and use a spanning tree process and driver

@pgRouting/admins

Summary by CodeRabbit

Release Notes

  • Refactor
    • Reorganized spanning tree algorithms (Kruskal and Prim) with improved internal architecture
    • Unified spanning tree computation drivers for consistent processing across algorithms
    • Refactored graph traversal methods (DFS and BFS) with standardized root vertex handling

- Using set instead of vector for roots
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

This PR refactors spanning tree algorithms (Prim, Kruskal, BFS, DFS, driving distance) by creating a centralized process and driver infrastructure. It moves core algorithm implementations from the functions namespace to algorithms, removes individual driver files, creates unified pgr_process_spanningTree and do_spanningTree entry points, updates public signatures to use set<int64_t> for roots, and adds enum-based algorithm selection via expanded Which enum values.

Changes

Cohort / File(s) Summary
Spanning Tree Algorithm Implementations
include/spanningTree/kruskal.hpp, include/spanningTree/mst.hpp, include/spanningTree/prim.hpp
Moved class definitions from functions to algorithms namespace; updated method signatures to accept const std::set<int64_t>& roots instead of vectors; refactored internal state management and added private helper functions.
Spanning Tree Source Files
src/spanningTree/kruskal.cpp, src/spanningTree/prim.cpp, src/spanningTree/spanningTree_driver.cpp, src/spanningTree/spanningTree_process.cpp
Added new wrapper functions in functions namespace delegating to algorithms; created unified driver with enum-based algorithm selection; added centralized process handler with SQL parsing and error management.
Deleted Spanning Tree Drivers
src/spanningTree/kruskal_driver.cpp, src/spanningTree/prim_driver.cpp, src/spanningTree/mst_common.cpp, src/spanningTree/details.cpp
Removed legacy driver implementations and utility functions; consolidation into new unified process/driver pattern.
Traversal Algorithm Updates
include/traversal/depthFirstSearch.hpp, src/traversal/depthFirstSearch.c, src/traversal/depthFirstSearch_driver.cpp
Moved class from functions to algorithms namespace; updated to use set<int64_t> roots; integrated into unified spanning tree process; removed dedicated driver.
Breadth-First Search Updates
include/breadthFirstSearch/breadthFirstSearch.hpp, src/breadthFirstSearch/breadthFirstSearch.c, src/breadthFirstSearch/breadthFirstSearch_driver.cpp
Updated to delegate through new spanning tree process; removed dedicated driver; added free function wrapper in functions namespace.
Driving Distance Integration
include/dijkstra/drivingDist.hpp, src/driving_distance/driving_distance.c, src/driving_distance/driving_distance_driver.cpp
Moved function from algorithm to functions namespace; removed dedicated driver; integrated into unified spanning tree process with parameter adjustments.
Enum and Infrastructure Updates
include/c_common/enums.h, include/process/spanningTree_process.h, include/drivers/spanningTree_driver.hpp
Added KRUSKAL, PRIM, DFS, BFS variants with explicit enum values; created new process header (renamed from prim_driver.h); updated driver header with new signatures.
Deleted Driver Headers
include/drivers/breadthFirstSearch/..., include/drivers/driving_distance/..., include/drivers/traversal/..., include/drivers/spanningTree/mst_common.h
Removed public C-style driver interfaces; consolidation into unified pattern.
Type Conversion Utilities
include/cpp_common/to_postgres.hpp, src/cpp_common/to_postgres.cpp
Added new overloads for get_tuples supporting MST_rt vector conversion and depth assignment for driving distance results.
General Utilities
include/cpp_common/utilities.hpp, src/cpp_common/utilities.cpp
Added only_root_result function; extended get_name with new algorithm enum mappings; added early return guard for empty points_sql.
Build Configuration
src/spanningTree/CMakeLists.txt, src/breadthFirstSearch/CMakeLists.txt, src/driving_distance/CMakeLists.txt, src/traversal/CMakeLists.txt, tools/scripts/code_checker.sh
Updated source lists: removed old driver files, added new implementation files; added include-order checking for new process file.
Minor Cleanup
src/allpairs/allpairs_process.cpp, src/astar/astar_process.cpp, src/dijkstra/shortestPath_process.cpp, src/ordering/ordering_process.cpp
Removed redundant null-checks in error-handling blocks.
Documentation
NEWS.md, doc/src/release_notes.rst, locale/...
Added release notes and localization entries for spanning tree process/driver enhancement.

Sequence Diagram(s)

sequenceDiagram
    participant SQL as SQL Call
    participant Process as pgr_process_spanningTree
    participant Driver as do_spanningTree (Driver)
    participant Enum as Which Enum
    participant Algo as Algorithm<br/>(Prim/Kruskal/DFS/BFS)
    participant Graph as Graph Construction

    SQL->>Process: Call with edges_sql, roots, mode
    Process->>Process: Validate inputs
    Process->>Enum: Parse fn_suffix to Which
    Enum->>Process: Return enum value
    Process->>Driver: Call do_spanningTree with Which
    Driver->>Graph: Parse edges_sql, build graph
    Driver->>Enum: Switch on Which value
    Enum->>Driver: Route to algorithm
    Driver->>Algo: Instantiate & call<br/>(e.g., Prim/Kruskal)
    Algo->>Algo: Compute spanning tree
    Algo->>Driver: Return MST_rt results
    Driver->>Process: Return computed results
    Process->>SQL: Return tuples to PostgreSQL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

Prim, Kruskal, Rewrite

Suggested reviewers

  • robe2
  • krashish8

Poem

🐰 Spanning trees now dance in one place,
Unified drivers set a steadier pace,
From Prim and Kruskal to BFS so bright,
One process to rule them all—what a delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main objective of the PR: creating and using a new spanning tree process and driver.
Linked Issues check ✅ Passed The PR successfully implements all requirements from issue #3075: creates a new spanning tree process/driver and updates Prim, Kruskal, BFS, DFS, and driving distance functions to use it.
Out of Scope Changes check ✅ Passed All changes align with the stated objectives. Minor cleanup in error handling (pfree redundancy removal) and locale updates are supportive refactoring directly related to the main changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
include/traversal/depthFirstSearch.hpp (1)

76-98: 🧹 Nitpick | 🔵 Trivial

Pre-existing inconsistency: DFS unconditionally emits root rows, BFS does not.

Line 84 pushes a root entry regardless of whether the vertex exists in the graph, whereas the BFS counterpart in include/breadthFirstSearch/breadthFirstSearch.hpp (lines 62-63) only emits the root row inside the has_vertex check. This is pre-existing behavior unchanged by this PR, but worth noting for future alignment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/traversal/depthFirstSearch.hpp` around lines 76 - 98, The
depthFirstSearch function currently unconditionally pushes a root result into
results before checking graph.has_vertex, causing inconsistent behavior versus
BFS; move the results.push_back({root, ...}) so it only executes inside the
has_vertex() block (the same block where depthFirstSearch_single_vertex(...) is
called and get_results(...) is used) to ensure root rows are emitted only for
existing vertices; update references around depthFirstSearch,
depthFirstSearch_single_vertex, get_results, and the local results vector
accordingly.
include/spanningTree/prim.hpp (1)

89-93: ⚠️ Potential issue | 🟠 Major

Pgr_prim::clear() hides Pgr_mst::clear(), leaving base-class state uncleared. When generate_mst calls this->clear(), it invokes the derived class's private clear() method (due to name hiding), which only resets data, predecessors, and distances. The base class fields—m_spanning_tree, m_components, m_tree_id—are not cleared. In contrast, Pgr_kruskal::generate_mst properly clears all state via the base class Pgr_mst::clear().

While the current usage pattern (creating fresh Pgr_prim objects per call in src/spanningTree/prim.cpp) prevents practical issues, this design is fragile and inconsistent with Kruskal. If a Pgr_prim object were reused across multiple algorithm invocations, stale edges from the previous call would accumulate in m_spanning_tree.

Also note: Line 134's cost calculation (auto cost = distances[u] - distances[v]) appears semantically incorrect (should be just distances[v] to get the edge weight). However, graph.get_edge() falls back to returning the minimum-cost edge when no exact match is found, which masks the error in typical cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/spanningTree/prim.hpp` around lines 89 - 93, Pgr_prim::clear() hides
the base Pgr_mst::clear() and therefore leaves base fields (m_spanning_tree,
m_components, m_tree_id) uncleared; update the derived clear to call the base
clear (e.g., call Pgr_mst::clear() from Pgr_prim::clear()) or remove/rename the
derived clear and ensure generate_mst invokes Pgr_mst::clear() so base state is
reset before reuse; also fix the cost calculation in generate_mst where it
currently computes "auto cost = distances[u] - distances[v]"—replace that with
the correct edge weight expression (use distances[v] or otherwise compute the
actual edge weight) so the MST edge cost is computed correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/spanningTree/mst.hpp`:
- Around line 100-113: mstBFS (and similarly mstDFS and mstDD) take the
parameter roots by value then assign it to the member m_roots, creating an extra
copy; replace the copy assignment with a move to avoid the redundant copy by
using std::move(roots) when assigning into m_roots (i.e., change m_roots = roots
to m_roots = std::move(roots)), and apply the same change in mstDFS and mstDD to
use std::move for the by-value roots parameter.

In `@include/spanningTree/prim.hpp`:
- Around line 130-136: The edge cost is computed inverted causing get_edge(...)
to miss exact matches; in the block where u = predecessors[v] and you compute
auto cost = distances[u] - distances[v], swap the operands so cost =
distances[v] - distances[u] (i.e., compute distances[v] minus
distances[predecessors[v]]), then call graph.get_edge(u, v, cost) and insert
into this->m_spanning_tree.edges to ensure exact cost matching.

In `@src/cpp_common/to_postgres.cpp`:
- Around line 267-278: The loop that sets tuples[i].depth uses d.at(row.node)
which can throw std::out_of_range if row.node is missing and also continues
scanning all maps—change the logic in the for (const auto &d : depths) loop to
first check for row.node with d.find(row.node) (or d.count(row.node)), only
assign depth when the node is present and the earlier condition (itr->second ==
0) holds, and then break out of the depths loop once a matching map is used;
ensure tuples[i].depth is left as -1 when no match is found to preserve current
behavior.

In `@src/driving_distance/driving_distance.c`:
- Line 102: The seq column in pgr_drivingDistance is declared BIGINT but the
code sets values[0] using Int32GetDatum; change the datum constructor to
Int64GetDatum and cast the call counter to int64_t (i.e. use
Int64GetDatum((int64_t)funcctx->call_cntr + 1)) so values[0] matches the BIGINT
seq type and aligns with kruskal.c/prim.c; update the statement that assigns
values[0] in driving_distance.c accordingly.

In `@src/spanningTree/kruskal.cpp`:
- Around line 34-65: Hoist the repeated type alias "using Kruskal =
pgrouting::algorithms::Pgr_kruskal<pgrouting::UndirectedGraph>;" to the
pgrouting::functions namespace scope, and in each wrapper function (kruskal,
kruskalBFS, kruskalDFS, kruskalDD) rename the local instance currently named
"kruskal" to a non-shadowing name like "algo" or "fn" before calling
algo.kruskal(...)/algo.kruskalBFS(...)/algo.kruskalDFS(...)/algo.kruskalDD(...);
this removes duplication and avoids shadowing the kruskal function name.

In `@src/spanningTree/prim.cpp`:
- Around line 37-42: The local variable prim in the function
prim(pgrouting::UndirectedGraph &graph) shadows the function name; rename the
local (e.g., to algo or fn) where you instantiate
pgrouting::algorithms::Pgr_prim<pgrouting::UndirectedGraph> (currently "Prim
prim;") and return via that instance (prim.prim(graph)) updated accordingly;
also apply the same clearer local-name convention for the similar instances in
primBFS, primDFS, and primDD to avoid confusion between the function and local
variable names.

In `@src/spanningTree/spanningTree_driver.cpp`:
- Line 137: The variable hint is being reset to an empty string twice
unnecessarily; remove the redundant assignment(s) to hint (the second reset
after the get_edges call) so hint is only initialized once. Locate uses of hint
in spanningTree_driver.cpp around the get_edges(...) call and the subsequent
redundant assignment and delete the extra `hint = ""` statement(s), leaving the
single initial initialization intact.
- Around line 159-163: Remove the unused local variables enop, eofp,
edges_of_points, and points from spanningTree_driver.cpp (they are declared but
never referenced), and also remove the orphaned include
"withPoints/withPoints.hpp" since Point_on_edge_t is no longer used; keep the
active edges variable and ensure no other code depends on Point_on_edge_t or
that include before committing the change.
- Around line 185-245: The directed branch currently only handles DFS, BFS and
DIJKSTRADD via process<DirectedGraph> and falls through to a generic "Unknown
function on directed graph" for MST enums; update the directed switch (or its
default) to detect MST-related enums (KRUSKAL, KRUSKALBFS, KRUSKALDFS,
KRUSKALDD, PRIM, PRIMBFS, PRIMDFS, PRIMDD) and return a clear error (e.g.,
"Kruskal/Prim algorithms require an undirected graph: " + get_name(which))
instead of the generic message, or alternatively implement the proper MST
handling for directed graphs by invoking the same process/handlers used in the
undirected branch (referencing process<DirectedGraph> or
pgrouting::functions::kruskal/prim as appropriate).
- Around line 169-178: BFS handling is inconsistent when edges.empty(): it
returns early without populating return_tuples/return_count while other
algorithms call pgrouting::only_root_result(roots) and set return_count via
get_tuples; update the BFS branch to mirror others by calling auto emptyresults
= pgrouting::only_root_result(roots); then set return_count =
get_tuples(emptyresults, return_tuples) before returning (or add a clear comment
in the BFS branch explaining intentional different semantics), referencing the
BFS check, only_root_result(), get_tuples(), return_tuples, and return_count to
locate and modify the logic.

In `@src/spanningTree/spanningTree_process.cpp`:
- Around line 136-140: The outer condition already ensures (*result_tuples) is
non-null, so remove the redundant inner "if (*result_tuples)" check in
spanningTree_process.cpp and directly call pfree(*result_tuples), then set
(*result_tuples) = nullptr and (*result_count) = 0; this simplifies the block
that currently checks err.str().empty() and (*result_tuples) and avoids the
duplicate null test.
- Around line 82-83: The error message for the distance validation is
inconsistent with the check: currently code checks "if (distance < 0)" and calls
pgr_throw_error("Negative value found on 'distance'", "Must be positive");
either change the guard to "if (distance <= 0)" to enforce strictly positive
distances, or change the hint to "Must be non-negative" to allow zero; apply the
same change to all four occurrences where distance is validated (the
pgr_throw_error calls for 'distance' in spanningTree_process.cpp) so the
condition and the message remain consistent.

---

Outside diff comments:
In `@include/spanningTree/prim.hpp`:
- Around line 89-93: Pgr_prim::clear() hides the base Pgr_mst::clear() and
therefore leaves base fields (m_spanning_tree, m_components, m_tree_id)
uncleared; update the derived clear to call the base clear (e.g., call
Pgr_mst::clear() from Pgr_prim::clear()) or remove/rename the derived clear and
ensure generate_mst invokes Pgr_mst::clear() so base state is reset before
reuse; also fix the cost calculation in generate_mst where it currently computes
"auto cost = distances[u] - distances[v]"—replace that with the correct edge
weight expression (use distances[v] or otherwise compute the actual edge weight)
so the MST edge cost is computed correctly.

In `@include/traversal/depthFirstSearch.hpp`:
- Around line 76-98: The depthFirstSearch function currently unconditionally
pushes a root result into results before checking graph.has_vertex, causing
inconsistent behavior versus BFS; move the results.push_back({root, ...}) so it
only executes inside the has_vertex() block (the same block where
depthFirstSearch_single_vertex(...) is called and get_results(...) is used) to
ensure root rows are emitted only for existing vertices; update references
around depthFirstSearch, depthFirstSearch_single_vertex, get_results, and the
local results vector accordingly.

@cvvergara cvvergara marked this pull request as draft February 20, 2026 17:04
- Prim & Kruskal:
  - Not shadowing function
  - Reducing duplication
- driver & process: cleanup and message fix
- process removing redundant if
- Using UInt64GetDatum on seq
- Refining get_tuples
- using const reference
@cvvergara cvvergara marked this pull request as ready for review February 21, 2026 01:20
@cvvergara cvvergara requested a review from robe2 February 21, 2026 01:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/spanningTree/prim.hpp`:
- Around line 95-103: Change primTree to accept the vertex descriptor type V
instead of int64_t to avoid the id↔vertex roundtrip: update the primTree
signature from primTree(const G &graph, int64_t root_vertex) to primTree(const G
&graph, V root) and remove the graph.get_V(root_vertex) call inside (use root
directly). Then update all callers (e.g., generate_mst which currently passes
graph.graph[root].id) to pass the vertex descriptor variable (root) rather than
the external id, and adjust any other sites that pass an int64_t id (lines
referenced in the review) to pass V. Ensure predecessors/distances resizing
remains unchanged.

In `@NEWS.md`:
- Line 23: The NEWS.md change was made directly but is auto-generated; instead
add the entry for issue `#3075` into the source data used by the generator (the
inputs consumed by notes2news.pl) rather than editing NEWS.md itself; locate the
notes2news.pl invocation and the repository of release-note source files
referenced by notes2news.pl and add a new note entry for "#3075: Spanning tree:
create and use a process and driver" there, then run the generator to regenerate
NEWS.md so the entry is included properly.

In `@src/driving_distance/driving_distance.c`:
- Line 199: The code uses UInt64GetDatum for the seq BIGINT column in the v3
path; change the assignment of values[0] so it uses Int64GetDatum with an
explicit cast to int64_t (mirroring the v4 path) instead of UInt64GetDatum, i.e.
replace the UInt64GetDatum(funcctx->call_cntr + 1) usage with
Int64GetDatum((int64_t)funcctx->call_cntr + 1) to correctly represent the signed
BIGINT; update the occurrence where values[0], funcctx->call_cntr, and
UInt64GetDatum are referenced in the v3 code path.

In `@src/spanningTree/spanningTree_process.cpp`:
- Around line 104-108: The DIJKSTRADD branch currently rejects negative
distances but logs the hint "Must be positive", which contradicts the check and
the other validation messages; update the error hint in the DIJKSTRADD case (in
spanningTree_process.cpp) to "Must be non negative" so the pgr_throw_error call
for the distance check matches the < 0 condition and the other validation sites.
- Around line 67-95: The code performs fragile enum arithmetic
(static_cast<Which>(static_cast<int>(which) + val)) assuming PRIM/KRUSKAL
variants are contiguous; add compile-time guards by inserting static_asserts
that verify PRIMDD == PRIM + 1, PRIMDFS == PRIM + 2, PRIMBFS == PRIM + 3 and
KRUSKALDD == KRUSKAL + 1, KRUSKALDFS == KRUSKAL + 2, KRUSKALBFS == KRUSKAL + 3
(use the actual enum variant names from Which) before this switch so any
reordering will fail to compile; alternatively replace the arithmetic with an
explicit mapping (e.g., a small switch or helper function that maps (which,
suffix) to the correct Which value) and validate values the same way in the
existing switch on val.

---

Duplicate comments:
In `@src/spanningTree/spanningTree_driver.cpp`:
- Around line 164-174: When edges.empty() the code currently returns immediately
for which == BFS, causing inconsistent behavior with other algorithms; change
the conditional so BFS follows the same path as others by calling
pgrouting::only_root_result(roots), assigning return_count =
get_tuples(emptyresults, return_tuples), and returning, instead of the early
return inside the which == BFS branch; update or remove the TODO(later) comment
(or add a tracked TODO/issue reference) to record this behavioral consistency
fix.

Comment on lines 95 to +103
void primTree(
const G &graph,
int64_t root_vertex);
int64_t root_vertex) {
clear();

void generate_mst(const G &graph) override;

private:
// Member
std::vector<V> predecessors;
std::vector<double> distances;
std::vector<V> data;
std::set<V> m_unassigned;
};
predecessors.resize(graph.num_vertices());
distances.resize(graph.num_vertices());

auto v_root(graph.get_V(root_vertex));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider accepting V directly in primTree to avoid a redundant id↔vertex roundtrip.

generate_mst already holds the vertex descriptor root (type V), converts it to an external id via graph.graph[root].id (line 156), passes it into primTree as int64_t root_vertex (line 97), which then converts it back to V via graph.get_V(root_vertex) (line 103). Accepting V directly would eliminate the unnecessary double conversion.

♻️ Suggested simplification
     void primTree(
             const G &graph,
-            int64_t root_vertex) {
+            V v_root) {
         clear();

         predecessors.resize(graph.num_vertices());
         distances.resize(graph.num_vertices());

-        auto v_root(graph.get_V(root_vertex));
-
         using prim_visitor = visitors::Prim_dijkstra_visitor<V>;

And at the call site in generate_mst:

             primTree(
                     graph,
-                    graph.graph[root].id);
+                    root);

Also applies to: 152-156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/spanningTree/prim.hpp` around lines 95 - 103, Change primTree to
accept the vertex descriptor type V instead of int64_t to avoid the id↔vertex
roundtrip: update the primTree signature from primTree(const G &graph, int64_t
root_vertex) to primTree(const G &graph, V root) and remove the
graph.get_V(root_vertex) call inside (use root directly). Then update all
callers (e.g., generate_mst which currently passes graph.graph[root].id) to pass
the vertex descriptor variable (root) rather than the external id, and adjust
any other sites that pass an int64_t id (lines referenced in the review) to pass
V. Ensure predecessors/distances resizing remains unchanged.

* [#3056](https://github.com/pgRouting/pgrouting/issues/3056): edwardMoore: use the shortest_path process and driver
* [#3060](https://github.com/pgRouting/pgrouting/issues/3060): dagShortestPath: use the shortest_path process and driver
* [#3064](https://github.com/pgRouting/pgrouting/issues/3064): Astar: create and use a process and driver for Astar
* [#3075](https://github.com/pgRouting/pgrouting/issues/3075): Spanning tree: create and use a process and driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edit NEWS.md via the generator script, not directly.

Direct edits to NEWS.md will be overwritten the next time tools/release-scripts/notes2news.pl is run. The entry for #3075 should be added to the script's source files instead.

Based on learnings from PRs #2970 and #2868: "NEWS.md is auto-generated and manual edits would be overwritten" — the correct procedure is to update the source files consumed by notes2news.pl.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@NEWS.md` at line 23, The NEWS.md change was made directly but is
auto-generated; instead add the entry for issue `#3075` into the source data used
by the generator (the inputs consumed by notes2news.pl) rather than editing
NEWS.md itself; locate the notes2news.pl invocation and the repository of
release-note source files referenced by notes2news.pl and add a new note entry
for "#3075: Spanning tree: create and use a process and driver" there, then run
the generator to regenerate NEWS.md so the entry is included properly.

}
values[0] = Int32GetDatum((int32_t)funcctx->call_cntr + 1);

values[0] = UInt64GetDatum(funcctx->call_cntr + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual code at the specified line
cd src/driving_distance && cat -n driving_distance.c | sed -n '195,205p'

Repository: pgRouting/pgrouting

Length of output: 660


🏁 Script executed:

# Check the v4 path around line 102
cd src/driving_distance && cat -n driving_distance.c | sed -n '98,108p'

Repository: pgRouting/pgrouting

Length of output: 738


🏁 Script executed:

# Look for the SQL function definition to check the return type of seq
fd -e sql . | xargs grep -l "driving_distance" | head -5

Repository: pgRouting/pgrouting

Length of output: 45


🌐 Web query:

PostgreSQL UInt64GetDatum macro version history availability

💡 Result:

UInt64GetDatum(X) (and its companion DatumGetUInt64(X)) was introduced during the PostgreSQL 9.6 development cycle (patch discussion dated 2016‑02‑01 shows these macros being added to src/include/postgres.h).[1]

Availability by major version:

  • PostgreSQL 9.6+: UInt64GetDatum is available in src/include/postgres.h (example from Debian’s PostgreSQL 9.6.6 source package shows the macro present).[2]
  • PostgreSQL ≤ 9.5: not available (extensions commonly carry a compatibility shim and explicitly note “only available in 9.6+”).[3]
  • Still present in current PostgreSQL “master” / supported branches (it appears in current postgres.h documentation listings).[4]

Common compatibility fallback used by extensions for pre-9.6 is effectively:

  • #define UInt64GetDatum(X) Int64GetDatum((int64)(X)) (matches the non-byval definition used in later versions).[2][3]

Sources:
[1] Patch thread showing DatumGetUInt64 / UInt64GetDatum being added (2016‑02‑01) https://postgrespro.com/list/thread-id/1884554
[2] PostgreSQL 9.6 src/include/postgres.h (Debian sources) showing UInt64GetDatum definition https://sources.debian.org/src/postgresql-9.6/9.6.6-0%2Bdeb9u1~bpo8%2B1/src/include/postgres.h
[3] wal2json code comment noting “only available in 9.6+” and providing a fallback macro https://gitee.com/enmotech/wal2json/blob/master/wal2json.c
[4] Current PostgreSQL source documentation index listing UInt64GetDatum() in postgres.h https://doxygen.postgresql.org/globals_u.html


🏁 Script executed:

# Check pgRouting's PostgreSQL version requirements
fd -e txt -e md -e cmake . | xargs grep -i "postgres.*version\|minimum.*postgres" | head -20

Repository: pgRouting/pgrouting

Length of output: 1789


🏁 Script executed:

# Find SQL function definitions for driving_distance
fd -e sql . src/driving_distance | xargs cat

Repository: pgRouting/pgrouting

Length of output: 45


🏁 Script executed:

# Check if v3 function is marked as deprecated in the code
cd src/driving_distance && grep -n "deprecated\|DEPRECATED" driving_distance.c

Repository: pgRouting/pgrouting

Length of output: 350


🏁 Script executed:

# Look for version-specific handling or comments in the file
cd src/driving_distance && head -50 driving_distance.c

Repository: pgRouting/pgrouting

Length of output: 1648


🏁 Script executed:

# Search for SQL function definitions
find . -path "*/sql/*driving_distance*" -type f | head -10

Repository: pgRouting/pgrouting

Length of output: 268


🏁 Script executed:

# If the above doesn't find files, search more broadly for SQL function signatures
fd -e sql . | xargs grep -l "driving_distance" 2>/dev/null | head -5

Repository: pgRouting/pgrouting

Length of output: 45


🏁 Script executed:

# Check the v3 SQL function definition
cat sql/driving_distance/_drivingDistance.sql

Repository: pgRouting/pgrouting

Length of output: 1660


🏁 Script executed:

# Also check the main function for reference
cat sql/driving_distance/drivingDistance.sql

Repository: pgRouting/pgrouting

Length of output: 3180


UInt64GetDatum is semantically incorrect for the BIGINT SQL column type.

The seq column is declared as BIGINT (signed 64-bit) in the SQL function signature. The v4 path (line 102) correctly uses Int64GetDatum((int64_t)funcctx->call_cntr + 1) for this column, while this deprecated v3 path uses UInt64GetDatum, which is semantically incorrect for signed integers. For consistency and correctness, this should match the v4 implementation.

Proposed fix
-        values[0] = UInt64GetDatum(funcctx->call_cntr + 1);
+        values[0] = Int64GetDatum((int64_t)funcctx->call_cntr + 1);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/driving_distance/driving_distance.c` at line 199, The code uses
UInt64GetDatum for the seq BIGINT column in the v3 path; change the assignment
of values[0] so it uses Int64GetDatum with an explicit cast to int64_t
(mirroring the v4 path) instead of UInt64GetDatum, i.e. replace the
UInt64GetDatum(funcctx->call_cntr + 1) usage with
Int64GetDatum((int64_t)funcctx->call_cntr + 1) to correctly represent the signed
BIGINT; update the occurrence where values[0], funcctx->call_cntr, and
UInt64GetDatum are referenced in the v3 code path.

Comment on lines +67 to +95
switch (which) {
case PRIM:
case KRUSKAL:
if (suffix == "DD") {
val = 1;
} else if (suffix == "DFS") {
val = 2;
} else if (suffix == "BFS") {
val = 3;
}

which = static_cast<Which>(static_cast<int>(which) + val);

switch (val) {
case 1:
if (distance < 0) {
pgr_throw_error("Negative value found on 'distance'", "Must be non negative");
}
break;
case 2:
case 3:
if (max_depth < 0) {
pgr_throw_error("Negative value found on 'max_depth'", "Must be non negative");
}
break;
default:
break;
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

# Find the enum definition
fd enums.h | head -5

Repository: pgRouting/pgrouting

Length of output: 87


🏁 Script executed:

# Search for the Which enum definition
rg -n "enum.*Which|PRIM|KRUSKAL" --type=cpp --type=c -A 20 | head -80

Repository: pgRouting/pgrouting

Length of output: 4880


🏁 Script executed:

# Read the enums.h file to find the Which enum definition
cat include/c_common/enums.h | head -100

Repository: pgRouting/pgrouting

Length of output: 1706


Fragile enum arithmetic: relies on contiguous enum layout.

Line 78 (static_cast<Which>(static_cast<int>(which) + val)) assumes PRIM (510) and its variants PRIMDD (511), PRIMDFS (512), PRIMBFS (513) are contiguous—and similarly for KRUSKAL (500) and its variants. While this currently holds in the enum definition, any reordering of these values will silently produce wrong algorithm selection. Consider adding static_assert statements to guard against future enum changes.

Option: Add static_asserts
+    static_assert(static_cast<int>(PRIMDD)  == static_cast<int>(PRIM) + 1, "Enum layout assumption");
+    static_assert(static_cast<int>(PRIMDFS) == static_cast<int>(PRIM) + 2, "Enum layout assumption");
+    static_assert(static_cast<int>(PRIMBFS) == static_cast<int>(PRIM) + 3, "Enum layout assumption");
+    static_assert(static_cast<int>(KRUSKALDD)  == static_cast<int>(KRUSKAL) + 1, "Enum layout assumption");
+    static_assert(static_cast<int>(KRUSKALDFS) == static_cast<int>(KRUSKAL) + 2, "Enum layout assumption");
+    static_assert(static_cast<int>(KRUSKALBFS) == static_cast<int>(KRUSKAL) + 3, "Enum layout assumption");
     switch (which) {
🧰 Tools
🪛 Cppcheck (2.19.0)

[style] 72-72: The function 'pgr_alloc' is never used.

(unusedFunction)


[style] 83-83: The function 'pgr_free' is never used.

(unusedFunction)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spanningTree/spanningTree_process.cpp` around lines 67 - 95, The code
performs fragile enum arithmetic (static_cast<Which>(static_cast<int>(which) +
val)) assuming PRIM/KRUSKAL variants are contiguous; add compile-time guards by
inserting static_asserts that verify PRIMDD == PRIM + 1, PRIMDFS == PRIM + 2,
PRIMBFS == PRIM + 3 and KRUSKALDD == KRUSKAL + 1, KRUSKALDFS == KRUSKAL + 2,
KRUSKALBFS == KRUSKAL + 3 (use the actual enum variant names from Which) before
this switch so any reordering will fail to compile; alternatively replace the
arithmetic with an explicit mapping (e.g., a small switch or helper function
that maps (which, suffix) to the correct Which value) and validate values the
same way in the existing switch on val.

Comment on lines +104 to +108
case DIJKSTRADD:
if (distance < 0) {
pgr_throw_error("Negative value found on 'distance'", "Must be positive");
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent error hint: "Must be positive" vs "Must be non negative".

The < 0 check allows zero, but the hint says "Must be positive" (implying zero is invalid). The other three validation sites (lines 83, 89, 100) correctly use "Must be non negative". This one was missed.

Proposed fix
         case DIJKSTRADD:
             if (distance < 0) {
-                pgr_throw_error("Negative value found on 'distance'", "Must be positive");
+                pgr_throw_error("Negative value found on 'distance'", "Must be non negative");
             }
             break;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spanningTree/spanningTree_process.cpp` around lines 104 - 108, The
DIJKSTRADD branch currently rejects negative distances but logs the hint "Must
be positive", which contradicts the check and the other validation messages;
update the error hint in the DIJKSTRADD case (in spanningTree_process.cpp) to
"Must be non negative" so the pgr_throw_error call for the distance check
matches the < 0 condition and the other validation sites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spanning tree: create an use a process & driver

1 participant