Skip to content

Comments

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests#3709

Merged
mattkjames7 merged 5 commits intomasterfrom
mage-pr-710
Jan 30, 2026
Merged

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests#3709
mattkjames7 merged 5 commits intomasterfrom
mage-pr-710

Conversation

@mattkjames7
Copy link
Contributor

@mattkjames7 mattkjames7 commented Jan 23, 2026

MAGE PR #710 transferred to this repo. Closes #3564.

Summary

This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02 / CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing GPU-accelerated graph algorithms up to date with current NVIDIA tooling.

Key Changes

Infrastructure Upgrade:

  • CUDA 11.5.2 → 13.1.0
  • RAPIDS/cuGraph 22.02 → 25.12
  • Ubuntu 20.04 → 24.04
  • Python 3.8 → 3.12

API Migration:
All 9 cuGraph algorithms updated to use the modern pylibcugraph API:

  • cugraph::pagerankcugraph::pagerank() with explicit graph view
  • cugraph::betweenness_centrality → normalized output handling
  • cugraph::hits → proper hub/authority vector management
  • cugraph::katz_centrality → updated alpha/beta parameter handling
  • cugraph::louvain / cugraph::leiden → new clustering return types
  • cugraph::personalized_pagerank → vertex list handling

Legacy API Preserved:
Two algorithms remain on cugraph::ext_raft:: API as they haven't been migrated in RAPIDS 25.x:

  • balanced_cut_clustering
  • spectral_clustering

E2E Tests Added

Comprehensive end-to-end tests for all 9 algorithms following MAGE's existing test framework:

e2e/pagerank_test/test_cugraph_networkx_validation/
e2e/betweenness_centrality_test/test_cugraph_networkx_validation/
e2e/hits_test/test_cugraph_networkx_validation/
e2e/katz_test/test_cugraph_networkx_validation/
e2e/louvain_test/test_cugraph_networkx_validation/
e2e/leiden_cugraph_test/test_cugraph_networkx_validation/
e2e/personalized_pagerank_test/test_cugraph_networkx_validation/
e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/
e2e/spectral_clustering_test/test_cugraph_networkx_validation/

Each test uses a 9-node two-community graph topology with expected values validated against NetworkX ground truth (5% tolerance for GPU floating-point variance).

Validation Script

Added scripts/validate_cugraph_algorithms.py - a standalone debugging tool that:

  1. Builds identical graph in NetworkX (ground truth)
  2. Spins up Memgraph container with cuGraph modules
  3. Runs each algorithm and compares against NetworkX
  4. Reports pass/fail with detailed value comparisons

This is for developer debugging, not CI.

Test Plan

  • All 9 cuGraph algorithms pass validation against NetworkX ground truth
  • Docker image builds successfully with Dockerfile.cugraph
  • E2E tests follow existing MAGE test conventions
  • CI pipeline runs (pending merge)

Breaking Changes

None. All algorithm signatures and return types preserved.

… e2e tests (#710)

## Summary

This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02
/ CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing
GPU-accelerated graph algorithms up to date with current NVIDIA tooling.

### Key Changes

**Infrastructure Upgrade:**
- CUDA 11.5.2 → 13.1.0
- RAPIDS/cuGraph 22.02 → 25.12
- Ubuntu 20.04 → 24.04
- Python 3.8 → 3.12

**API Migration:**
All 9 cuGraph algorithms updated to use the modern pylibcugraph API:
- `cugraph::pagerank` → `cugraph::pagerank()` with explicit graph view
- `cugraph::betweenness_centrality` → normalized output handling
- `cugraph::hits` → proper hub/authority vector management
- `cugraph::katz_centrality` → updated alpha/beta parameter handling
- `cugraph::louvain` / `cugraph::leiden` → new clustering return types
- `cugraph::personalized_pagerank` → vertex list handling

**Legacy API Preserved:**
Two algorithms remain on `cugraph::ext_raft::` API as they haven't been
migrated in RAPIDS 25.x:
- `balanced_cut_clustering`
- `spectral_clustering`

### E2E Tests Added

Comprehensive end-to-end tests for all 9 algorithms following MAGE's
existing test framework:

```
e2e/pagerank_test/test_cugraph_networkx_validation/
e2e/betweenness_centrality_test/test_cugraph_networkx_validation/
e2e/hits_test/test_cugraph_networkx_validation/
e2e/katz_test/test_cugraph_networkx_validation/
e2e/louvain_test/test_cugraph_networkx_validation/
e2e/leiden_cugraph_test/test_cugraph_networkx_validation/
e2e/personalized_pagerank_test/test_cugraph_networkx_validation/
e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/
e2e/spectral_clustering_test/test_cugraph_networkx_validation/
```

Each test uses a 9-node two-community graph topology with expected
values validated against NetworkX ground truth (5% tolerance for GPU
floating-point variance).

### Validation Script

Added `scripts/validate_cugraph_algorithms.py` - a standalone debugging
tool that:
1. Builds identical graph in NetworkX (ground truth)
2. Spins up Memgraph container with cuGraph modules
3. Runs each algorithm and compares against NetworkX
4. Reports pass/fail with detailed value comparisons

This is for developer debugging, not CI.

## Test Plan

- [x] All 9 cuGraph algorithms pass validation against NetworkX ground
truth
- [x] Docker image builds successfully with `Dockerfile.cugraph`
- [x] E2E tests follow existing MAGE test conventions
- [ ] CI pipeline runs (pending merge)

## Breaking Changes

None. All algorithm signatures and return types preserved.

---------

Co-authored-by: matt <mattkjames7@gmail.com>
@mattkjames7
Copy link
Contributor Author

mattkjames7 commented Jan 23, 2026

Tracking

  • [Link to Epic/Issue]

Standard development

CI Testing Labels

  • Select the appropriate CI test labels (CI -build=build-name -test=test-suite)

Documentation checklist

  • Add the documentation label
  • Add the bug / feature label
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02 / CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing GPU-accelerated graph algorithms up to date with current NVIDIA tooling. #3709
  • cuGraph API changes documentation#1517
    • Is back linked to this development PR

@mattkjames7
Copy link
Contributor Author

TODO: update docs with changes to module inputs

@mattkjames7 mattkjames7 marked this pull request as ready for review January 23, 2026 12:21
@mattkjames7 mattkjames7 requested a review from DavIvek January 23, 2026 12:21
@antejavor antejavor requested review from antejavor and removed request for DavIvek January 23, 2026 12:29
@mattkjames7 mattkjames7 marked this pull request as draft January 28, 2026 09:20
Copy link
Contributor

@antejavor antejavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge PR that mostly contains the changes in the cuGraph API and updates from legacy.

Since there is validate_cugraph_algorithms.py script there, I assume that the changes here are correct.

The only missing step is to update the docs API, since some args have changed in this PR.

@antejavor
Copy link
Contributor

Two questions @mattkjames7 now that Mage is in the main repo, are we running the Mage test anywhere? Did you try to use this on Nvidia hardware? I assume you did.

@mattkjames7
Copy link
Contributor Author

This is a huge PR that mostly contains the changes in the cuGraph API and updates from legacy.

Since there is validate_cugraph_algorithms.py script there, I assume that the changes here are correct.

The only missing step is to update the docs API, since some args have changed in this PR.

I haven't written the docs for this just yet, but I will link them here and ping once I have. I think the API changes are fairly minimal.

Two questions @mattkjames7 now that Mage is in the main repo, are we running the Mage test anywhere? Did you try to use this on Nvidia hardware? I assume you did.

We run the MAGE tests in this repo now as part of the diff workflow + daily build and RC build. We do not run any cuGraph tests in CI - the image built with this code will only run if a Nvidia GPU is present, though I did build it and run the tests locally. There will hopefully soon be a cuGraph image that is built regularly too, with this WIP PR: #3723

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@mattkjames7 mattkjames7 added this pull request to the merge queue Jan 30, 2026
Merged via the queue into master with commit 203b09d Jan 30, 2026
39 of 40 checks passed
@mattkjames7 mattkjames7 deleted the mage-pr-710 branch January 30, 2026 17:05
@mattkjames7 mattkjames7 linked an issue Feb 4, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cugraph build not up to date Use latest cugraph

3 participants