Add graph-store PPR E2E wiring by mkolodner-sc · Pull Request #655 · Snapchat/GiGL

mkolodner-sc · 2026-05-29T17:06:50Z

Summary

Stacked on #645.

Enables PPR sampling in the existing homogeneous Graph Store training and inference entrypoints, then adds a short E2E test that exercises that path.

Changes include:

Add parse_sampler_options for sampler_type: ppr task args.
Thread optional SamplerOptions through homogeneous Graph Store training and inference loaders.
Keep k-hop as the default behavior when no sampler type is provided.
Default Graph Store local_world_size from GraphStoreInfo and fail fast when it disagrees with the cluster topology.
Add the homogeneous Cora Graph Store PPR E2E config, E2E test registration, and Makefile target.

…re_ppr_e2e

mkolodner-sc · 2026-05-29T17:11:26Z


 logger = Logger()

-# Default number of inference processes per machine incase one isnt provided in inference args


In Graph Store mode, the source of truth should be cluster_info.num_processes_per_compute, not a local CPU/GPU heuristic. The previous fallback could make inference spawn a different number of compute processes than storage expected, causing storage rendezvous failures like “only N/M clients joined.”

…re_ppr_e2e

kmontemayor2-sc · 2026-05-29T18:48:10Z

+def parse_sampler_options(args: Mapping[str, str]) -> Optional[SamplerOptions]:
+    sampler_type = args.get("sampler_type", "khop").strip().lower().replace("-", "_")
+    if sampler_type == "":
+        sampler_type = "khop"
+
+    if sampler_type in {"khop", "k_hop", "neighbor", "neighbor_sampler"}:
+        return None
+
+    if sampler_type != "ppr":
+        raise ValueError(
+            f"Unsupported sampler_type={sampler_type}. Expected one of: khop, ppr."
+        )
+
+    max_ppr_nodes = args.get("ppr_max_nodes")
+    if max_ppr_nodes is None:
+        max_ppr_nodes = args.get("ppr_max_ppr_nodes", "50")
+
+    num_neighbors_per_hop = args.get("ppr_neighbors_per_hop")
+    if num_neighbors_per_hop is None:
+        num_neighbors_per_hop = args.get("ppr_num_neighbors_per_hop", "1000")
+
+    return PPRSamplerOptions(
+        alpha=float(args.get("ppr_alpha", "0.5")),
+        eps=float(args.get("ppr_eps", "0.0001")),
+        max_ppr_nodes=int(max_ppr_nodes),
+        num_neighbors_per_hop=int(num_neighbors_per_hop),
+        max_fetch_iterations=_parse_optional_int(args.get("ppr_max_fetch_iterations")),
+    )


instead of this can we encode PPRSamplerOptions as a json dict in the config?

Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Ping on this :)

…re_ppr_e2e

mkolodner-sc · 2026-05-29T20:51:57Z

/e2e_test

github-actions · 2026-05-29T20:52:09Z

GiGL Automation

@ 20:52:09UTC : 🔄 E2E Test started.

@ 22:16:13UTC : ✅ Workflow completed successfully.

kmontemayor2-sc

Actually, is there a reason we want a full e2e test here? Why not just example the graph store integration test 1?

We don't really need the "full" e2e test suite here right? And I feel like doing it this way makes our examples more confusing.

kmontemayor2-sc · 2026-05-29T21:04:55Z

+def parse_sampler_options(args: Mapping[str, str]) -> Optional[SamplerOptions]:
+    sampler_type = args.get("sampler_type", "khop").strip().lower().replace("-", "_")
+    if sampler_type == "":
+        sampler_type = "khop"
+
+    if sampler_type in {"khop", "k_hop", "neighbor", "neighbor_sampler"}:
+        return None
+
+    if sampler_type != "ppr":
+        raise ValueError(
+            f"Unsupported sampler_type={sampler_type}. Expected one of: khop, ppr."
+        )
+
+    max_ppr_nodes = args.get("ppr_max_nodes")
+    if max_ppr_nodes is None:
+        max_ppr_nodes = args.get("ppr_max_ppr_nodes", "50")
+
+    num_neighbors_per_hop = args.get("ppr_neighbors_per_hop")
+    if num_neighbors_per_hop is None:
+        num_neighbors_per_hop = args.get("ppr_num_neighbors_per_hop", "1000")
+
+    return PPRSamplerOptions(
+        alpha=float(args.get("ppr_alpha", "0.5")),
+        eps=float(args.get("ppr_eps", "0.0001")),
+        max_ppr_nodes=int(max_ppr_nodes),
+        num_neighbors_per_hop=int(num_neighbors_per_hop),
+        max_fetch_iterations=_parse_optional_int(args.get("ppr_max_fetch_iterations")),
+    )


Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Ping on this :)

mkolodner-sc added 2 commits May 29, 2026 17:04

Add graph-store PPR E2E wiring

a49a650

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

8c1dd36

…re_ppr_e2e

mkolodner-sc commented May 29, 2026

View reviewed changes

mkolodner-sc added 5 commits May 29, 2026 17:18

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

851ed8b

…re_ppr_e2e

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

ab6aecd

…re_ppr_e2e

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

ee5806b

…re_ppr_e2e

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

98bb3f9

…re_ppr_e2e

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

f0e3275

…re_ppr_e2e

kmontemayor2-sc reviewed May 29, 2026

View reviewed changes

mkolodner-sc added 2 commits May 29, 2026 20:30

Merge branch 'mkolodner-sc/ppr_gs_memory' into mkolodner-sc/graph_sto…

a24e32a

…re_ppr_e2e

Configure graph-store PPR sampler options inline

2f35f22

kmontemayor2-sc reviewed May 29, 2026

View reviewed changes

Clarify graph-store PPR sampler args

188525f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graph-store PPR E2E wiring#655

Add graph-store PPR E2E wiring#655
mkolodner-sc wants to merge 10 commits into
mkolodner-sc/ppr_gs_memoryfrom
mkolodner-sc/graph_store_ppr_e2e

mkolodner-sc commented May 29, 2026 •

edited

Loading

Uh oh!

mkolodner-sc May 29, 2026

Uh oh!

kmontemayor2-sc May 29, 2026

Uh oh!

kmontemayor2-sc May 29, 2026

Uh oh!

Uh oh!

Uh oh!

mkolodner-sc commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

kmontemayor2-sc left a comment

Uh oh!

Uh oh!

kmontemayor2-sc May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		logger = Logger()

		# Default number of inference processes per machine incase one isnt provided in inference args

Conversation

mkolodner-sc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

mkolodner-sc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

kmontemayor2-sc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

kmontemayor2-sc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mkolodner-sc commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

kmontemayor2-sc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kmontemayor2-sc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkolodner-sc commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading