Skip to content

Add graph-store PPR E2E wiring#655

Draft
mkolodner-sc wants to merge 10 commits into
mkolodner-sc/ppr_gs_memoryfrom
mkolodner-sc/graph_store_ppr_e2e
Draft

Add graph-store PPR E2E wiring#655
mkolodner-sc wants to merge 10 commits into
mkolodner-sc/ppr_gs_memoryfrom
mkolodner-sc/graph_store_ppr_e2e

Conversation

@mkolodner-sc
Copy link
Copy Markdown
Collaborator

@mkolodner-sc mkolodner-sc commented May 29, 2026

Summary

Stacked on #645.

Enables PPR sampling in the existing homogeneous Graph Store training and inference entrypoints, then adds a short E2E test that exercises that path.

Changes include:

  • Add parse_sampler_options for sampler_type: ppr task args.
  • Thread optional SamplerOptions through homogeneous Graph Store training and inference loaders.
  • Keep k-hop as the default behavior when no sampler type is provided.
  • Default Graph Store local_world_size from GraphStoreInfo and fail fast when it disagrees with the cluster topology.
  • Add the homogeneous Cora Graph Store PPR E2E config, E2E test registration, and Makefile target.


logger = Logger()

# Default number of inference processes per machine incase one isnt provided in inference args
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Graph Store mode, the source of truth should be cluster_info.num_processes_per_compute, not a local CPU/GPU heuristic. The previous fallback could make inference spawn a different number of compute processes than storage expected, causing storage rendezvous failures like “only N/M clients joined.”

Comment thread gigl/utils/sampling.py Outdated
Comment on lines +102 to +129
def parse_sampler_options(args: Mapping[str, str]) -> Optional[SamplerOptions]:
sampler_type = args.get("sampler_type", "khop").strip().lower().replace("-", "_")
if sampler_type == "":
sampler_type = "khop"

if sampler_type in {"khop", "k_hop", "neighbor", "neighbor_sampler"}:
return None

if sampler_type != "ppr":
raise ValueError(
f"Unsupported sampler_type={sampler_type}. Expected one of: khop, ppr."
)

max_ppr_nodes = args.get("ppr_max_nodes")
if max_ppr_nodes is None:
max_ppr_nodes = args.get("ppr_max_ppr_nodes", "50")

num_neighbors_per_hop = args.get("ppr_neighbors_per_hop")
if num_neighbors_per_hop is None:
num_neighbors_per_hop = args.get("ppr_num_neighbors_per_hop", "1000")

return PPRSamplerOptions(
alpha=float(args.get("ppr_alpha", "0.5")),
eps=float(args.get("ppr_eps", "0.0001")),
max_ppr_nodes=int(max_ppr_nodes),
num_neighbors_per_hop=int(num_neighbors_per_hop),
max_fetch_iterations=_parse_optional_int(args.get("ppr_max_fetch_iterations")),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this can we encode PPRSamplerOptions as a json dict in the config?

Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Ping on this :)

Comment thread gigl/utils/sampling.py Outdated
Comment thread gigl/utils/sampling.py Outdated
@mkolodner-sc
Copy link
Copy Markdown
Collaborator Author

/e2e_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

GiGL Automation

@ 20:52:09UTC : 🔄 E2E Test started.

@ 22:16:13UTC : ✅ Workflow completed successfully.

Copy link
Copy Markdown
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is there a reason we want a full e2e test here? Why not just example the graph store integration test 1?

We don't really need the "full" e2e test suite here right? And I feel like doing it this way makes our examples more confusing.

Comment thread gigl/utils/sampling.py Outdated
Comment on lines +102 to +129
def parse_sampler_options(args: Mapping[str, str]) -> Optional[SamplerOptions]:
sampler_type = args.get("sampler_type", "khop").strip().lower().replace("-", "_")
if sampler_type == "":
sampler_type = "khop"

if sampler_type in {"khop", "k_hop", "neighbor", "neighbor_sampler"}:
return None

if sampler_type != "ppr":
raise ValueError(
f"Unsupported sampler_type={sampler_type}. Expected one of: khop, ppr."
)

max_ppr_nodes = args.get("ppr_max_nodes")
if max_ppr_nodes is None:
max_ppr_nodes = args.get("ppr_max_ppr_nodes", "50")

num_neighbors_per_hop = args.get("ppr_neighbors_per_hop")
if num_neighbors_per_hop is None:
num_neighbors_per_hop = args.get("ppr_num_neighbors_per_hop", "1000")

return PPRSamplerOptions(
alpha=float(args.get("ppr_alpha", "0.5")),
eps=float(args.get("ppr_eps", "0.0001")),
max_ppr_nodes=int(max_ppr_nodes),
num_neighbors_per_hop=int(num_neighbors_per_hop),
max_fetch_iterations=_parse_optional_int(args.get("ppr_max_fetch_iterations")),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also IMO it's a bit weird to have the sampling parameterized like this since I don't think the model can / should be the same for PPR vs khop sampling right?

Ping on this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants