Add sparse relation-aware Graph Transformer signals by yliu2-sc · Pull Request #652 · Snapchat/GiGL

yliu2-sc · 2026-05-27T18:58:41Z

Scope of work done

This PR adds several opt-in Graph Transformer improvements that make the encoder more relation-aware without enabling the graph-edge hard attention mask path.

Adds sparse pairwise_nonmissing_indices for pairwise structural attention bias so missing pairs can be distinguished from nonmissing pairs without carrying an extra dense nonmissing mask.
Zero-initializes anchor-relative and pairwise structural attention-bias projection weights so additive structural logit bias starts neutral and can be learned only if useful.
Adds sparse relation-aware attention logits via relation_attention_mode="edge_type_bilinear". Relation edges are represented as sparse (batch_idx, query_pos, key_pos, relation_idx) coordinates, with source -> target mapped to query=target, key=source.
Adds relation_value_mode="sparse_residual_gate", a zero-initialized relation-specific value residual path that lets relation type affect message content without replacing the main SDPA attention implementation.
Adds edge_attr_attention_bias_mode="sparse_linear", an opt-in sparse edge-attribute-to-per-head-logit bias path with zero-initialized per-relation projections.
Adds transform and encoder unit coverage for sparse NM indices, relation ordering/direction, zero-init equivalence, nonzero relation probes, edge-attr payload validation, and edge-attr logit accumulation.

Explicitly out of scope: graph-edge hard attention masking. That path is intentionally not included in this PR because the recent experiments showed runtime/memory issues that need a separate design pass.

Implementation notes

Default behavior remains unchanged unless the new modes are configured.
Relation-aware logits and relation value gates initialize to zero, preserving baseline outputs at initialization.
Sparse relation and edge-attr payloads are built before relation identity is lost in to_homogeneous().
Existing dense pairwise_bias stays unchanged; this PR only removes the extra dense NM-specific mask and uses sparse coordinates for the additional nonmissing bias.
The main attention path still uses PyTorch SDPA. Relation value gating is applied as a sparse residual after SDPA rather than forcing a dense manual attention implementation.

Where is the documentation for this feature?: N/A for this draft. I can add docs/changelog notes once we settle the final public interface names.

Did you add automated tests or write a test plan?

/Users/yliu2/Desktop/BAGL/.venv/bin/ruff check --config pyproject.toml gigl/transforms/graph_transformer.py gigl/nn/graph_transformer.py tests/unit/transforms/graph_transformer_test.py tests/unit/nn/graph_transformer_test.py
PYTHONPATH=/Users/yliu2/Desktop/BAGL/GiGL /Users/yliu2/Desktop/BAGL/.venv/bin/python -m unittest tests.unit.transforms.graph_transformer_test tests.unit.nn.graph_transformer_test

Updated Changelog.md? NO

Ready for code review?: NO - draft PR for design/API discussion first.

mkolodner-sc and others added 7 commits May 19, 2026 01:00

potential fix

ed818c2

Update

abb8e56

Update

a0e84fa

Improvements

088fe1b

readout mode

4b04e90

updates

977db41

Add sparse relation-aware graph transformer signals

c03d5ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sparse relation-aware Graph Transformer signals#652

Add sparse relation-aware Graph Transformer signals#652
yliu2-sc wants to merge 7 commits into
mainfrom
codex/gt-clean-no-graph-edges

yliu2-sc commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yliu2-sc commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants