Skip to content

Add sparse relation-aware Graph Transformer signals#652

Draft
yliu2-sc wants to merge 7 commits into
mainfrom
codex/gt-clean-no-graph-edges
Draft

Add sparse relation-aware Graph Transformer signals#652
yliu2-sc wants to merge 7 commits into
mainfrom
codex/gt-clean-no-graph-edges

Conversation

@yliu2-sc
Copy link
Copy Markdown
Collaborator

Scope of work done

This PR adds several opt-in Graph Transformer improvements that make the encoder more relation-aware without enabling the graph-edge hard attention mask path.

  • Adds sparse pairwise_nonmissing_indices for pairwise structural attention bias so missing pairs can be distinguished from nonmissing pairs without carrying an extra dense nonmissing mask.
  • Zero-initializes anchor-relative and pairwise structural attention-bias projection weights so additive structural logit bias starts neutral and can be learned only if useful.
  • Adds sparse relation-aware attention logits via relation_attention_mode="edge_type_bilinear". Relation edges are represented as sparse (batch_idx, query_pos, key_pos, relation_idx) coordinates, with source -> target mapped to query=target, key=source.
  • Adds relation_value_mode="sparse_residual_gate", a zero-initialized relation-specific value residual path that lets relation type affect message content without replacing the main SDPA attention implementation.
  • Adds edge_attr_attention_bias_mode="sparse_linear", an opt-in sparse edge-attribute-to-per-head-logit bias path with zero-initialized per-relation projections.
  • Adds transform and encoder unit coverage for sparse NM indices, relation ordering/direction, zero-init equivalence, nonzero relation probes, edge-attr payload validation, and edge-attr logit accumulation.

Explicitly out of scope: graph-edge hard attention masking. That path is intentionally not included in this PR because the recent experiments showed runtime/memory issues that need a separate design pass.

Implementation notes

  • Default behavior remains unchanged unless the new modes are configured.
  • Relation-aware logits and relation value gates initialize to zero, preserving baseline outputs at initialization.
  • Sparse relation and edge-attr payloads are built before relation identity is lost in to_homogeneous().
  • Existing dense pairwise_bias stays unchanged; this PR only removes the extra dense NM-specific mask and uses sparse coordinates for the additional nonmissing bias.
  • The main attention path still uses PyTorch SDPA. Relation value gating is applied as a sparse residual after SDPA rather than forcing a dense manual attention implementation.

Where is the documentation for this feature?: N/A for this draft. I can add docs/changelog notes once we settle the final public interface names.

Did you add automated tests or write a test plan?

  • /Users/yliu2/Desktop/BAGL/.venv/bin/ruff check --config pyproject.toml gigl/transforms/graph_transformer.py gigl/nn/graph_transformer.py tests/unit/transforms/graph_transformer_test.py tests/unit/nn/graph_transformer_test.py
  • PYTHONPATH=/Users/yliu2/Desktop/BAGL/GiGL /Users/yliu2/Desktop/BAGL/.venv/bin/python -m unittest tests.unit.transforms.graph_transformer_test tests.unit.nn.graph_transformer_test

Updated Changelog.md? NO

Ready for code review?: NO - draft PR for design/API discussion first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants