Skip to content

fix: fetch schema over network to ensure @unique check across groups#9596

Open
eileenaaa wants to merge 3 commits intodgraph-io:mainfrom
eileenaaa:fix/sync-schema-on-unique-fail
Open

fix: fetch schema over network to ensure @unique check across groups#9596
eileenaaa wants to merge 3 commits intodgraph-io:mainfrom
eileenaaa:fix/sync-schema-on-unique-fail

Conversation

@eileenaaa
Copy link
Contributor

@eileenaaa eileenaaa commented Feb 10, 2026

Description

This PR fixes an issue where the @unique constraint is bypassed in multi-group Dgraph environments. This occurs when a mutation is processed by an Alpha node that does not own the predicate, as the local schema state lacks the necessary metadata for "remote" predicates.(see #9565)

To address this, I've implemented an on-demand fetching mechanism:

  • On-Demand Fetching: If a predicate is missing from the local schema.State, the Alpha node now fetches the schema over the network from the owner group using worker.GetSchemaOverNetwork.
  • Per-Request Cache: Introduced a repaired map within addQueryIfUnique to cache these network lookups for the duration of the request. This prevents RPC amplification and ensures performance remains stable during large mutations.
  • Scope Isolation: The fetched metadata is only used for the current request context, preventing any potential stale data issues in the global schema state.

This ensures that the @unique constraint is correctly validated cluster-wide, regardless of which group handles the mutation.

Checklist

  • The PR title follows the
    Conventional Commits syntax, leading
    with fix:, feat:, chore:, ci:, etc.
  • Code compiles correctly and linting (via trunk) passes locally
  • Tests added for new functionality, or regression tests for bug fixes added as applicable

unique, predChecked := repaired[fullPred]
if !predChecked {
schReq := &pb.SchemaRequest{Predicates: []string{fullPred}}
remoteNodes, err := worker.GetSchemaOverNetwork(ctx, schReq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewmcneely when GetSchemaOverNetwork fails we currently skip the unique check and let the mutation through, so duplicates can still happen on schema fetch failure.
Should we instead fail the mutation when the schema fetch fails, or keep the current behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points here:

  1. an optimization could be made so that the schema request only happens once, not per predicate. But it's probably OK as is as the number of uniques in a schema is usually limited.

  2. I don't think aborting the transaction outright is ideal, but at a minimum we should log the fact that the schema could not be acquired, e.g.,

    if err != nil {
        glog.Warningf("unique validation failed to fetch schema for predicate %s: %v", fullPred, err)
    }
    

Copy link
Contributor Author

@eileenaaa eileenaaa Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shiva-istari @matthewmcneely,

I've updated the PR to address your feedback. Here’s a summary of the changes:

  • Optimized Network Fetching (Batching): Instead of fetching schema per predicate, I've implemented a pre-scan mechanism. The Alpha node now collects all predicates missing from the local state and performs a single GetSchemaOverNetwork call for the entire request. This significantly reduces RPC amplification.
  • Added Error Logging: I've added glog.Warningf when the network fetch fails, including the list of failed predicates for better observability.
  • Strictness vs. Availability: Currently, if the network fetch fails, it logs a warning and allows the mutation to proceed (bypassing the unique check).

I've kept the current "allow on failure" behavior for now to avoid breaking availability.

Please take another look. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eileenaaa... Thanks for this, it's looking good. The more I think about it, the more I think we should really enforce the unique check even if that means aborting the transaction if the schema is n/a. We'll discuss it as a team next week and get back to you.

unique, predChecked := repaired[fullPred]
if !predChecked {
schReq := &pb.SchemaRequest{Predicates: []string{fullPred}}
remoteNodes, err := worker.GetSchemaOverNetwork(ctx, schReq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points here:

  1. an optimization could be made so that the schema request only happens once, not per predicate. But it's probably OK as is as the number of uniques in a schema is usually limited.

  2. I don't think aborting the transaction outright is ideal, but at a minimum we should log the fact that the schema could not be acquired, e.g.,

    if err != nil {
        glog.Warningf("unique validation failed to fetch schema for predicate %s: %v", fullPred, err)
    }
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants