fix: fetch schema over network to ensure @unique check across groups#9596
fix: fetch schema over network to ensure @unique check across groups#9596eileenaaa wants to merge 3 commits intodgraph-io:mainfrom
Conversation
edgraph/server.go
Outdated
| unique, predChecked := repaired[fullPred] | ||
| if !predChecked { | ||
| schReq := &pb.SchemaRequest{Predicates: []string{fullPred}} | ||
| remoteNodes, err := worker.GetSchemaOverNetwork(ctx, schReq) |
There was a problem hiding this comment.
@matthewmcneely when GetSchemaOverNetwork fails we currently skip the unique check and let the mutation through, so duplicates can still happen on schema fetch failure.
Should we instead fail the mutation when the schema fetch fails, or keep the current behavior?
There was a problem hiding this comment.
A few points here:
-
an optimization could be made so that the schema request only happens once, not per predicate. But it's probably OK as is as the number of
uniquesin a schema is usually limited. -
I don't think aborting the transaction outright is ideal, but at a minimum we should log the fact that the schema could not be acquired, e.g.,
if err != nil { glog.Warningf("unique validation failed to fetch schema for predicate %s: %v", fullPred, err) }
There was a problem hiding this comment.
Hi @shiva-istari @matthewmcneely,
I've updated the PR to address your feedback. Here’s a summary of the changes:
- Optimized Network Fetching (Batching): Instead of fetching schema per predicate, I've implemented a pre-scan mechanism. The Alpha node now collects all predicates missing from the local state and performs a single
GetSchemaOverNetworkcall for the entire request. This significantly reduces RPC amplification. - Added Error Logging: I've added
glog.Warningfwhen the network fetch fails, including the list of failed predicates for better observability. - Strictness vs. Availability: Currently, if the network fetch fails, it logs a warning and allows the mutation to proceed (bypassing the unique check).
I've kept the current "allow on failure" behavior for now to avoid breaking availability.
Please take another look. Thanks!
There was a problem hiding this comment.
Hi @eileenaaa... Thanks for this, it's looking good. The more I think about it, the more I think we should really enforce the unique check even if that means aborting the transaction if the schema is n/a. We'll discuss it as a team next week and get back to you.
edgraph/server.go
Outdated
| unique, predChecked := repaired[fullPred] | ||
| if !predChecked { | ||
| schReq := &pb.SchemaRequest{Predicates: []string{fullPred}} | ||
| remoteNodes, err := worker.GetSchemaOverNetwork(ctx, schReq) |
There was a problem hiding this comment.
A few points here:
-
an optimization could be made so that the schema request only happens once, not per predicate. But it's probably OK as is as the number of
uniquesin a schema is usually limited. -
I don't think aborting the transaction outright is ideal, but at a minimum we should log the fact that the schema could not be acquired, e.g.,
if err != nil { glog.Warningf("unique validation failed to fetch schema for predicate %s: %v", fullPred, err) }
Description
This PR fixes an issue where the
@uniqueconstraint is bypassed in multi-group Dgraph environments. This occurs when a mutation is processed by an Alpha node that does not own the predicate, as the local schema state lacks the necessary metadata for "remote" predicates.(see #9565)To address this, I've implemented an on-demand fetching mechanism:
schema.State, the Alpha node now fetches the schema over the network from the owner group usingworker.GetSchemaOverNetwork.repairedmap withinaddQueryIfUniqueto cache these network lookups for the duration of the request. This prevents RPC amplification and ensures performance remains stable during large mutations.This ensures that the
@uniqueconstraint is correctly validated cluster-wide, regardless of which group handles the mutation.Checklist
Conventional Commits syntax, leading
with
fix:,feat:,chore:,ci:, etc.