Conversation
There was a problem hiding this comment.
Pull request overview
Refactors sync-function invocation preparation in resync-related code paths to reuse prepareSyncFn instead of older revision-body helpers.
Changes:
- Updated resync processing to call
prepareSyncFnbeforegetChannelsAndAccess. - Updated active-rev sync-function recalculation to use
prepareSyncFninstead ofgetAvailable1xRev+ unmarshal.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| db/database.go | Switches per-leaf resync preparation to prepareSyncFn prior to running sync/channel/access calculation. |
| db/crud.go | Switches active-rev recalculation body preparation to prepareSyncFn. |
db/database.go
Outdated
| changed := 0 | ||
| doc.History.forEachLeaf(func(rev *RevInfo) { | ||
| bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false) | ||
| body, metaMap, _, err := db.prepareSyncFn(doc, doc) |
There was a problem hiding this comment.
prepareSyncFn(doc, doc) inside forEachLeaf always prepares the current document body/rev (newRevID = doc.RevID), but the subsequent getChannelsAndAccess(..., rev.ID) is meant to evaluate each leaf revision. This mismatch means conflicting leaf revisions will all be re-synced using the same body/_rev, producing incorrect channel/access results. Consider loading the body for rev.ID (as the previous code did) or introducing a helper that prepares a sync-fn body for an arbitrary revID so BodyRev matches rev.ID.
db/database.go
Outdated
| metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| if err != nil { | ||
| return | ||
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) |
There was a problem hiding this comment.
If prepareSyncFn returns an error, the callback currently logs but continues into getChannelsAndAccess with potentially nil/invalid body/metaMap, which can lead to incorrect results or runtime errors in the sync function invocation. Return early from the leaf callback when preparation fails.
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) | |
| // Skip this leaf when sync function preparation fails so the sync function | |
| // is not invoked with invalid or incomplete inputs. | |
| base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err) | |
| return |
db/crud.go
Outdated
| curBody, _, _, err := db.prepareSyncFn(doc, doc) | ||
| if err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
prepareSyncFn unmarshals user xattrs via doc.GetMetaMap(...) and validates the body, but this call site discards the returned metaMap and only needs the current body. This adds extra work and can introduce a new failure path (e.g., invalid rawUserXattr) even though metaMap is already provided to this function. Consider using a lighter-weight helper to get a mutable body (or refactor prepareSyncFn/add a variant) that doesn't re-fetch meta when the caller already has it.
db/database.go
Outdated
| metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| if err != nil { | ||
| return | ||
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) |
There was a problem hiding this comment.
This warning logs the document ID without redaction (docid). Since document IDs are user data, wrap it with base.UD(...) (and preferably use a %q-style format consistent with nearby logs) to avoid leaking sensitive data in logs.
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) | |
| base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err) |
|
@bbrks I've reverted the use of prepareSyncFn in both the functions for the following reasons:
So I just decided to remove the keys that are not required for Sync Function after the |
db/database.go
Outdated
| // removing the following fields as these fields are not required for sync function | ||
| if _, ok := body[BodyAttachments]; ok { | ||
| delete(body, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
In this resync path we first call get1xRevFromDoc, which injects _attachments into the JSON bytes when attachments exist, and then immediately unmarshal and delete _attachments. This adds avoidable CPU/memory overhead during resync. Consider fetching the revision body without stamping _attachments (e.g., via getRevision/getAvailableRev and then only injecting _id/_rev/_deleted as needed, or by introducing a helper that builds the sync-function input body without attachments).
db/database.go
Outdated
| if _, ok := body[BodyRevisions]; ok { | ||
| delete(body, BodyRevisions) | ||
| } |
There was a problem hiding this comment.
This block removes _revisions before calling the sync function, but get1xRevFromDoc is called with listRevisions=false in this function, so _revisions should not be injected in the first place. If you’re guarding against legacy bodies containing _revisions, please add a brief comment explaining that; otherwise this can be removed to reduce noise and keep behavior consistent with the rest of the code.
db/crud.go
Outdated
| // removing _attachments, as attachments are not required to be passed | ||
| // into sync function | ||
| if _, ok := curBody[BodyAttachments]; ok { | ||
| delete(curBody, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
Similar to resync: getAvailable1xRev stamps _attachments into the JSON bytes, then this code unmarshals and deletes _attachments before running the sync function. If attachments are intentionally excluded from sync, consider avoiding injecting them earlier (e.g., a variant of getAvailable1xRev that doesn’t include attachments, or a helper to build the sync-function input body).
db/database.go
Outdated
| // removing the following fields as these fields are not required for sync function | ||
| if _, ok := body[BodyAttachments]; ok { | ||
| delete(body, BodyAttachments) | ||
| } | ||
| if _, ok := body[BodyRevisions]; ok { | ||
| delete(body, BodyRevisions) | ||
| } |
There was a problem hiding this comment.
These changes alter the document body passed into the sync function during resync / active-rev recalculation (specifically excluding _attachments, and potentially _revisions). There are existing tests for getResyncedDocument in db/database_test.go; please add coverage asserting the sync function does not see these fields in these code paths to prevent regressions.
db/crud.go
Outdated
| // removing _attachments, as attachments are not required to be passed | ||
| // into sync function | ||
| if _, ok := curBody[BodyAttachments]; ok { | ||
| delete(curBody, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
Please add/extend tests in db/crud_test.go to cover recalculateSyncFnForActiveRev ensuring the sync function input body does not include _attachments (and documenting/covering whether _revisions should be present or not). This is a behavior change that’s easy to regress.
| // removing _attachments, as attachments are not required to be passed | |
| // into sync function | |
| if _, ok := curBody[BodyAttachments]; ok { | |
| delete(curBody, BodyAttachments) | |
| } | |
| // Recalculating channel/access for an older active revision must not expose | |
| // attachment metadata to the sync function. `_attachments` is stripped here | |
| // to keep this path aligned with the body shape expected by sync-function | |
| // evaluation and to make the behavior explicit in this regression-prone area. | |
| // | |
| // Note that `_revisions` is intentionally not removed in this step; whether it | |
| // is present is determined by the revision body returned from storage and any | |
| // downstream body normalization, not by this attachment-specific safeguard. | |
| delete(curBody, BodyAttachments) |
| if err != nil { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
The PR description says resync paths were refactored to use prepareSyncFn instead of getAvailable1xRev/get1xRevFromDoc, but in getResyncedDocument we still build the sync-function body via get1xRevFromDoc (then strip fields). Either update this resync path to use prepareSyncFn (or an equivalent helper) or adjust the PR description to match the actual change.
Redocly previews |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 69 out of 71 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
db/crud.go:2269
- recalculateSyncFnForActiveRev still contains commented-out code paths (old getAvailable1xRev flow and the commented delete(_attachments)). Please remove these commented blocks or implement the intended behavior in a single place (ideally inside prepareDocForSyncFn) to keep the sync-fn preparation logic consistent.
access = nil
roles = nil
}
return
}
func (db *DatabaseCollectionWithUser) addAttachments(ctx context.Context, newAttachments updatedAttachments) error {
// Need to check and add attachments here to ensure the attachment is within size constraints
err := db.setAttachments(ctx, newAttachments)
if err != nil {
if errors.Is(err, ErrAttachmentTooLarge) || err.Error() == "document value was too large" {
err = base.HTTPErrorf(http.StatusRequestEntityTooLarge, "Attachment too large")
} else {
err = errors.Wrap(err, "Error adding attachment")
}
}
db/database.go
Outdated
| //bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false) | ||
| //if err != nil { | ||
| // base.WarnfCtx(ctx, "Error getting rev from doc %s/%s %s", base.UD(docid), rev.ID, err) | ||
| //} | ||
| //var body Body | ||
| //if err := body.Unmarshal(bodyBytes); err != nil { | ||
| // base.WarnfCtx(ctx, "Error unmarshalling body %s/%s for sync function %s", base.UD(docid), rev.ID, err) | ||
| // return | ||
| //} | ||
| //metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| //if err != nil { | ||
| // return | ||
| //} | ||
| // | ||
| //// removing the following fields as these fields are not required for sync function | ||
| //delete(body, BodyAttachments) | ||
| //delete(body, BodyRevisions) | ||
|
|
|
|
||
| func (db *DatabaseCollectionWithUser) recalculateSyncFnForActiveRev(ctx context.Context, doc *Document, metaMap map[string]any, newRevID string) (channelSet base.Set, access, roles channels.AccessMap, syncExpiry *uint32, oldBodyJSON string, err error) { | ||
| // In some cases an older revision might become the current one. If so, get its | ||
| // channels & access, for purposes of updating the doc: | ||
| curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID()) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| var curBody Body | ||
| err = curBody.Unmarshal(curBodyBytes) | ||
| //curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID()) | ||
| //if err != nil { | ||
| // return | ||
| //} | ||
| // | ||
| //var curBody Body | ||
| //err = curBody.Unmarshal(curBodyBytes) | ||
| curBody, _, _, err := db.prepareDocForSyncFn(ctx, doc, nil, doc.GetRevTreeID(), true, false) | ||
| if err != nil { | ||
| return |
| } | ||
|
|
||
| // If using xattrs and this isn't an SG write, we shouldn't attempt to cache. | ||
| if syncData == nil { |
| tbpOptions.TeardownFuncs = append(tbpOptions.TeardownFuncs, func() { | ||
| if len(globalBlipTesterClients.m) != 0 { | ||
| // must panic to bubble up through test harness | ||
| panic(fmt.Sprintf("%v active blip tester clients should be 0 at end of tests", globalBlipTesterClients.m)) | ||
| } | ||
| }) | ||
| db.BypassReleasedSequenceWait.Store(true) | ||
| serverContextGlobalsInitialized.Store(true) | ||
| db.TestBucketPoolWithIndexes(ctx, m, tbpOptions) |
| response = rt.Send(RequestByUser("GET", url, "", username)) | ||
| } | ||
| assert.NoError(c, base.JSONUnmarshal(response.Body.Bytes(), &changes)) | ||
| assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %d changes", numChangesExpected, len(changes.Results)) | ||
| assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %s changes", numChangesExpected, changes.Summary()) | ||
| }, waitTime, 10*time.Millisecond) |
| if s.RevAndVersion.CurrentVersion != "" || s.RevAndVersion.CurrentSource != "" { | ||
| extractedCV, err := cv.ExtractCV() | ||
| if !errors.Is(err, base.ErrNotFound) { | ||
| if err != nil { | ||
| base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite write check - skipping cv match check: %v", err) | ||
| return true, true, false | ||
| base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite check, document will not be processed: %v", err) | ||
| return false, false, false | ||
| } |
fcf6b13 to
5b832b6
Compare
db/database.go
Outdated
|
|
||
| body, metaMap, _, err := db.prepareDocForSyncFn(ctx, doc, nil, rev.ID, true, false) | ||
| if err != nil { | ||
| base.WarnfCtx(ctx, "Unable to prepare doc for rev %d: %v", rev.ID, err) |
| var ancestorRevID string | ||
| bodyBytes, ancestorRevID, _, err = db.getAvailableRev(ctx, doc, revID) | ||
| if err != nil { | ||
| return |
db/crud.go
Outdated
| newRevID = revID | ||
|
|
||
| mutableBody[BodyId] = doc.ID | ||
| mutableBody[BodyRev] = newRevID |
bbrks
left a comment
There was a problem hiding this comment.
This looks pretty good now - just a few small things
rest/admin_api_auth_test.go
Outdated
| { | ||
| Method: "POST", | ||
| Endpoint: "/db/_offline", | ||
| Users: []string{syncGatewayConfigurator}, | ||
| }, | ||
| { | ||
| Method: "POST", | ||
| Endpoint: "/db/_online", | ||
| Users: []string{syncGatewayConfigurator}, | ||
| }, |
There was a problem hiding this comment.
This change is undoing part of the change made on main in #8154
I think we've already covered how to do rebases to resolve conflicts with force pushes, but this squash merge either doesn't seem to be doing the right thing or is error-prone.
| mutableBodyForSyncFn, err := newDoc.GetDeepMutableBody() | ||
| if err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
Why does this codepath require a mutable body (a deep copy - not cheap to make), whereas others can just use doc.Body(ctx) to pass in?
The comment I don't think is good justification alone, since _rawBody is a lazily marshalled version of the doc, but can always be reproduced from the unmarshalled version when needed. Curious what the previous validation behavior is that we're preserving too - is that just json marshalling validation?
We're paying the cost of the DeepCopy already inside prepareDocForSyncFn to make any body being passed in a mutable body so this feels like duplicated effort in this case.
There was a problem hiding this comment.
There were a few tests, that were failing because I initially did not perform a mutable copy. The body would not be sanitized for the next test case and fail due to this. The only difference between prepareSyncFn and prepareDocForSyncFn was making the mutable copy. The tests that failed were TestImportInternalPropertiesHandling and TestBackupOldRevisionWithAttachments.
There was a downstream function that which required newDoc's rawBody to be present, postWriteUpdateHLV was one function that I observed, didn't check for other functions in downstream. The rawDoc being updated was only possible if newDoc.GetDeepMutableBody() was invoked. Since I was not passing newDoc I couldn't do this inside the prepareDocForSyncFn. That's why I make I called the GetDeepMutableBody method outside.
I missed removing the deep copy inside the prepareDocForSyncFn, will remove that. I would like to know what more explanation you would like in the comment.
| if getAncestor { | ||
| var bodyBytes []byte | ||
| var ancestorRevID string | ||
| bodyBytes, ancestorRevID, _, err = db.getAvailableRev(ctx, doc, revID) | ||
| if err != nil { | ||
| return | ||
| } | ||
| err = mutableBody.Unmarshal(bodyBytes) | ||
| if err != nil { | ||
| return | ||
| } | ||
| if ancestorRev, ok := doc.History[ancestorRevID]; ok && ancestorRev != nil && ancestorRev.Deleted { | ||
| mutableBody[BodyDeleted] = true | ||
| } | ||
| newRevID = ancestorRevID |
There was a problem hiding this comment.
This bool feels like the callers need to be doing the getAvailableRev bit and pass in the ancestorRevID and tombstone just like a regular document.
What makes the ancestor case special that it has to be handled differently here? Just avoiding the validateNewBody work?
Could we perhaps add a shouldValidate or isNewDoc bool instead to control that behavior?
There was a problem hiding this comment.
The ancestor case has to be separate for recalculateSyncFnForActiveRev which is used to calculate the previous rev's access which will used to handle current rev conflicts. And for the resync case, we need ancestors as we need to recompute the access to all the revisions in the history.
This was previously done separately under their respective functions. Since we wanted to ensure the use of prepareDocForSyncFn for all scenarios, I had to add this ancestor case.
The other case is just doing the same thing as what prepareSyncFn used to do before.
I had to add the tombstone bool because newDoc contains if the document is deleted or not, which is required for the non ancestor case. So i'm not sure if this can be controlled by a single bool value, I will still need two bool values to ensure the behavior doesn't change
CBG-5061
Describe your PR here...
prepareSyncFnwith new function signatureprepareDocForSyncFnPre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests