Skip to content

CBG-5061: Sync Function Refactor#8152

Open
RIT3shSapata wants to merge 12 commits intomainfrom
CBG-5061
Open

CBG-5061: Sync Function Refactor#8152
RIT3shSapata wants to merge 12 commits intomainfrom
CBG-5061

Conversation

@RIT3shSapata
Copy link
Copy Markdown
Contributor

@RIT3shSapata RIT3shSapata commented Apr 7, 2026

CBG-5061

Describe your PR here...

  • Refactored prepareSyncFn with new function signature
  • made the document preparation for all the processes to use prepareDocForSyncFn

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

@RIT3shSapata RIT3shSapata requested a review from bbrks April 7, 2026 13:36
Copilot AI review requested due to automatic review settings April 7, 2026 13:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors sync-function invocation preparation in resync-related code paths to reuse prepareSyncFn instead of older revision-body helpers.

Changes:

  • Updated resync processing to call prepareSyncFn before getChannelsAndAccess.
  • Updated active-rev sync-function recalculation to use prepareSyncFn instead of getAvailable1xRev + unmarshal.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
db/database.go Switches per-leaf resync preparation to prepareSyncFn prior to running sync/channel/access calculation.
db/crud.go Switches active-rev recalculation body preparation to prepareSyncFn.

db/database.go Outdated
changed := 0
doc.History.forEachLeaf(func(rev *RevInfo) {
bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false)
body, metaMap, _, err := db.prepareSyncFn(doc, doc)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareSyncFn(doc, doc) inside forEachLeaf always prepares the current document body/rev (newRevID = doc.RevID), but the subsequent getChannelsAndAccess(..., rev.ID) is meant to evaluate each leaf revision. This mismatch means conflicting leaf revisions will all be re-synced using the same body/_rev, producing incorrect channel/access results. Consider loading the body for rev.ID (as the previous code did) or introducing a helper that prepares a sync-fn body for an arbitrary revID so BodyRev matches rev.ID.

Copilot uses AI. Check for mistakes.
db/database.go Outdated
metaMap, err := doc.GetMetaMap(db.UserXattrKey())
if err != nil {
return
base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prepareSyncFn returns an error, the callback currently logs but continues into getChannelsAndAccess with potentially nil/invalid body/metaMap, which can lead to incorrect results or runtime errors in the sync function invocation. Return early from the leaf callback when preparation fails.

Suggested change
base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err)
// Skip this leaf when sync function preparation fails so the sync function
// is not invoked with invalid or incomplete inputs.
base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err)
return

Copilot uses AI. Check for mistakes.
db/crud.go Outdated
Comment on lines 2243 to 2246
curBody, _, _, err := db.prepareSyncFn(doc, doc)
if err != nil {
return
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareSyncFn unmarshals user xattrs via doc.GetMetaMap(...) and validates the body, but this call site discards the returned metaMap and only needs the current body. This adds extra work and can introduce a new failure path (e.g., invalid rawUserXattr) even though metaMap is already provided to this function. Consider using a lighter-weight helper to get a mutable body (or refactor prepareSyncFn/add a variant) that doesn't re-fetch meta when the caller already has it.

Copilot uses AI. Check for mistakes.
db/database.go Outdated
metaMap, err := doc.GetMetaMap(db.UserXattrKey())
if err != nil {
return
base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning logs the document ID without redaction (docid). Since document IDs are user data, wrap it with base.UD(...) (and preferably use a %q-style format consistent with nearby logs) to avoid leaking sensitive data in logs.

Suggested change
base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err)
base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err)

Copilot uses AI. Check for mistakes.
@RIT3shSapata
Copy link
Copy Markdown
Contributor Author

@bbrks I've reverted the use of prepareSyncFn in both the functions for the following reasons:

  • getResyncedDocument: copilot's comment, made a lot of sense of how by using prepare sync function we're just recomputing the access for the same document content.
  • recalculateSyncFnForActiveRev: By using prepareSyncFn in here, resulted in lot of failure in tests with tombstones.

So I just decided to remove the keys that are not required for Sync Function after the bodyBytes is fetched.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

db/database.go Outdated
Comment on lines +1767 to +1770
// removing the following fields as these fields are not required for sync function
if _, ok := body[BodyAttachments]; ok {
delete(body, BodyAttachments)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this resync path we first call get1xRevFromDoc, which injects _attachments into the JSON bytes when attachments exist, and then immediately unmarshal and delete _attachments. This adds avoidable CPU/memory overhead during resync. Consider fetching the revision body without stamping _attachments (e.g., via getRevision/getAvailableRev and then only injecting _id/_rev/_deleted as needed, or by introducing a helper that builds the sync-function input body without attachments).

Copilot uses AI. Check for mistakes.
db/database.go Outdated
Comment on lines +1771 to +1773
if _, ok := body[BodyRevisions]; ok {
delete(body, BodyRevisions)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block removes _revisions before calling the sync function, but get1xRevFromDoc is called with listRevisions=false in this function, so _revisions should not be injected in the first place. If you’re guarding against legacy bodies containing _revisions, please add a brief comment explaining that; otherwise this can be removed to reduce noise and keep behavior consistent with the rest of the code.

Copilot uses AI. Check for mistakes.
db/crud.go Outdated
Comment on lines +2254 to +2258
// removing _attachments, as attachments are not required to be passed
// into sync function
if _, ok := curBody[BodyAttachments]; ok {
delete(curBody, BodyAttachments)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to resync: getAvailable1xRev stamps _attachments into the JSON bytes, then this code unmarshals and deletes _attachments before running the sync function. If attachments are intentionally excluded from sync, consider avoiding injecting them earlier (e.g., a variant of getAvailable1xRev that doesn’t include attachments, or a helper to build the sync-function input body).

Copilot uses AI. Check for mistakes.
db/database.go Outdated
Comment on lines +1767 to +1773
// removing the following fields as these fields are not required for sync function
if _, ok := body[BodyAttachments]; ok {
delete(body, BodyAttachments)
}
if _, ok := body[BodyRevisions]; ok {
delete(body, BodyRevisions)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes alter the document body passed into the sync function during resync / active-rev recalculation (specifically excluding _attachments, and potentially _revisions). There are existing tests for getResyncedDocument in db/database_test.go; please add coverage asserting the sync function does not see these fields in these code paths to prevent regressions.

Copilot uses AI. Check for mistakes.
db/crud.go Outdated
Comment on lines +2254 to +2258
// removing _attachments, as attachments are not required to be passed
// into sync function
if _, ok := curBody[BodyAttachments]; ok {
delete(curBody, BodyAttachments)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add/extend tests in db/crud_test.go to cover recalculateSyncFnForActiveRev ensuring the sync function input body does not include _attachments (and documenting/covering whether _revisions should be present or not). This is a behavior change that’s easy to regress.

Suggested change
// removing _attachments, as attachments are not required to be passed
// into sync function
if _, ok := curBody[BodyAttachments]; ok {
delete(curBody, BodyAttachments)
}
// Recalculating channel/access for an older active revision must not expose
// attachment metadata to the sync function. `_attachments` is stripped here
// to keep this path aligned with the body shape expected by sync-function
// evaluation and to make the behavior explicit in this regression-prone area.
//
// Note that `_revisions` is intentionally not removed in this step; whether it
// is present is determined by the revision body returned from storage and any
// downstream body normalization, not by this attachment-specific safeguard.
delete(curBody, BodyAttachments)

Copilot uses AI. Check for mistakes.
Comment on lines 1763 to +1766
if err != nil {
return
}

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says resync paths were refactored to use prepareSyncFn instead of getAvailable1xRev/get1xRevFromDoc, but in getResyncedDocument we still build the sync-function body via get1xRevFromDoc (then strip fields). Either update this resync path to use prepareSyncFn (or an equivalent helper) or adjust the PR description to match the actual change.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 13, 2026

Redocly previews

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 71 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

db/crud.go:2269

  • recalculateSyncFnForActiveRev still contains commented-out code paths (old getAvailable1xRev flow and the commented delete(_attachments)). Please remove these commented blocks or implement the intended behavior in a single place (ideally inside prepareDocForSyncFn) to keep the sync-fn preparation logic consistent.
		access = nil
		roles = nil
	}
	return
}

func (db *DatabaseCollectionWithUser) addAttachments(ctx context.Context, newAttachments updatedAttachments) error {
	// Need to check and add attachments here to ensure the attachment is within size constraints
	err := db.setAttachments(ctx, newAttachments)
	if err != nil {
		if errors.Is(err, ErrAttachmentTooLarge) || err.Error() == "document value was too large" {
			err = base.HTTPErrorf(http.StatusRequestEntityTooLarge, "Attachment too large")
		} else {
			err = errors.Wrap(err, "Error adding attachment")
		}
	}

db/database.go Outdated
Comment on lines +1753 to +1770
//bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false)
//if err != nil {
// base.WarnfCtx(ctx, "Error getting rev from doc %s/%s %s", base.UD(docid), rev.ID, err)
//}
//var body Body
//if err := body.Unmarshal(bodyBytes); err != nil {
// base.WarnfCtx(ctx, "Error unmarshalling body %s/%s for sync function %s", base.UD(docid), rev.ID, err)
// return
//}
//metaMap, err := doc.GetMetaMap(db.UserXattrKey())
//if err != nil {
// return
//}
//
//// removing the following fields as these fields are not required for sync function
//delete(body, BodyAttachments)
//delete(body, BodyRevisions)

Comment on lines 2221 to 2235

func (db *DatabaseCollectionWithUser) recalculateSyncFnForActiveRev(ctx context.Context, doc *Document, metaMap map[string]any, newRevID string) (channelSet base.Set, access, roles channels.AccessMap, syncExpiry *uint32, oldBodyJSON string, err error) {
// In some cases an older revision might become the current one. If so, get its
// channels & access, for purposes of updating the doc:
curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID())
if err != nil {
return
}

var curBody Body
err = curBody.Unmarshal(curBodyBytes)
//curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID())
//if err != nil {
// return
//}
//
//var curBody Body
//err = curBody.Unmarshal(curBodyBytes)
curBody, _, _, err := db.prepareDocForSyncFn(ctx, doc, nil, doc.GetRevTreeID(), true, false)
if err != nil {
return
}

// If using xattrs and this isn't an SG write, we shouldn't attempt to cache.
if syncData == nil {
Comment on lines 2764 to 2772
tbpOptions.TeardownFuncs = append(tbpOptions.TeardownFuncs, func() {
if len(globalBlipTesterClients.m) != 0 {
// must panic to bubble up through test harness
panic(fmt.Sprintf("%v active blip tester clients should be 0 at end of tests", globalBlipTesterClients.m))
}
})
db.BypassReleasedSequenceWait.Store(true)
serverContextGlobalsInitialized.Store(true)
db.TestBucketPoolWithIndexes(ctx, m, tbpOptions)
Comment on lines 923 to 927
response = rt.Send(RequestByUser("GET", url, "", username))
}
assert.NoError(c, base.JSONUnmarshal(response.Body.Bytes(), &changes))
assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %d changes", numChangesExpected, len(changes.Results))
assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %s changes", numChangesExpected, changes.Summary())
}, waitTime, 10*time.Millisecond)
Comment on lines 687 to 693
if s.RevAndVersion.CurrentVersion != "" || s.RevAndVersion.CurrentSource != "" {
extractedCV, err := cv.ExtractCV()
if !errors.Is(err, base.ErrNotFound) {
if err != nil {
base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite write check - skipping cv match check: %v", err)
return true, true, false
base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite check, document will not be processed: %v", err)
return false, false, false
}
@RIT3shSapata RIT3shSapata force-pushed the CBG-5061 branch 2 times, most recently from fcf6b13 to 5b832b6 Compare April 13, 2026 16:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

db/database.go Outdated

body, metaMap, _, err := db.prepareDocForSyncFn(ctx, doc, nil, rev.ID, true, false)
if err != nil {
base.WarnfCtx(ctx, "Unable to prepare doc for rev %d: %v", rev.ID, err)
Comment on lines +2180 to +2183
var ancestorRevID string
bodyBytes, ancestorRevID, _, err = db.getAvailableRev(ctx, doc, revID)
if err != nil {
return
db/crud.go Outdated
Comment on lines 2203 to 2206
newRevID = revID

mutableBody[BodyId] = doc.ID
mutableBody[BodyRev] = newRevID
@RIT3shSapata RIT3shSapata assigned bbrks and unassigned RIT3shSapata Apr 13, 2026
Copy link
Copy Markdown
Member

@bbrks bbrks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good now - just a few small things

Comment on lines +1431 to +1440
{
Method: "POST",
Endpoint: "/db/_offline",
Users: []string{syncGatewayConfigurator},
},
{
Method: "POST",
Endpoint: "/db/_online",
Users: []string{syncGatewayConfigurator},
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is undoing part of the change made on main in #8154

I think we've already covered how to do rebases to resolve conflicts with force pushes, but this squash merge either doesn't seem to be doing the right thing or is error-prone.

mutableBodyForSyncFn, err := newDoc.GetDeepMutableBody()
if err != nil {
return
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this codepath require a mutable body (a deep copy - not cheap to make), whereas others can just use doc.Body(ctx) to pass in?

The comment I don't think is good justification alone, since _rawBody is a lazily marshalled version of the doc, but can always be reproduced from the unmarshalled version when needed. Curious what the previous validation behavior is that we're preserving too - is that just json marshalling validation?

We're paying the cost of the DeepCopy already inside prepareDocForSyncFn to make any body being passed in a mutable body so this feels like duplicated effort in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few tests, that were failing because I initially did not perform a mutable copy. The body would not be sanitized for the next test case and fail due to this. The only difference between prepareSyncFn and prepareDocForSyncFn was making the mutable copy. The tests that failed were TestImportInternalPropertiesHandling and TestBackupOldRevisionWithAttachments.

There was a downstream function that which required newDoc's rawBody to be present, postWriteUpdateHLV was one function that I observed, didn't check for other functions in downstream. The rawDoc being updated was only possible if newDoc.GetDeepMutableBody() was invoked. Since I was not passing newDoc I couldn't do this inside the prepareDocForSyncFn. That's why I make I called the GetDeepMutableBody method outside.

I missed removing the deep copy inside the prepareDocForSyncFn, will remove that. I would like to know what more explanation you would like in the comment.

Comment on lines +2178 to +2192
if getAncestor {
var bodyBytes []byte
var ancestorRevID string
bodyBytes, ancestorRevID, _, err = db.getAvailableRev(ctx, doc, revID)
if err != nil {
return
}
err = mutableBody.Unmarshal(bodyBytes)
if err != nil {
return
}
if ancestorRev, ok := doc.History[ancestorRevID]; ok && ancestorRev != nil && ancestorRev.Deleted {
mutableBody[BodyDeleted] = true
}
newRevID = ancestorRevID
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bool feels like the callers need to be doing the getAvailableRev bit and pass in the ancestorRevID and tombstone just like a regular document.

What makes the ancestor case special that it has to be handled differently here? Just avoiding the validateNewBody work?

Could we perhaps add a shouldValidate or isNewDoc bool instead to control that behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ancestor case has to be separate for recalculateSyncFnForActiveRev which is used to calculate the previous rev's access which will used to handle current rev conflicts. And for the resync case, we need ancestors as we need to recompute the access to all the revisions in the history.

This was previously done separately under their respective functions. Since we wanted to ensure the use of prepareDocForSyncFn for all scenarios, I had to add this ancestor case.

The other case is just doing the same thing as what prepareSyncFn used to do before.

I had to add the tombstone bool because newDoc contains if the document is deleted or not, which is required for the non ancestor case. So i'm not sure if this can be controlled by a single bool value, I will still need two bool values to ensure the behavior doesn't change

@RIT3shSapata RIT3shSapata requested a review from bbrks April 14, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants