Make retried Onyx writes idempotent on storage retry#792
Conversation
Follow-up to Expensify#787. Splits each retriable collection write into an orchestrator (cache + subscriber notify + storage prep) and a small file-private write helper (Storage.multiSet/multiMerge + retryOperation + devtools log). retryOperation re-enters the write helper, not the orchestrator, which fixes two pre-existing bugs flagged in Expensify#787 review: 1. New keys no longer get silently downgraded to multiMerge on retry. The existing/new key split was previously re-derived from getAllKeys against a cache that the first attempt had already mutated, routing brand-new keys through multiMerge instead of multiSet on retry. Benign on IDB but wrong semantics and a crash on backends that require multiSet for missing keys. 2. keysChanged/keyChanged subscribers (notably waitForCollectionCallback ones, which re-fire on every call by contract) are now notified exactly once per logical operation rather than once per retry attempt. Affected paths: mergeCollectionWithPatches, multiSetWithRetry, setCollectionWithRetry, partialSetCollection. setWithRetry was already safe via its hasValueChanged guard. partialSetCollection shares the new persistCollectionWrite helper with setCollectionWithRetry since their storage tails are identical. Adds a 'retry side-effect idempotency' test block covering all four paths. Tests confirmed to fail without the lib changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`npm run build:docs` output for the three new file-private helpers introduced in the previous commit: persistMultiSetWrite, persistCollectionWrite, persistMergedCollectionWrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de62ae5eae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const {keyValuePairsToStore, newData} = params; | ||
|
|
||
| return Storage.multiSet(keyValuePairsToStore) | ||
| .catch((error) => OnyxUtils.retryOperation(error, persistMultiSetWrite, params, retryAttempt)) |
There was a problem hiding this comment.
Retry orchestrator after eviction to restore cache state
When Storage.multiSet fails with a capacity error, retryOperation() evicts an LRU key via remove(), which immediately drops that key from cache and notifies subscribers before retrying. Retrying persistMultiSetWrite (and similarly persistCollectionWrite / persistMergedCollectionWrite) only re-attempts the storage write, so if the evicted key is part of the in-flight write, its cache value is never restored and subscribers keep the stale “removed” state even though storage eventually succeeds. This is a regression from the previous behavior where retries re-ran the orchestrator and re-applied cache updates.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 11bc0a6.
Fix: each orchestrator (multiSetWithRetry, setCollectionWithRetry, partialSetCollection, mergeCollectionWithPatches) now constructs a restoreEvictedKey(key) closure that re-applies its cache write + subscriber notification for a single in-flight key. The closure is passed via the write helper's params; retryOperation invokes it after remove(keyForRemoval) and before re-entering the write helper. No-op when the evicted key isn't part of this write (the typical case).
mergeCollection nuance: the closure snapshots the cache's post-merge state (not just the delta) so previously-merged-in fields survive when the eviction's cache.drop forces us to re-populate cache from scratch — otherwise we'd lose fields like {id: 1} when restoring after a mergeCollection({key: {value: 'new'}}) retry.
Regression coverage: added 3 tests under storage eviction — one per write helper — that mock Storage.multiSet/Storage.multiMerge to fail with database or disk is full once on a setup where the LRU evictable key is the same key being written, and assert both cache + subscriber reflect the restored value (not undefined). All 3 fail without the lib changes (Received: undefined / missing fields) and pass with them.
459/459 tests pass; tsc and prettier clean.
Addresses Codex P1 on Expensify#792 review: Expensify#792 (comment) When `retryOperation()` evicts an LRU key via `remove()` on a storage capacity error, `remove()` immediately drops the key from cache and fires `keyChanged(undefined)`. With the orchestrator/write-helper split introduced in this PR, the retry only re-attempts the storage write -- it does not re-apply cache state. So if the evicted key is part of the in-flight write, cache + subscribers stay in the "removed" state even though the storage retry eventually succeeds, leaving the App's view of the world inconsistent with persisted data. Fix: each orchestrator now constructs a `restoreEvictedKey(key)` closure that re-applies its cache write + subscriber notification for a single in-flight key, and passes it via the write helper's params. `retryOperation` invokes the closure after `remove(keyForRemoval)`, before re-entering the write helper. The closure is a no-op when the evicted key isn't part of this write (the typical case where some unrelated evictable key gets selected). For mergeCollection specifically, the closure snapshots the cache's post-merge state for in-flight keys (rather than just the merge delta) so that previously-merged-in fields are preserved when the eviction's cache.drop forces us to re-populate cache from scratch. Adds 3 regression tests under `storage eviction`: - `multiSet` -- LRU evictable is the in-flight key (persistMultiSetWrite path) - `setCollection` -- same scenario via persistCollectionWrite - `mergeCollection` -- same scenario via persistMergedCollectionWrite, asserts the previously-existing `id` field survives the eviction All 3 tests fail with `Received: undefined` / missing fields if the lib changes are reverted; pass with them in place. All 459 unit tests pass (456 existing + 3 new).
CI flagged 3 occurrences in the eviction-restoration regression tests added in the previous commit. The repo's eslint config prefers the shorthand for simple types via @typescript-eslint/array-type.
|
i will review it soon |
Details
Follow-up to #787. The cache-first / storage-second refactor of
mergeCollectionWithPatchessurfaced two pre-existing bugs that affect every Onyx write path that goes throughretryOperation. Both were flagged on the #787 review and left for this PR (review comment):"new vs existing" routing is recomputed against an already-mutated cache on retry. In
mergeCollectionWithPatches,setCollectionWithRetry, andpartialSetCollection, the existing/new split is derived fromgetAllKeys()/ cached state. When the first attempt'scache.set/cache.mergehas already populated brand-new keys into the cache, the retry attempt reclassifies them as "existing" and routes them throughmultiMergeinstead ofmultiSet. Benign on IDB (wheremultiMergeon a missing key behaves likeset) but wrong semantics, and the inline comment inmergeCollectionWithPatchesexplicitly warns this throws on some storage backends.Subscriber notifications re-fire on every retry attempt.
multiSetWithRetry,setCollectionWithRetry,mergeCollectionWithPatches, andpartialSetCollectioncallkeyChanged/keysChangedbefore the storage write. On a failed storage write,retryOperationre-enters the orchestrator, which fireskeyChanged/keysChangedagain with the same payload. The internal===dedup inkeysChangedabsorbs most subscribers, butOnyx.connect({ waitForCollectionCallback: true })subscribers re-fire on every call by contract.setWithRetrywas already safe becausebroadcastUpdateshort-circuits onhasChanged === falseand on retry the cache already reflects the new value — that pattern is the design model.Fix shape
For each of the four affected functions, split the body into:
cache.set/cache.merge, fireskeyChanged/keysChanged, prepares the storage key-value pairs, then delegates. Called exactly once perOnyx.X(...)invocation.retryOperationre-entering the write helper, never the orchestrator.Concretely:
mergeCollectionWithPatchespersistMergedCollectionWritemultiSetWithRetrypersistMultiSetWritesetCollectionWithRetrypersistCollectionWritepartialSetCollectionpersistCollectionWrite(shared — storage tail is identical)Result: the existing/new split is computed once and captured in the write helper's params (fixes bug 1), subscribers are notified once from the orchestrator (fixes bug 2), and
retryOperation's signature is unchanged — it's already generic overRetriableOnyxOperation(extended here with the three new helpers).Related Issues
No separate GH issue — this PR addresses #787 review comment directly.
Linked E/App PR
Expensify/App#91626
Automated Tests
Added a new
retry side-effect idempotencydescribe block intests/unit/onyxUtilsTest.tswith 5 cases (one per affected path + one routing-specific assertion):mergeCollection— new keys still route throughmultiSeton retry, not downgraded tomultiMerge. Seeds an existing member, mocksStorage.multiMergeto reject once then resolve, asserts the new key is never present in anymultiMergecall's args.mergeCollection—waitForCollectionCallbacksubscriber fires once across retries. Same setup, asserts subscriber call count is exactly 1.Onyx.multiSet— collection subscriber fires once across retries.Onyx.setCollection— collection subscriber fires once across retries.OnyxUtils.partialSetCollection— collection subscriber fires once across retries.All 5 fail with
Received number of calls: 2(and the routing test fails differently — multiMerge is called with the new key on retry) if you revert just the lib changes. Confirmed locally viagit stashoflib/OnyxUtils.ts lib/types.ts.All 456 existing unit tests still pass — including the timing-sensitive
retryOperationdescribe block, themergeCollection cache-first orderingregression tests from #787, and the storage-eviction tests.Manual Tests
The four refactored paths (
mergeCollectionWithPatches,multiSetWithRetry,setCollectionWithRetry,partialSetCollection) are reached from the App via Pusher events, theOpenAppresponse, everyOnyx.updatebatch that contains aMERGE_COLLECTION/MULTI_SET/SET_COLLECTIONop, LHN refresh, Search filters, chat sends, mark-all-as-read, hold/unhold, workspace switching, etc. The companion App PR Expensify/App#91626 pinsreact-native-onyxto this branch's head for end-to-end exercise; the steps below are the same protocol used for #787.Setup
elirangoshen/fix/retry-side-effects-idempotencyoncallstack-internal/Expensify-App), which bumpspackage.json'sreact-native-onyxto this branch's head.npm installunder Node 20.20.0, thennpm run web.Functional smoke — no regression in the refactored paths
The refactor preserves the cache-first invariant from #787, so all of these should look indistinguishable from
mainto the user:main. (exercisesmergeCollectionWithPatchesvia Pusher /OpenApp)reportActions_), confirms via Pusher, persists after reload. (mergeCollectionWithPatches)mergeCollectionWithPatches)mergeCollectionWithPatches+setCollectionWithRetry)mergeCollectionWithPatches)mergeCollectionWithPatches)mergeCollectionWithPatches)Storage-failure simulation — core regression guard for this PR
This is the test that exercises the retry path the fix targets. With the fix in place, retries on storage failure should produce one subscriber notification per logical operation (not one per retry attempt), and brand-new keys should stay routed through
multiSeteven when an earliermultiMergeretry kicked in.OnyxDB).mergeCollection-driven action — switch workspaces, send a chat message, or apply a search filter.retryOperationkicks in. Console shows storage error logs and retry attempts (look forFailed to save to storage. Error: ... retryAttempt: N/5inLogger.logInfo), but no white screen, no stale UI, no data loss within the session, and no duplicatewaitForCollectionCallbacksubscriber invocations (verify via React DevTools "Highlight updates when components render" or via aconsole.countinstrumented in a temporaryuseOnyxcallback if you want to be precise).multiSet-driven action (least common — typically a deep-link prep) and asetCollection-driven action (Search filter result population) to cover the other refactored paths.Onyx.connect({waitForCollectionCallback: true})callbacks firing twice in DevTools traces, or (b) on the routing bug specifically, no observable user-facing difference on IDB but the issue is fixed under the hood for non-IDB backends.Cold-cache merge — preserves the pre-warm fast path from #787
mainsession — the cache pre-warm path (existing inmergeCollectionWithPatches) is untouched by this PR.Author Checklist
### Related Issuessection above### Linked E/App PRsection above, and verified this change against it (E/App CI passed and manual testing completed) (companion PR [No QA] Pin react-native-onyx to PR #792 head for retry-idempotency end-to-end testing App#91626 pins this branch's HEAD for end-to-end exercise)TestssectiontoggleReportand notonIconClick)myBool && <MyComponent />.STYLE.md) were followedAvatar, I verified the components usingAvatarare working as expected) (459/459 unit tests green)persistCollectionWriteis shared betweensetCollectionWithRetryandpartialSetCollectionsince their storage tails are byte-identical)/** comment above it */(N/A)thisproperly so there are no scoping issues (i.e. foronClick={this.submit}the methodthis.submitshould be bound tothisin the constructor) (N/A)thisare necessary to be bound (i.e. avoidthis.submit = this.submit.bind(this);ifthis.submitis never passed to a component event handler likeonClick) (N/A)Avataris modified, I verified thatAvataris working as expected in all cases) (pending companion App PR #91626)mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps. (N/A — no merge from main yet)Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
Screen.Recording.2026-05-26.at.14.41.29.mov
Screen.Recording.2026-05-26.at.14.42.06.mov
Screen.Recording.2026-05-26.at.14.42.27.mov
Screen.Recording.2026-05-26.at.14.43.07.mov
Screen.Recording.2026-05-26.at.14.44.17.mov
Screen.Recording.2026-05-26.at.14.44.48.mov
🤖 Generated with Claude Code