Skip to content

fix(plan-engine): runtime block-identity self-heal on compile #3585

Open
andrii-harbour wants to merge 1 commit into
mainfrom
andrii/sd-3283-document-api-mutations-fail-with-document_identity_conflict
Open

fix(plan-engine): runtime block-identity self-heal on compile #3585
andrii-harbour wants to merge 1 commit into
mainfrom
andrii/sd-3283-document-api-mutations-fail-with-document_identity_conflict

Conversation

@andrii-harbour
Copy link
Copy Markdown
Contributor

No description provided.

DOCUMENT_IDENTITY_CONFLICT was blocking every mutation on documents whose
in-session ProseMirror state already carried duplicate paraId / sdBlockId
values. The dominant production path is a Yjs collab restore: the import-
time normalizer (`normalizeDuplicateBlockIdentitiesInContent`) runs only on
fresh DOCX imports, so any duplicate that landed in the persisted Yjs
state (whether copy-paste-cloned paraIds from Word's exporter, or older
runtime splits before keepOnSplit:false shipped) survived every later
hydration. The only recovery in production was "open in Word, re-save,
re-upload" β€” and the legacy error message even told callers to invoke a
`document.repair()` op that did not exist.

This change makes `compilePlan` self-heal a corrupted in-session state
before `assertNoDuplicateBlockIds` fires, using the same rename semantics
the importer already applies.

What landed:

* New `plan-engine/repair-block-identities.ts` walks the PM doc, dispatches
  one transaction with per-attribute `tr.setNodeAttribute` renames (AttrStep,
  no from/to β€” explicitly skipped by structured-content lock and permission-
  ranges filters), then post-dispatch verifies the attrs actually applied;
  throws REPAIR_BLOCKED with offending IDs in the message string when a
  filter rejects it (Python SDK strips `details`).
* `BlockIndex` carries an optional `explicitIdentities` side-channel populated
  during the existing `buildBlockIndex` descendants walk β€” the planner
  consumes it for O(1) clean-doc early-exit, no second walk.
* `compilePlan` gains `CompilePlanOptions { skipIdentityRepair }`; `previewPlan`
  passes it so preview stays non-mutating and surfaces the conflict as a
  PreviewFailure via the existing PlanError catch.
* The repair transaction is meta-tagged with `superdoc/block-identity-repair`.
  Both `trackRevisions` and the story-runtime host-store sync listener
  meta-skip it (placed before `docChanged` so a future metadata-only repair
  also stays revision-invisible), and `trackedTransaction` bypasses
  track-changes wrapping at an explicit check rather than the implicit
  disallowed-meta fall-through.
* `executePlan` now calls `checkRevision` BEFORE `compilePlan` so a stale
  optimistic-concurrency request rejects without first letting the repair
  rewrite identity attrs. Regression test pins that the doc identity is
  unchanged and revision unmoved across a stale-revision rejection.
* `assertNoDuplicateBlockIds` error message embeds the colliding IDs in the
  string (with per-ID + total-count truncation) and points remediation at
  `doc.open`, removing the dead `document.repair()` reference.
* Importer normalizer (`normalizeDuplicateBlockIdentitiesInContent`) refactored
  to consume `getExplicitIdentityEntries` + `groupIdentityEntriesByValue`
  from the shared `block-identity-renaming.js` module; the repair module
  imports the same primitives. Single source of truth for "which attrs
  carry identity" across import-time and runtime.

Coverage:

* Unit: `repair-block-identities.test.ts` pins the fast-path early-exit,
  per-attribute setNodeAttribute, deterministic allocator ordering, and
  the REPAIR_BLOCKED throw on filter-rejected dispatch.
* Regression: `tests/regression/duplicate-block-identities.test.js` exercises
  end-to-end recovery from a Yjs-restore-shaped state through compilePlan +
  executeCompiledPlan, previewPlan non-mutation, and stale-revision rejection
  before any repair mutation.
* Track-changes: `revision-tracker.test.ts` pins the meta-skip ordering
  invariant (skip applies even for hypothetical metadata-only repair txs).
* All affected suites green (180/180 in touched files; full super-editor
  suite ran clean during cross-review iterations).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andrii-harbour andrii-harbour self-assigned this May 30, 2026
@andrii-harbour andrii-harbour requested a review from a team as a code owner May 30, 2026 01:53
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 30, 2026

SD-3283

@github-actions
Copy link
Copy Markdown
Contributor

I wasn't able to get the ecma-spec MCP tools authorized in this session (each call returned a pending-permission error), so I verified the OOXML-relevant claims against my own knowledge of ECMA-376 rather than live spec lookups. Flagging that up front for transparency.

Status: PASS

The two files in scope β€” block-identity-renaming.js and normalizeDuplicateBlockIdentitiesInContent.js β€” are block-identity dedup utilities operating on ProseMirror node attributes, not XML element/attribute serializers. They neither parse nor emit OOXML element/attribute names, so most of the usual spec-violation surface (non-existent elements, missing required attrs) doesn't apply here. The only attribute that maps to real OOXML is paraId (w14:paraId); sdBlockId and blockId are SuperDoc-internal identifiers with no OOXML counterpart.

On the paraId handling, the code is consistent with the spec:

  • 8 uppercase hex digits (block-identity-renaming.js:28,141) β€” w14:paraId is typed ST_LongHexNumber, i.e. a 4-byte hex value = exactly 8 hex digits. The padding/length is correct, and uppercase is the conventional Word output form. βœ“
  • Synthesizing paraId on paragraph and tableRow (SYNTHETIC_PARA_ID_TYPES, line 26) β€” matches where Word actually emits w14:paraId: on both <w:p> and <w:tr>. No spurious types. βœ“
  • The 0x00000001 … 0xFFFFFFFF allocation range (lines 29, 137-148) stays within the 4-byte ST_LongHexNumber domain. βœ“

No incorrect defaults, no fabricated attributes, no missing-required-attribute issues at the OOXML layer. The refactor is a pure extract-and-share of the existing normalizer logic β€” identity semantics are unchanged β€” and the surrounding runtime-repair/revision-tracker changes don't touch OOXML serialization.

One caveat worth noting (not a spec violation): if you'd like me to confirm the w14:paraId-on-<w:tr> placement and the ST_LongHexNumber definition against the actual ECMA-376 text rather than from memory, grant the mcp__ecma-spec__* tool permissions and I'll re-run the lookups to make this airtight.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 15 files

Linked issue analysis

Linked issue: SD-3283: Document API mutations fail with DOCUMENT_IDENTITY_CONFLICT on duplicate DOCX paragraph identities

Status Acceptance criteria Notes
⚠️ Add a regression test with two paragraph-like blocks sharing the same imported `paraId` but having unique `sdBlockId`s. New unit and regression tests exercise duplicated paraId scenarios and the runtime repair pass, but the added tests create duplicate `paraId` cases and assert `paraId` renames; I don't see an explicit test that sets distinct `sdBlockId` values while sharing the same imported `paraId` (the grouping logic supports multi-attr groups).
βœ… Mutations no longer fail with `DOCUMENT_IDENTITY_CONFLICT` for duplicate imported paragraph IDs when a unique runtime identity can be assigned safely. compilePlan now runs a repair pass before asserting duplicates; repair dispatches metadata-only AttrSteps and tests assert compile/execute succeed and index is deduped afterward.
⚠️ Truly ambiguous/duplicate non-repairable identities still fail with a clear error. The repair code detects when a transaction filter blocks the repair and throws a `REPAIR_BLOCKED` PlanError with a remediation message; however, I don't see a focused unit/test that constructs an unrecoverable/ambiguous identity and asserts the exact error path in this PR (the code implements the behavior).
⚠️ The customer workaround of Word re-open/re-upload is no longer required for affected precedents/documents. The runtime repair reproduces import-time dedup semantics and tests show in-session duplicates are healed so mutations succeed without re-import, which addresses the workaround in typical cases. Full proof against production precedents is out of scope for unit/CI tests, so this is implemented but not proven at scale here.
βœ… Error remediation text matches an actually available API/workflow. Error messages and `remediation` payload were updated to point to `doc.open` re-import, and tests were updated to assert the new message and to preserve colliding IDs in the message for SDK consumers.

Re-trigger cubic

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9dfcf81a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// `sdBlockId`-only collisions, because `resolveBlockNodeId` projects via
// paraId first and the index-level scan never observes the sdBlockId.
if (!options.skipIdentityRepair) {
const repair = repairDuplicateBlockIdentities(editor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve refs before repairing duplicate identities

When a duplicate-laden document is queried before applying a mutation, queryMatchAdapter/findLegacyAdapter issue V4 refs using the current blockId values (see find/common.ts's toTextAddress and query-match-adapter.ts's ref segments) without running this repair. This new repair runs before ref resolution and is revision-invisible, so a ref that originally pointed at the later duplicate still contains the old shared blockId after that block is renamed; resolveV4TextRef then resolves index.candidates.find(c => c.nodeId === seg.blockId) to the first remaining block with that id, causing mutations.apply to mutate the wrong block or fail. Either repair before refs are issued, or reject/remap pre-repair refs after this dispatch.

Useful? React with πŸ‘Β / πŸ‘Ž.

Comment on lines +655 to +657
// the compiler dispatches an in-place identity-repair transaction before
// assembling the plan; pass `{ skipIdentityRepair: true }` if a strictly
// non-mutating compile is required (as `previewPlan` does). The story
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check revisions before mutating selection requests

For selectionMutationWrapper, the caller's expectedRevision is still checked only after compilePlan returns (line 711), but this comment documents that compilation can now dispatch the identity-repair transaction. On a duplicate-laden doc, a stale selection mutation or dry run with expectedRevision will rewrite block identities before throwing REVISION_MISMATCH, violating the optimistic-concurrency/no-mutation-on-reject behavior that executePlan now protects by checking before compile. Move the revision guard ahead of this compile path as well.

Useful? React with πŸ‘Β / πŸ‘Ž.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

βœ… All modified and coverable lines are covered by tests.

πŸ“’ Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants