Skip to content

fix(export): prevent DOCX corruption from UTF-16 XML parts and schema violations (SD-2170)#2349

Open
caio-pizzol wants to merge 1 commit intomainfrom
caio/sd-2170-fix-wfootnotepr-element-ordering-in-settingsxml-export
Open

fix(export): prevent DOCX corruption from UTF-16 XML parts and schema violations (SD-2170)#2349
caio-pizzol wants to merge 1 commit intomainfrom
caio/sd-2170-fix-wfootnotepr-element-ordering-in-settingsxml-export

Conversation

@caio-pizzol
Copy link
Contributor

  1. UTF-16 encoded customXml parts (e.g. iManage metadata) were garbled during re-zip because exportFromOriginalFile read all files as UTF-8 strings. Now reads as raw bytes and decodes with ensureXmlString.

  2. ensureXmlString now updates the XML encoding declaration after transcoding from UTF-16 to UTF-8.

  3. Empty w:numbering root element (no abstractNum/num children) written when all list definitions are pruned. Now deletes numbering.xml entirely and passes null to updatedDocs to trigger zip removal.

  4. [Content_Types].xml retained stale Override entries for deleted parts. Now prunes overrides for any part set to null in updatedDocs.

  5. w:footnotePr element ordering in settings.xml — filter+push moved it to the end, violating CT_Settings xsd:sequence. Now replaces in-place or inserts before w:endnotePr.

  6. Bibliography customXml preserved as-is when there are zero sources, avoiding unnecessary XML recreation.

… violations

SD-2170: Multiple export issues caused Word to report "unreadable content":

1. UTF-16 encoded customXml parts (e.g. iManage metadata) were garbled
   during re-zip because exportFromOriginalFile read all files as UTF-8
   strings. Now reads as raw bytes and decodes with ensureXmlString.

2. ensureXmlString now updates the XML encoding declaration after
   transcoding from UTF-16 to UTF-8.

3. Empty w:numbering root element (no abstractNum/num children) written
   when all list definitions are pruned. Now deletes numbering.xml
   entirely and passes null to updatedDocs to trigger zip removal.

4. [Content_Types].xml retained stale Override entries for deleted parts.
   Now prunes overrides for any part set to null in updatedDocs.

5. w:footnotePr element ordering in settings.xml — filter+push moved it
   to the end, violating CT_Settings xsd:sequence. Now replaces in-place
   or inserts before w:endnotePr.

6. Bibliography customXml preserved as-is when there are zero sources,
   avoiding unnecessary XML recreation.
@linear
Copy link

linear bot commented Mar 10, 2026

@github-actions
Copy link
Contributor

Status: PASS

The OOXML-relevant changes all look correct. A few notes worth flagging:

footnotesExporter.js — w:footnotePr insertion ordering

The new logic inserts w:footnotePr before w:endnotePr, or before w:compat as a fallback. The comment calls w:endnotePr the "next element in the XSD sequence," but CT_Settings actually uses xs:all rather than xs:sequence, meaning child element order is irrelevant for strict schema conformance. The insertion behavior is still fine in practice — maintaining a document-natural ordering is good hygiene — but the comment is slightly misleading. See https://ooxml.dev/spec?q=CT_Settings for the schema definition.

SuperConverter.js — empty w:numbering comment

The comment claims "An empty w:numbering root element violates the OOXML schema." This isn't quite right — CT_Numbering children (abstractNum, num) both have minOccurs="0", so an empty <w:numbering/> is technically schema-valid. The real problem is Microsoft Word's intolerance of it at runtime (SD-2170). The fix (omitting the file entirely when there are no definitions) is the right call — a document without numbering just omits word/numbering.xml — but the stated rationale in the comment overstates the schema violation.

Neither of these is a spec violation in the generated output. The other changes — OPC Override pruning for deleted parts, updating the XML declaration encoding attribute after transcoding, and the null-safety guard around word/numbering.xml — are all correct per their respective specs.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7306211037

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// An empty w:numbering root element violates the OOXML schema
// and causes Word to report corruption (SD-2170).
if (liveAbstracts.length === 0 && liveDefinitions.length === 0) {
delete this.convertedXml[numberingPath];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prune numbering relationship when removing numbering.xml

When #exportNumberingFile deletes word/numbering.xml, the export pipeline does not also remove the corresponding relationship from word/_rels/document.xml.rels (relationship pruning currently only handles comment types), so documents that originally had a numbering relationship can be exported with a dangling internal target. In the “all lists removed” path this still leaves an invalid package graph and can continue to trigger Word repair/corruption warnings despite deleting the part itself.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant