fix(export): prevent DOCX corruption from UTF-16 XML parts and schema violations (SD-2170)#2349
Conversation
… violations SD-2170: Multiple export issues caused Word to report "unreadable content": 1. UTF-16 encoded customXml parts (e.g. iManage metadata) were garbled during re-zip because exportFromOriginalFile read all files as UTF-8 strings. Now reads as raw bytes and decodes with ensureXmlString. 2. ensureXmlString now updates the XML encoding declaration after transcoding from UTF-16 to UTF-8. 3. Empty w:numbering root element (no abstractNum/num children) written when all list definitions are pruned. Now deletes numbering.xml entirely and passes null to updatedDocs to trigger zip removal. 4. [Content_Types].xml retained stale Override entries for deleted parts. Now prunes overrides for any part set to null in updatedDocs. 5. w:footnotePr element ordering in settings.xml — filter+push moved it to the end, violating CT_Settings xsd:sequence. Now replaces in-place or inserts before w:endnotePr. 6. Bibliography customXml preserved as-is when there are zero sources, avoiding unnecessary XML recreation.
|
Status: PASS The OOXML-relevant changes all look correct. A few notes worth flagging: footnotesExporter.js — The new logic inserts SuperConverter.js — empty The comment claims "An empty Neither of these is a spec violation in the generated output. The other changes — OPC Override pruning for deleted parts, updating the XML declaration encoding attribute after transcoding, and the null-safety guard around |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7306211037
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // An empty w:numbering root element violates the OOXML schema | ||
| // and causes Word to report corruption (SD-2170). | ||
| if (liveAbstracts.length === 0 && liveDefinitions.length === 0) { | ||
| delete this.convertedXml[numberingPath]; |
There was a problem hiding this comment.
Prune numbering relationship when removing numbering.xml
When #exportNumberingFile deletes word/numbering.xml, the export pipeline does not also remove the corresponding relationship from word/_rels/document.xml.rels (relationship pruning currently only handles comment types), so documents that originally had a numbering relationship can be exported with a dangling internal target. In the “all lists removed” path this still leaves an invalid package graph and can continue to trigger Word repair/corruption warnings despite deleting the part itself.
Useful? React with 👍 / 👎.
UTF-16 encoded customXml parts (e.g. iManage metadata) were garbled during re-zip because exportFromOriginalFile read all files as UTF-8 strings. Now reads as raw bytes and decodes with ensureXmlString.
ensureXmlString now updates the XML encoding declaration after transcoding from UTF-16 to UTF-8.
Empty w:numbering root element (no abstractNum/num children) written when all list definitions are pruned. Now deletes numbering.xml entirely and passes null to updatedDocs to trigger zip removal.
[Content_Types].xml retained stale Override entries for deleted parts. Now prunes overrides for any part set to null in updatedDocs.
w:footnotePr element ordering in settings.xml — filter+push moved it to the end, violating CT_Settings xsd:sequence. Now replaces in-place or inserts before w:endnotePr.
Bibliography customXml preserved as-is when there are zero sources, avoiding unnecessary XML recreation.