You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds WP_HTML_Processor::set_inner_html() for non-atomic tag openers. The method replaces the target element's raw inner HTML only when the replacement can be parsed without changing the tree outside the target; otherwise it returns false and leaves the source unchanged.
Also adds focused PHPUnit coverage and a deterministic standalone fuzzer for the outside-tree invariant, including BODY/HTML attribute-hoisting cases such as and safe template/foreign-content exceptions.
The current implementation is intentionally conservative: it validates the proposed replacement by parsing in full document/fragment context rather than trying to reason about the replacement string in isolation.
High-level flow:
Reject unsupported call sites: virtual tokens, non-matched states, tag closers, atomic/non-closer elements, integration-node tokens, or tokens without a source bookmark.
Flush pending lexical updates so validation runs against the same source that will receive the replacement.
Locate the target opener by source span and find the raw inner-HTML byte range by reparsing until the target element is popped/closed. This handles explicit closers, implicit closers, EOF virtual pops, and special full-document BODY/HTML end behavior.
Build candidate source by splicing the proposed inner HTML into the original source.
Reparse the original and candidate in the same public parsing mode and compute an outside-tree signature. The signature records tokens outside the target, including token type/name/namespace, closer state, breadcrumbs, and serialized token. Tokens inside the target are skipped for comparison, but still processed because they can affect parser state and where the target closes.
Compare active formatting element state at target entry/exit to catch parser-state leaks such as reconstruction outside the target.
Track non-visitable parser events for BODY/HTML start/end tags that are consumed without normal visitable stack events. Attribute-bearing <body ...> / <html ...> tokens inside the target/replacement range are rejected because they may hoist attributes onto the real body/html element rather than remain target-local. Safe cases such as template content and foreign-content HTML-looking tags are allowed.
Queue one raw lexical replacement only if the original and candidate outside signatures match and no unsupported/parser-error condition is encountered. Otherwise return false and leave the source unchanged.
In other words: the safety property is enforced by full in-context reparsing plus strict outside-token comparison, with extra bookkeeping for parser effects that are not visible as normal visited tokens.
Performance notes / possible optimizations
The current version prioritizes correctness and simplicity, but it does extra work. It can parse the original once to find the target end, parse the original again for the outside signature, and parse the candidate for the candidate outside signature.
Potential follow-ups:
Merge original passes. The end-finding pass and original outside-signature pass could likely be combined into one original parse that finds inner_end, records the outside signature, and detects original-side BODY/HTML hoist events.
Stop at target close with a complete parser-state signature. Instead of parsing through EOF, compare original/candidate state immediately after the target closes. This could avoid reparsing an unchanged suffix, but only if the state signature is complete: open elements, active formatting elements, insertion mode/template mode, namespace/integration state, form/head/frameset state, and hoist events. This is the highest-risk optimization because omitting one state field could make the check unsound.
Fast path for text-only replacements. If the replacement contains no markup introducers, common cases could skip candidate reparsing after inner_end is known, because plain text cannot introduce closers, active formatting changes, table repairs, or body/html hoists.
Cheap pre-scan for obvious rejects. A lexical scan for high-risk constructs like target closers, <body, <html, nested non-nestable tags, or unclosed active formatting elements could reject many invalid replacements before constructing/parsing the full candidate. This would be an optimization only; the full parser validation should remain authoritative.
Cache target metadata. If repeated attempts are made at the same processor position, the computed original target end/signature metadata could be reused.
The safest near-term optimization is merging the original passes. The largest potential win is stopping at target close with a complete parser-state comparison, but that requires careful proof that the state snapshot fully determines parsing of the unchanged suffix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Adds WP_HTML_Processor::set_inner_html() for non-atomic tag openers. The method replaces the target element's raw inner HTML only when the replacement can be parsed without changing the tree outside the target; otherwise it returns false and leaves the source unchanged.
Also adds focused PHPUnit coverage and a deterministic standalone fuzzer for the outside-tree invariant, including BODY/HTML attribute-hoisting cases such as and safe template/foreign-content exceptions.
Validation