-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Parent: #563
Context
lol_html fragments text nodes across input chunk boundaries when processing HTML incrementally. Script rewriters (NextJsNextDataRewriter, GoogleTagManagerIntegration) currently expect complete text content — if a domain string like "googletagmanager.com" is split across chunks, the rewrite silently fails.
Phase 1 works around this with a dual-mode HtmlRewriterAdapter: streaming mode when no script rewriters are registered, buffered mode when they are. This means streaming only benefits configs without GTM/NextJS script rewriters.
Phase 3 makes the rewriters themselves fragment-safe, enabling streaming for ALL configurations.
Approach
Each script rewriter accumulates text fragments internally via is_last_in_text_node, then operates on the complete text. Key considerations:
- Intermediate fragments must return
Replace("")(notKeep) to suppress output, since the full accumulated text is emitted on the final fragment - When the rewriter returns
Keepon the full text but fragments were suppressed, must emitReplace(full_text)to restore the content - When text is NOT fragmented (single fragment), return
Keepas before — no unnecessary replacement - Multiple rewriters on the same selector (e.g.,
NextJsNextDataRewriteronscript#__NEXT_DATA__+NextJsRscPlaceholderRewriteronscript) each accumulate independently — lasttext.replace()wins, same as current behavior
Tasks
- Add
Mutex<String>accumulation toNextJsNextDataRewriter - Add
Mutex<String>accumulation toGoogleTagManagerIntegration - Remove
new_buffered()fromHtmlRewriterAdapter— always stream - Remove
has_script_rewritersgate fromcreate_html_processor - Add small-chunk-size regression tests:
__NEXT_DATA__rewrite with text split across chunk boundaries- GTM inline script rewrite with domain split across chunk boundaries
- Full verification
Acceptance Criteria
- All script rewriters produce correct output regardless of chunk boundaries
HtmlRewriterAdapteralways streams (no buffered mode)- Streaming benefits all configurations, not just those without script rewriters
- All existing tests pass