Skip to content

Restore stealth plugin and fix fetcher language configuration#1247

Open
clementbiron wants to merge 24 commits into
mainfrom
fetcher-language-config
Open

Restore stealth plugin and fix fetcher language configuration#1247
clementbiron wants to merge 24 commits into
mainfrom
fetcher-language-config

Conversation

@clementbiron

@clementbiron clementbiron commented May 26, 2026

Copy link
Copy Markdown
Member

This PR reactivates puppeteer-extra-plugin-stealth in the full DOM fetcher. Since v10.3.1 (commit 5e4945d2), puppeteer.use(stealthPlugin(...)) was called inside fetch() instead of before puppeteer.launch(), so puppeteer-extra never bound its onPageCreated hooks and the plugin was silently inert. navigator.webdriver and HeadlessChrome were exposed to every tracked service for around four months.

It also applies the @opentermsarchive/engine.fetcher.language configuration to navigator.language and navigator.languages through the stealth sub-evasions, not just the Accept-Language HTTP header that was already affected. The stealth wrapper's locale option was being silently ignored. The redundant setExtraHTTPHeaders and CDP Network.setUserAgentOverride calls in configurePage are removed: they duplicated stealth's path with subtly different results.

Breaking: quality factors (;q=...) are no longer accepted in @opentermsarchive/engine.fetcher.language. Previously accepted values such as en-IE,en-GB;q=0.9,en;q=0.8 now throw at launchHeadlessBrowser. Combined with the now-removed setExtraHTTPHeaders call, those values used to produce malformed headers like q=0.9;q=0.9, and the surface area was not worth preserving. Provide a plain comma-separated priority list instead (e.g. en-IE,en-GB,en); the browser derives the Accept-Language quality factors from tag order.

As a side effect of applying the configuration to navigator.languages, language: "en" now exposes navigator.languages as ["en"] instead of the previous default ["en-US", "en"]. Setting language: "en-US,en" restores the previous default.

@clementbiron clementbiron requested a review from Ndpnt May 26, 2026 14:52
Comment thread src/archivist/fetcher/fullDomFetcher.js Outdated
Comment thread src/archivist/fetcher/fullDomFetcher.test.js Outdated
@clementbiron clementbiron requested a review from Ndpnt May 27, 2026 13:17
Comment thread src/archivist/fetcher/fullDomFetcher.test.js Outdated
Comment thread src/archivist/fetcher/fullDomFetcher.test.js Outdated
Comment thread CHANGELOG.md Outdated
@clementbiron clementbiron requested a review from Ndpnt June 8, 2026 12:07
@clementbiron clementbiron force-pushed the fetcher-language-config branch from a583677 to 220ab3c Compare June 15, 2026 13:42
@clementbiron clementbiron force-pushed the fetcher-language-config branch from 220ab3c to fef5ad3 Compare June 15, 2026 13:49
@Ndpnt Ndpnt force-pushed the fetcher-language-config branch from 8cbb020 to 8c1bbdb Compare June 17, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants