Fix: Browser context leak causes Chrome crash and cascading test failures by georgephillips · Pull Request #50 · Pagefind/toolproof

georgephillips · 2026-04-08T07:40:23Z

Summary

When running a large test suite (~240 tests) at concurrency 4, Chrome becomes unresponsive after ~50 tests, causing Failed to close browser window: Timeout panics that cascade into mass test failures. The root cause is a browser context leak — each test creates a CDP browser context but never disposes it, causing Chrome to accumulate dozens of zombie contexts until it crashes.

Root cause

In get_window() (definitions/browser/mod.rs), each test creates a new browser context via create_browser_context(), but the context ID is stored in a local variable and dropped after the page is created. Only the Page is returned as BrowserWindow::Chrome(page).

At shutdown, Civilization::shutdown() calls page.close() (CDP Target.closeTarget), which closes the tab but does not dispose the browser context. The dispose_on_detach: true flag only fires when the CDP WebSocket session disconnects (at process exit), not when the page closes.

After ~50 tests, Chrome carries ~50 empty but alive browser contexts with their associated memory/renderer processes. Chrome hits a resource limit, the WebSocket stalls, and all concurrent workers' CDP commands timeout simultaneously.

Observed behavior

~53 tests pass normally
Four thread 'tokio-runtime-worker' panicked messages appear at civilization.rs with Failed to close browser window: Timeout
Every remaining test fails with Step timed out after 30s on the browser load step
Main thread panics at main.rs with Failed to await all tests
Process exits with code 101 (no summary printed)

Fix (three changes)

1. Fix the context leak

Changed BrowserWindow::Chrome to store the BrowserContextId and an Arc<Browser> alongside the page. Updated shutdown() to explicitly call browser.dispose_browser_context(context_id) after closing the page.

2. Graceful error handling on window close

Replaced .expect("Failed to close browser window") in Civilization::shutdown() with error logging and a 5-second timeout guard. Browser teardown failures are now warnings, not panics.

3. Graceful JoinError handling in main

Replaced panic!("Failed to await all tests: {e}") with a filter_map that logs panicked tasks to stderr and continues collecting results.

Files changed

toolproof/src/definitions/browser/mod.rs — BrowserWindow::Chrome now holds { page, context_id, browser }; all match arms updated
toolproof/src/civilization.rs — shutdown disposes context, with timeout + error handling
toolproof/src/main.rs — JoinError logged instead of panicking

Verification

Tested against a ~240 test suite that reliably reproduced the bug:

	Before	After
Panics	3 (`Failed to close browser window: Timeout`)	0
Cascading timeouts	8+ (`Step timed out after 30s`)	0
Process exit	Code 101 (panic)	Code 1 (normal, with summary)
Tests passing	~53 before crash	222

The 18 remaining failures after the fix are real test-level issues (snapshot changes, selector timeouts), not infrastructure cascades.

Environment

macOS (darwin 25.3.0)
Chrome (headless, via chromiumoxide 0.7)
Concurrency: 4 (default 10)
Affected 2 out of ~6 team members (likely Chrome version dependent)

…d enhanced error logging. Updated BrowserWindow structure to include context ID and browser reference.

Improved shutdown process for Chrome windows with timeout handling an…

84f3ec4

…d enhanced error logging. Updated BrowserWindow structure to include context ID and browser reference.

georgephillips requested a review from bglw as a code owner April 8, 2026 07:40

Upgrade chromium

15e895b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Browser context leak causes Chrome crash and cascading test failures#50

Fix: Browser context leak causes Chrome crash and cascading test failures#50
georgephillips wants to merge 2 commits intomainfrom
close-browsers-gracefully

georgephillips commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

georgephillips commented Apr 8, 2026

Summary

Root cause

Observed behavior

Fix (three changes)

1. Fix the context leak

2. Graceful error handling on window close

3. Graceful JoinError handling in main

Files changed

Verification

Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants