Skip to content

Fix: Browser context leak causes Chrome crash and cascading test failures#50

Open
georgephillips wants to merge 2 commits intomainfrom
close-browsers-gracefully
Open

Fix: Browser context leak causes Chrome crash and cascading test failures#50
georgephillips wants to merge 2 commits intomainfrom
close-browsers-gracefully

Conversation

@georgephillips
Copy link
Copy Markdown
Contributor

Summary

When running a large test suite (~240 tests) at concurrency 4, Chrome becomes unresponsive after ~50 tests, causing Failed to close browser window: Timeout panics that cascade into mass test failures. The root cause is a browser context leak — each test creates a CDP browser context but never disposes it, causing Chrome to accumulate dozens of zombie contexts until it crashes.

Root cause

In get_window() (definitions/browser/mod.rs), each test creates a new browser context via create_browser_context(), but the context ID is stored in a local variable and dropped after the page is created. Only the Page is returned as BrowserWindow::Chrome(page).

At shutdown, Civilization::shutdown() calls page.close() (CDP Target.closeTarget), which closes the tab but does not dispose the browser context. The dispose_on_detach: true flag only fires when the CDP WebSocket session disconnects (at process exit), not when the page closes.

After ~50 tests, Chrome carries ~50 empty but alive browser contexts with their associated memory/renderer processes. Chrome hits a resource limit, the WebSocket stalls, and all concurrent workers' CDP commands timeout simultaneously.

Observed behavior

  1. ~53 tests pass normally
  2. Four thread 'tokio-runtime-worker' panicked messages appear at civilization.rs with Failed to close browser window: Timeout
  3. Every remaining test fails with Step timed out after 30s on the browser load step
  4. Main thread panics at main.rs with Failed to await all tests
  5. Process exits with code 101 (no summary printed)

Fix (three changes)

1. Fix the context leak

Changed BrowserWindow::Chrome to store the BrowserContextId and an Arc<Browser> alongside the page. Updated shutdown() to explicitly call browser.dispose_browser_context(context_id) after closing the page.

2. Graceful error handling on window close

Replaced .expect("Failed to close browser window") in Civilization::shutdown() with error logging and a 5-second timeout guard. Browser teardown failures are now warnings, not panics.

3. Graceful JoinError handling in main

Replaced panic!("Failed to await all tests: {e}") with a filter_map that logs panicked tasks to stderr and continues collecting results.

Files changed

  • toolproof/src/definitions/browser/mod.rsBrowserWindow::Chrome now holds { page, context_id, browser }; all match arms updated
  • toolproof/src/civilization.rs — shutdown disposes context, with timeout + error handling
  • toolproof/src/main.rs — JoinError logged instead of panicking

Verification

Tested against a ~240 test suite that reliably reproduced the bug:

Before After
Panics 3 (Failed to close browser window: Timeout) 0
Cascading timeouts 8+ (Step timed out after 30s) 0
Process exit Code 101 (panic) Code 1 (normal, with summary)
Tests passing ~53 before crash 222

The 18 remaining failures after the fix are real test-level issues (snapshot changes, selector timeouts), not infrastructure cascades.

Environment

  • macOS (darwin 25.3.0)
  • Chrome (headless, via chromiumoxide 0.7)
  • Concurrency: 4 (default 10)
  • Affected 2 out of ~6 team members (likely Chrome version dependent)

…d enhanced error logging. Updated BrowserWindow structure to include context ID and browser reference.
@georgephillips georgephillips requested a review from bglw as a code owner April 8, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants