test(ci): drain event loop with setImmediate after every test (mitigation hypothesis)#7844
test(ci): drain event loop with setImmediate after every test (mitigation hypothesis)#7844JohnMcLear wants to merge 1 commit into
Conversation
Mitigation hypothesis for the Windows backend silent ELIFECYCLE flake. Ten captured deaths so far on the merged diagnostic infrastructure (#7838, #7842) show a consistent shape: 200-400 ms after a test starts, the heartbeat (5 Hz setInterval) goes silent for the entire death window — clear evidence the event loop has stopped servicing timers — and the process is then externally terminated, bypassing all of Node's JS-level handlers, --report-on-fatalerror, and --report-uncaught-exception. Pre-kill state in the libuv handle trace is nominal (3-7 handles, no leak, no spike). Dying tests span supertest+JWT HTTP, socket.io connect bursts, and DOCX export round-trips — different surface code, same fingerprint. The common substrate is rapid loopback TCP and queued I/O across test boundaries. Insert a single setImmediate yield in the mocha root afterEach so the event loop has a deterministic drain point at every test boundary. If kill rate drops materially on the Windows backend matrix after this lands, cumulative event-loop pressure is the trigger and we have a working mitigation; if it doesn't change, we rule that out and look at per-test pathologies (jose CNG, specific Express middleware paths, etc.). Cost: ~600 tests × 1 setImmediate ≈ negligible compared to the multi-minute backend test phase. Locally verified: a 3-test probe runs cleanly with the new async afterEach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoDrain event loop with setImmediate after every test
WalkthroughsDescription• Add setImmediate yield in mocha afterEach hook • Drain event loop at every test boundary deterministically • Mitigate Windows backend silent ELIFECYCLE flake hypothesis • Cumulative I/O pressure across tests suspected root cause Diagramflowchart LR
A["Test execution"] -->|afterEach hook| B["setImmediate yield"]
B -->|event loop drain| C["Queued I/O callbacks processed"]
C -->|deterministic boundary| D["Next test starts clean"]
E["Hypothesis: cumulative I/O pressure"] -->|mitigation| B
File Changes1. src/tests/backend/diagnostics.ts
|
Code Review by Qodo
1. Global drain not gated
|
|
Closing — the setImmediate drain in afterEach causes real plugin-test regressions (ep_subscript_and_superscript's |
… to etherpad's own backend specs PR #7854's first iteration added the yield to 6 known-dying spec files (pad.ts, importexportGetPost.ts, socketio.ts, messages.ts, import.ts, clientvar_rev_consistency.ts). Linux backend matrix passed, proving the yield doesn't break the affected tests' own state-sharing assumptions. But the very next Win+plugins run captured **death #13 in sessionsAndGroups.ts**, a 7th file outside the scoped fix. The flake migrated rather than being suppressed. That's strong evidence the trigger is the rapid-sequential-test pattern in general, not specific files. Replace the per-file scope with a root-level `mochaHooks.beforeEach` yield in diagnostics.ts, gated on a file-path check: yield for ether/etherpad's own specs in `tests/backend/specs/`, SKIP for plugin tests loaded from `../node_modules/ep_*/static/tests/backend/specs/`. The plugin-test skip exists because PR #7844 demonstrated that an unconditional global yield breaks `ep_subscript_and_superscript`'s `returns HTML with Subscript HTML tags` series — those plugin tests share state across describe-block boundaries and don't tolerate any microtask reordering. The file-path check preserves PR #7844's finding without re-breaking those tests. Files modified: - src/tests/backend/diagnostics.ts: root beforeEach yield, scoped Per-file changes from the previous commit are reverted — root scope supersedes them and there's no point yielding twice per test. Test plan unchanged from the original PR: - Linux ± plugins must pass. - Windows ± plugins flake rate: ~22% pre-fix. Post-fix, run the CI 5-10x and compare. If unchanged, cadence is ruled out as the trigger and we look at per-test pathologies (jose CNG on Windows, libuv IOCP edge cases unrelated to load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Insert a single
setImmediateyield in the mocha rootafterEachso the event loop drains queued I/O callbacks at every test boundary. Pure mitigation hypothesis test — no other changes.Why
Ten captured deaths on the merged diagnostic infrastructure (#7838, #7842) show a consistent shape:
setIntervalgoes silent for the entire death window (no timer events at all).unhandledRejection,uncaughtException,--report-on-fatalerror,--report-uncaught-exception, and all signal handlers.The common substrate across all 10 deaths is rapid loopback TCP and queued I/O across test boundaries. This experiment tests one specific hypothesis: cumulative event-loop pressure across tests is the trigger. A
setImmediateyield inafterEachforces a deterministic drain at every boundary instead of letting work stack across tests.Expected outcome
Cost
~600 tests × 1 setImmediate yield ≈negligible compared to the multi-minute backend test phase. Locally verified — a 3-test probe runs cleanly with the new asyncafterEach.🤖 Generated with Claude Code