fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540
Draft
AztecBot wants to merge 2 commits into
Draft
fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540AztecBot wants to merge 2 commits into
AztecBot wants to merge 2 commits into
Conversation
…_ha_full Adds an optional `timeoutMs` argument to `tryStop` so callers that share a teardown loop with services that can hang indefinitely (e.g. a sequencer awaiting an L1 publish on a test-warped clock) can cap the wait per service. The inline Promise.race + sleep from #23539's afterAll moves into the helper.
…rrupt The publisher's monitor loop already checked `interrupted` via `isTxTimedOut`, but only after every viem RPC await returned. When the L1 endpoint stalled (anvil paused after a test-driven warp, mismatched clocks, etc.) the next `getTransactionCount` or `getBlock` could sit indefinitely, so `sequencer.stop()` blocked waiting on the publish to settle. Two changes: - `ReadOnlyL1TxUtils` now holds an `InterruptibleSleep` and a list of pending await listeners; `interrupt()` wakes the sleep and rejects every wrapped await with `InterruptError`. A small `raceInterrupt(promise)` helper lets callers race viem awaits against the interrupt signal. - `monitorTransaction` wraps `getL1Timestamp`, `getTransactionCount`, and `tryGetTxReceipt` with `raceInterrupt`, uses `interruptibleSleep.sleep` instead of plain `sleep`, and short-circuits at the top of each iteration if `interrupted` is already set. `InterruptError` from a wrapped await falls through the existing timeout-handling path, so callers still see `TimeoutError` exactly as before. This shrinks `node.stop()` from minutes (worst case) to milliseconds when a publish is in flight, and fixes the merge-train dequeue caused by `e2e_ha_full.test.ts` afterAll burning its 20-minute hook budget on a single HA peer's L1 publish.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two commits — keep them separate at merge with the
ci-no-squashlabel.1. refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full
Follow-up to #23539. That PR open-coded
Promise.race(service.stop(), sleep(N))directly in the HAafterAll. This commit pushes the timeout into the sharedtryStophelper so other teardown loops can opt in the same way.yarn-project/stdlib/src/interfaces/service.ts— add optionaltimeoutMs?: numbertotryStop(service, logger, timeoutMs?). When set, the call returns at most aftertimeoutMsand logs an error naming the service if it did not stop in time. Without the argument, behavior is unchanged for existing call sites.yarn-project/end-to-end/src/composed/ha/e2e_ha_full.test.ts— replace the inlinePromise.raceinafterAllwithtryStop(service, logger, 30_000).2. fix(l1-tx-utils): make monitorTransaction unblock immediately on interrupt
In the merge-train dequeue of PR #23344 (log, line ~19144), HA-2's
sequencer.stop()blocked for ~23 minutes waiting on an L1 multicall publish before viem finally errored withTransaction sending is interrupted. Root cause:monitorTransaction's loop only re-checkedinterruptedviaisTxTimedOut, after eachgetL1Timestamp+getTransactionCountround-trip. When anvil's RPC stalled (aftereth.warp(..., { resetBlockInterval: true })plus a test-warpeddateProvider), those awaits sat indefinitely.ReadOnlyL1TxUtils— holds anInterruptibleSleepand a list of interrupt listeners.interrupt()wakes the sleep and rejects every pending wrapped await withInterruptError. Adds araceInterrupt(promise)helper.L1TxUtils.monitorTransaction— short-circuits at the top of each iteration ifinterruptedis set, wrapsgetL1Timestamp, thePromise.all([getTransactionCount, getTransactionCount]), and bothtryGetTxReceiptcalls inraceInterrupt(...), replacessleep(...)withinterruptibleSleep.sleep(...), and catchesInterruptErrorto fall through the existing timeout path — callers still seeTimeoutError.Healthy case is unchanged. Existing
'handles interruption during SENT/SPEED_UP/CANCELLED state'tests inl1_tx_utils.test.tscontinue to assert the behavior.Why one PR with two commits
Tooling constraint:
create_pris hardwired to a per-session branch, so a second PR can't be opened from this session. Bundling here withci-no-squashpreserves the separation at merge time. Companion to #23539 (merged) and #23540's original scope.