fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout by AztecBot · Pull Request #23540 · AztecProtocol/aztec-packages

AztecBot · 2026-05-24T12:43:12Z

Two commits — keep them separate at merge with the ci-no-squash label.

1. refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full

Follow-up to #23539. That PR open-coded Promise.race(service.stop(), sleep(N)) directly in the HA afterAll. This commit pushes the timeout into the shared tryStop helper so other teardown loops can opt in the same way.

yarn-project/stdlib/src/interfaces/service.ts — add optional timeoutMs?: number to tryStop(service, logger, timeoutMs?). When set, the call returns at most after timeoutMs and logs an error naming the service if it did not stop in time. Without the argument, behavior is unchanged for existing call sites.
yarn-project/end-to-end/src/composed/ha/e2e_ha_full.test.ts — replace the inline Promise.race in afterAll with tryStop(service, logger, 30_000).

2. fix(l1-tx-utils): make monitorTransaction unblock immediately on interrupt

In the merge-train dequeue of PR #23344 (log, line ~19144), HA-2's sequencer.stop() blocked for ~23 minutes waiting on an L1 multicall publish before viem finally errored with Transaction sending is interrupted. Root cause: monitorTransaction's loop only re-checked interrupted via isTxTimedOut, after each getL1Timestamp + getTransactionCount round-trip. When anvil's RPC stalled (after eth.warp(..., { resetBlockInterval: true }) plus a test-warped dateProvider), those awaits sat indefinitely.

ReadOnlyL1TxUtils — holds an InterruptibleSleep and a list of interrupt listeners. interrupt() wakes the sleep and rejects every pending wrapped await with InterruptError. Adds a raceInterrupt(promise) helper.
L1TxUtils.monitorTransaction — short-circuits at the top of each iteration if interrupted is set, wraps getL1Timestamp, the Promise.all([getTransactionCount, getTransactionCount]), and both tryGetTxReceipt calls in raceInterrupt(...), replaces sleep(...) with interruptibleSleep.sleep(...), and catches InterruptError to fall through the existing timeout path — callers still see TimeoutError.

Healthy case is unchanged. Existing 'handles interruption during SENT/SPEED_UP/CANCELLED state' tests in l1_tx_utils.test.ts continue to assert the behavior.

Why one PR with two commits

Tooling constraint: create_pr is hardwired to a per-session branch, so a second PR can't be opened from this session. Bundling here with ci-no-squash preserves the separation at merge time. Companion to #23539 (merged) and #23540's original scope.

…_ha_full Adds an optional `timeoutMs` argument to `tryStop` so callers that share a teardown loop with services that can hang indefinitely (e.g. a sequencer awaiting an L1 publish on a test-warped clock) can cap the wait per service. The inline Promise.race + sleep from #23539's afterAll moves into the helper.

…rrupt The publisher's monitor loop already checked `interrupted` via `isTxTimedOut`, but only after every viem RPC await returned. When the L1 endpoint stalled (anvil paused after a test-driven warp, mismatched clocks, etc.) the next `getTransactionCount` or `getBlock` could sit indefinitely, so `sequencer.stop()` blocked waiting on the publish to settle. Two changes: - `ReadOnlyL1TxUtils` now holds an `InterruptibleSleep` and a list of pending await listeners; `interrupt()` wakes the sleep and rejects every wrapped await with `InterruptError`. A small `raceInterrupt(promise)` helper lets callers race viem awaits against the interrupt signal. - `monitorTransaction` wraps `getL1Timestamp`, `getTransactionCount`, and `tryGetTxReceipt` with `raceInterrupt`, uses `interruptibleSleep.sleep` instead of plain `sleep`, and short-circuits at the top of each iteration if `interrupted` is already set. `InterruptError` from a wrapped await falls through the existing timeout-handling path, so callers still see `TimeoutError` exactly as before. This shrinks `node.stop()` from minutes (worst case) to milliseconds when a publish is in flight, and fixes the merge-train dequeue caused by `e2e_ha_full.test.ts` afterAll burning its 20-minute hook budget on a single HA peer's L1 publish.

AztecBot added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels May 24, 2026

AztecBot changed the title ~~refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full~~ fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540

fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540
AztecBot wants to merge 2 commits into
merge-train/spartanfrom
cb/133ce6d845a4

AztecBot commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AztecBot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full

2. fix(l1-tx-utils): make monitorTransaction unblock immediately on interrupt

Why one PR with two commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AztecBot commented May 24, 2026 •

edited

Loading