Skip to content

fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540

Draft
AztecBot wants to merge 2 commits into
merge-train/spartanfrom
cb/133ce6d845a4
Draft

fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout#23540
AztecBot wants to merge 2 commits into
merge-train/spartanfrom
cb/133ce6d845a4

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 24, 2026

Two commits — keep them separate at merge with the ci-no-squash label.

1. refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full

Follow-up to #23539. That PR open-coded Promise.race(service.stop(), sleep(N)) directly in the HA afterAll. This commit pushes the timeout into the shared tryStop helper so other teardown loops can opt in the same way.

  • yarn-project/stdlib/src/interfaces/service.ts — add optional timeoutMs?: number to tryStop(service, logger, timeoutMs?). When set, the call returns at most after timeoutMs and logs an error naming the service if it did not stop in time. Without the argument, behavior is unchanged for existing call sites.
  • yarn-project/end-to-end/src/composed/ha/e2e_ha_full.test.ts — replace the inline Promise.race in afterAll with tryStop(service, logger, 30_000).

2. fix(l1-tx-utils): make monitorTransaction unblock immediately on interrupt

In the merge-train dequeue of PR #23344 (log, line ~19144), HA-2's sequencer.stop() blocked for ~23 minutes waiting on an L1 multicall publish before viem finally errored with Transaction sending is interrupted. Root cause: monitorTransaction's loop only re-checked interrupted via isTxTimedOut, after each getL1Timestamp + getTransactionCount round-trip. When anvil's RPC stalled (after eth.warp(..., { resetBlockInterval: true }) plus a test-warped dateProvider), those awaits sat indefinitely.

  • ReadOnlyL1TxUtils — holds an InterruptibleSleep and a list of interrupt listeners. interrupt() wakes the sleep and rejects every pending wrapped await with InterruptError. Adds a raceInterrupt(promise) helper.
  • L1TxUtils.monitorTransaction — short-circuits at the top of each iteration if interrupted is set, wraps getL1Timestamp, the Promise.all([getTransactionCount, getTransactionCount]), and both tryGetTxReceipt calls in raceInterrupt(...), replaces sleep(...) with interruptibleSleep.sleep(...), and catches InterruptError to fall through the existing timeout path — callers still see TimeoutError.

Healthy case is unchanged. Existing 'handles interruption during SENT/SPEED_UP/CANCELLED state' tests in l1_tx_utils.test.ts continue to assert the behavior.

Why one PR with two commits

Tooling constraint: create_pr is hardwired to a per-session branch, so a second PR can't be opened from this session. Bundling here with ci-no-squash preserves the separation at merge time. Companion to #23539 (merged) and #23540's original scope.

…_ha_full

Adds an optional `timeoutMs` argument to `tryStop` so callers that share a
teardown loop with services that can hang indefinitely (e.g. a sequencer
awaiting an L1 publish on a test-warped clock) can cap the wait per service.
The inline Promise.race + sleep from #23539's afterAll moves into the helper.
@AztecBot AztecBot added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels May 24, 2026
…rrupt

The publisher's monitor loop already checked `interrupted` via `isTxTimedOut`,
but only after every viem RPC await returned. When the L1 endpoint stalled
(anvil paused after a test-driven warp, mismatched clocks, etc.) the next
`getTransactionCount` or `getBlock` could sit indefinitely, so `sequencer.stop()`
blocked waiting on the publish to settle.

Two changes:

- `ReadOnlyL1TxUtils` now holds an `InterruptibleSleep` and a list of pending
  await listeners; `interrupt()` wakes the sleep and rejects every wrapped
  await with `InterruptError`. A small `raceInterrupt(promise)` helper lets
  callers race viem awaits against the interrupt signal.

- `monitorTransaction` wraps `getL1Timestamp`, `getTransactionCount`, and
  `tryGetTxReceipt` with `raceInterrupt`, uses `interruptibleSleep.sleep`
  instead of plain `sleep`, and short-circuits at the top of each iteration
  if `interrupted` is already set. `InterruptError` from a wrapped await
  falls through the existing timeout-handling path, so callers still see
  `TimeoutError` exactly as before.

This shrinks `node.stop()` from minutes (worst case) to milliseconds when a
publish is in flight, and fixes the merge-train dequeue caused by
`e2e_ha_full.test.ts` afterAll burning its 20-minute hook budget on a single
HA peer's L1 publish.
@AztecBot AztecBot changed the title refactor(stdlib): add optional timeoutMs to tryStop and use it in e2e_ha_full fix(l1-tx-utils, e2e_ha_full): unblock node.stop() on interrupt + tryStop timeout May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant