Skip to content

feat: merge-train/spartan#20476

Open
AztecBot wants to merge 39 commits intonextfrom
merge-train/spartan
Open

feat: merge-train/spartan#20476
AztecBot wants to merge 39 commits intonextfrom
merge-train/spartan

Conversation

@AztecBot
Copy link
Collaborator

@AztecBot AztecBot commented Feb 13, 2026

BEGIN_COMMIT_OVERRIDE
fix: stringify all bigints in pino-logger (#20303)
chore: ensure consistent HA DB timestamps (#20398)
chore: log L1 trace errors once in debug (#20380)
test(archiver): add missing reorg and prune unit tests (#20441)
feat(ci.aztec-labs.com): CI cost and metrics tracking (#20100)
refactor(ethereum): cleaner initialization for tx delayer in l1txutils (#20319)
chore(validator): blob upload tests (#20463)
chore: move TXE ports out of Linux ephemeral range (#20475)
fix(node): sync ws before simulating public calls (#20499)
fix(ethereum): check timeout before consuming nonce in L1TxUtils (#20501)
chore(e2e): reenable block building test (#20504)
refactor(sequencer): rename block-level metrics to checkpoint-level (#20505)
feat(archiver): return L2 block data to avoid fetching full block (#20503)
chore(mbps): clean up TODOs for multiple blocks per slot (#20502)
feat(archiver): add l2 tips cache (#20510)
chore(e2e): toggle mbps in e2e tests (#20315)
END_COMMIT_OVERRIDE

spypsy and others added 14 commits February 9, 2026 17:22
…xUtils wrapping

E2e tests previously reached deep into component internals to replace L1TxUtils
instances with DelayedTxUtils wrappers, resulting in fragile code like:
`(((proverNode as TestProverNode).publisher as ProverNodePublisher).l1TxUtils as DelayedTxUtils).delayer!`

This refactoring makes the delayer config-driven:
- Add `enableDelayer` and `txDelayerMaxInclusionTimeIntoSlot` config fields to L1TxUtilsConfig
- Move tx_delayer.ts from test/ to l1_tx_utils/ so factories can use it
- Delete DelayedTxUtils class (replaced by delayer field on L1TxUtils)
- Apply delayer automatically in L1TxUtils factories when enabled via config
- Share a single DelayerImpl across all L1TxUtils instances per component
- Store sequencer delayer in SequencerClient, accessible via getDelayer()
- Remove Sequencer.publisher field (no longer needed by any test)
- Add interruptAll() to SequencerPublisherFactory for shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Unify ViemWallet/EthSigner factory pairs into single functions using
  L1SignerSource union type (6 low-level factories -> 2)
- Delete L1TxUtilsWithBlobs class; blob support is now opt-in via kzg
  parameter on L1TxUtils constructor
- Internalize delayer: L1TxUtils constructor wraps its own client when
  enableDelayer is set in config, deleting applyDelayer and withDelayer
- Move ethereumSlotDuration from deps threading into L1TxUtilsConfig
- Simplify node-lib factories from 6 to 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The createL1TxUtilsFromSigners factory now requires a dateProvider for
the config-driven delayer, but the integration test was not providing it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The default config maps `enableDelayer` to `false`. The previous code used
`config.enableDelayer ?? true`, but `??` only triggers on null/undefined,
not on `false`, so the delayer was never enabled in tests. This caused all
e2e_epochs tests to fail with "Could not find prover or sequencer delayer".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four new tests covering previously untested archiver reorg/prune scenarios:
- Upcoming L2 prune (handleEpochPrune removes unproven checkpoints)
- Lost proof (proven checkpoint rolls back to zero)
- Re-proof after prune (pruned blocks get re-proposed and proven)
- New checkpoint behind L1 syncpoint (L1 reorg adds checkpoint in already-scanned range)

Also adds canPruneAtTime mock support to FakeL1State and removes two xit placeholders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes [A-543](https://linear.app/aztec-labs/issue/A-543/ensure-no-timezone-issues-when-cleaning-up-old-duties)

Also add note to docs for users to know that they can't use the same DB for nodes running on different rollup versions

Follow-up from comments on #20060
Fixes
[A-543](https://linear.app/aztec-labs/issue/A-543/ensure-no-timezone-issues-when-cleaning-up-old-duties)

Also add note to docs for users to know that they can't use the same DB
for nodes running on different rollup versions

Follow-up from comments on #20060
## Summary
- Add four new tests covering previously untested archiver reorg/prune
scenarios:
- **Upcoming L2 prune**: `handleEpochPrune` removes unproven checkpoints
when `canPruneAtTime` returns true
- **Lost proof**: proven checkpoint rolls back to zero via
`updateProvenCheckpoint` edge case
- **Re-proof after prune**: pruned blocks get re-proposed on L1 and
proven, archiver re-syncs
- **New checkpoint behind L1 syncpoint**:
`checkForNewCheckpointsBeforeL1SyncPoint` detects and recovers from an
L1 reorg that added a checkpoint in an already-scanned range
- Add `canPruneAtTime` mock support to `FakeL1State` (`setCanPrune`
method)
- Remove two `xit` placeholder tests

## Test plan
- All 32 archiver sync tests pass: `yarn workspace @aztec/archiver test
src/archiver-sync.test.ts`

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@socket-security
Copy link

socket-security bot commented Feb 13, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​boto3@​1.42.4899100100100100
Addedpypi/​google-cloud-bigquery@​3.40.199100100100100

View full report

PhilWindle and others added 9 commits February 13, 2026 11:53
#20319)

The tx delayer is a component we use in some e2e tests to simulate
delays on L1 txs.

This PR updates how we inject the tx delayer. Before, we would access
the internals of the sequencer or prover publisher, and manually
"attach" a tx delayer to its `L1TxUtils`. This was brittle (or "hideous"
as described in some of the comments), and also didn't work consistently
with the model of spawning one publisher per publisher address.

Now, it's the responsibility of `L1TxUtils` to create and attach the
delayer itself on construction, based on config. This pollutes a bit the
non-test flow, but makes it much more robust.

This PR also simplifies the inheritance chain of L1TxUtils, by removing
L1TxUtilsWithBlobs in favor of just having an optional `kzg` instance
when blobs are needed. It also simplifies many of the factory methods
that were duplicated all over the place.

### Claude's summary

- Makes the tx delayer **config-driven** instead of requiring tests to
manually wrap L1TxUtils instances with DelayedTxUtils
- Adds `enableDelayer` and `txDelayerMaxInclusionTimeIntoSlot` config
fields to `L1TxUtilsConfig`
- Moves `tx_delayer.ts` from `test/` to `l1_tx_utils/` and deletes
`DelayedTxUtils` class
- Factories automatically apply the delayer when `enableDelayer` is set,
sharing a single `DelayerImpl` across all L1TxUtils instances per
component
- Stores the sequencer delayer in `SequencerClient` (accessible via
`getDelayer()`) instead of exposing an array on `AztecNodeService`
- Removes `Sequencer.publisher` field (no longer needed) and adds
`interruptAll()` to `SequencerPublisherFactory`
- Eliminates fragile deep-cast chains like `(((proverNode as
TestProverNode).publisher as ProverNodePublisher).l1TxUtils as
DelayedTxUtils).delayer!`

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Blob upload tests for the validator client
- TXE servers were allocated on ports 45730+ which falls in the Linux
ephemeral range (32768-60999), risking collisions with OS-assigned
outgoing connection ports. Moves to 14730+ which is safely below the
range.
- Adds `ci3/check_port` utility that checks if a port is free and prints
the process tree of the holder (via `pstree`) if taken
@AztecBot AztecBot requested a review from nventuro as a code owner February 13, 2026 18:49
spalladino and others added 8 commits February 13, 2026 16:38
The `simulatePublicCalls` method in the aztec node would simulate using
a fake block following immediately after the latest one from the
archiver. However, it also used a sync of world state that didn't
necessarily have all the latest changes from that block.

This means that a client who was monitoring the node for a block to be
mined could send a simulation request that relied on data from that very
latest block that was still not in world-state, and fail.

This was causing issues eg in [cross-chain-bot e2e
tests](http://ci.aztec-labs.com/4f7378c362712da7) where the simulation
for consuming the message in public land would fail since the message
was flagged as ready (since that's an archiver check) but not present in
world state, erroring with:

```
17:22:06     Simulation error: Assertion failed: Tried to consume nonexistent L1-to-L2 message 'self.l1_to_l2_msg_exists(message_hash, leaf_index)'
17:22:06     Context:
17:22:06     TxExecutionRequest(0x032f68986bd02951337b130a702e9041a055bdc49a8cb809e3244d252596d2f4 called 0x9d57a239)
17:22:06     simulatePublic=true
17:22:06     skipTxValidation=true
17:22:06     scopes=0x032f68986bd02951337b130a702e9041a055bdc49a8cb809e3244d252596d2f4
17:22:06
17:22:06       228 |
17:22:06       229 |         assert(!self.nullifier_exists_unsafe(nullifier, self.this_address()), "L1-to-L2 message is already nullified");
17:22:06     > 230 |         assert(self.l1_to_l2_msg_exists(message_hash, leaf_index), "Tried to consume nonexistent L1-to-L2 message");
17:22:06           |                ^
17:22:06       231 |
17:22:06       232 |         self.push_nullifier(nullifier);
17:22:06       233 |     }
```
)

## Summary

- Move the `txTimeoutAt` check before `nonceManager.consume()` in
`L1TxUtils.sendTransaction` to prevent nonce leaks
- Add regression test verifying that a timed-out send does not consume a
nonce

When a transaction timed out before sending (e.g. due to
`advanceInboxInProgress` warping L1 time), the nonce was consumed but
the tx was never submitted to L1. This created a permanent gap: all
subsequent transactions used higher nonces (108, 109, ...) but the chain
expected the leaked nonce (107) first. The sequencer got stuck in an
infinite prune-rebuild loop.

Flaky failure: http://ci.aztec-labs.com/6b46aa90848758e1 (`e2e_bot >
creates bot after inbox drift`)

## Test plan

- New unit test: `does not consume nonce when transaction times out
before sending`
- Full `l1_tx_utils.test.ts` suite passes (46/46)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Given it's a constant value throughout the checkpoint.
… block

We were checking if the last block pointed to by a checkpoint proposal
existed, but not if it was the last one. This would have eventually
failed due to mismatched reexecution, since reexecution picked up all
blocks in the slot, but this check is cheaper to do.
The block proposal handler requires access to a block header, along with
its checkpoint number. However, the checkpoint number is NOT part of the
block header, and it's a bit painful to add (since the block header goes
into circuits). Instead, the checkpoint number for a given block is only
returned as part of the full L2Block, including txs.

This PR adds an intermediate struct `BlockData` (similar to
`CheckpointData` from #20467) that contains the block header plus
checkpoint number, archive root, index within checkpoint, etc.
@ludamad ludamad enabled auto-merge February 13, 2026 22:06
spalladino and others added 4 commits February 13, 2026 19:24
Updates the configuration on some e2e epochs and p2p tests in order to run with multiple blocks per slot.

Also updates the fixed ports to be overridden via env vars so we can run multiple e2e tests in parallel without having them clash with each other.
Flags as flake just in case, but seems to be working properly.
…20505)

## Summary
- Renames sequencer metrics that were incorrectly named at the block
level to checkpoint level (proposal success, precheck failed, rewards)
- Adds new checkpoint-level metrics: build duration, block count, tx
count, total mana
- Removes old block-level metric constants
(`SEQUENCER_BLOCK_PROPOSAL_SUCCESS_COUNT`,
`SEQUENCER_BLOCK_PROPOSAL_PRECHECK_FAILED_COUNT`,
`SEQUENCER_CURRENT_BLOCK_REWARDS`)
- Fixes attestation metric descriptions to say "checkpoint" instead of
"block"
- Updates grafana alert for new naming.

## Test plan
- [x] `yarn build` passes
- [x] `yarn format` and `yarn lint` clean
- [x] `yarn workspace @aztec/sequencer-client test
src/sequencer/checkpoint_proposal_job.test.ts` passes (26/26)
- [x] `yarn workspace @aztec/sequencer-client test
src/sequencer/sequencer.test.ts` passes (22/22)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
We were hitting a bug in the archiver where getL2Tips failed when called
during a reorg, since blocks were removed inbetween the getBlockNumber
and getBlock calls.

An easy fix is adding a retry. But why make it easy?

This PR adds a cache for L2 tips for the archiver that gets updated only
during write operations and within store transactions. This should
ensure we don't get rugged while computing tips, and also reduce the
load on the block store, since getL2Tips gets called constantly by all
subsystems on the archiver (from their respective blockstreams).

Builds on top of #20503 so we can use getBlockData instead of getBlock
for accessing checkpoint numbers.
…0503)

The block proposal handler requires access to a block header, along with
its checkpoint number. However, the checkpoint number is NOT part of the
block header, and it's a bit painful to add (since the block header goes
into circuits). Instead, the checkpoint number for a given block is only
returned as part of the full L2Block, including txs.

This PR adds an intermediate struct `BlockData` (similar to
`CheckpointData` from #20467) that contains the block header plus
checkpoint number, archive root, index within checkpoint, etc.
- fix(validator): Reject checkpoint proposals that reference a block
which is not the latest in the slot, preventing
mismatched reexecution that would have failed later at a higher cost
- refactor(mbps): Add timestamp to checkpoint global variables, since
it's constant throughout the checkpoint
- chore(mbps): Remove outdated and no-longer-needed TODO comments across
the MBPS codebase
We were hitting a bug in the archiver where getL2Tips failed when called
during a reorg, since blocks were removed inbetween the getBlockNumber
and getBlock calls. See
[here](http://ci.aztec-labs.com/dc9959c1d7c4af18) for a failed CI run.

An easy fix is adding a retry. But why make it easy?

This PR adds a cache for L2 tips for the archiver that gets updated only
during write operations and within store transactions. This should
ensure we don't get rugged while computing tips, and also reduce the
load on the block store, since getL2Tips gets called constantly by all
subsystems on the archiver (from their respective blockstreams).

Builds on top of #20503 so we can use getBlockData instead of getBlock
for accessing checkpoint numbers.
- Updates the configuration on some e2e epochs and p2p tests in order to
run with multiple blocks per slot.
- Updates the fixed ports to be overridden via env vars so we can run
multiple e2e tests in parallel without having them clash with each
other.
- Updates the l1-reorg tests so they don't assume they run on the first
epoch (since setup can take variable time)

Fixes A-239
@AztecBot
Copy link
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/0c6e2326f7a1a249�0c6e2326f7a1a2498;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_l1_reorgs.parallel.test.ts "updates L1 to L2 messages changed due to an L1 reorg" (69s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants