Skip to content

fix(archiver): skip descendants of invalid-attestations checkpoints#23502

Draft
spalladino wants to merge 2 commits into
merge-train/spartanfrom
spl/test/archiver-stuck-non-consecutive-checkpoint
Draft

fix(archiver): skip descendants of invalid-attestations checkpoints#23502
spalladino wants to merge 2 commits into
merge-train/spartanfrom
spl/test/archiver-stuck-non-consecutive-checkpoint

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented May 22, 2026

Motivation

archiver/src/modules/l1_synchronizer.ts skipped checkpoints with insufficient/invalid attestations under the assumption that the next proposer would invalidate them before publishing. When that assumption was violated — i.e., proposer P2 published a valid-attestations checkpoint that extended P1's invalid one — the archiver hit InitialCheckpointNumberNotSequentialError in block_store.addCheckpoints, the catch handler rolled back the L1 sync point, and the next poll re-fetched the same range and re-threw. The archiver looped indefinitely. The protocol already defines OffenseType.PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS for exactly this case but the slasher couldn't see valid-attestations descendants because the archiver threw before emitting any event.

Human Note

This is particularly relevant under pipelining. Attestors now attest to a checkpoint before the previous one is pushed to L1, so they can be inadvertently attesting to a checkpoint built on top of one that became invalid as it was published to the rollup the contract with wrong attestations. So an honest attestor could get slashed if the proposer was malicious.

Approach

In the synchronizer, persist rejected ancestors in the block store keyed by archive root. On each new checkpoint, before attestation validation, compare its header.lastArchiveRoot against the persisted set — if it matches, skip the checkpoint as a descendant of an invalid ancestor and emit a new L2BlockSourceEvents.CheckpointBuiltOnInvalidAncestorDetected event with enough metadata to resolve the proposer. Prune the set by L1 finality so reorg-vulnerable rows are dropped only once they can no longer be displaced. The slasher's AttestationsBlockWatcher is fixed to slash the proposer (not the attestors) under the existing offense type and gains a new handler for the new event.

Changes

  • stdlib: New L2BlockSourceEvents.CheckpointBuiltOnInvalidAncestorDetected enum value and CheckpointBuiltOnInvalidAncestorEvent payload (committee/seed/epoch/checkpoint/ancestor info).
  • archiver: New RejectedCheckpoint type and LMDB-backed store API in block_store.ts (getRejectedCheckpoints, addRejectedCheckpoint, removeRejectedCheckpointsBelowL1Block). l1_synchronizer.ts learns Branch A (descendant skip — emit event, persist, continue) ahead of attestation validation, and prunes rejected rows below the finalized L1 block at the end of each sync iteration. Added getEpochInfoForCheckpoint helper in validation.ts so the descendant path can populate the event payload without paying for signature verification twice.
  • slasher: Existing slashAttestorsOnAncestorInvalid renamed to slashProposerOnAncestorInvalid and switched from attestor list to a single proposer (resolved the same way slashProposer does), fixing the inconsistency between the offense's proposer-oriented naming and the previous attestor-slashing behavior. New handleDescendantOfInvalid listens to the new event and slashes the proposer of the valid descendant. Class JSDoc updated to describe both paths.
  • archiver (tests): New block_store.test.ts cases for the rejected-checkpoints store. archiver-sync.test.ts gains a regression for the valid-descendant skip and updates the existing invalid-attestations test now that descendants also persist.
  • slasher (tests): Existing descendant test updated to expect proposer-slash (one emit, not two attestor emits); new cases cover the new event handler, cascading descendants, deduplication, and the no-proposer fallback.
  • end-to-end (tests): The e2e regression introduced in this branch is flipped from asserting the stuck-state bug to asserting the post-fix behavior — the chain advances past the bad descendant, the new event fires, and both offenses (PROPOSED_INSUFFICIENT_ATTESTATIONS for P1's proposer and PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS for P2's proposer) are recorded.
  • sequencer-client: Test-only skipWaitForValidParentCheckpointOnL1 flag (from the original test commit) bypasses the parent-validity check inside CheckpointProposalJob.waitForValidParentCheckpointOnL1 so the e2e can engineer the fault.

Adds an e2e test that deterministically reproduces the archiver retry
loop documented at l1_synchronizer.ts:905-908: when a checkpoint with
insufficient attestations lands and the next proposer publishes a valid
descendant without first invalidating it, block_store.addCheckpoints
throws InitialCheckpointNumberNotSequentialError on every poll.

Gated by a new test-only sequencer flag skipWaitForValidParentCheckpointOnL1
that bypasses the parent-validity check inside the checkpoint proposal job.
@spalladino spalladino added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure ci-draft Run CI on draft PRs. labels May 22, 2026
When a proposer published a valid-attestations checkpoint that extended
an earlier checkpoint with insufficient attestations, the archiver would
skip the bad parent then throw InitialCheckpointNumberNotSequentialError
when storing the descendant, the catch handler would roll back the L1
sync point, and the next poll would loop on the same range indefinitely.

This change persists rejected ancestors in the block store, detects
descendants by matching header.lastArchiveRoot against the stored set,
emits a new CheckpointBuiltOnInvalidAncestorDetected event, and prunes
the set by L1 finality so reorg-vulnerable rows never linger.

The slasher's AttestationsBlockWatcher is updated to slash the proposer
(rather than the attestors) under offense
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS for both the
existing invalid-descendant path and the new valid-descendant path.

The e2e test introduced in this branch is flipped from asserting the
stuck-state bug to asserting the post-fix behavior: the chain advances
past the bad descendant, the new event fires, and the proposer is
slashed.
@spalladino spalladino changed the title test(e2e): repro archiver stuck on invalid-attestations descendant fix(archiver): skip descendants of invalid-attestations checkpoints May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant