Skip to content

feat: add proposer proof aggregation#284

Open
MegaRedHand wants to merge 8 commits intodevnet4from
block-producer-aggregation
Open

feat: add proposer proof aggregation#284
MegaRedHand wants to merge 8 commits intodevnet4from
block-producer-aggregation

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

This PR adds true attestation aggregation on the proposer block building pipeline. As part of this, the attestation byte cap we added was removed since we now have the 16 attestation+proof cap.

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Review Summary: This PR implements recursive XMSS proof aggregation (leanSpec #510) and replaces the byte-limit-based block packing with a count-based limit. The cryptographic logic looks correct, but there are critical issues with determinism and error handling that must be addressed.

Critical Issues

1. Non-deterministic attestation ordering in build_block

  • File: crates/blockchain/src/store.rs, line 1174

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context to write a thorough review. Here it is:


PR Review: feat: add proposer proof aggregation

Overview

This PR replaces the byte-budget cap on block attestation size with true recursive XMSS proof aggregation (leanSpec #510). When multiple proofs cover the same AttestationData, they are now merged into a single aggregated proof via aggregate_proofs rather than being unioned with an empty-proof placeholder. The MAX_ATTESTATION_PROOF_BYTES cap is removed; the 16-entry MAX_ATTESTATIONS_DATA limit now acts as the sole size guard.


Correctness

Message consistency in recursive aggregation — OK

The data_root used in aggregate_proofs (data.hash_tree_root()) at store.rs:1046 matches the message used in aggregate_signatures at store.rs:170. The cryptographic contract is preserved.

aggregate_proofs minimum-children guard — OK

The InsufficientChildren guard in crypto/src/lib.rs:129-131 is consistent with compact_attestations calling it only for groups with indices.len() >= 2. The code is logically safe.

Empty proof handling — behavioral change with no safety net

The old compact_attestations produced AggregatedSignatureProof::empty(merged_bits) on merge. The new code calls AggregatedXMSS::deserialize on each child's proof_data, which returns None for an empty byte slice and maps to ChildDeserializationFailed. In production this should not occur (all proofs in aggregated_payloads originate from aggregate_signatures with non-empty inputs), but the failure mode is now a missed block slot rather than a silent no-op. No runtime guard or warning distinguishes this case.


Test Coverage — Significant Gap

The only test covering the duplicate-merging path (compact_attestations_merges_empty_proofs) was removed. The two remaining compact_attestations tests exercise only the fast paths (no merging required):

  • compact_attestations_no_duplicates — all distinct data, fast-return at store.rs:1013
  • compact_attestations_preserves_order_no_duplicates — also no duplicates

The actual recursive aggregation path — where aggregate_proofs is invoked — is not exercised at all. Given that a failure here silently causes the proposer to skip a slot (lib.rs:221-225), this gap is significant. A test using real XMSS proof data (or an integration-level devnet scenario) is needed to validate the new code path.

The build_block_caps_attestation_data_entries test correctly validates the MAX_ATTESTATIONS_DATA cap, but uses AggregatedSignatureProof::empty() throughout, so compact_attestations never reaches the merge branch.


test_reorg_with_slot_gaps Removal from Skip List

test_reorg_with_slot_gaps was removed from SKIP_TESTS without explanation in either the PR description or the diff. The original comment stated it fails because the runner does not simulate gossip. If it now passes for a different reason (e.g., a prior fix), that should be documented with a comment. Silently un-skipping a test that was skipped for a known structural reason is risky — if it now passes incidentally, a future regression may be masked.


Performance

aggregate_proofs (leanVM proof recursive aggregation) is now on the critical path of block building, called synchronously inside build_block. This was previously only triggered during interval-2 aggregation. If multiple proofs share the same AttestationData in a stall scenario, all merge operations block the tick handler before the block is published.

The cost is bounded by MAX_ATTESTATIONS_DATA = 16 groups and a small number of proofs per group in practice, so this is unlikely to be problematic at current validator counts. It is worth benchmarking under stall conditions with a full validator set to confirm.


Minor Issues

  1. Non-deterministic sort order (store.rs:1176): The secondary sort key (data_root) was removed, leaving only target.slot. Entries with the same target slot now have non-deterministic ordering across runs. This does not affect correctness since different nodes build with different payload pools, but it reduces test reproducibility. If the prior secondary key was only added to stabilize tests, a comment to that effect would help.

  2. Unnecessary clone in aggregate_proofs (crypto/src/lib.rs:151): agg.clone() is performed when building children_refs because deserialized must remain alive for the pubkey borrows. Consuming deserialized with a into_iter() refactor would avoid the clone, though the impact is marginal given leanVM's own cost dominates.

  3. debug_assert_eq! not enforced in release (store.rs:991): The attestations.len() == proofs.len() invariant is only checked in debug builds. This is acceptable for an internal function, but worth noting if the function ever becomes more broadly used.

  4. children_refs intermediate Vec (crypto/src/lib.rs:149-152): The separate children_refs allocation exists only to satisfy the borrow checker. A comment explaining why deserialized must outlive children_refs would aid future readers.


CLAUDE.md Update

The removal of the skip-signature-verification caveat ("--is-aggregator is not needed with skip-sig-verification") is a behavior claim. If that bypass path still exists in the code, the documentation is now incomplete. If it was removed in an earlier PR, the note was already stale — either way the change is acceptable, but the reasoning should be confirmed.


Summary

The core approach is sound: recursive proof aggregation is the correct way to compact proofs per leanSpec, and the message/slot plumbing is consistent. The main concern is the missing test for the merge path — the new aggregate_proofs call in compact_attestations is entirely untested, and failures there cause silent slot misses. Adding at least one test with real (non-empty) proof data for the merging branch would substantially reduce risk before merging.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: build_block no longer enforces any total proof-byte budget, but each AggregatedSignatureProof still allows up to 1 MiB of proof_data. With MAX_ATTESTATIONS_DATA = 16, a proposer can assemble a block with roughly 16 MiB of proof payload alone, which exceeds the 10 MiB network limit. The gossipsub publish path does not reject oversized blocks before broadcasting, so this client can build and emit blocks peers will drop. Reintroduce a total-byte cap or preflight the final SSZ size before publishing. store.rs#L1185 store.rs#L1236 block.rs#L73 handler.rs#L169 encoding.rs#L54

  2. Medium: the no-verification path now regresses on duplicate attestations. on_gossip_attestation_without_verification still stores AggregatedSignatureProof::empty(...), but compact_attestations now always calls recursive proof aggregation for duplicates, and aggregate_proofs rejects empty/placeholder proofs during deserialization. Any duplicated AttestationData in test / no-verification mode will now fail at block production instead of compacting. Either preserve the old union-only behavior for empty proofs, or skip recursive aggregation when every child proof is empty. store.rs#L502 store.rs#L1045 lib.rs#L124

  3. Low: sorting only by target.slot makes equal-slot processing depend on HashMap iteration order. Because block construction is capped and iterates until justification changes, that can change which attestations land in the block and therefore the block root across runs. Keeping data_root as a stable secondary key avoids that nondeterminism. store.rs#L1174

I did not run the Rust tests here because cargo tried to write under /home/runner/.rustup, which is read-only in this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR introduces true XMSS proof aggregation on the proposer side: a greedy proof-selection pass (extend_proofs_greedily) collects maximum validator coverage from aggregated_payloads, and a subsequent compact_attestations step recursively merges per-AttestationData proofs into one via the new aggregate_proofs helper. The log_inv_rate parameter in aggregate_signatures is also corrected from 1 to 2 to match the cross-client devnet4 convention.

  • P1 – overlapping participants in recursive aggregation: extend_proofs_greedily may select multiple proofs whose original participant sets overlap (e.g. local-aggregated {1,2,3} + gossip-received {2,3,4}). compact_attestations passes these as children to aggregate_proofs, which calls xmss_aggregate with validator 3's pubkey present in both child key lists. Whether xmss_aggregate supports overlapping child sets is unverified and likely unsupported, risking invalid proofs that will fail verification on peer nodes.

Confidence Score: 3/5

Not safe to merge until the overlapping-participant aggregation issue is resolved; invalid proofs would cause cross-client verification failures in multi-aggregator scenarios.

A P1 correctness issue exists in extend_proofs_greedily: when multiple proofs for the same AttestationData with overlapping participants are available, the recursive aggregate_proofs call receives child key lists where some validators appear in more than one child. This is likely unsupported by xmss_aggregate, producing aggregated proofs that pass locally but fail verification on peer nodes. The fix is straightforward (restrict selection to disjoint proofs or cap selection to one proof per data), but the bug is present in the current code.

crates/blockchain/src/store.rs — specifically extend_proofs_greedily (lines 1085–1138) and its interaction with compact_attestations (lines 986–1079).

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Adds greedy proof selection (extend_proofs_greedily), proof compaction (compact_attestations), and recursive proof aggregation into build_block; the greedy selector can pick proofs with overlapping participants, which may produce invalid aggregated proofs when passed to aggregate_proofs.
crates/common/crypto/src/lib.rs Adds aggregate_proofs for recursive proof merging and fixes log_inv_rate from 1 to 2 for cross-client compatibility; minor clone overhead in aggregate_proofs but otherwise correct.
crates/blockchain/tests/forkchoice_spectests.rs Updates build_signed_block to emit one AggregatedSignatureProof::empty per attestation so the signatures list length matches attestations; adds gossipAggregatedAttestation step handler routing through the no-verification path with a clear TODO.
CLAUDE.md Documentation-only update reflecting the new log_inv_rate=2 convention and aggregation pipeline details; no code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[aggregated_payloads\ndata_root → AttestationData, Vec<Proof>] --> B[build_block loop\nfilter by source == current_justified]
    B --> C[extend_proofs_greedily\ngreedy coverage selection]
    C -->|may select overlapping proofs| D[aggregated_attestations\nmulti-entry per AttestationData]
    C -->|same proofs| E[aggregated_signatures\nVec<AggregatedSignatureProof>]
    D --> F[compact_attestations\ngroup by AttestationData]
    E --> F
    F -->|single entry| G[keep as-is]
    F -->|multiple entries ≥2| H[union_aggregation_bits\nmerge participants]
    H --> I[aggregate_proofs\nxmss_aggregate children\n⚠️ may have overlapping pubkeys]
    I --> J[merged AggregatedSignatureProof]
    G --> K[final block\ncompacted attestations + proofs]
    J --> K
    K --> L[process_block\nstate transition\ncompute state_root]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1085-1138

Comment:
**Greedy proof selection may pass overlapping participant sets to `aggregate_proofs`**

`extend_proofs_greedily` selects a proof as long as it covers at least one *new* validator (`best_count > 0`), but pushes the *original* proof — which may still include participants already covered by a previously selected proof. When two proofs for the same `AttestationData` overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to `aggregate_proofs` in `compact_attestations`. The `xmss_aggregate` call in `aggregate_proofs` receives child-pubkey lists `[pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}]` with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by `xmss_aggregate`, which could produce an invalid proof that fails verification on peer nodes.

A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from `covered`:

```rust
// Only select proof if ALL its participants are new (disjoint from covered)
let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid));
if !is_disjoint || best_count == 0 {
    break;
}
```

Alternatively, restrict the greedy loop to select at most one proof and rely on `compact_attestations` to merge only from distinct, non-overlapping sources.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/common/crypto/src/lib.rs
Line: 148-155

Comment:
**Avoidable `AggregatedXMSS::clone()` per child proof**

The `children_refs` step borrows `deserialized` by shared reference and then clones each `AggregatedXMSS`. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming `deserialized` and splitting it:

```rust
let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) =
    deserialized.into_iter().unzip();
let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> =
    pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect();
```

This avoids the per-child clone while satisfying the borrow requirements of `xmss_aggregate`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1210-1234

Comment:
**Intermediate justification checks clone and run state transition on uncompacted attestations**

Inside the loop, `aggregated_attestations.clone().try_into()` is called and then `process_slots` + `process_block` are run on uncompacted data that may contain multiple entries per `AttestationData`. `process_attestations` in the state transition uses a per-validator boolean map keyed on `target.root`, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (`head_state.clone()` every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: skip known failing tests" | Re-trigger Greptile

Comment on lines 1085 to 1138
@@ -1098,10 +1116,6 @@ fn extend_proofs_greedily(
}

let proof = &proofs[best_idx];
let proof_bytes = proof.proof_data.len();
if bytes_consumed + proof_bytes > remaining_bytes {
break;
}

// Collect coverage only for the winning proof
let new_covered: Vec<u64> = proof
@@ -1120,10 +1134,7 @@ fn extend_proofs_greedily(

covered.extend(new_covered);
remaining_indices.remove(&best_idx);
bytes_consumed += proof_bytes;
}

bytes_consumed
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Greedy proof selection may pass overlapping participant sets to aggregate_proofs

extend_proofs_greedily selects a proof as long as it covers at least one new validator (best_count > 0), but pushes the original proof — which may still include participants already covered by a previously selected proof. When two proofs for the same AttestationData overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to aggregate_proofs in compact_attestations. The xmss_aggregate call in aggregate_proofs receives child-pubkey lists [pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}] with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by xmss_aggregate, which could produce an invalid proof that fails verification on peer nodes.

A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from covered:

// Only select proof if ALL its participants are new (disjoint from covered)
let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid));
if !is_disjoint || best_count == 0 {
    break;
}

Alternatively, restrict the greedy loop to select at most one proof and rely on compact_attestations to merge only from distinct, non-overlapping sources.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1085-1138

Comment:
**Greedy proof selection may pass overlapping participant sets to `aggregate_proofs`**

`extend_proofs_greedily` selects a proof as long as it covers at least one *new* validator (`best_count > 0`), but pushes the *original* proof — which may still include participants already covered by a previously selected proof. When two proofs for the same `AttestationData` overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to `aggregate_proofs` in `compact_attestations`. The `xmss_aggregate` call in `aggregate_proofs` receives child-pubkey lists `[pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}]` with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by `xmss_aggregate`, which could produce an invalid proof that fails verification on peer nodes.

A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from `covered`:

```rust
// Only select proof if ALL its participants are new (disjoint from covered)
let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid));
if !is_disjoint || best_count == 0 {
    break;
}
```

Alternatively, restrict the greedy loop to select at most one proof and rely on `compact_attestations` to merge only from distinct, non-overlapping sources.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +148 to +155
// Build references for xmss_aggregate: &[(&[XmssPublicKey], AggregatedXMSS)]
let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> = deserialized
.iter()
.map(|(pks, agg)| (pks.as_slice(), agg.clone()))
.collect();

// No raw XMSS signatures; purely recursive merge of existing proofs.
let (_sorted_pubkeys, aggregate) = xmss_aggregate(&children_refs, vec![], &message.0, slot, 2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Avoidable AggregatedXMSS::clone() per child proof

The children_refs step borrows deserialized by shared reference and then clones each AggregatedXMSS. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming deserialized and splitting it:

let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) =
    deserialized.into_iter().unzip();
let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> =
    pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect();

This avoids the per-child clone while satisfying the borrow requirements of xmss_aggregate.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/crypto/src/lib.rs
Line: 148-155

Comment:
**Avoidable `AggregatedXMSS::clone()` per child proof**

The `children_refs` step borrows `deserialized` by shared reference and then clones each `AggregatedXMSS`. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming `deserialized` and splitting it:

```rust
let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) =
    deserialized.into_iter().unzip();
let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> =
    pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect();
```

This avoids the per-child clone while satisfying the borrow requirements of `xmss_aggregate`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1210 to 1234
@@ -1229,9 +1233,10 @@ fn build_block(
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Intermediate justification checks clone and run state transition on uncompacted attestations

Inside the loop, aggregated_attestations.clone().try_into() is called and then process_slots + process_block are run on uncompacted data that may contain multiple entries per AttestationData. process_attestations in the state transition uses a per-validator boolean map keyed on target.root, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (head_state.clone() every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1210-1234

Comment:
**Intermediate justification checks clone and run state transition on uncompacted attestations**

Inside the loop, `aggregated_attestations.clone().try_into()` is called and then `process_slots` + `process_block` are run on uncompacted data that may contain multiple entries per `AttestationData`. `process_attestations` in the state transition uses a per-validator boolean map keyed on `target.root`, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (`head_state.clone()` every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant