feat: add proposer proof aggregation by MegaRedHand · Pull Request #284 · lambdaclass/ethlambda

MegaRedHand · 2026-04-14T20:54:40Z

This PR adds true attestation aggregation on the proposer block building pipeline. As part of this, the attestation byte cap we added was removed since we now have the 16 attestation+proof cap.

github-actions · 2026-04-14T20:56:54Z

🤖 Kimi Code Review

Review Summary: This PR implements recursive XMSS proof aggregation (leanSpec #510) and replaces the byte-limit-based block packing with a count-based limit. The cryptographic logic looks correct, but there are critical issues with determinism and error handling that must be addressed.

Critical Issues

1. Non-deterministic attestation ordering in build_block

File: crates/blockchain/src/store.rs, line 1174

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-04-14T20:58:02Z

🤖 Claude Code Review

Now I have enough context to write a thorough review. Here it is:

PR Review: feat: add proposer proof aggregation

Overview

This PR replaces the byte-budget cap on block attestation size with true recursive XMSS proof aggregation (leanSpec #510). When multiple proofs cover the same AttestationData, they are now merged into a single aggregated proof via aggregate_proofs rather than being unioned with an empty-proof placeholder. The MAX_ATTESTATION_PROOF_BYTES cap is removed; the 16-entry MAX_ATTESTATIONS_DATA limit now acts as the sole size guard.

Correctness

Message consistency in recursive aggregation — OK

The data_root used in aggregate_proofs (data.hash_tree_root()) at store.rs:1046 matches the message used in aggregate_signatures at store.rs:170. The cryptographic contract is preserved.

aggregate_proofs minimum-children guard — OK

The InsufficientChildren guard in crypto/src/lib.rs:129-131 is consistent with compact_attestations calling it only for groups with indices.len() >= 2. The code is logically safe.

Empty proof handling — behavioral change with no safety net

The old compact_attestations produced AggregatedSignatureProof::empty(merged_bits) on merge. The new code calls AggregatedXMSS::deserialize on each child's proof_data, which returns None for an empty byte slice and maps to ChildDeserializationFailed. In production this should not occur (all proofs in aggregated_payloads originate from aggregate_signatures with non-empty inputs), but the failure mode is now a missed block slot rather than a silent no-op. No runtime guard or warning distinguishes this case.

Test Coverage — Significant Gap

The only test covering the duplicate-merging path (compact_attestations_merges_empty_proofs) was removed. The two remaining compact_attestations tests exercise only the fast paths (no merging required):

compact_attestations_no_duplicates — all distinct data, fast-return at store.rs:1013
compact_attestations_preserves_order_no_duplicates — also no duplicates

The actual recursive aggregation path — where aggregate_proofs is invoked — is not exercised at all. Given that a failure here silently causes the proposer to skip a slot (lib.rs:221-225), this gap is significant. A test using real XMSS proof data (or an integration-level devnet scenario) is needed to validate the new code path.

The build_block_caps_attestation_data_entries test correctly validates the MAX_ATTESTATIONS_DATA cap, but uses AggregatedSignatureProof::empty() throughout, so compact_attestations never reaches the merge branch.

`test_reorg_with_slot_gaps` Removal from Skip List

test_reorg_with_slot_gaps was removed from SKIP_TESTS without explanation in either the PR description or the diff. The original comment stated it fails because the runner does not simulate gossip. If it now passes for a different reason (e.g., a prior fix), that should be documented with a comment. Silently un-skipping a test that was skipped for a known structural reason is risky — if it now passes incidentally, a future regression may be masked.

Performance

aggregate_proofs (leanVM proof recursive aggregation) is now on the critical path of block building, called synchronously inside build_block. This was previously only triggered during interval-2 aggregation. If multiple proofs share the same AttestationData in a stall scenario, all merge operations block the tick handler before the block is published.

The cost is bounded by MAX_ATTESTATIONS_DATA = 16 groups and a small number of proofs per group in practice, so this is unlikely to be problematic at current validator counts. It is worth benchmarking under stall conditions with a full validator set to confirm.

Minor Issues

Non-deterministic sort order (store.rs:1176): The secondary sort key (data_root) was removed, leaving only target.slot. Entries with the same target slot now have non-deterministic ordering across runs. This does not affect correctness since different nodes build with different payload pools, but it reduces test reproducibility. If the prior secondary key was only added to stabilize tests, a comment to that effect would help.
Unnecessary clone in aggregate_proofs (crypto/src/lib.rs:151): agg.clone() is performed when building children_refs because deserialized must remain alive for the pubkey borrows. Consuming deserialized with a into_iter() refactor would avoid the clone, though the impact is marginal given leanVM's own cost dominates.
debug_assert_eq! not enforced in release (store.rs:991): The attestations.len() == proofs.len() invariant is only checked in debug builds. This is acceptable for an internal function, but worth noting if the function ever becomes more broadly used.
children_refs intermediate Vec (crypto/src/lib.rs:149-152): The separate children_refs allocation exists only to satisfy the borrow checker. A comment explaining why deserialized must outlive children_refs would aid future readers.

CLAUDE.md Update

The removal of the skip-signature-verification caveat ("--is-aggregator is not needed with skip-sig-verification") is a behavior claim. If that bypass path still exists in the code, the documentation is now incomplete. If it was removed in an earlier PR, the note was already stale — either way the change is acceptable, but the reasoning should be confirmed.

Summary

The core approach is sound: recursive proof aggregation is the correct way to compact proofs per leanSpec, and the message/slot plumbing is consistent. The main concern is the missing test for the merge path — the new aggregate_proofs call in compact_attestations is entirely untested, and failures there cause silent slot misses. Adding at least one test with real (non-empty) proof data for the merging branch would substantially reduce risk before merging.

Automated review by Claude (Anthropic) · sonnet · custom prompt

github-actions · 2026-04-14T20:58:12Z

🤖 Codex Code Review

High: build_block no longer enforces any total proof-byte budget, but each AggregatedSignatureProof still allows up to 1 MiB of proof_data. With MAX_ATTESTATIONS_DATA = 16, a proposer can assemble a block with roughly 16 MiB of proof payload alone, which exceeds the 10 MiB network limit. The gossipsub publish path does not reject oversized blocks before broadcasting, so this client can build and emit blocks peers will drop. Reintroduce a total-byte cap or preflight the final SSZ size before publishing. store.rs#L1185 store.rs#L1236 block.rs#L73 handler.rs#L169 encoding.rs#L54
Medium: the no-verification path now regresses on duplicate attestations. on_gossip_attestation_without_verification still stores AggregatedSignatureProof::empty(...), but compact_attestations now always calls recursive proof aggregation for duplicates, and aggregate_proofs rejects empty/placeholder proofs during deserialization. Any duplicated AttestationData in test / no-verification mode will now fail at block production instead of compacting. Either preserve the old union-only behavior for empty proofs, or skip recursive aggregation when every child proof is empty. store.rs#L502 store.rs#L1045 lib.rs#L124
Low: sorting only by target.slot makes equal-slot processing depend on HashMap iteration order. Because block construction is capped and iterates until justification changes, that can change which attestations land in the block and therefore the block root across runs. Keeping data_root as a stable secondary key avoids that nondeterminism. store.rs#L1174

I did not run the Rust tests here because cargo tried to write under /home/runner/.rustup, which is read-only in this sandbox.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

greptile-apps · 2026-04-14T21:01:58Z

Greptile Summary

This PR introduces true XMSS proof aggregation on the proposer side: a greedy proof-selection pass (extend_proofs_greedily) collects maximum validator coverage from aggregated_payloads, and a subsequent compact_attestations step recursively merges per-AttestationData proofs into one via the new aggregate_proofs helper. The log_inv_rate parameter in aggregate_signatures is also corrected from 1 to 2 to match the cross-client devnet4 convention.

P1 – overlapping participants in recursive aggregation: extend_proofs_greedily may select multiple proofs whose original participant sets overlap (e.g. local-aggregated {1,2,3} + gossip-received {2,3,4}). compact_attestations passes these as children to aggregate_proofs, which calls xmss_aggregate with validator 3's pubkey present in both child key lists. Whether xmss_aggregate supports overlapping child sets is unverified and likely unsupported, risking invalid proofs that will fail verification on peer nodes.

Confidence Score: 3/5

Not safe to merge until the overlapping-participant aggregation issue is resolved; invalid proofs would cause cross-client verification failures in multi-aggregator scenarios.

A P1 correctness issue exists in extend_proofs_greedily: when multiple proofs for the same AttestationData with overlapping participants are available, the recursive aggregate_proofs call receives child key lists where some validators appear in more than one child. This is likely unsupported by xmss_aggregate, producing aggregated proofs that pass locally but fail verification on peer nodes. The fix is straightforward (restrict selection to disjoint proofs or cap selection to one proof per data), but the bug is present in the current code.

crates/blockchain/src/store.rs — specifically extend_proofs_greedily (lines 1085–1138) and its interaction with compact_attestations (lines 986–1079).

Important Files Changed

Filename	Overview
crates/blockchain/src/store.rs	Adds greedy proof selection (`extend_proofs_greedily`), proof compaction (`compact_attestations`), and recursive proof aggregation into `build_block`; the greedy selector can pick proofs with overlapping participants, which may produce invalid aggregated proofs when passed to `aggregate_proofs`.
crates/common/crypto/src/lib.rs	Adds `aggregate_proofs` for recursive proof merging and fixes `log_inv_rate` from 1 to 2 for cross-client compatibility; minor clone overhead in `aggregate_proofs` but otherwise correct.
crates/blockchain/tests/forkchoice_spectests.rs	Updates `build_signed_block` to emit one `AggregatedSignatureProof::empty` per attestation so the signatures list length matches attestations; adds `gossipAggregatedAttestation` step handler routing through the no-verification path with a clear TODO.
CLAUDE.md	Documentation-only update reflecting the new `log_inv_rate=2` convention and aggregation pipeline details; no code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[aggregated_payloads\ndata_root → AttestationData, Vec&lt;Proof&gt;] --> B[build_block loop\nfilter by source == current_justified]
    B --> C[extend_proofs_greedily\ngreedy coverage selection]
    C -->|may select overlapping proofs| D[aggregated_attestations\nmulti-entry per AttestationData]
    C -->|same proofs| E[aggregated_signatures\nVec&lt;AggregatedSignatureProof&gt;]
    D --> F[compact_attestations\ngroup by AttestationData]
    E --> F
    F -->|single entry| G[keep as-is]
    F -->|multiple entries ≥2| H[union_aggregation_bits\nmerge participants]
    H --> I[aggregate_proofs\nxmss_aggregate children\n⚠️ may have overlapping pubkeys]
    I --> J[merged AggregatedSignatureProof]
    G --> K[final block\ncompacted attestations + proofs]
    J --> K
    K --> L[process_block\nstate transition\ncompute state_root]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1085-1138

Comment:
**Greedy proof selection may pass overlapping participant sets to `aggregate_proofs`**

`extend_proofs_greedily` selects a proof as long as it covers at least one *new* validator (`best_count > 0`), but pushes the *original* proof — which may still include participants already covered by a previously selected proof. When two proofs for the same `AttestationData` overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to `aggregate_proofs` in `compact_attestations`. The `xmss_aggregate` call in `aggregate_proofs` receives child-pubkey lists `[pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}]` with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by `xmss_aggregate`, which could produce an invalid proof that fails verification on peer nodes.

A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from `covered`:

```rust
// Only select proof if ALL its participants are new (disjoint from covered)
let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid));
if !is_disjoint || best_count == 0 {
    break;
}
```

Alternatively, restrict the greedy loop to select at most one proof and rely on `compact_attestations` to merge only from distinct, non-overlapping sources.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/common/crypto/src/lib.rs
Line: 148-155

Comment:
**Avoidable `AggregatedXMSS::clone()` per child proof**

The `children_refs` step borrows `deserialized` by shared reference and then clones each `AggregatedXMSS`. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming `deserialized` and splitting it:

```rust
let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) =
    deserialized.into_iter().unzip();
let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> =
    pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect();
```

This avoids the per-child clone while satisfying the borrow requirements of `xmss_aggregate`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1210-1234

Comment:
**Intermediate justification checks clone and run state transition on uncompacted attestations**

Inside the loop, `aggregated_attestations.clone().try_into()` is called and then `process_slots` + `process_block` are run on uncompacted data that may contain multiple entries per `AttestationData`. `process_attestations` in the state transition uses a per-validator boolean map keyed on `target.root`, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (`head_state.clone()` every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "test: skip known failing tests" | Re-trigger Greptile}

greptile-apps · 2026-04-14T21:02:01Z

crates/blockchain/src/store.rs

@@ -1098,10 +1116,6 @@ fn extend_proofs_greedily(
        }

        let proof = &proofs[best_idx];
-        let proof_bytes = proof.proof_data.len();
-        if bytes_consumed + proof_bytes > remaining_bytes {
-            break;
-        }

        // Collect coverage only for the winning proof
        let new_covered: Vec<u64> = proof
@@ -1120,10 +1134,7 @@ fn extend_proofs_greedily(

        covered.extend(new_covered);
        remaining_indices.remove(&best_idx);
-        bytes_consumed += proof_bytes;
    }
-
-    bytes_consumed
 }


Greedy proof selection may pass overlapping participant sets to aggregate_proofs

extend_proofs_greedily selects a proof as long as it covers at least one new validator (best_count > 0), but pushes the original proof — which may still include participants already covered by a previously selected proof. When two proofs for the same AttestationData overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to aggregate_proofs in compact_attestations. The xmss_aggregate call in aggregate_proofs receives child-pubkey lists [pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}] with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by xmss_aggregate, which could produce an invalid proof that fails verification on peer nodes.

A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from covered:

// Only select proof if ALL its participants are new (disjoint from covered) let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid)); if !is_disjoint || best_count == 0 { break; }

Alternatively, restrict the greedy loop to select at most one proof and rely on compact_attestations to merge only from distinct, non-overlapping sources.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/src/store.rs Line: 1085-1138 Comment: **Greedy proof selection may pass overlapping participant sets to `aggregate_proofs`** `extend_proofs_greedily` selects a proof as long as it covers at least one *new* validator (`best_count > 0`), but pushes the *original* proof — which may still include participants already covered by a previously selected proof. When two proofs for the same `AttestationData` overlap (e.g. local committee aggregation covered {1,2,3} and a received gossip proof covered {2,3,4}), both are selected and later passed as children to `aggregate_proofs` in `compact_attestations`. The `xmss_aggregate` call in `aggregate_proofs` receives child-pubkey lists `[pubkeys_for_{1,2,3}, pubkeys_for_{2,3,4}]` with validators 2 and 3 appearing in both children. Recursive aggregation of proofs with overlapping participants is likely unsupported by `xmss_aggregate`, which could produce an invalid proof that fails verification on peer nodes. A straightforward fix is to restrict selection to proofs whose participants are fully disjoint from `covered`: ```rust // Only select proof if ALL its participants are new (disjoint from covered) let is_disjoint = proof.participant_indices().all(|vid| !covered.contains(&vid)); if !is_disjoint || best_count == 0 { break; } ``` Alternatively, restrict the greedy loop to select at most one proof and rely on `compact_attestations` to merge only from distinct, non-overlapping sources. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T21:02:02Z

crates/common/crypto/src/lib.rs

+    // Build references for xmss_aggregate: &[(&[XmssPublicKey], AggregatedXMSS)]
+    let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> = deserialized
+        .iter()
+        .map(|(pks, agg)| (pks.as_slice(), agg.clone()))
+        .collect();
+
+    // No raw XMSS signatures; purely recursive merge of existing proofs.
+    let (_sorted_pubkeys, aggregate) = xmss_aggregate(&children_refs, vec![], &message.0, slot, 2);


Avoidable AggregatedXMSS::clone() per child proof

The children_refs step borrows deserialized by shared reference and then clones each AggregatedXMSS. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming deserialized and splitting it:

let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) = deserialized.into_iter().unzip(); let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> = pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect();

This avoids the per-child clone while satisfying the borrow requirements of xmss_aggregate.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/common/crypto/src/lib.rs Line: 148-155 Comment: **Avoidable `AggregatedXMSS::clone()` per child proof** The `children_refs` step borrows `deserialized` by shared reference and then clones each `AggregatedXMSS`. Since proof data can be large (potentially hundreds of KB), this allocates one full proof copy per child. The clone can be eliminated by consuming `deserialized` and splitting it: ```rust let (pks_list, aggs): (Vec<Vec<LeanSigPubKey>>, Vec<AggregatedXMSS>) = deserialized.into_iter().unzip(); let children_refs: Vec<(&[LeanSigPubKey], AggregatedXMSS)> = pks_list.iter().map(|v| v.as_slice()).zip(aggs).collect(); ``` This avoids the per-child clone while satisfying the borrow requirements of `xmss_aggregate`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T21:02:03Z

crates/blockchain/src/store.rs

@@ -1229,9 +1233,10 @@ fn build_block(
        }
    }


Intermediate justification checks clone and run state transition on uncompacted attestations

Inside the loop, aggregated_attestations.clone().try_into() is called and then process_slots + process_block are run on uncompacted data that may contain multiple entries per AttestationData. process_attestations in the state transition uses a per-validator boolean map keyed on target.root, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (head_state.clone() every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/src/store.rs Line: 1210-1234 Comment: **Intermediate justification checks clone and run state transition on uncompacted attestations** Inside the loop, `aggregated_attestations.clone().try_into()` is called and then `process_slots` + `process_block` are run on uncompacted data that may contain multiple entries per `AttestationData`. `process_attestations` in the state transition uses a per-validator boolean map keyed on `target.root`, so disjoint-participant duplicates are handled correctly, but the repeated full state-transition clones (`head_state.clone()` every iteration) are expensive. Consider accumulating just the newly added data roots and running a single speculative state transition after all data have been collected, rather than once per justification advance. How can I resolve this? If you propose a fix, please make it concise.

MegaRedHand added 2 commits April 14, 2026 17:36

feat: add proposer proof aggregation

dd362a3

docs: update CLAUDE.md

52b71e3

MegaRedHand added 2 commits April 14, 2026 17:59

Merge branch 'devnet4' into block-producer-aggregation

241470f

test: skip known failing tests

2c0fec0

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

MegaRedHand added 4 commits April 14, 2026 19:27

feat: use payloads too on subnet aggregation

b7bf4b2

refactor: simplify

dc57279

chore: fmt

238590c

Merge branch 'devnet4' into block-producer-aggregation

e030b5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add proposer proof aggregation#284

feat: add proposer proof aggregation#284
MegaRedHand wants to merge 8 commits intodevnet4from
block-producer-aggregation

MegaRedHand commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

greptile-apps bot commented Apr 14, 2026

Important Files Changed

Uh oh!

greptile-apps bot Apr 14, 2026

Uh oh!

greptile-apps bot Apr 14, 2026

Uh oh!

greptile-apps bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MegaRedHand commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

🤖 Kimi Code Review

Critical Issues

Uh oh!

github-actions bot commented Apr 14, 2026

🤖 Claude Code Review

PR Review: feat: add proposer proof aggregation

Overview

Correctness

Test Coverage — Significant Gap

test_reorg_with_slot_gaps Removal from Skip List

Performance

Minor Issues

CLAUDE.md Update

Summary

Uh oh!

github-actions bot commented Apr 14, 2026

🤖 Codex Code Review

Uh oh!

greptile-apps bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`test_reorg_with_slot_gaps` Removal from Skip List