Skip to content

Conversation

@mrzeszutko
Copy link
Contributor

Summary

  • Add comprehensive blob storage documentation for node operators
  • Rename BLOB_SINK_ARCHIVE_API_URLBLOB_ARCHIVE_API_URL (cleanup after BlobSink removal)
  • Remove dead environment variables BLOB_SINK_PORT and BLOB_SINK_URL

Description

Following the removal of the BlobSink HTTP server (#19143), this PR:

  1. Adds new documentation (blob_storage.md) explaining how Aztec nodes store and retrieve blob data, including:

    • Overview of blob sources (FileStore, L1 Consensus, Archive API)
    • PeerDAS and supernode requirements for L1 consensus
    • Configuration examples for GCS, S3, and Cloudflare R2
    • Authentication setup
    • Troubleshooting guide
  2. Cleans up legacy naming by renaming BLOB_SINK_ARCHIVE_API_URL to BLOB_ARCHIVE_API_URL - the "sink" terminology is no longer accurate since the HTTP server was removed

  3. Removes dead code - BLOB_SINK_PORT and BLOB_SINK_URL env vars that were left behind after BlobSink removal

@mrzeszutko mrzeszutko changed the title Blob storage documentation docs: blob storage documentation Dec 22, 2025
@mrzeszutko mrzeszutko marked this pull request as ready for review December 22, 2025 21:07
Copy link
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split the instructions related to retrieval and storage, since they are meant for different users. Also let's please delete the generic or redundant instructions inserted by Claude, like "search logs for troubleshooting".

The blob client can retrieve blobs from multiple sources, tried in order:

1. **File Store**: Fast retrieval from configured storage (S3, GCS, R2, local files, HTTPS)
2. **L1 Consensus**: Beacon node API for recent blobs (within ~18 days)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **L1 Consensus**: Beacon node API for recent blobs (within ~18 days)
2. **L1 Consensus**: Beacon node API to a (semi-)supernode for recent blobs (within ~18 days)

- **Semi-supernodes** (validators with ≥1,824 ETH / 57 validators): Handle at least 64 columns, enabling reconstruction of complete blob data.
- **Regular nodes**: Only download 1/8th of the data (8 of 128 columns) to verify availability. This is **not sufficient** to serve complete blob data.

If L1 consensus is your only blob source, your beacon node must be a supernode or semi-supernode (or connected to one) to retrieve complete blobs. A regular node cannot reconstruct full blob data from its partial columns alone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If L1 consensus is your only blob source, your beacon node must be a supernode or semi-supernode (or connected to one) to retrieve complete blobs. A regular node cannot reconstruct full blob data from its partial columns alone.
:::warning Supernodes
If L1 consensus is your only blob source, your beacon node must be a supernode or semi-supernode (or connected to one) to retrieve complete blobs. A regular node cannot reconstruct full blob data from its partial columns alone.
:::

Comment on lines +47 to +48
- Have the Aztec node software installed
- Understand basic node operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this


- Have the Aztec node software installed
- Understand basic node operation
- For uploading blobs: Have access to cloud storage (Google Cloud Storage, Amazon S3, or Cloudflare R2) with appropriate permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not mix up instructions for blob retrieval and blob upload. Move everything related to blob upload to a separate file or at least a separate section, since only a very small subset of users will do uploads.

Comment on lines +201 to +204
1. **Check startup logs**: Look for messages about blob source connectivity
2. **Monitor blob retrieval**: Watch for successful blob fetches during sync
3. **Verify storage**: Check your storage bucket to confirm blob files exist
4. **Test retrieval**: Restart the node and verify it can retrieve previously stored blobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either have proper instructions for verification, or just delete them


- **Use file stores for production**: File stores provide faster, more reliable blob retrieval than L1 consensus
- **Configure multiple sources**: Use multiple file store URLs and L1 consensus hosts for redundancy
- **Enable blob uploads**: Configure `BLOB_FILE_STORE_UPLOAD_URL` to contribute to blob availability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a best practice for all users. Uploading to a store does not help anyone unless they advertise it, like snapshot providers do.

3. **Verify storage**: Check your storage bucket to confirm blob files exist
4. **Test retrieval**: Restart the node and verify it can retrieve previously stored blobs

## Troubleshooting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand Claude generated most of this. Let's clean them up and remove the slop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants