Skip to content

Predictable ledger state snapshots#6526

Draft
geo2a wants to merge 3 commits intomasterfrom
geo2a/predictable-snapshots
Draft

Predictable ledger state snapshots#6526
geo2a wants to merge 3 commits intomasterfrom
geo2a/predictable-snapshots

Conversation

@geo2a
Copy link
Copy Markdown
Contributor

@geo2a geo2a commented Apr 13, 2026

Description

This PR brings in the Consensus feature of predictable ledger state snapshots:

  • snapshots will be taken by all nodes at the same deterministic slots numbers, rather then depending on a node's start time.
  • to avoid the thundering herd effect, when all nodes take the snapshot at the same time and stop the network, every node will introduce a randomised time delay before taking a snapshot.

Changes to cardano-node configuration

The LedgerDB section of the config.yaml file is re-worked to have the following parameters:

LedgerDB:
  # remains as-is
  Backend: V2InMemory

  # start taking the snaphots at slot 172800, after Byron
  SlotOffset: 172800

  # take snapshots every 432000 slots, at the end of every Shelley epoch
  SnapshotInterval: 432000

  # A minimum duration between snapshots, in seconds (used to avoid excessive snapshots while syncing).
  RateLimit: 0 # default is 10 minutes

  # randomised snapshot delay range, in seconds.
  # Both Min and Max need to be specified, otherwise the default delay of (5min, 10min) will be used
  MinDelay: 60
  MaxDelay: 120

New tracing events

  • SnapshotRequestDelayed snapshotRequestTime delayBeforeSnapshotting slots --- traces the fact that a snapshot was requested for slots, but the request will be executed after delayBeforeSnapshotting.
  • SnapshotRequestCompleted signifies the completion of a delayed snapshot request.

Manual Testing

This feature is a little tricky to test automatically, and I have not found any end-to-end tests for the ledger state snapshot functionality. I've done some manual testing by analysing the logs. This process could be automated using cardano-testnet, but I'm afraid that the test could be flaky and very verbose.

To test the feature, I've ran started a sync with mainnet and used Claude Code to grep the logs and construct a table that verifies that the snapshots are indeed taken at the announces slots after the expected delay:

# SnapshotRequestDelayed Scheduled Slots TookSnapshot Taken Slot Reported Delay (s) Actual Delta (s)
1 2026-04-10 12:21:49.5021 4492799 2026-04-10 12:23:02.5057 4492799 73 73.0036
2 2026-04-10 12:23:03.3550 4924780 2026-04-10 12:24:07.3566 4924780 64 64.0016
3 2026-04-10 12:24:08.6039 5356780, 5788780 2026-04-10 12:25:56.6049 5356780 108 108.0010
2026-04-10 12:25:58.6811 5788780 110.0772
4 2026-04-10 12:26:00.0986 6220777, 6652775, 7084774 2026-04-10 12:27:13.1009 6220777 73 73.0023
2026-04-10 12:27:15.2028 6652775 75.1042
2026-04-10 12:27:16.2926 7084774 76.1940
5 2026-04-10 12:27:17.9178 7516773, 7948772, 8380772 2026-04-10 12:28:40.9199 7516773 83 83.0021
2026-04-10 12:28:42.1981 7948772 84.2803
2026-04-10 12:28:43.6356 8380772 85.7178

The relevant fragment of the log file is attached:

Details [2026-04-10 12:21:49.5021Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4492799] , with a randomised delay of 73s [2026-04-10 12:23:02.5057Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 [2026-04-10 12:23:03.2456Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 , duration: 0.739851377s [2026-04-10 12:23:03.2530Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:23:03.3550Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4924780] , with a randomised delay of 64s [2026-04-10 12:24:07.3566Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 [2026-04-10 12:24:08.2799Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 , duration: 0.923355707s [2026-04-10 12:24:08.2872Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:24:08.6039Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 5356780,SlotNo 5788780] , with a randomised delay of 108s [2026-04-10 12:25:56.6049Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 [2026-04-10 12:25:58.6810Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 , duration: 2.076082693s [2026-04-10 12:25:58.6811Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 [2026-04-10 12:25:59.7608Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 , duration: 1.079557458s [2026-04-10 12:25:59.7861Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:26:00.0986Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 6220777,SlotNo 6652775,SlotNo 7084774] , with a randomised delay of 73s [2026-04-10 12:27:13.1009Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 [2026-04-10 12:27:15.2022Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 , duration: 2.101244366s [2026-04-10 12:27:15.2028Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 [2026-04-10 12:27:16.2922Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 , duration: 1.089450966s [2026-04-10 12:27:16.2926Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 [2026-04-10 12:27:17.4298Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 , duration: 1.137220622s [2026-04-10 12:27:17.4573Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:27:17.9178Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 7516773,SlotNo 7948772,SlotNo 8380772] , with a randomised delay of 83s [2026-04-10 12:28:40.9199Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 [2026-04-10 12:28:42.1979Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 , duration: 1.277999016s [2026-04-10 12:28:42.1981Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 [2026-04-10 12:28:43.6354Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 , duration: 1.43729202s [2026-04-10 12:28:43.6356Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 [2026-04-10 12:28:45.5275Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 , duration: 1.891909794s [2026-04-10 12:28:45.5622Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot
# Checklist
  • Commit sequence broadly makes sense and commits have useful messages
  • New tests are added if needed and existing tests are updated. These may include:
    • golden tests
    • property tests
    • roundtrip tests
    • integration tests
      See Running tests for more details
  • Any changes are noted in the CHANGELOG.md for affected package
  • The version bounds in .cabal files are updated
  • CI passes. See note on CI. The following CI checks are required:
    • Code is linted with hlint. See .github/workflows/check-hlint.yml to get the hlint version
    • Code is formatted with stylish-haskell. See .github/workflows/stylish-haskell.yml to get the stylish-haskell version
    • Code builds on Linux, MacOS and Windows for ghc-9.6 and ghc-9.12
  • Self-reviewed the diff

Note on CI

If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.

@geo2a geo2a self-assigned this Apr 13, 2026
* updates ouroboros-consensus -> 3.0.0.0
* updates dmq-node -> 0.4.2.0
* updates plutus -> 1.61
* updates cardano-ledger-core -> 1.20
* updates cardano-api -> 10.26

cardano-node: fix LedgerDB `selectorToArgs`

PrometheusSimple: expose DoS protection options

cardano-node: optimize delegMapSize metric
@geo2a geo2a changed the title Predictable snapshots ledger state snapshots Predictable ledger state snapshots Apr 14, 2026
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch 2 times, most recently from f5edd49 to 56838db Compare April 14, 2026 06:27
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 56838db to 844a4c4 Compare April 14, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants