Skip to content

go/oasis-node/cmd/storage: Add create and import checkpoint cmd#6454

Open
martintomazic wants to merge 5 commits intomasterfrom
martin/feature/create-checkpoint-cmd
Open

go/oasis-node/cmd/storage: Add create and import checkpoint cmd#6454
martintomazic wants to merge 5 commits intomasterfrom
martin/feature/create-checkpoint-cmd

Conversation

@martintomazic
Copy link
Copy Markdown
Contributor

@martintomazic martintomazic commented Feb 7, 2026

Closes #6423

  • Write our own version of BootstrapState
    • The node creating checkpoint can dump untrusted metadata, that will be used for initializing cometbft stores after importing the checkpoint.
    • We could also make it trustless, if we want to get rid of the snapshots entirely and only store checkpoints + storage diffs and blocks. :)
  • Fix TODOs depending on the review feedback.

@martintomazic martintomazic force-pushed the martin/feature/storage-inspect-cmd branch from ea89ecc to c5e2f2a Compare February 9, 2026 10:33
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 9761a3c to 0bba8dd Compare February 9, 2026 23:08
@martintomazic martintomazic force-pushed the martin/feature/storage-inspect-cmd branch 2 times, most recently from fe09fe6 to f833d73 Compare February 10, 2026 00:00
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 0bba8dd to 41b49b4 Compare February 10, 2026 00:05
@martintomazic martintomazic force-pushed the martin/feature/storage-inspect-cmd branch from f833d73 to b47eb6c Compare February 10, 2026 14:35
Base automatically changed from martin/feature/storage-inspect-cmd to master February 10, 2026 21:53
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 41b49b4 to b31dfff Compare February 11, 2026 09:00
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 11, 2026

Deploy Preview for oasisprotocol-oasis-core canceled.

Name Link
🔨 Latest commit 194a24f
🔍 Latest deploy log https://app.netlify.com/projects/oasisprotocol-oasis-core/deploys/69e67defc1853b00088f230e

@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from b31dfff to 744884b Compare February 11, 2026 09:04
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go
@martintomazic
Copy link
Copy Markdown
Contributor Author

Works! :)

The only thing that is impractical is finding corresponding runtime rounds to given consensus height and the fact that bootstrap "eats" one height as described.

Finally, one should be very careful with creation/import height/rounds so that you have all relevant light history for the runtime checkpoints you are importing.

@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch 3 times, most recently from ef92148 to a41d394 Compare February 11, 2026 14:12
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from a41d394 to 206c70e Compare February 11, 2026 14:25
@martintomazic
Copy link
Copy Markdown
Contributor Author

martintomazic commented Feb 11, 2026

Creating checkpoints from the penultimate snapshot, is dominated by the Sapphire checkpoint creation.

With 6 chunker threads current projection is 5-7 hours (will update). Import is a matter of minutes.

@martintomazic martintomazic marked this pull request as ready for review February 11, 2026 14:33
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch 3 times, most recently from 817bc76 to 2be35e9 Compare February 22, 2026 22:38
@martintomazic
Copy link
Copy Markdown
Contributor Author

martintomazic commented Feb 22, 2026

Added unit and e2e tests, fixed empty state corner case and improved code quality.

Two minor things left to discuss:

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 59.29204% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.56%. Comparing base (c9a4b8e) to head (06b5cc4).

Files with missing lines Patch % Lines
go/oasis-node/cmd/storage/checkpoint.go 58.43% 73 Missing and 65 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6454      +/-   ##
==========================================
- Coverage   64.73%   64.56%   -0.18%     
==========================================
  Files         699      700       +1     
  Lines       68246    68581     +335     
==========================================
+ Hits        44179    44279     +100     
- Misses      19060    19183     +123     
- Partials     5007     5119     +112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@peternose peternose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I import a consensus checkpoint, I get few lines of the following error. Afterwards, blocks execute normally.

{"caller":"grpc.go:194","err":"failed to get consensus status: failed to fetch current block: cometbft: block query failed: height 28800866 must be less than or equal to the current blockchain height 0","level":"error","method":"/oasis-core.NodeController/GetStatus","module":"grpc/internal","msg":"request failed","req_seq":15,"ts":"2026-02-24T13:00:28.934344662Z"}

Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint_test.go
Comment thread go/oasis-node/cmd/storage/checkpoint_test.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 2be35e9 to 06b5cc4 Compare February 24, 2026 15:32
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch 5 times, most recently from e9c6485 to 9633a30 Compare March 4, 2026 22:47
@martintomazic
Copy link
Copy Markdown
Contributor Author

When I import a consensus checkpoint, I get few lines of the following error. Afterwards, blocks execute normally.

Nice catch. Yes this is also how CometBFT checkpoint import works, but found a fixup regardless :).

The more annoying thing that I find is that you technically cannot import a checkpoint for the latest height, so probably adding extra validation + documenting this in the command would be beneficial, instead of unexpected error.

@martintomazic
Copy link
Copy Markdown
Contributor Author

Ready for a second review.

As you spotted I am "abusing" checkpoint.FileStore abstraction (#6467) bound to one concrete NodeDB, that should be able to create and import checkpoints for different versions.

However, this command is technically not create/import checkpoint but rather create/import state/snapshot. Indeed 99% is creating and importing checkpoints and using checkpoint.FileStore for encoding the snapshot structure. Nevertheless, CometBFT bootstrap metadata has nothing to do with the checkpoints alone.

For this reason I have created my helpers (stateless), so that they can also be easily refactored, possibly moved to checkpoint package one day. Until this command is the only client I am not sure it makes sense. Improving checkpoint package abstractions probably does. :)

Let's align on the user facing API and sanity checking the inputs:

  1. go/oasis-node/cmd/storage: Add create and import checkpoint cmd #6454 (comment)
    • The simpler the better given we will the only user of this command:
  2. Should we find a better name for this command given that is not just creating/importing checkpoints? Not sure.

@martintomazic martintomazic requested a review from peternose March 4, 2026 23:20
@peternose
Copy link
Copy Markdown
Collaborator

Ready for a second review.

Will have a look. Merge after we release 26.0.

@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 9633a30 to f082fcd Compare March 5, 2026 10:21
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from f082fcd to 6eb8569 Compare March 18, 2026 09:57
Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
Comment thread go/oasis-node/cmd/storage/checkpoint.go Outdated
return fmt.Errorf("failed to stop target compute worker: %w", err)
}

// Reset the target node's state completely. Ideally we would use NoAutoStart,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should start the network for 10 blocks, stop the target node, wait for 20 blocks more, stop the source node, do snapshots, start the target node, and import them afterwards.

Comment thread go/oasis-test-runner/scenario/e2e/runtime/checkpoint_create_import.go Outdated
@martintomazic martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 6eb8569 to afbc5a0 Compare March 31, 2026 12:51
Comment on lines +63 to +75
rtState, err := srcCtrl.Roothash.GetRuntimeState(ctx, &roothash.RuntimeRequest{
RuntimeID: KeyValueRuntimeID,
Height: candidateHeight,
})
if err != nil {
return fmt.Errorf("failed to get runtime state for height %d: %w", candidateHeight, err)
}

// Pick runtime state's LastBlockHeight as the consensus checkpoint height else
// runtime light history indexer might miss authoritative light block for the
// corresponding runtime round.
cpRound := rtState.LastBlock.Header.Round
cpHeight := rtState.LastBlockHeight
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have same issue with our checkpoint sync.

Consensus might create a checkpoint, and we would trigger checkpoint creation for the corresponding round for configured runtimes. The problem is that if this round was not created at given height, but before, indexer would skip it:

if state.LastBlockHeight != height {

Which would cause the corresponding runtime checkpoint sync to fail due to a missing authoritative light header. Out of scope, will open an issue.

)

// CheckpointCreateImport is the checkpoint create/import e2e scenario.
var CheckpointCreateImport scenario.Scenario = newCheckpointCreateImportImpl()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one (flaky?) outcome in the CI that I am trying to reproduce locally (might have seen it once locally before).
https://buildkite.com/oasisprotocol/oasis-core-ci/builds/16664#019d43f3-6e51-4ddc-b8a4-a0ab5da22719

{"caller":"worker.go:1155","err":"storage/database: failed to Apply: mkvs: node not found in node db","level":"error","module":"worker/storage/committee","msg":"can't apply write log","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:20:47.085392644Z"}

The imported checkpoint was corrupted as storage committee worker was unable to apply next storage diff. Might be also committee worker issue:

Storage committee logs from the CI
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{state-root}","round":11,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:31.02065292Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":11,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:31.02067514Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":11,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:31.020769116Z"}
{"caller":"worker.go:1204","last_finalized":11,"last_synced":11,"level":"debug","module":"worker/storage/committee","msg":"incoming block","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:33.737682523Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:33.73770249Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:33.737737249Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:33.737816314Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:33.737930232Z"}
{"caller":"worker.go:1204","last_finalized":12,"last_synced":12,"level":"debug","module":"worker/storage/committee","msg":"incoming block","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:35.059190353Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:35.059224542Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{state-root}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:35.059398047Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:35.05943708Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:35.059690354Z"}
{"caller":"worker.go:1204","last_finalized":13,"last_synced":13,"level":"debug","module":"worker/storage/committee","msg":"incoming block","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:36.462700172Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:36.462737431Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{state-root}","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:36.462816046Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:36.462842958Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:36.463146676Z"}
{"caller":"worker.go:1204","last_finalized":14,"last_synced":14,"level":"debug","module":"worker/storage/committee","msg":"incoming block","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:37.789492905Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:37.789509283Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{state-root}","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:37.789540837Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:37.789551362Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:37.78963392Z"}
{"caller":"worker.go:789","err":"context canceled","level":"error","module":"worker/storage/committee","msg":"checkpointer stopped","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:38.169306642Z"}
{"caller":"worker.go:1318","level":"info","module":"worker/storage/committee","msg":"stopped","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:38.169313829Z"}
{"caller":"worker.go:768","level":"info","module":"worker/storage/committee","msg":"starting","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.835974673Z"}
{"caller":"worker.go:948","genesis_round":0,"last_synced":11,"level":"info","module":"worker/storage/committee","msg":"worker initialized","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.836554655Z"}
{"caller":"worker.go:1021","level":"info","module":"worker/storage/committee","msg":"initialized","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.83663171Z"}
{"caller":"worker.go:1204","last_finalized":11,"last_synced":11,"level":"debug","module":"worker/storage/committee","msg":"incoming block","round":16,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.836767167Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.83698815Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837095354Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837161045Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":14,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837259592Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":15,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837338432Z"}
{"awaiting_retry":"outstanding_mask{state-root}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":16,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837388739Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837444513Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=14 type=io-root hash=7fa1c8d40fdd82e2c0af8ffd3009890a4d5cc1109e1b36789b7fd283a95bf07e>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=14 type=io-root hash=c672b8d1ef56ed28ab87c3622c5114069bdd3ad7b8f9737498d0c01ecef0967a>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837508641Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=14 type=state-root hash=9314bf6ac6112131839cae80d8452dfaabcb0e408413b402e6573eb53eed3333>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.837837373Z"}
{"caller":"worker.go:1175","level":"debug","module":"worker/storage/committee","msg":"finished syncing round","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.838913637Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.838980235Z"}
{"caller":"worker.go:1155","err":"storage/database: failed to Apply: mkvs: node not found in node db","level":"error","module":"worker/storage/committee","msg":"can't apply write log","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.8390787Z"}
{"caller":"worker.go:439","level":"debug","module":"worker/storage/committee","msg":"storage round finalized","round":12,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839273082Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839260743Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=15 type=io-root hash=a15d4949610bf4c365b0a75368b6e79bf751c57067fd456ed34f1a5693e00ee6>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=15 type=io-root hash=c672b8d1ef56ed28ab87c3622c5114069bdd3ad7b8f9737498d0c01ecef0967a>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839331111Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=15 type=state-root hash=b9f885bdbbeb1504c9b0e46d186ed90af8fed7f88b0c8772058a483ee0efd9b0>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=14 type=state-root hash=9314bf6ac6112131839cae80d8452dfaabcb0e408413b402e6573eb53eed3333>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839631516Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839724966Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839799029Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.839847976Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.840478217Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.840622243Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.841242907Z"}
{"caller":"worker.go:1155","err":"storage/database: failed to Apply: mkvs: node not found in node db","level":"error","module":"worker/storage/committee","msg":"can't apply write log","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.841528927Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.841592563Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.841656563Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.842566191Z"}
{"caller":"worker.go:1155","err":"storage/database: failed to Apply: mkvs: node not found in node db","level":"error","module":"worker/storage/committee","msg":"can't apply write log","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:06:55.842681114Z"}
{"caller":"worker.go:1261","in_flight_rounds":4,"level":"debug","module":"worker/storage/committee","msg":"heartbeat","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:03.261447066Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:03.261550558Z"}
{"caller":"worker.go:401","level":"debug","module":"worker/storage/committee","msg":"calling GetDiff","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:03.26165467Z"}
{"awaiting_retry":"outstanding_mask{}","caller":"worker.go:1074","level":"debug","module":"worker/storage/committee","msg":"preparing round sync","outstanding_mask":"outstanding_mask{}","round":13,"runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:03.263252444Z"}
{"caller":"worker.go:1155","err":"storage/database: failed to Apply: mkvs: node not found in node db","level":"error","module":"worker/storage/committee","msg":"can't apply write log","new_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=13 type=state-root hash=4848fde3109b8a49e559a3a10040e25af7c5d79f8eeeecbd55d30738f3baec55>","old_root":"<Root ns=8000000000000000000000000000000000000000000000000000000000000000 version=12 type=state-root hash=963bc9c0eaf31ad37a8c9a25146d047fdef7d468b89f19a301ab28e76f84ee65>","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:03.26339852Z"}
{"caller":"worker.go:1261","in_flight_rounds":4,"level":"debug","module":"worker/storage/committee","msg":"heartbeat","runtime_id":"8000000000000000000000000000000000000000000000000000000000000000","ts":"2026-03-31T13:07:12.7026433Z"}

^^ Worker synced from rounds 0-15, then state was reset. After that round 11 is imported, round 12 root does not change hence no need to fetch it but we still see finalized round. Finally round 13 fails.

All other local/CI runs have no such issues whatsoever, including testing this command on the real mainnet data.

Submitting more runtime transactions once the checkpoint was created, so that restarted node has to catch-up with more (and new) state, did not trigger this outcome either.

@martintomazic martintomazic requested a review from peternose March 31, 2026 19:26
The test should be ideally hardened by also making sure the
target node also syncs up to the tip of the runtime chain
and not just consensus.
Prevously compute may not sync the latest round and checkpoint
might fail. This is fixed by querying compute explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

go/oasis-node: Enable snapshot creation with exact start version

2 participants