Add local signer backup recovery flow by ihordiachenko · Pull Request #715 · Blockstream/greenlight

ihordiachenko · 2026-05-01T12:04:51Z

Adds opt-in local VLS signer backups with CLI inspection/conversion tooling. There are two available backup strategies:

new-channels-only: default, low I/O, snapshots when a channel first becomes recoverable.
periodic: snapshots new recoverable channels and then refreshes after configured recoverable-channel updates, with more disk writes

Backups can be created through:

 glcli signer run --backup-path, inspected with inspect-backup

Backups can be converted to CLN recoverchannel input with:

glcli signer convert-backup --format cln --path <backup file>

Tradeoffs

Backups are best-effort during signer operation: write failures are logged and do not interrupt signing. The backup file is created only after a snapshot trigger, not immediately at startup.
Peer addresses are stored from Greenlight’s peerlist alongside VLS state to close the main recovery-data gap.
Only v1 channels supported for now

cdecker

Hm, not quite sure this is the direction we should go. Calling the client API from the signer is not necessary as far as I can see. The idea was to just take a snapshot of the signer state, which contains all the relevant information to recover on its own, whereas this change is a sprawling change, injecting new client connections in a variety of places, and adding strong coupling.

The original issue had the following line:

Conclusion: VLS state contains all SCB data plus much more. Storing VLS state snapshots should be sufficient for disaster recovery.

cdecker · 2026-05-07T13:29:42Z

+use std::io::Write;
+use std::path::Path;


This would prevent us from compiling in no_std environments, of which we target wasm as well as embedded environments. This means we need to gate the use and functionality behind a #[cfg(...)] guard, so we can exclude these parts for no_std envs.

Good point. Done

cdecker · 2026-05-07T13:30:51Z


 mod approver;
 mod auth;
+mod backup;


We likely need to #[cfg(...)] guard to the mod, then we have a nice and clean separation.

cdecker · 2026-05-07T13:33:18Z

+    async fn process_request(
+        &self,
+        req: HsmRequest,
+        mut node_client: Option<&mut crate::node::ClnClient>,


I don't quite understand the logic behind pushing a backup side-effect into the processing itself, when we can do snapshot comparison in the caller.

cdecker · 2026-05-07T13:34:38Z

        }
    }

+    fn backup_peerlist_client(&self, channel: Channel) -> Result<Option<node::ClnClient>, Error> {


Not sure why we need a node::ClnClient here at all, we have all the necessary data in the signerstate already, so let's just extract from there.

Following up on your feedback in the sibling MR:

Ah, now I see why retrieving extra information is necessary in the public PR. I think we can work around it though, since a funding must always be preceeded by a connect command, whose IP address we can just stash away. Alternatively, a much cleaner solution would be to just add the IP, if known, to the VLS state itself. That would take a while to propagate through back to us, but it would mean the signer state is a true superset.

I went with adding peer data into the VLS state to make it a real superstate

cdecker

Very good, the functionality is all there, however the implementation and specifically where it hooks into the rest of the functionality is rather strange to me. From what I understand the backup is now interspersed with the signer state update, whereas we could just keep the signer state processing untouched, then pass the updated signer state into the backup for it to extract the changes (regenerate the backup), and then conditionally write to disk when there is a change to before. This would be more aligned with the phase-separation we have currently:

Pre-flight checks in the form of the end-to-end verification on the requests
Snapshot of the signer state
Pass signer state and request to the VLS core for verification, state updates and response generation
Diff between pre- and post-state
Pass diff to backup so it can update itself
Return (response, post_state) to gl-plugin

Can be a followup PR, but probably simpler to separate here, rather than to untangle once merged.

cdecker · 2026-05-18T14:06:12Z

+        let private_key = self
+            .tls
+            .private_key
+            .clone()
+            .ok_or_else(|| Error::Other(anyhow!("missing TLS private key for CLN auth")))?;


I wonder if this may ever happen actually 🤔

cdecker · 2026-05-18T14:12:09Z

@@ -0,0 +1,111 @@
+# Signer Backups
+
+Greenlight signers can keep a local copy of the VLS signer state to enable disaster recovery or migration to a self-hosted node. This backup is opt-in and disabled by default. When enabled, the backup file contains signer state entries for recoverable channels and known peers.


Migrating to a new self-hosted node requires the Greenlight node to be forcefully disabled, otherwise we run into the split brain issue, and LN will penalize. The backup really is only for disaster recovery, never for migration, as an unattended and/or uncoordinated migration will result in loss of funds!

Please reword this intro as a safety net, not to be used for migrations.

cdecker · 2026-05-18T14:13:28Z

+
+## Convert for Core Lightning
+
+Convert the signer backup to Core Lightning recovery input:


PLease add a warning that this is only ever to be done if the service goes down, and the signer MUST NEVER connect to Greenlight's hosted node ever, otherwise loss of funds may be inevitable.

Sort of, actually the backup we are building here is the SCB equivalent, not a real resumable backup. That'd involve storing the shachain secrets, and other related secrets in lockstep wtih the node, which this PR does not implement.

So technically, could be safe, but only because CLN will have to immediately close the channels in the backup to recover the funds, which makes concurrent operations less risky, but still quite risky should the GL node not have been immobilized.

cdecker · 2026-05-18T14:16:11Z

+When VLS counterparty revocation secrets are present in the backup, the
+converted CLN SCB entries include the shachain TLV. If that signer state is
+absent, conversion still emits CLN recovery input without the shachain TLV.


Very good, I think there isn't a whole lot missing if we have the shachain to just be able to resume the channels in their state without closing them.

cdecker · 2026-05-18T14:19:00Z

@@ -0,0 +1,96 @@
+#[cfg(feature = "backup")]
+mod enabled {


I'm not sure I follow here. I was hoping we'd just have the following in src/signer/mod.rs:

#[cfg(feature = "backup")] mod backup;

That's all we really need. And then from the callsites we just prepend the callsites with a cfg check inline. Having dummy stubs around is terrible DX.

cdecker · 2026-05-18T14:20:25Z


    network: Network,
    state: Arc<Mutex<crate::persist::State>>,
+    backup: backup_runtime::Runtime,


#[cfg(feature = "backup")]

Here and the other call sites, and we avoid the weird roundabout way of disabling and enabling a Runtime.

ihordiachenko requested a review from cdecker May 1, 2026 12:04

ihordiachenko force-pushed the feature/state_backup branch from bf8917e to c557661 Compare May 1, 2026 12:07

ihordiachenko marked this pull request as ready for review May 1, 2026 12:12

cdecker reviewed May 7, 2026

View reviewed changes

ihordiachenko added 12 commits May 13, 2026 02:38

gl-client: initial backup implementation

8e1e84e

gl-client: add periodic signer backups

9f11d61

gl-client: parse backup data

e231f25

gl-client: added inspect backup command

31c9eb3

signer: add CLN backup conversion

13aec33

gl-cli: add signer backup run flags

6eea3a8

docs: add signer backup reference

16abbeb

signer: encode CLN shachain in backup conversion

2b0d046

backups: fixed txid byte order

89d22e6

docs: update signer backups description

fa6acb2

signer: store backup peers in signer state

3adc48f

signer: gate backup support behind feature

2237e89

ihordiachenko force-pushed the feature/state_backup branch from 2f9ce1d to 2237e89 Compare May 12, 2026 23:45

cdecker approved these changes May 18, 2026

View reviewed changes

		@@ -0,0 +1,111 @@
		# Signer Backups

		Greenlight signers can keep a local copy of the VLS signer state to enable disaster recovery or migration to a self-hosted node. This backup is opt-in and disabled by default. When enabled, the backup file contains signer state entries for recoverable channels and known peers.


		## Convert for Core Lightning

		Convert the signer backup to Core Lightning recovery input:

		use std::io::Write;
		use std::path::Path;

Conversation

ihordiachenko commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tradeoffs

Uh oh!

cdecker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdecker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ihordiachenko commented May 1, 2026 •

edited

Loading