Skip to content

Askrene parallel solving support#8723

Open
rustyrussell wants to merge 19 commits intoElementsProject:masterfrom
rustyrussell:guilt/askrene-parallel
Open

Askrene parallel solving support#8723
rustyrussell wants to merge 19 commits intoElementsProject:masterfrom
rustyrussell:guilt/askrene-parallel

Conversation

@rustyrussell
Copy link
Contributor

This is actually fairly simple: we fork() at the point we're going to call the solver, and the child runs until it either produces an error message or a JSON result.

The main changes are in refactoring to make it clear which parts of the code run in the child, and which run in the parent.

@rustyrussell rustyrussell added this to the v26.03 milestone Nov 25, 2025
Copy link
Collaborator

@Lagrang3 Lagrang3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever parallelization!

@rustyrussell rustyrussell force-pushed the guilt/askrene-parallel branch from c3fb159 to 8ffb1fc Compare December 5, 2025 04:04
@rustyrussell rustyrussell force-pushed the guilt/askrene-parallel branch from 8ffb1fc to cb89fc4 Compare January 27, 2026 08:00
@rustyrussell
Copy link
Contributor Author

Rebased on master.

@rustyrussell rustyrussell force-pushed the guilt/askrene-parallel branch 2 times, most recently from 0809ac8 to c023fed Compare January 27, 2026 22:36
…g an entore response.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We reimplemented this redundantly: hash_scid was called
short_channel_id_hash, so I obviously missed it.

Rename, and implement hash_scidd helper too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is fairly simple.  We do all the prep work, fire off the child,
and it continues all the way to producing JSON output (or an error).
The parent then forwards it.

Limitations (fixed in successive patches):

1. Child logging currently gets lost.
2. We wait for the child, so this code is not a speedup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We just shim rq_log for now, but we'll be weaning the child process off
that soon.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to make it clear when future generations edit the code, which
routines are called in the child (i.e. all the routing), and which in
the parent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Weird that it was in askrene.c

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now there's only one file clearly shared by both parent and child.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Notably no access to the struct command and struct plugin.

Note: we actually *do* mess with askrene->reserves, but the previous code
used cmd to get to it.  Now we need to include a non-const pointer in
struct route_query.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `askrene` now runs routing in parallel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Queue them before we query local channels, so they don't use stale
information.

Changelog-Added: Config: `askrene-max-threads` to control how many CPUs we use for routing (default 4).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The fork logic itself is pretty simple, so do that directly in
askrene.c, and then call into "run_child()" almost as soon as
we do the fork.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes sure it cannot interfere with the parent askrene's
connection to lightningd, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This speeds them up, and exercises the askrene parallel code.

Before: test_real_data: 260s  test_real_biases: 173s
After:  test_real_data: 133s  test_real_biases: 106s

And this is because much of the time is spent uncompressing the gossmap
and startup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noticed this in the logs:

```
lightningd-1 2026-01-28T00:27:37.504Z DEBUG   gossipd: gossip_store: Read 59428/118856/0/0 cannounce/cupdate/nannounce/delete from store in 45521871 bytes, now 45521849 bytes (populated=true)
lightningd-1 2026-01-28T00:27:37.504Z DEBUG   gossipd: Got 118856 bad cupdates, ignoring them (expected on mainnet)
```

That's weird, and turns out it counting good updates, not bad ones!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/askrene-parallel branch from c023fed to af5110c Compare February 17, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants