-
Notifications
You must be signed in to change notification settings - Fork 975
Description
It appears that the code for pruning the channel_htlcs table of records relating to closed channels was recently moved so that it is executed during node startup rather than at the point of channel closure. This by itself might not have been a bad decision since (as I recall) the "online" pruning was causing the whole node to grind to a halt whenever a channel was closed. (#7962) However, it didn't really solve the problem, as now the node can take an unacceptably long time to start up.
My channel_htlcs table is about 6355 MB in size, and I have it partitioned so that historical records (resolved since more than about a week ago) get moved off of flash onto spinning rust. Normally that's fine since those records never really get accessed. However, wallet_delete_old_htlcs executes a query that requires a full table scan of channel_htlcs.
Actually, it's worse than one full table scan because of how the query is written:
DELETE FROM channel_htlcs
WHERE id IN (
SELECT ch.id
FROM channel_htlcs AS ch
JOIN channels AS c ON c.id = ch.channel_id
WHERE c.state = 10
);That query actually requires two full table scans: one to construct a hash set containing all the IDs to delete and then a second to delete all the records whose IDs are contained in that set. (EXPLAIN confirms that the query does in fact perform two sequential scans of channel_htlcs.)
You can rewrite the query so that it does all the work in a single scan:
DELETE FROM channel_htlcs AS ch
USING channels AS c
WHERE c.id = ch.channel_id
AND c.state = 10;Now you'll tell me that SQLite can't do that, and I'll assert yet again that SQLite is a toy database, and we'll butt heads. Let's move on…
I think the above query is more complex than necessary, though. Wouldn't the following accomplish the same goal?
DELETE FROM channel_htlcs
WHERE channel_id IN (
SELECT id
FROM channels
WHERE state = 10
);That requires only one full scan of channel_htlcs, SQLite is capable of understanding the syntax, and the hash set against which to check every record may contain ~thousands of elements rather than ~millions.
Incidentally, "millions" is not hyperbole:
=> SELECT count(*) FROM channel_htlcs ch JOIN channels c ON c.id = ch.channel_id WHERE c.state = 10;
count
---------
1331377
(1 row)Now, I can't actually say whether the simpler query would execute much/any faster. It may simply be the case that updating the unique index on (channel_id, channel_htlc_id, direction) for millions of record deletions on spinning rust is just going to be dog slow regardless of how the set of rows to be deleted is determined.
I ended up killing lightningd and canceling the PostgreSQL backend that was executing the DELETE query. Then I dropped the unique index, executed the simpler query shown above, and re-created the index. That took only a few minutes, whereas the original query had been executing for more than two hours before I killed it. Obviously, dropping and re-creating the index is not going to be faster in the common case, so I don't quite know what to recommend here. Counting up the rows that would need to be deleted is pretty quick, so maybe only drop and re-create the index if you have more than some threshold number of records to delete.
On my own node now I have removed the call to wallet_delete_old_htlcs in lightningd/lightningd.c:main and have set up a pg_cron job to run the simpler query once a day. Now these cleanup operations will never again delay my node. I'm not sure about the best course of action to help others, though.