payments: kv-to-sql migration (tests + wiring)#10485
payments: kv-to-sql migration (tests + wiring)#10485ziggie1984 merged 16 commits intolightningnetwork:masterfrom
Conversation
Summary of ChangesHello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change by migrating the payments database from a Key-Value (KV) store to a native SQL schema. The primary goal is to enhance database robustness, query flexibility, and overall performance for payment-related operations. It includes the complete implementation of the migration logic, a new SQL-backed payment store, and a comprehensive suite of tests to ensure data integrity and correct functionality during and after the migration. This change also prepares the system for future database optimizations and features by leveraging the capabilities of a relational database. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive migration for the payments database from a key-value store to SQL. The changes are well-structured, with the migration logic encapsulated in a new payments/db/migration1 package. This package includes a reader for the old KV store format, cleverly handling historical data quirks like duplicate payments by migrating them as HTLC attempts on the primary payment. The migration is correctly wired into the application startup, guarded by a new migration version and a tombstone mechanism to prevent accidental use of the old database post-migration. The testing strategy is particularly strong, featuring data integrity checks, property-based tests, and specific tests for various payment features, which provides high confidence in the correctness of this complex change. My review is positive, with only a couple of minor stylistic suggestions to improve readability.
d8531d2 to
2930b37
Compare
2802da4 to
83bce46
Compare
2930b37 to
8c106c7
Compare
622021b to
fc449ad
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the migration logic for moving payment data from the key-value store to a SQL database. The changes are extensive, including a new payments/db/migration1 package that encapsulates the migration logic, new SQL schema definitions for duplicate payments, and the necessary wiring in the configuration builder to trigger the migration. The approach is solid, with good isolation of migration code and thorough validation checks to ensure data integrity. I've found a few minor issues related to code style and typos in error messages and comments, which I've detailed in the review comments.
fc449ad to
83a9946
Compare
27f2fec to
5a25e61
Compare
0360b30 to
915f282
Compare
147db52 to
bba3e6c
Compare
328849d to
ea6085f
Compare
|
@ziggie1984 - worth checking why the BW compat tests failed here |
bba3e6c to
78ba5e4
Compare
ea6085f to
5a021fc
Compare
78ba5e4 to
e9a8826
Compare
Add a migration specific query which allows to set the failure reason when inserting a payment into the db.
Older LND versions could create multiple payments for the same hash. We need to preserve those historical records during KV→SQL migration, but they don’t fit the normal payment schema because we enforce a unique payment hash constraint. Introduce a lean payment_duplicates table to store only the essential fields (identifier, amount, timestamps, settle/fail data). This keeps the primary payment records stable and makes the migration deterministic even when duplicate records lack attempt info. The table is intentionally minimal and can be dropped after migration if no duplicate payments exist. For now there is no logic in place which allows the noderunner to fetch duplicate payments after the migration.
Copy the core payments/db code into payments/db/migration1 and add the required sqlc-generated types/queries from sqldb/sqlc. This effectively freezes the migration code so it stays robust against future query or schema changes in the main payments package. Replace the delegation to channeldb.ReadElement/WriteElement with self-contained, frozen implementations that only handle the exact types required by this migration package. This removes the dependency on the live channeldb codec so that future changes to channeldb serialization cannot silently corrupt or break the migration. UnknownElementType is also defined locally for the same reason.
Implement the KV→SQL payment migration and add an in-migration validation pass that deep-compares KV and SQL payment data in batches. Duplicate payments are migrated into the payment_duplicates table, and duplicates without attempt info or explicit resolution are marked failed to ensure terminal state. Validation checks those rows as well.
Add test helpers plus sql_migration_test coverage for KV→SQL migration. Basic migration, sequence ordering, data integrity, and feature-specific cases (MPP/AMP, custom records, blinded routes, metadata, failure messages). Also cover duplicate payment migration to payment_duplicates, including missing attempt info to ensure terminal failure is recorded. This gives broad regression coverage for the migration path and its edge-cases.
Add a developer-facing migration_external_test that allows running the KV→SQL payments migration against a real channel.db backend to debug migration failures on actual data. The accompanying testdata README documents how to supply a database file and configure the test, so users can validate their data and confirm the migration completes successfully. The test is skipped by default and meant for manual diagnostics.
Hook the payments KV→SQL migration into the SQL migration config. The migration is still only available when building with the build tag "test_native_sql". Moreover a tombstone protection similar to the invoice migration is added to prevent re-runningi with the KV backend once migration completes.
For legacy payments, the HTLC Hash field may be nil in the bbolt backend. Previously, the migration would fail with "HTLC attempt X missing payment hash" when encountering such payments. This commit fixes the migration by falling back to the parent payment hash when the HTLC-specific hash is nil. This is consistent with how the router handles legacy payments (see patchLegacyPaymentHash in payment_lifecycle.go). The validation logic is also updated to apply the same fallback when comparing bbolt data with migrated SQL data, ensuring the comparison succeeds.
When duplicatePaymentSequenceKey is missing from a duplicate payment sub-bucket, the code returned the outer function's err variable which is nil at that point. This caused corrupted duplicate entries to be silently treated as "not found" instead of failing loudly. Return a new dedicated ErrNoDuplicateSequenceNumber error so malformed data is detected immediately.
The defer closure checked a local err variable for commit/rollback decisions, but err remained nil after a successful BeginTx. When txBody failed, the error was returned directly without assigning to err, so the defer always committed instead of rolling back. Additionally, since err was not a named return value, the defer's Commit error assignment was silently swallowed. Replace the error-prone defer pattern with explicit rollback on txBody failure and a direct Commit return.
Tests that all migration files follow the defined schema and that there are not duplicates which could cause collision.
5a021fc to
50198ed
Compare
yyforyongyu
left a comment
There was a problem hiding this comment.
I don't think the frozen is complete? route and codec are frozen, but payment migration still depends on live lnwire/record serialization behavior.
| ) | ||
|
|
||
| // Open the KV backend in read-only mode. | ||
| err := kvBackend.View(func(kvTx kvdb.RTx) error { |
There was a problem hiding this comment.
This migration follows the same design as the invoice migration which is also not resumable
I don't think this is a good argument - the current design means there is a stuck startup loop, if the migration fails => users cannot go back to kvdb => restart with native sql flag => fail again => ... stuck on startup.
Migration 30 already built a resumable migration flow, we could use it here to keep things safe.
Non-blocking tho - this should be an edge case, as the only concern is that the db doesn't have enough disk space to save the WAL before the tx is committed.
| // completed. | ||
| d.logger.Debugf("Setting payments bucket " + | ||
| "tombstone") | ||
|
|
There was a problem hiding this comment.
As mentioned above, if SQL migration keeps failing, the node can be stuck in fail-closed mode. I think we could do sth like,
- Commit SQL migration + migration tracker first.
- Set tombstone after commit.
- On startup, reconcile:
- migration done + tombstone missing => set tombstone
- tombstone set + migration not done => treat as interrupted (don’t hard-lock KV forever).
Again non-blocker, just thinking about edge cases.
LegacyPayload was a hint used exclusively by the KV store to decide how to serialize and deserialize the hop payload (legacy format vs TLV). The SQL store does not serialize hop data at all — every hop field is persisted natively in its own column — so this flag has no meaning there and is never stored. Clear LegacyPayload for all hops inside normalizePaymentForCompare so that deep-equality checks between KV and SQL payments succeed even when the KV source data carries LegacyPayload=true. A dedicated test (TestMigrationLegacyPayloadNormalized) is added to verify that a payment with LegacyPayload=true hops migrates and compares correctly.
50198ed to
95d8862
Compare
Also freeze the lnwire and record packages used by the migration. Copy the minimal subset of lnwire files (16) into payments/db/migration1/lnwire/ and all record files (6) into payments/db/migration1/record/. Three lnwire files are trimmed to avoid pulling in the full message-type dispatch tree — all changes are purely subtractive and can be verified with: diff lnwire/message.go payments/db/migration1/lnwire/message.go diff lnwire/writer.go payments/db/migration1/lnwire/writer.go diff lnwire/lnwire.go payments/db/migration1/lnwire/lnwire.go All migration1 files now import only the frozen packages, removing the live dependency on lnwire and record so future changes to those packages cannot affect migration correctness.
Adds the KV db migration code to native SQL.
After merging this PR, the migration is still hidden behind
test_native_sql