Add data directory locking and partial corrupt initdb cleanup#892
Add data directory locking and partial corrupt initdb cleanup#892reisepass wants to merge 2 commits intoelectric-sql:mainfrom
Conversation
Tests that FAIL without the fix — reproducing real corruption: - Overlapping instances: triple, staggered, DDL writer, rapid cycling - HMR double-instance: lock-based blocking and rapid swap corruption - WAL bloat burst mode: rapid kills without checkpoint cause corruption Co-Authored-By: Matthaus Wolff <8714327+WolffM@users.noreply.github.com>
- Add PID-based file lock to NodeFS to prevent overlapping instance corruption - Detect partially-initialized data dirs and move to .corrupt-<timestamp> backup - Add tests verifying partial init backup behavior
|
This makes a lot of sense. Accidental multi-process access during local dev is very real, especially when using pnpm dev in multiple terminals. The lock file approach feels like a clean and pragmatic safeguard, especially since Postgres assumes exclusive control over the data directory. Curious is there any plan to expose a clearer error message when the second process is rejected so it’s obvious what happened? |
|
This is a really practical improvement. Locking the data dir should prevent a whole class of annoying dev-time corruption issues, and the partial initdb handling makes the setup much more robust. Great work! |
|
@reisepass Thank you for this!
I would prefer if you address only the lock file issue for the moment. I just skimmed through the changes (sorry, really busy with other things), seems like there are a lot of other files (tests?) added. For the sake of simplicity, if the only thing addressed is the locking of folder access, please consolidate in a single test file or add to existing test files the minimum that tests the new functionality.
Sounds reasonable. |
I'v been running into issues with pglite getting corrupted in persist to disk mode. The most human scenario is that you have on pnpm dev running in one window and then you start for a quick test not remembering that you already had one open. In my dev work this honestly happens daily so it was hard to use pglite practically as an sqlite replacement.
But simple fix just add a lock file.
This is nothing fancy since postgres assumes it has full control we just reject the second process trying to use the data dir.
And then another small fix for corruption that happens if the file was corrupted due to partial initialization, this is less common but might as-well add another little feature to make it more robust. If a file is there that is so corrupt you can't even find the PG_VERSION it just moves it to a backup and makes another.
^I would be fine also editing this out if someone doesn't like this behavior the Lock file is the important feature