Optimize Postgres compactor queries #446
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Statement
In the context of #400, the current compaction query uses an OR condition with COLLATE "C" for cursor-based pagination:
The OR condition prevents Postgres from using the index bounds and removes the majority of the rows, as we can see in the query plan with EXPLAIN ANALYZE in below sample query executed during compaction:
Output:
PostgreSQL scans all rows matching the Index Cond and then applies the Filter, discarding most rows as we can see in the Rows Removed by Filter section.
Performance Impact
Solution
To optimize this query, I'm proposing to split the code into 3 specialized queries. Each query will handle a specific parameter case passed to
compactcommand. Check the changes in the PostgresCompactor.ts file to see the changes. I've also created unit and integration tests to verify correctness.Results
By running both the code currently in master branch and this branch I got these numbers locally when compacting a database with 1 million bucket records.