Skip to content

Conversation

@jaswdr
Copy link

@jaswdr jaswdr commented Dec 24, 2025

Problem Statement

In the context of #400, the current compaction query uses an OR condition with COLLATE "C" for cursor-based pagination:

SELECT
  op,
  op_id,
  source_table,
  table_name,
  row_id,
  source_key,
  bucket_name
FROM
  bucket_data
WHERE
  group_id = $1
  AND bucket_name >= $2
  AND (
    (
      bucket_name = $3
      AND op_id < $4
    )
    OR bucket_name < $3 COLLATE "C"
  )
ORDER BY
  bucket_name DESC,
  op_id DESC
LIMIT
  $5

The OR condition prevents Postgres from using the index bounds and removes the majority of the rows, as we can see in the query plan with EXPLAIN ANALYZE in below sample query executed during compaction:

EXPLAIN ANALYSE SELECT 
  op,
  op_id,
  source_table,
  table_name,
  row_id,
  source_key,
  bucket_name
FROM bucket_data
WHERE
  group_id = 1
  AND bucket_name >= ''
  AND ((
    bucket_name = 'global[]'
    AND op_id < 49976) 
    OR bucket_name < 'global[]' COLLATE "C")
ORDER BY bucket_name DESC, op_id DESC
LIMIT 10000

Output:

Limit  (cost=0.42..27866.77 rows=10000 width=114) (actual time=144.147..146.449 rows=10000 loops=1)
  Buffers: shared hit=4729 read=50886
  ->  Index Scan Backward using unique_id on bucket_data  (cost=0.42..160948.06 rows=57757 width=114) (actual time=144.146..146.125 rows=10000 loops=1)
        Index Cond: ((group_id = 1) AND (bucket_name >= ''::text))
        Filter: (((bucket_name = 'global[]'::text) AND (op_id < 49976)) OR (bucket_name < 'global[]'::text COLLATE "C"))
        Rows Removed by Filter: 950000 <-- this is the problem, we are scanning more than we should
        Buffers: shared hit=4729 read=50886
Planning Time: 0.070 ms
Execution Time: 146.637 ms

PostgreSQL scans all rows matching the Index Cond and then applies the Filter, discarding most rows as we can see in the Rows Removed by Filter section.
Performance Impact

Solution

To optimize this query, I'm proposing to split the code into 3 specialized queries. Each query will handle a specific parameter case passed to compact command. Check the changes in the PostgresCompactor.ts file to see the changes. I've also created unit and integration tests to verify correctness.

Results

By running both the code currently in master branch and this branch I got these numbers locally when compacting a database with 1 million bucket records.

Branch Time Throughput
PR 8.67s 116,451 rec/s
main 15.61s 64,708 rec/s

@changeset-bot
Copy link

changeset-bot bot commented Dec 24, 2025

⚠️ No Changeset found

Latest commit: 033b1bc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@jaswdr jaswdr force-pushed the issue-400-impl branch 2 times, most recently from 4189ea7 to 266ff72 Compare December 27, 2025 11:35
}
}

const COMPACT_ROW_CODEC = pick(models.BucketData, [
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to global scope since it never changes.

const paramPart = bucket.slice(bracketIndex);

try {
const parsed = JSON.parse(paramPart);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using JSON.parse to avoid writing a custom parsing function, let me know what do you think.

@jaswdr jaswdr marked this pull request as ready for review December 28, 2025 18:27
@jaswdr jaswdr changed the title [WIP] Optimize Postgres compactor queries Optimize Postgres compactor queries Dec 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant