Skip to content

adapter/catalog: make CatalogState a persistent data structure#35079

Open
aljoscha wants to merge 1 commit intoMaterializeInc:mainfrom
aljoscha:adapter-truly-cow-catalog
Open

adapter/catalog: make CatalogState a persistent data structure#35079
aljoscha wants to merge 1 commit intoMaterializeInc:mainfrom
aljoscha:adapter-truly-cow-catalog

Conversation

@aljoscha
Copy link
Contributor

@aljoscha aljoscha commented Feb 19, 2026

In transact_inner, CatalogState is wrapped in Cow::Borrowed and cloned via to_mut() — twice (once for preliminary_state, once for state). This triggers a full deep clone of ~15 BTreeMap fields plus several complex wrapper types. This change makes CatalogState::clone() effectively O(1) by using structural sharing.

Two complementary changes:

  1. Replace all BTreeMap fields on CatalogState with im::OrdMap — clone is O(1) (refcount bump on shared tree root), mutations are O(log n) with path-copying. The im crate is already used in the workspace.

  2. Wrap remaining expensive non-map fields (system_configuration, default_privileges, system_privileges, comments, storage_metadata) in Arc — clone is O(1), mutation via Arc::make_mut() which clones only when shared.

Nested maps inside value types (e.g. Database.schemas_by_id) are NOT changed — with im::OrdMap, values are structurally shared at the tree-node level, so nested BTreeMaps are shared by reference until the containing node is path-copied during mutation.

A MutableMap trait is introduced in apply.rs so that helper functions apply_inverted_lookup and apply_with_update work uniformly with both BTreeMap (nested inside value types like Database, Cluster) and im::OrdMap (CatalogState top-level fields).

Experiments

Benchmarked DDL latency scaling before and after this change, measuring across
7 scale points from 0 to 50,000 user objects. Both binaries were optimized
builds run from a fresh --reset state. Tables were created incrementally
using psql -f batch files. 10 samples per DDL type per scale point.

Before: commit df6bf0cf22 (BTreeMap-based CatalogState)
After: commit 2729942ac1 (persistent CatalogState with imbl::OrdMap + Arc)

Absolute latencies (median of 10 samples)

Objects CREATE TABLE CREATE VIEW DROP TABLE DROP VIEW
Before After Before After Before After Before After
0 86ms 125ms 60ms 79ms 58ms 74ms 52ms 64ms
1,000 99ms 90ms 69ms 61ms 66ms 58ms 55ms 51ms
5,000 138ms 111ms 103ms 80ms 99ms 74ms 86ms 64ms
10,000 176ms 132ms 144ms 107ms 130ms 93ms 122ms 82ms
20,000 253ms 187ms 229ms 163ms 194ms 132ms 184ms 119ms
30,000 334ms 231ms 312ms 204ms 260ms 160ms 252ms 150ms
50,000 564ms 355ms 516ms 329ms 442ms 252ms 469ms 244ms

The 0-object "after" numbers are inflated by cold-start overhead (first few
DDLs after fresh startup). At 1,000+ objects the "after" binary is consistently
faster.

Improvement at scale

Objects CREATE TABLE CREATE VIEW DROP TABLE DROP VIEW
1,000 -10% -12% -12% -7%
5,000 -20% -22% -25% -26%
10,000 -25% -25% -28% -33%
20,000 -26% -29% -32% -35%
30,000 -31% -35% -38% -40%
50,000 -37% -36% -43% -48%

O(n) scaling factors (linear regression, R² > 0.95 for all)

DDL Type Before (ms/1k objs) After (ms/1k objs) Reduction
CREATE TABLE 9.22 4.78 -48%
CREATE VIEW 8.99 5.08 -43%
DROP TABLE 7.46 3.58 -52%
DROP VIEW 8.08 3.61 -55%

The per-object cost is reduced by ~42–55% across all DDL types. Fixed overhead
(intercept) is essentially unchanged when correcting for the cold-start anomaly.

Test machine

  • Instance: AWS EC2 c8g.4xlarge (16 vCPU Graviton4 / Neoverse-V2, 30 GiB RAM)
  • OS: Ubuntu 24.04.3 LTS, kernel 6.14.0-1015-aws (aarch64)
  • Rust: rustc 1.93.1, cargo 1.93.1
  • Storage: 512 GB EBS (gp3)
  • CockroachDB: v23.1.11 in Docker (used as catalog backing store)

@github-actions
Copy link

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@aljoscha
Copy link
Contributor Author

Let's see if this passes CI. And I still have to verify that this actually improves DDL perf with large numbers of objects.

@aljoscha aljoscha force-pushed the adapter-truly-cow-catalog branch 3 times, most recently from 1def8e5 to 2729942 Compare February 19, 2026 12:15
@aljoscha aljoscha marked this pull request as ready for review February 19, 2026 15:20
@aljoscha aljoscha requested a review from a team as a code owner February 19, 2026 15:20
@aljoscha aljoscha requested a review from SangJunBak February 19, 2026 15:20
},
SystemObjectId::System => {
let mut system_privileges = state.system_privileges.clone();
let mut system_privileges = PrivilegeMap::clone(&state.system_privileges);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to change these lines?

Comment on lines -453 to 455
match diff {
StateDiff::Addition => self
.default_privileges
StateDiff::Addition => Arc::make_mut(&mut self.default_privileges)
.grant(default_privilege.object, default_privilege.acl_item),
Copy link
Contributor

@SangJunBak SangJunBak Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing grant and revoke actually mutate the internal privileges BTreeMap. Can there ever be more than one mutable reference to self.default_privileges at a time? And if so, wouldn't that cause the change to not be saved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Arc::make_mut is a bit confusing. The reason it works is that make_mut requires a &mut, and there can only be one &mut so this actually will guarantee to change it in place. If anyone got a snapshot of the catalog using owned_catalog they will have a diverging view, but they cannot change the Catalog as stored in the Coordinator struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense! Thanks for the detailed explanation

In `transact_inner`, `CatalogState` is wrapped in `Cow::Borrowed` and
cloned via `to_mut()` — twice (once for `preliminary_state`, once for
`state`). This triggers a full deep clone of ~15 BTreeMap fields plus
several complex wrapper types. This change makes `CatalogState::clone()`
effectively O(1) by using structural sharing.

Two complementary changes:

1. Replace all BTreeMap fields on CatalogState with `imbl::OrdMap` —
   clone is O(1) (refcount bump on shared tree root), mutations are
   O(log n) with path-copying. The `imbl` crate is already used in the
   workspace.

2. Wrap remaining expensive non-map fields (`system_configuration`,
   `default_privileges`, `system_privileges`, `comments`,
   `storage_metadata`) in `Arc` — clone is O(1), mutation via
   `Arc::make_mut()` which clones only when shared.

Nested maps inside value types (e.g. `Database.schemas_by_id`) are NOT
changed — with `imbl::OrdMap`, values are structurally shared at the
tree-node level, so nested BTreeMaps are shared by reference until the
containing node is path-copied during mutation.

A `MutableMap` trait is introduced in apply.rs so that helper functions
`apply_inverted_lookup` and `apply_with_update` work uniformly with both
`BTreeMap` (nested inside value types like Database, Cluster) and
`imbl::OrdMap` (CatalogState top-level fields).
@aljoscha aljoscha force-pushed the adapter-truly-cow-catalog branch from e444cf3 to 6a9c39e Compare February 25, 2026 10:09
@aljoscha aljoscha requested a review from SangJunBak February 25, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants