adapter/catalog: make CatalogState a persistent data structure#35079
adapter/catalog: make CatalogState a persistent data structure#35079aljoscha wants to merge 1 commit intoMaterializeInc:mainfrom
Conversation
Pre-merge checklist
|
|
Let's see if this passes CI. And I still have to verify that this actually improves DDL perf with large numbers of objects. |
1def8e5 to
2729942
Compare
| }, | ||
| SystemObjectId::System => { | ||
| let mut system_privileges = state.system_privileges.clone(); | ||
| let mut system_privileges = PrivilegeMap::clone(&state.system_privileges); |
There was a problem hiding this comment.
Any reason to change these lines?
| match diff { | ||
| StateDiff::Addition => self | ||
| .default_privileges | ||
| StateDiff::Addition => Arc::make_mut(&mut self.default_privileges) | ||
| .grant(default_privilege.object, default_privilege.acl_item), |
There was a problem hiding this comment.
Noticing grant and revoke actually mutate the internal privileges BTreeMap. Can there ever be more than one mutable reference to self.default_privileges at a time? And if so, wouldn't that cause the change to not be saved?
There was a problem hiding this comment.
Yeah, Arc::make_mut is a bit confusing. The reason it works is that make_mut requires a &mut, and there can only be one &mut so this actually will guarantee to change it in place. If anyone got a snapshot of the catalog using owned_catalog they will have a diverging view, but they cannot change the Catalog as stored in the Coordinator struct.
There was a problem hiding this comment.
Ah that makes sense! Thanks for the detailed explanation
In `transact_inner`, `CatalogState` is wrapped in `Cow::Borrowed` and cloned via `to_mut()` — twice (once for `preliminary_state`, once for `state`). This triggers a full deep clone of ~15 BTreeMap fields plus several complex wrapper types. This change makes `CatalogState::clone()` effectively O(1) by using structural sharing. Two complementary changes: 1. Replace all BTreeMap fields on CatalogState with `imbl::OrdMap` — clone is O(1) (refcount bump on shared tree root), mutations are O(log n) with path-copying. The `imbl` crate is already used in the workspace. 2. Wrap remaining expensive non-map fields (`system_configuration`, `default_privileges`, `system_privileges`, `comments`, `storage_metadata`) in `Arc` — clone is O(1), mutation via `Arc::make_mut()` which clones only when shared. Nested maps inside value types (e.g. `Database.schemas_by_id`) are NOT changed — with `imbl::OrdMap`, values are structurally shared at the tree-node level, so nested BTreeMaps are shared by reference until the containing node is path-copied during mutation. A `MutableMap` trait is introduced in apply.rs so that helper functions `apply_inverted_lookup` and `apply_with_update` work uniformly with both `BTreeMap` (nested inside value types like Database, Cluster) and `imbl::OrdMap` (CatalogState top-level fields).
e444cf3 to
6a9c39e
Compare
In
transact_inner,CatalogStateis wrapped inCow::Borrowedand cloned viato_mut()— twice (once forpreliminary_state, once forstate). This triggers a full deep clone of ~15 BTreeMap fields plus several complex wrapper types. This change makesCatalogState::clone()effectively O(1) by using structural sharing.Two complementary changes:
Replace all BTreeMap fields on CatalogState with
im::OrdMap— clone is O(1) (refcount bump on shared tree root), mutations are O(log n) with path-copying. Theimcrate is already used in the workspace.Wrap remaining expensive non-map fields (
system_configuration,default_privileges,system_privileges,comments,storage_metadata) inArc— clone is O(1), mutation viaArc::make_mut()which clones only when shared.Nested maps inside value types (e.g.
Database.schemas_by_id) are NOT changed — withim::OrdMap, values are structurally shared at the tree-node level, so nested BTreeMaps are shared by reference until the containing node is path-copied during mutation.A
MutableMaptrait is introduced in apply.rs so that helper functionsapply_inverted_lookupandapply_with_updatework uniformly with bothBTreeMap(nested inside value types like Database, Cluster) andim::OrdMap(CatalogState top-level fields).Experiments
Benchmarked DDL latency scaling before and after this change, measuring across
7 scale points from 0 to 50,000 user objects. Both binaries were optimized
builds run from a fresh
--resetstate. Tables were created incrementallyusing
psql -fbatch files. 10 samples per DDL type per scale point.Before: commit
df6bf0cf22(BTreeMap-based CatalogState)After: commit
2729942ac1(persistent CatalogState with imbl::OrdMap + Arc)Absolute latencies (median of 10 samples)
The 0-object "after" numbers are inflated by cold-start overhead (first few
DDLs after fresh startup). At 1,000+ objects the "after" binary is consistently
faster.
Improvement at scale
O(n) scaling factors (linear regression, R² > 0.95 for all)
The per-object cost is reduced by ~42–55% across all DDL types. Fixed overhead
(intercept) is essentially unchanged when correcting for the cold-start anomaly.
Test machine