Skip to content

feat: postgres database resource#292

Open
jason-lynch wants to merge 1 commit intomainfrom
feat/PLAT-417/postgres-database-resource
Open

feat: postgres database resource#292
jason-lynch wants to merge 1 commit intomainfrom
feat/PLAT-417/postgres-database-resource

Conversation

@jason-lynch
Copy link
Member

Summary

Adds a new PostgresDatabaseResource that shifts the Postgres database creation logic to happen later in the database creation process. Now, the InstanceResource is only responsible for creating the database users, and the PostgresDatabase resource is responsible for creating the Postgres database, granting role privileges to the new database, and initializing the Spock node.

This change is necessary because we need to use pg_service.conf for SystemD support, and we need that file to exist when we create the Spock node. This change shifts the Spock node initialization until after all of the instances are created and all nodes are ready, so we're able to construct the DSNs for all nodes before we initialize Spock.

This change has an added benefit that it opens up the possibility of support for multiple Postgres databases per Control Plane database since we're able to make multiple databases/spock nodes per NodeResource/Patroni cluster. I've added the DatabaseName field to most, if not all, of the resources that would need to change to accommodate this. However, I did stop short of incorporating the database name into these resource's identities, which will be necessary to complete that multi-database support.

Testing

There should be no difference in user-facing functionality.

PLAT-417

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

The PR introduces a Postgres database resource and threads database identity (DatabaseName/DatabaseOwner) through node/spec/operations, refactors many resources and dependencies to be database-scoped, adjusts DSN/Spock/role logic, simplifies instance initialization, and updates golden tests and orchestrator APIs for database-aware operations.

Changes

Cohort / File(s) Summary
New Postgres resource
server/internal/database/postgres_database.go, server/internal/database/resources.go
Adds exported PostgresDatabaseResource and registers it; implements Refresh/Create/Update/Delete and helpers for DB creation, rename, Spock init/restore, and privilege updates.
Database-scoped dependency refactor
server/internal/database/*_resource.go (e.g., replication_slot_resource.go, subscription_resource.go, sync_event_resource.go, wait_for_sync_event_resource.go, replication_slot_create_resource.go, replication_slot_advance_from_cts_resource.go, lag_tracker_commit_ts_resource.go)
Adds DatabaseName fields to many resources and changes Dependencies() to reference PostgresDatabaseResourceIdentifier(..., DatabaseName) instead of node-only identifiers; updates providerDSN to use orchestrator NodeDSN.
Node/Instance helpers & DSN changes
server/internal/database/node_resource.go, server/internal/database/instance_resource.go, server/internal/postgres/dsn.go
Adds NodeResource.DSN and PrimaryInstance methods; simplifies instance init (removes Spock/TLS-heavy flows); extends DSN with Service and TargetSessionAttrs and new Fields()/Host()/Port() helpers.
Orchestrator API & InstanceResources updates
server/internal/database/orchestrator.go, server/internal/orchestrator/swarm/orchestrator.go
Adds DatabaseDependencies to InstanceResources, new constructor overload, RestoreConfig() and NodeDSN(...) orchestrator method; updates orchestrator implementation to resolve NodeDSN.
Operations refactor (node/db state)
server/internal/database/operations/*.go (notably common.go, add_nodes.go, update_nodes.go, end.go, populate_nodes.go, restore_database.go, restore_database_test.go)
Introduces nodeResourceState() and databaseResourceState() helpers; replaces per-instance monitor insertion with aggregated node/database states; threads DatabaseName and DatabaseOwner into NodeResources and operation flows; updates signatures (addSyncResources gains dbName).
Populate / peer / sync resources
server/internal/database/operations/populate_nodes.go, ..._test.go
Propagates DatabaseName into replication/sync-related resources (ReplicationSlot, Subscription, SyncEvent, WaitForSyncEvent, LagTrackerCommitTimestamp, ReplicationSlotAdvanceFromCTS) and test expectations.
Golden tests and update tests
server/internal/database/operations/golden_test/Test*/*.json, server/internal/database/operations/*_test.go, update_database_test.go, update_nodes_test.go
Massive updates to golden test expectations: add/move database.postgres_database::* entries, reorder create/update/delete batches, replace many monitor.instance sequences with database-scoped operations, and update numerous tests to include DatabaseName and DatabaseOwner.
Postgres utilities and roles
server/internal/postgres/create_db.go, server/internal/postgres/errors.go, server/internal/postgres/roles.go, server/internal/postgres/roles_test.go
Adds IsInRecovery, NodeNeedsCreate, StartRepairModeTxn, replaces InitializePgEdgeExtensions with InitializeSpockNode; removes DBName/DBOwner from role options, introduces AlterOwner and GrantBuiltinRolePrivileges helpers, updates tests.
Monitor & service updates
server/internal/monitor/instance_monitor_resource.go, server/internal/orchestrator/swarm/service_user_role.go
Adds NodeName to InstanceMonitorResource, includes PostgresDatabase dependency; service user role drops DBName/DBOwner options and declares PostgresDatabase type dependency.
Resource context & workflows
server/internal/resource/resource.go, server/internal/resource/event_test.go, server/internal/workflows/* (apply_event.go, executor.go, common.go, plan_restore.go)
Adds HostID onto resource.Context and propagates HostID in workflows and executor; NodeResources returned by workflow helpers now include DatabaseOwner/DatabaseName.
Validation
server/internal/api/apiv1/validate.go, validate_test.go
Adds validateUsers to ensure unique usernames per DB and at most one db_owner; adds test case for invalid users.
Misc & cleanup
various files (imports, small signature updates, tests helpers)
Small signature changes (NewInstanceResources call sites), removed TLS/cert retrieval paths in instance init, updated AdminDSN to include ApplicationName, added resource helper ToResourceDataSlice.

Poem

🐇 I hop through schemas, carrots in my paw,

Databases named, dependencies law.
Nodes and replicas now know where they stand,
Spock and DSNs in tidy, ordered band.
A rabbit cheers builds with gentle clap—hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: postgres database resource' clearly describes the main change—introduction of a new PostgresDatabaseResource—following Conventional Commits style and matching the template requirements.
Description check ✅ Passed The description covers Summary (purpose and scope shift), Changes (high-level overview of what changed), and Testing (no user-facing changes expected). Issue link (PLAT-417) is provided. Missing explicit Testing commands and Checklist completion, but core required sections are present and substantive.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/PLAT-417/postgres-database-resource
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from a4bfd8b to 0af2711 Compare March 9, 2026 22:37
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch 2 times, most recently from 887b1a8 to daeb238 Compare March 10, 2026 13:10
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 0af2711 to 9df990e Compare March 10, 2026 13:10
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch 2 times, most recently from 20b19ea to b768409 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from ae1a213 to dc3adc0 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from b768409 to de8a54c Compare March 10, 2026 14:57
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from dc3adc0 to 31a709a Compare March 10, 2026 14:57
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from de8a54c to e753e90 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 31a709a to f647f73 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from e753e90 to e47f053 Compare March 10, 2026 21:01
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from f647f73 to 29af08f Compare March 10, 2026 21:01
@jason-lynch
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
server/internal/postgres/roles.go (1)

98-111: ⚠️ Potential issue | 🔴 Critical

Copy-paste bug: condition checks wrong role name.

The conditional statement at line 102 checks for 'pgedge_application' but creates 'pgedge_application_read_only' at line 105. This is inconsistent and will fail to create the read-only role if pgedge_application already exists.

🐛 Proposed fix
 func CreateApplicationReadOnlyRole(opts BuiltinRoleOptions) Statements {
 	statements := Statements{
 		ConditionalStatement{
 			If: Query[bool]{
-				SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application');",
+				SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application_read_only');",
 			},
 			Then: Statement{
 				SQL: "CREATE ROLE pgedge_application_read_only WITH NOLOGIN;",
 			},
 		},
 	}
 
 	return statements
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/roles.go` around lines 98 - 111, The condition in
CreateApplicationReadOnlyRole uses the wrong role name; update the Query inside
the ConditionalStatement so it checks for NOT EXISTS (... WHERE rolname =
'pgedge_application_read_only') to match the role created by the Statement
(pgedge_application_read_only), ensuring the conditional creates the read-only
role when it doesn't already exist.
server/internal/database/sync_event_resource.go (1)

23-27: ⚠️ Potential issue | 🟠 Major

Wire DatabaseName into identity and connection selection.

Right now this new field is only stored. Identifier() still keys the resource by providerNode + subscriberNode, and Refresh() still connects with provider.Spec.DatabaseName. That means database-scoped sync events will collide for the same node pair and can still run against the wrong database.

Suggested fix
-func SyncEventResourceIdentifier(providerNode, subscriberNode string) resource.Identifier {
+func SyncEventResourceIdentifier(providerNode, subscriberNode, databaseName string) resource.Identifier {
 	return resource.Identifier{
-		ID:   providerNode + subscriberNode,
+		ID:   providerNode + subscriberNode + "-" + databaseName,
 		Type: ResourceTypeSyncEvent,
 	}
 }
@@
 func (r *SyncEventResource) Identifier() resource.Identifier {
-	return SyncEventResourceIdentifier(r.ProviderNode, r.SubscriberNode)
+	return SyncEventResourceIdentifier(r.ProviderNode, r.SubscriberNode, r.DatabaseName)
 }
@@
-	providerConn, err := provider.Connection(ctx, rc, provider.Spec.DatabaseName)
+	providerConn, err := provider.Connection(ctx, rc, r.DatabaseName)
 	if err != nil {
-		return fmt.Errorf("failed to connect to provider database %q: %w", provider.Spec.DatabaseName, err)
+		return fmt.Errorf("failed to connect to provider database %q: %w", r.DatabaseName, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/sync_event_resource.go` around lines 23 - 27, The
resource's identity and connection selection must include the new DatabaseName
field: update Identifier() to incorporate DatabaseName (e.g., include
r.DatabaseName in the composite key returned so keys are unique per
providerNode+subscriberNode+database), and update Refresh() to use the
resource's DatabaseName when selecting/creating the provider connection instead
of using provider.Spec.DatabaseName (ensure any connection lookup or call uses
r.DatabaseName). Also update any equality/hash logic that builds the identity
string in Identifier() to reflect the added field so database-scoped sync events
no longer collide.
server/internal/database/wait_for_sync_event_resource.go (1)

19-24: ⚠️ Potential issue | 🟠 Major

DatabaseName still isn’t part of this resource’s identity or execution path.

The new field is stored on the struct, but the identifier, dependency lookup, and connection target still key off only the node pair. Two databases replicated between the same provider/subscriber nodes will collapse into one WaitForSyncEventResource, and Refresh() can wait on the wrong database by using subscriber.Spec.DatabaseName instead of r.DatabaseName.

Also applies to: 44-52, 59-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/wait_for_sync_event_resource.go` around lines 19 -
24, The resource identity and execution must include DatabaseName: change
WaitForSyncEventResourceIdentifier to incorporate the database name into the ID
(e.g., providerNode + subscriberNode + ":" + databaseName) and update any places
that build or compare resource identifiers (including the other functions
mentioned around the 44-52 and 59-68 diffs) to use this new identifier format;
in Refresh() and any dependency lookup or connection logic replace uses of
subscriber.Spec.DatabaseName with r.DatabaseName so the wait targets the
resource's own DatabaseName rather than the subscriber spec.
server/internal/workflows/plan_restore.go (1)

92-103: ⚠️ Potential issue | 🟠 Major

Pass node-level restore config to the activity, not the workflow-level config.

The node parameter already carries node.RestoreConfig, but line 102 forwards the workflow-level restoreConfig to GetRestoreResourcesInput. Since the activity uses input.RestoreConfig directly (line 57 of get_restore_resources.go), node-level restore overrides are lost. Change RestoreConfig: restoreConfig, to RestoreConfig: node.RestoreConfig,.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/workflows/plan_restore.go` around lines 92 - 103, The
workflow currently passes the workflow-level restoreConfig into the activity,
causing node-level overrides to be ignored; in the loop that constructs the
GetRestoreResourcesInput (when creating in :=
&activities.GetRestoreResourcesInput{...}) replace RestoreConfig: restoreConfig
with RestoreConfig: node.RestoreConfig so the activity (used by
GetRestoreResources) receives the node-specific restore settings; locate this in
the block that builds nodeRestore (NodeRestoreResources) and the inner loop over
node.Instances where in is created using taskID and instance.
server/internal/orchestrator/swarm/service_user_role.go (1)

118-155: ⚠️ Potential issue | 🔴 Critical

Execute schema and table grants against r.DatabaseName, not the postgres database.

connectToPrimary() connects to the postgres database (line 257), so lines 147–149 grant USAGE and SELECT on the public schema within postgres, not within r.DatabaseName. This leaves the service role with CONNECT permission on r.DatabaseName but no ability to access tables there. Open a second connection targeting r.DatabaseName for the schema/table grant statements.

Additionally, line 149's ALTER DEFAULT PRIVILEGES lacks FOR ROLE, so it only affects future objects created by the executing admin role. If the database owner creates tables, the default privileges will not apply to them. Add FOR ROLE <owner> if future owner-created tables should inherit these privileges.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/swarm/service_user_role.go` around lines 118 -
155, The schema/table GRANTs are being executed against the connection returned
by connectToPrimary (which targets the postgres DB) so USAGE/SELECT/ALTER
DEFAULT PRIVILEGES run on postgres instead of r.DatabaseName; open a second
connection targeting r.DatabaseName (e.g., add a new conn2 via a
connect-to-database call or extend connectToPrimary to accept a database name)
and execute the grants variable (the statements that currently build GRANT
USAGE/SELECT and ALTER DEFAULT PRIVILEGES) against that connection, ensuring you
Close() it; also change the ALTER DEFAULT PRIVILEGES statement to include FOR
ROLE <owner> (use the actual database owner variable or query the DB owner) so
it affects objects created by the database owner rather than only the admin
executing the command.
server/internal/database/operations/update_nodes_test.go (1)

153-153: ⚠️ Potential issue | 🟡 Minor

Bug: t.Name() should be tc.name for sub-test naming.

Using t.Name() returns the parent test function name (TestUpdateNode), not the individual test case name. This means all sub-tests will have the same name, making test output confusing and potentially causing issues with test filtering/reporting.

Note: Line 377 in TestRollingUpdateNodes has the same issue, while line 593 in TestConcurrentUpdateNodes correctly uses tc.name.

Proposed fix
-		t.Run(t.Name(), func(t *testing.T) {
+		t.Run(tc.name, func(t *testing.T) {

And similarly at line 377:

-		t.Run(t.Name(), func(t *testing.T) {
+		t.Run(tc.name, func(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/operations/update_nodes_test.go` at line 153,
Sub-tests inside TestUpdateNode (and the similar case in TestRollingUpdateNodes)
are using t.Run(t.Name(), ...) which yields the parent test name for every case;
change those calls to use the table case identifier tc.name (i.e., replace
t.Run(t.Name(), func... ) with t.Run(tc.name, func... ) ) so each sub-test is
uniquely named — verify TestConcurrentUpdateNodes already uses tc.name as
reference.
server/internal/database/operations/populate_nodes_test.go (1)

166-176: ⚠️ Potential issue | 🟡 Minor

Bug: Test uses t.Name() instead of tc.name for subtest naming.

Using t.Name() returns the parent test's name, causing all subtests to have the same name. This makes it harder to identify which test case failed and may cause test filtering issues.

🐛 Proposed fix
-		t.Run(t.Name(), func(t *testing.T) {
+		t.Run(tc.name, func(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/operations/populate_nodes_test.go` around lines 166
- 176, The subtests in populate_nodes_test.go call t.Run(t.Name(), ...) so every
case gets the parent test name; change the subtest name to use the test-case
identifier (tc.name) in the t.Run call so each subtest is uniquely named, e.g.
t.Run(tc.name, func(t *testing.T) { ... }), ensuring you reference the
table-driven test variable tc and its name field around the call to
operations.PopulateNode.
🧹 Nitpick comments (5)
docker/control-plane-dev/config.json (1)

4-8: Consider if "error" level is appropriate for a dev environment.

Setting the api_server log level to "error" will suppress warnings, info, and debug messages. Since this is a development configuration, you may want to use a more verbose level (e.g., "debug" or "info") to aid in troubleshooting and development.

If this is intentional to reduce log noise, consider documenting the rationale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/control-plane-dev/config.json` around lines 4 - 8, The current dev
config sets the api_server log level to "error" which will hide
warnings/info/debug; update the "logging.component_levels.api_server" value to a
more verbose level (e.g., "info" or "debug") in the config and, if keeping
"error" intentionally, add a brief comment or documentation note explaining the
rationale; locate the "logging" -> "component_levels" -> "api_server" key in the
JSON and change the value or add accompanying documentation where config is
defined.
server/internal/postgres/roles.go (1)

152-162: Consider extracting shared schema logic.

BuiltinRolePrivilegeOptions.Schemas() (lines 157-162) is identical to BuiltinRoleOptions.Schemas() (lines 66-71). Consider extracting an embedded struct or a shared helper to reduce duplication.

♻️ Possible refactor using embedded struct
+type SchemaConfig struct {
+	ExtraSchemas []string
+}
+
+func (s SchemaConfig) Schemas() []string {
+	var schemas []string
+	schemas = append(schemas, defaultSchemas...)
+	schemas = append(schemas, s.ExtraSchemas...)
+	return schemas
+}
+
 type BuiltinRoleOptions struct {
-	PGVersion    string
-	ExtraSchemas []string
-}
-
-func (o BuiltinRoleOptions) Schemas() []string {
-	var schemas []string
-	schemas = append(schemas, defaultSchemas...)
-	schemas = append(schemas, o.ExtraSchemas...)
-	return schemas
+	PGVersion string
+	SchemaConfig
 }
 
 type BuiltinRolePrivilegeOptions struct {
-	DBName       string
-	ExtraSchemas []string
-}
-
-func (o BuiltinRolePrivilegeOptions) Schemas() []string {
-	var schemas []string
-	schemas = append(schemas, defaultSchemas...)
-	schemas = append(schemas, o.ExtraSchemas...)
-	return schemas
+	DBName string
+	SchemaConfig
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/roles.go` around lines 152 - 162,
BuiltinRolePrivilegeOptions.Schemas() duplicates BuiltinRoleOptions.Schemas() —
extract the shared logic into a single helper or embedded type and have both
structs reuse it; for example create a small SchemaOptions struct (holding
ExtraSchemas) with a Schemas() method that appends defaultSchemas and
ExtraSchemas, or add a package-level function SchemasFor(extra []string)
[]string that returns append(defaultSchemas, extra...), then replace the
duplicate implementations in BuiltinRolePrivilegeOptions and BuiltinRoleOptions
to call the shared Schemas implementation and remove the duplicated method
bodies.
server/internal/database/instance_resource.go (1)

137-143: Consider updating error message for consistency.

Similar to the initializeInstance method, the Connection method connects to dbName (line 138) but the error message (line 142) references r.Spec.DatabaseName, which may not match the actual database being connected to.

Proposed fix
 	conn, err := ConnectToInstance(ctx, &ConnectionOptions{
 		DSN: r.ConnectionInfo.AdminDSN(dbName),
 		TLS: tlsCfg,
 	})
 	if err != nil {
-		return nil, fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err)
+		return nil, fmt.Errorf("failed to connect to database %q: %w", dbName, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/instance_resource.go` around lines 137 - 143, The
error message in the Connection method uses r.Spec.DatabaseName while the
attempted connection is to the local variable dbName; update the fmt.Errorf call
to reference dbName (the actual database passed to ConnectToInstance) for
consistency and accurate diagnostics—locate the ConnectToInstance call and the
fmt.Errorf that returns "failed to connect to database" and replace the
r.Spec.DatabaseName reference with dbName.
server/internal/database/operations/helpers_test.go (1)

69-82: Add one fixture that exercises non-nil databaseDependencies.

This helper always passes nil for the new constructor argument, so the planner tests in this file still never exercise the extra dependency layer introduced by NewInstanceResources. A regression that drops PostgresDatabase ordering would still pass here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/operations/helpers_test.go` around lines 69 - 82,
The test helper always calls NewInstanceResources with a nil third argument, so
it never exercises non-nil databaseDependencies; update the fixture(s) that call
NewInstanceResources in helpers_test.go to include at least one case where the
third parameter is a non-nil dependencies slice (e.g., a realistic
databaseDependencies value or a minimal list containing a PostgresDatabase
dependency) so tests hit the extra dependency layer introduced by
NewInstanceResources (locate calls to NewInstanceResources in this file and add
a non-nil dependencies argument to one fixture).
server/internal/database/replication_slot_advance_from_cts_resource.go (1)

18-24: Identifier does not include DatabaseName - potential collision with multi-database support.

The ReplicationSlotAdvanceFromCTSResourceIdentifier only uses providerNode + subscriberNode, but the resource now has a DatabaseName field. If multiple databases are ever created per node (as mentioned in the PR objectives for future multi-database support), resources for different databases would have colliding identifiers.

The PR summary mentions "database name was not incorporated into resource identities yet," so this may be intentional for now, but it's worth tracking for when multi-database support is fully enabled.

Future consideration: Include DatabaseName in identifier
-func ReplicationSlotAdvanceFromCTSResourceIdentifier(providerNode, subscriberNode string) resource.Identifier {
+func ReplicationSlotAdvanceFromCTSResourceIdentifier(databaseName, providerNode, subscriberNode string) resource.Identifier {
 	return resource.Identifier{
 		Type: ResourceTypeReplicationSlotAdvanceFromCTS,
-		ID:   providerNode + subscriberNode,
+		ID:   databaseName + "-" + providerNode + "-" + subscriberNode,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/replication_slot_advance_from_cts_resource.go`
around lines 18 - 24, The identifier function
ReplicationSlotAdvanceFromCTSResourceIdentifier currently builds IDs from
providerNode + subscriberNode which can collide across databases; update this
function to incorporate the resource's DatabaseName (e.g., concatenate
providerNode, subscriberNode and DatabaseName with a stable separator) so IDs
are unique per-database; ensure you use the
ResourceTypeReplicationSlotAdvanceFromCTS as before and pick a separator that
cannot appear in node or db names (or escape values) so existing consumers of
resource.Identifier remain correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/database/instance_resource.go`:
- Around line 183-189: The error message incorrectly references
r.Spec.DatabaseName even though ConnectToInstance is connecting to the
"postgres" admin DB via r.ConnectionInfo.AdminDSN("postgres"); update the error
to reference the actual target (e.g., "postgres" or derive the admin DB name
from r.ConnectionInfo.AdminDSN call) so the fmt.Errorf in the ConnectToInstance
error branch accurately names the database being connected to and not
r.Spec.DatabaseName.

In `@server/internal/database/lag_tracker_commit_ts_resource.go`:
- Line 29: The resource's DatabaseName field is not being used for identity or
connection, causing different databases on the same node pair to collide; update
the Identifier() method to include DatabaseName in the key (alongside
origin/receiver), update the resource constructor/path so DatabaseName is
threaded into the created lag-tracker resource (replace any use of
instance.Spec.DatabaseName in constructors with the resource's DatabaseName),
and modify Refresh() (and any place that dials/connects, e.g., the code
currently using instance.Spec.DatabaseName) to use r.DatabaseName as the
connection target; ensure the same change is applied where the
constructor/initialization happens (lines around the existing ctor usage noted
at 60-61).

In `@server/internal/database/operations/common.go`:
- Around line 83-86: The error message is misleading: when state.AddResource(db)
fails we're adding a PostgresDatabaseResource (db), so update the fmt.Errorf
message in the failure branch to explicitly reference the
PostgresDatabaseResource or database resource (e.g., "failed to add
PostgresDatabaseResource to state" or "failed to add database resource to
state") so logs correctly reflect the operation performed (locate the call to
state.AddResource(db) and change the error text returned alongside the wrapped
err).

In `@server/internal/database/replication_slot_resource.go`:
- Line 29: The DatabaseName field was added but not threaded into resource
identity and deletion: update the Identifier() method to include the
DatabaseName (so the resource key includes provider, subscriber and
DatabaseName) and update Delete() to use the resource's DatabaseName (not
provider.Spec.DatabaseName) when looking up and deleting the replication
slot/state; ensure any state keys, comparisons and List/Delete calls in
Identifier(), Delete(), and any helper that builds the replication-slot key use
the DatabaseName field so multiple DBs on the same provider/subscriber are kept
distinct.

In `@server/internal/database/spec.go`:
- Around line 587-593: NodeInstances() currently picks the first user with
DBOwner==true from DatabaseUsers (setting DatabaseOwner) without validation,
which makes ownership order-dependent or empty; add validation in
validateDatabaseSpec() (or equivalent service-layer validation) to assert
exactly one DatabaseUsers entry has DBOwner==true and return an error if count
!= 1 so NodeInstances() can assume a single owner; update any callers/tests to
handle the validation error and reference DatabaseUsers, DBOwner, DatabaseOwner,
NodeInstances(), and validateDatabaseSpec() when implementing the check.

In `@server/internal/postgres/dsn.go`:
- Around line 35-66: The iteration over d.Extra in DSN.Fields() makes
DSN.String() nondeterministic; before the for loop that calls addField for each
entry in d.Extra, collect the map keys, sort them (e.g., using sort.Strings),
and then iterate the sorted keys to append fields in a stable order; update
references in DSN.Fields() (and any other place that serializes d.Extra) so
addField is invoked in key-sorted order to ensure deterministic DSN.String()
output.

---

Outside diff comments:
In `@server/internal/database/operations/populate_nodes_test.go`:
- Around line 166-176: The subtests in populate_nodes_test.go call
t.Run(t.Name(), ...) so every case gets the parent test name; change the subtest
name to use the test-case identifier (tc.name) in the t.Run call so each subtest
is uniquely named, e.g. t.Run(tc.name, func(t *testing.T) { ... }), ensuring you
reference the table-driven test variable tc and its name field around the call
to operations.PopulateNode.

In `@server/internal/database/operations/update_nodes_test.go`:
- Line 153: Sub-tests inside TestUpdateNode (and the similar case in
TestRollingUpdateNodes) are using t.Run(t.Name(), ...) which yields the parent
test name for every case; change those calls to use the table case identifier
tc.name (i.e., replace t.Run(t.Name(), func... ) with t.Run(tc.name, func... ) )
so each sub-test is uniquely named — verify TestConcurrentUpdateNodes already
uses tc.name as reference.

In `@server/internal/database/sync_event_resource.go`:
- Around line 23-27: The resource's identity and connection selection must
include the new DatabaseName field: update Identifier() to incorporate
DatabaseName (e.g., include r.DatabaseName in the composite key returned so keys
are unique per providerNode+subscriberNode+database), and update Refresh() to
use the resource's DatabaseName when selecting/creating the provider connection
instead of using provider.Spec.DatabaseName (ensure any connection lookup or
call uses r.DatabaseName). Also update any equality/hash logic that builds the
identity string in Identifier() to reflect the added field so database-scoped
sync events no longer collide.

In `@server/internal/database/wait_for_sync_event_resource.go`:
- Around line 19-24: The resource identity and execution must include
DatabaseName: change WaitForSyncEventResourceIdentifier to incorporate the
database name into the ID (e.g., providerNode + subscriberNode + ":" +
databaseName) and update any places that build or compare resource identifiers
(including the other functions mentioned around the 44-52 and 59-68 diffs) to
use this new identifier format; in Refresh() and any dependency lookup or
connection logic replace uses of subscriber.Spec.DatabaseName with
r.DatabaseName so the wait targets the resource's own DatabaseName rather than
the subscriber spec.

In `@server/internal/orchestrator/swarm/service_user_role.go`:
- Around line 118-155: The schema/table GRANTs are being executed against the
connection returned by connectToPrimary (which targets the postgres DB) so
USAGE/SELECT/ALTER DEFAULT PRIVILEGES run on postgres instead of r.DatabaseName;
open a second connection targeting r.DatabaseName (e.g., add a new conn2 via a
connect-to-database call or extend connectToPrimary to accept a database name)
and execute the grants variable (the statements that currently build GRANT
USAGE/SELECT and ALTER DEFAULT PRIVILEGES) against that connection, ensuring you
Close() it; also change the ALTER DEFAULT PRIVILEGES statement to include FOR
ROLE <owner> (use the actual database owner variable or query the DB owner) so
it affects objects created by the database owner rather than only the admin
executing the command.

In `@server/internal/postgres/roles.go`:
- Around line 98-111: The condition in CreateApplicationReadOnlyRole uses the
wrong role name; update the Query inside the ConditionalStatement so it checks
for NOT EXISTS (... WHERE rolname = 'pgedge_application_read_only') to match the
role created by the Statement (pgedge_application_read_only), ensuring the
conditional creates the read-only role when it doesn't already exist.

In `@server/internal/workflows/plan_restore.go`:
- Around line 92-103: The workflow currently passes the workflow-level
restoreConfig into the activity, causing node-level overrides to be ignored; in
the loop that constructs the GetRestoreResourcesInput (when creating in :=
&activities.GetRestoreResourcesInput{...}) replace RestoreConfig: restoreConfig
with RestoreConfig: node.RestoreConfig so the activity (used by
GetRestoreResources) receives the node-specific restore settings; locate this in
the block that builds nodeRestore (NodeRestoreResources) and the inner loop over
node.Instances where in is created using taskID and instance.

---

Nitpick comments:
In `@docker/control-plane-dev/config.json`:
- Around line 4-8: The current dev config sets the api_server log level to
"error" which will hide warnings/info/debug; update the
"logging.component_levels.api_server" value to a more verbose level (e.g.,
"info" or "debug") in the config and, if keeping "error" intentionally, add a
brief comment or documentation note explaining the rationale; locate the
"logging" -> "component_levels" -> "api_server" key in the JSON and change the
value or add accompanying documentation where config is defined.

In `@server/internal/database/instance_resource.go`:
- Around line 137-143: The error message in the Connection method uses
r.Spec.DatabaseName while the attempted connection is to the local variable
dbName; update the fmt.Errorf call to reference dbName (the actual database
passed to ConnectToInstance) for consistency and accurate diagnostics—locate the
ConnectToInstance call and the fmt.Errorf that returns "failed to connect to
database" and replace the r.Spec.DatabaseName reference with dbName.

In `@server/internal/database/operations/helpers_test.go`:
- Around line 69-82: The test helper always calls NewInstanceResources with a
nil third argument, so it never exercises non-nil databaseDependencies; update
the fixture(s) that call NewInstanceResources in helpers_test.go to include at
least one case where the third parameter is a non-nil dependencies slice (e.g.,
a realistic databaseDependencies value or a minimal list containing a
PostgresDatabase dependency) so tests hit the extra dependency layer introduced
by NewInstanceResources (locate calls to NewInstanceResources in this file and
add a non-nil dependencies argument to one fixture).

In `@server/internal/database/replication_slot_advance_from_cts_resource.go`:
- Around line 18-24: The identifier function
ReplicationSlotAdvanceFromCTSResourceIdentifier currently builds IDs from
providerNode + subscriberNode which can collide across databases; update this
function to incorporate the resource's DatabaseName (e.g., concatenate
providerNode, subscriberNode and DatabaseName with a stable separator) so IDs
are unique per-database; ensure you use the
ResourceTypeReplicationSlotAdvanceFromCTS as before and pick a separator that
cannot appear in node or db names (or escape values) so existing consumers of
resource.Identifier remain correct.

In `@server/internal/postgres/roles.go`:
- Around line 152-162: BuiltinRolePrivilegeOptions.Schemas() duplicates
BuiltinRoleOptions.Schemas() — extract the shared logic into a single helper or
embedded type and have both structs reuse it; for example create a small
SchemaOptions struct (holding ExtraSchemas) with a Schemas() method that appends
defaultSchemas and ExtraSchemas, or add a package-level function
SchemasFor(extra []string) []string that returns append(defaultSchemas,
extra...), then replace the duplicate implementations in
BuiltinRolePrivilegeOptions and BuiltinRoleOptions to call the shared Schemas
implementation and remove the duplicated method bodies.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a84ae357-8fd4-49e5-877f-563e790fe1cc

📥 Commits

Reviewing files that changed from the base of the PR and between 29af08f and e47f053.

📒 Files selected for processing (64)
  • docker/control-plane-dev/config.json
  • server/internal/database/instance_resource.go
  • server/internal/database/lag_tracker_commit_ts_resource.go
  • server/internal/database/node_resource.go
  • server/internal/database/operations/add_nodes.go
  • server/internal/database/operations/add_nodes_test.go
  • server/internal/database/operations/common.go
  • server/internal/database/operations/end.go
  • server/internal/database/operations/golden_test/TestRestoreDatabase/restore_all_nodes_in_two-node_db.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db_with_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency_with_forced_update.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_update_and_remove_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_a_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_concurrent.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_rolling.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/delete_two_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/forced_update.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/multiple_nodes_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_default_source_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate_and_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_one_node_from_three_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_two_nodes_from_three_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_with_service_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/two_nodes_to_three_nodes_with_populate.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/update_database_node_with_unchanged_service.json
  • server/internal/database/operations/helpers_test.go
  • server/internal/database/operations/populate_nodes.go
  • server/internal/database/operations/populate_nodes_test.go
  • server/internal/database/operations/restore_database.go
  • server/internal/database/operations/restore_database_test.go
  • server/internal/database/operations/update_database_test.go
  • server/internal/database/operations/update_nodes.go
  • server/internal/database/operations/update_nodes_test.go
  • server/internal/database/orchestrator.go
  • server/internal/database/postgres_database.go
  • server/internal/database/replication_slot_advance_from_cts_resource.go
  • server/internal/database/replication_slot_create_resource.go
  • server/internal/database/replication_slot_resource.go
  • server/internal/database/resources.go
  • server/internal/database/spec.go
  • server/internal/database/subscription_resource.go
  • server/internal/database/switchover_resource.go
  • server/internal/database/sync_event_resource.go
  • server/internal/database/wait_for_sync_event_resource.go
  • server/internal/monitor/instance_monitor_resource.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/swarm/service_user_role.go
  • server/internal/postgres/create_db.go
  • server/internal/postgres/dsn.go
  • server/internal/postgres/errors.go
  • server/internal/postgres/roles.go
  • server/internal/postgres/roles_test.go
  • server/internal/resource/event_test.go
  • server/internal/resource/resource.go
  • server/internal/workflows/activities/apply_event.go
  • server/internal/workflows/activities/executor.go
  • server/internal/workflows/common.go
  • server/internal/workflows/plan_restore.go
💤 Files with no reviewable changes (1)
  • server/internal/postgres/roles_test.go

Comment on lines +83 to +86
err := state.AddResource(db)
if err != nil {
return nil, fmt.Errorf("failed to add node resources to state: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading error message.

The error message says "failed to add node resources" but the code is adding a PostgresDatabaseResource. This could confuse debugging efforts.

📝 Proposed fix
 	err := state.AddResource(db)
 	if err != nil {
-		return nil, fmt.Errorf("failed to add node resources to state: %w", err)
+		return nil, fmt.Errorf("failed to add database resource to state: %w", err)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err := state.AddResource(db)
if err != nil {
return nil, fmt.Errorf("failed to add node resources to state: %w", err)
}
err := state.AddResource(db)
if err != nil {
return nil, fmt.Errorf("failed to add database resource to state: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/operations/common.go` around lines 83 - 86, The
error message is misleading: when state.AddResource(db) fails we're adding a
PostgresDatabaseResource (db), so update the fmt.Errorf message in the failure
branch to explicitly reference the PostgresDatabaseResource or database resource
(e.g., "failed to add PostgresDatabaseResource to state" or "failed to add
database resource to state") so logs correctly reflect the operation performed
(locate the call to state.AddResource(db) and change the error text returned
alongside the wrapped err).

Comment on lines +587 to +593
var owner string
for _, user := range s.DatabaseUsers {
if user.DBOwner {
owner = user.Username
break
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for validation that enforces exactly one DBOwner before NodeInstances runs.
rg -n -C3 --type=go '\bDBOwner\b|db_owner|DatabaseOwner' server/internal/database server/internal/workflows

Repository: pgEdge/control-plane

Length of output: 9257


🏁 Script executed:

#!/bin/bash
# Search for validation functions that check DatabaseUsers
rg -n -C5 'DatabaseUsers' server/internal/database/spec.go | head -60

Repository: pgEdge/control-plane

Length of output: 2309


🏁 Script executed:

#!/bin/bash
# Search for any validation or error checking related to DatabaseUsers
rg -n -B3 -A3 'len.*DatabaseUsers|DatabaseUsers.*len' server/internal/database

Repository: pgEdge/control-plane

Length of output: 925


🏁 Script executed:

#!/bin/bash
# Search for where NodeInstances is called from
rg -n 'NodeInstances\(\)' server/

Repository: pgEdge/control-plane

Length of output: 696


🏁 Script executed:

#!/bin/bash
# Search for validation/schema validation functions in spec.go
rg -n 'Validate|validate|Error|error' server/internal/database/spec.go | head -40

Repository: pgEdge/control-plane

Length of output: 851


🏁 Script executed:

#!/bin/bash
# Search for where Spec is created/validated and check for DBOwner validation
rg -n 'DBOwner|db_owner' server/internal/workflows --type=go -C2

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Look at the actual NodeInstances implementation to see error handling
sed -n '581,660p' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 2710


🏁 Script executed:

#!/bin/bash
# Search for any validation that happens at spec load/create time
rg -n 'func.*Validate|NewSpec|UnmarshalJSON' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 125


🏁 Script executed:

#!/bin/bash
# Check if there's validation in the API layer or spec service
rg -n 'DBOwner|db_owner|DatabaseOwner' server/internal/api server/internal/service --type=go

Repository: pgEdge/control-plane

Length of output: 307


🏁 Script executed:

#!/bin/bash
# Check if there's validation in the convert or API layer
rg -n -B5 -A5 'DbOwner|db_owner|DBOwner' server/internal/api/apiv1/convert.go

Repository: pgEdge/control-plane

Length of output: 894


🏁 Script executed:

#!/bin/bash
# Search for validation that might happen before NodeInstances is called
rg -n 'Spec.*Validate|func.*Validate' server/internal/workflows/

Repository: pgEdge/control-plane

Length of output: 992


🏁 Script executed:

#!/bin/bash
# Check what happens when owner is empty string - look at downstream usage
rg -n 'DatabaseOwner' server/internal/database/operations/common.go -B2 -A2

Repository: pgEdge/control-plane

Length of output: 378


🏁 Script executed:

#!/bin/bash
# Look for any validation of DatabaseUsers before creating/updating a spec
rg -n 'DatabaseUsers' server/internal/api --type=go -B2 -A2

Repository: pgEdge/control-plane

Length of output: 1392


🏁 Script executed:

#!/bin/bash
# Look at ValidateSpec to see what validation is actually done
sed -n '37,120p' server/internal/workflows/validate_spec.go

Repository: pgEdge/control-plane

Length of output: 2446


🏁 Script executed:

#!/bin/bash
# Check if there's any validation in the spec service layer
rg -n 'func.*Create.*Spec|func.*Update.*Spec' server/internal/database/ --type=go

Repository: pgEdge/control-plane

Length of output: 717


🏁 Script executed:

#!/bin/bash
# Search for any places that check owner is not empty
rg -n 'owner.*=|Owner.*==' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 92


🏁 Script executed:

#!/bin/bash
# See if there's validation logic anywhere that checks DBOwner count
rg -n 'count|len.*DBOwner|multiple.*owner' server/ --type=go

Repository: pgEdge/control-plane

Length of output: 7834


🏁 Script executed:

#!/bin/bash
# Check the CreateDatabase and UpdateDatabase to see if validation happens there
sed -n '53,100p' server/internal/database/service.go

Repository: pgEdge/control-plane

Length of output: 1488


🏁 Script executed:

#!/bin/bash
# Look for any test that has multiple DBOwner users
rg -n 'DBOwner.*true' server/internal/database/spec_test.go | head -30

Repository: pgEdge/control-plane

Length of output: 277


🏁 Script executed:

#!/bin/bash
# Check if there's any validation on Spec creation/API input
rg -n 'func.*Validate' server/internal/api/apiv1/ --type=go | grep -i spec

Repository: pgEdge/control-plane

Length of output: 248


🏁 Script executed:

#!/bin/bash
# Find and examine the ValidateDatabaseSpec function
rg -n 'func.*ValidateDatabaseSpec|func.*validateDatabaseSpec' server/internal/api/apiv1/ --type=go

Repository: pgEdge/control-plane

Length of output: 255


🏁 Script executed:

#!/bin/bash
# Look at the validate_test.go to see what DatabaseSpec validation tests exist
sed -n '497,650p' server/internal/api/apiv1/validate_test.go

Repository: pgEdge/control-plane

Length of output: 3692


🏁 Script executed:

#!/bin/bash
# Search for validation that checks DatabaseUsers specifically
rg -n 'DatabaseUsers|database_users' server/internal/api/apiv1/validate.go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check what ValidateChangedSpec does
rg -n -A20 'func ValidateChangedSpec' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Get the full validateDatabaseSpec function
sed -n '64,150p' server/internal/api/apiv1/validate.go

Repository: pgEdge/control-plane

Length of output: 2935


🏁 Script executed:

#!/bin/bash
# Search for all validations in validate.go
rg -n 'func validate' server/internal/api/apiv1/validate.go

Repository: pgEdge/control-plane

Length of output: 1830


🏁 Script executed:

#!/bin/bash
# Look for any test with multiple DBOwner entries to see if it's rejected
rg -n -B10 'DBOwner.*true' server/internal/api/apiv1/validate_test.go | head -50

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for DatabaseUsers or database_users in the entire codebase to see if there's any validation
rg -n 'DatabaseUsers|database_users' server/internal/api/apiv1/validate.go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check if the conversion function from API to internal spec has validation
sed -n '700,760p' server/internal/api/apiv1/convert.go

Repository: pgEdge/control-plane

Length of output: 2006


🏁 Script executed:

#!/bin/bash
# Look for any error handling or validation when creating NodeInstances with no owner
rg -n 'owner.*=.*""' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check if there's validation on the Spec struct itself (UnmarshalJSON, Validate methods)
rg -n 'func.*\(.*Spec\).*Validate|UnmarshalJSON' server/internal/database/spec.go

Repository: pgEdge/control-plane

Length of output: 125


🏁 Script executed:

#!/bin/bash
# Look for any place that checks if owner is empty and handles it
rg -n 'owner == ""' server/internal/database/ --type=go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check what happens if DatabaseOwner is empty in downstream operations
rg -n 'Owner.*:' server/internal/database/operations/common.go -B2 -A2

Repository: pgEdge/control-plane

Length of output: 253


🏁 Script executed:

#!/bin/bash
# Look at the test for NodeInstances to see if it tests the owner extraction logic
rg -n -A30 'func Test.*NodeInstances' server/internal/database/spec_test.go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check if there's any validation of DatabaseUsers in validateDatabaseSpec by looking at the full function output
sed -n '64,148p' server/internal/api/apiv1/validate.go | grep -i 'user\|owner'

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Look at the entire NodeInstances function to see if there's any error return for missing owner
sed -n '581,660p' server/internal/database/spec.go | tail -30

Repository: pgEdge/control-plane

Length of output: 1100


🏁 Script executed:

#!/bin/bash
# Check if owner being empty is ever checked in operations that use it
rg -n 'Owner.*==' server/internal/database/operations/ --type=go

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Verify: are there any tests that check what happens with zero or multiple DBOwner?
rg -n 'DBOwner' server/internal/database/spec_test.go | wc -l

Repository: pgEdge/control-plane

Length of output: 65


Add validation to enforce exactly one DBOwner user before NodeInstances() runs.

NodeInstances() silently takes the first user with DBOwner == true and stamps it onto every node. There is no upstream validation enforcing exactly one owner or rejecting zero owners. If multiple DBOwner users are present, ownership becomes input-order dependent; if none exist, DatabaseOwner becomes an empty string. Validate in validateDatabaseSpec() or at service layer that exactly one user has DBOwner == true.

Also applies to: 645-646

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/spec.go` around lines 587 - 593, NodeInstances()
currently picks the first user with DBOwner==true from DatabaseUsers (setting
DatabaseOwner) without validation, which makes ownership order-dependent or
empty; add validation in validateDatabaseSpec() (or equivalent service-layer
validation) to assert exactly one DatabaseUsers entry has DBOwner==true and
return an error if count != 1 so NodeInstances() can assume a single owner;
update any callers/tests to handle the validation error and reference
DatabaseUsers, DBOwner, DatabaseOwner, NodeInstances(), and
validateDatabaseSpec() when implementing the check.

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 29af08f to bf3dd8b Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from e47f053 to 26aa05d Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from bf3dd8b to 0508e04 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from 26aa05d to a856583 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 0508e04 to 7f89e1e Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from a856583 to 57a6663 Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 7f89e1e to b6a8fb4 Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from 57a6663 to e656f49 Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from b6a8fb4 to 53593e9 Compare March 12, 2026 19:28
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from e656f49 to 4e0f475 Compare March 12, 2026 19:28
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/resource-type-dependencies branch from 53593e9 to 53394ff Compare March 12, 2026 21:17
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from 4e0f475 to 3b3cfbb Compare March 12, 2026 21:17
Base automatically changed from feat/PLAT-417/resource-type-dependencies to main March 13, 2026 13:12
Adds a new `PostgresDatabaseResource` that shifts the Postgres database
creation logic to happen later in the database creation process. Now,
the `InstanceResource` is only responsible for creating the database
users, and the `PostgresDatabase` resource is responsible for creating
the Postgres database, granting role privileges to the new database, and
initializing the Spock node.

This change is necessary because we need to use `pg_service.conf` for
SystemD support, and we need that file to exist when we create the Spock
node. This change shifts the Spock node initialization until after all
of the instances are created and all nodes are ready, so we're able to
construct the DSNs for all nodes before we initialize Spock.

This change has an added benefit that it opens up the possibility of
support for multiple Postgres databases per Control Plane database since
we're able to make multiple databases/spock nodes per
`NodeResource`/Patroni cluster. I've added the `DatabaseName` field to
most, if not all, of the resources that would need to change to
accommodate this. However, I did stop short of incorporating the
database name into these resource's identities, which will be necessary
to complete that multi-database support.

PLAT-417
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/postgres-database-resource branch from 3b3cfbb to 42e1c4c Compare March 13, 2026 13:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/postgres/roles.go (1)

98-111: ⚠️ Potential issue | 🟠 Major

Pre-existing bug: incorrect role check in CreateApplicationReadOnlyRole.

Line 102 checks for the existence of pgedge_application but the function creates pgedge_application_read_only on line 105. This causes the read-only role creation to depend incorrectly on whether the application role exists.

This is a pre-existing issue not introduced by this PR, but should be tracked for a follow-up fix.

🐛 Proposed fix
 func CreateApplicationReadOnlyRole(opts BuiltinRoleOptions) Statements {
 	statements := Statements{
 		ConditionalStatement{
 			If: Query[bool]{
-				SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application');",
+				SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'pgedge_application_read_only');",
 			},
 			Then: Statement{
 				SQL: "CREATE ROLE pgedge_application_read_only WITH NOLOGIN;",
 			},
 		},
 	}

Do you want me to open a new issue to track this bug fix?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/roles.go` around lines 98 - 111, The existence check
in CreateApplicationReadOnlyRole is querying for 'pgedge_application' but the
function creates 'pgedge_application_read_only', so change the Query SQL in
CreateApplicationReadOnlyRole to check for NOT EXISTS ... WHERE rolname =
'pgedge_application_read_only' (so the ConditionalStatement's If Query matches
the actual role created) and ensure the Then Statement remains CREATE ROLE
pgedge_application_read_only WITH NOLOGIN; to avoid the creation depending on
the wrong role name.
♻️ Duplicate comments (3)
server/internal/database/spec.go (1)

587-593: ⚠️ Potential issue | 🟠 Major

Enforce exactly one DBOwner before assigning DatabaseOwner.

Current logic picks the first owner and silently accepts zero/multiple owners, making ownership order-dependent or empty.

Suggested fix
-	var owner string
+	var owner string
+	ownerCount := 0
 	for _, user := range s.DatabaseUsers {
 		if user.DBOwner {
 			owner = user.Username
-			break
+			ownerCount++
 		}
 	}
+	if ownerCount != 1 {
+		return nil, fmt.Errorf("expected exactly one database owner, found %d", ownerCount)
+	}

Also applies to: 645-646

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/spec.go` around lines 587 - 593, The loop that sets
owner by taking the first s.DatabaseUsers with DBOwner is unsafe; instead, count
and collect DBOwner entries from s.DatabaseUsers, if count != 1 return or
propagate an error (or fail validation) and only then set the DatabaseOwner
field to that sole user.Username; apply the same exact validation/assignment
change for the other occurrence that currently picks an owner (the second block
using DBOwner at the later assignment).
server/internal/database/operations/common.go (1)

83-85: ⚠️ Potential issue | 🟡 Minor

Name the failing resource correctly here.

state.AddResource(db) is adding a PostgresDatabaseResource, so the current message sends debugging toward the node path instead of the database path.

📝 Proposed fix
 	err := state.AddResource(db)
 	if err != nil {
-		return nil, fmt.Errorf("failed to add node resources to state: %w", err)
+		return nil, fmt.Errorf("failed to add database resource to state: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/operations/common.go` around lines 83 - 85, The
error message after calling state.AddResource(db) incorrectly references a node
resource; update the error to name the actual resource being added (a
PostgresDatabaseResource) so debugging points to the database path. Locate the
state.AddResource(db) call and change the fmt.Errorf text from "failed to add
node resources to state: %w" to something like "failed to add database resource
to state: %w" (optionally include db identifier if available) to clearly
reference the PostgresDatabaseResource being added.
server/internal/database/instance_resource.go (1)

137-143: ⚠️ Potential issue | 🟡 Minor

Use the requested dbName in the connect error.

Connection() accepts the target database as a parameter, but the error still reports r.Spec.DatabaseName. Calls like Line 178 now connect to "postgres", so this message is misleading on admin-connection failures.

📝 Proposed fix
 	conn, err := ConnectToInstance(ctx, &ConnectionOptions{
 		DSN: r.ConnectionInfo.AdminDSN(dbName),
 		TLS: tlsCfg,
 	})
 	if err != nil {
-		return nil, fmt.Errorf("failed to connect to database %q: %w", r.Spec.DatabaseName, err)
+		return nil, fmt.Errorf("failed to connect to database %q: %w", dbName, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/instance_resource.go` around lines 137 - 143, The
error message when ConnectToInstance fails uses r.Spec.DatabaseName but should
reference the actual target database parameter dbName; update the fmt.Errorf
call in the Connection() function (the block that calls ConnectToInstance with
ConnectionOptions and r.ConnectionInfo.AdminDSN(dbName)) to interpolate dbName
instead of r.Spec.DatabaseName so the error accurately reports the admin DB
being connected to.
🧹 Nitpick comments (3)
server/internal/postgres/roles.go (1)

152-162: Consider extracting duplicate Schemas() logic.

BuiltinRolePrivilegeOptions.Schemas() duplicates BuiltinRoleOptions.Schemas() (lines 66-71). Could embed a shared base or extract a helper.

♻️ Optional: Extract shared helper
+func combineSchemas(extraSchemas []string) []string {
+	var schemas []string
+	schemas = append(schemas, defaultSchemas...)
+	schemas = append(schemas, extraSchemas...)
+	return schemas
+}
+
 func (o BuiltinRoleOptions) Schemas() []string {
-	var schemas []string
-	schemas = append(schemas, defaultSchemas...)
-	schemas = append(schemas, o.ExtraSchemas...)
-	return schemas
+	return combineSchemas(o.ExtraSchemas)
 }
 
 ...
 
 func (o BuiltinRolePrivilegeOptions) Schemas() []string {
-	var schemas []string
-	schemas = append(schemas, defaultSchemas...)
-	schemas = append(schemas, o.ExtraSchemas...)
-	return schemas
+	return combineSchemas(o.ExtraSchemas)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/roles.go` around lines 152 - 162, The Schemas()
implementation in BuiltinRolePrivilegeOptions duplicates
BuiltinRoleOptions.Schemas(); extract the shared logic into a single helper
(e.g., a package-level function like combinedSchemas(extra []string) or a small
embedded struct with Schemas()) that appends defaultSchemas and the provided
extra list, then have both BuiltinRolePrivilegeOptions.Schemas() and
BuiltinRoleOptions.Schemas() call that helper; reference the existing symbols
BuiltinRolePrivilegeOptions.Schemas, BuiltinRoleOptions.Schemas, and
defaultSchemas when making the change.
server/internal/database/lag_tracker_commit_ts_resource.go (1)

38-43: Constructor does not initialize DatabaseName.

The NewLagTrackerCommitTimestampResource constructor doesn't accept or set the DatabaseName field, while the resource now depends on it for Dependencies(). Callers must manually set DatabaseName after construction. Consider either adding databaseName as a constructor parameter or documenting that callers must set it explicitly.

🔧 Optional: Add DatabaseName to constructor
-func NewLagTrackerCommitTimestampResource(originNode, receiverNode string) *LagTrackerCommitTimestampResource {
+func NewLagTrackerCommitTimestampResource(databaseName, originNode, receiverNode string) *LagTrackerCommitTimestampResource {
 	return &LagTrackerCommitTimestampResource{
+		DatabaseName: databaseName,
 		OriginNode:   originNode,
 		ReceiverNode: receiverNode,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/lag_tracker_commit_ts_resource.go` around lines 38 -
43, The constructor NewLagTrackerCommitTimestampResource currently doesn't set
the DatabaseName field but Dependencies() relies on it; modify
NewLagTrackerCommitTimestampResource to accept a databaseName string parameter
and assign it to the returned LagTrackerCommitTimestampResource's DatabaseName
field, and update all callers to pass the database name; alternatively (if you
prefer minimal changes) document and enforce via comment that callers must set
DatabaseName immediately after construction, but the preferred fix is adding the
databaseName parameter and assigning it inside
NewLagTrackerCommitTimestampResource so Dependencies() always has a valid
DatabaseName.
server/internal/database/replication_slot_advance_from_cts_resource.go (1)

85-89: Create() uses provider.Spec.DatabaseName instead of r.DatabaseName.

While the resource now has a DatabaseName field and uses it in Dependencies(), the Create() method connects and operates using provider.Spec.DatabaseName (lines 85, 95, 109, 119, 136). Under the current single-database-per-control-plane contract this is safe, but for consistency with the ReplicationSlotResource.Delete() fix, consider using r.DatabaseName.

♻️ Suggested fix for consistency
 	conn, err := provider.Connection(ctx, rc, provider.Spec.DatabaseName)
+	// Consider using r.DatabaseName for consistency:
+	// conn, err := provider.Connection(ctx, rc, r.DatabaseName)

Apply similar changes to lines 95, 109, 119, and 136.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/replication_slot_advance_from_cts_resource.go`
around lines 85 - 89, The Create() method is still using
provider.Spec.DatabaseName when opening connections and operating on the DB;
update Create() to use the resource's r.DatabaseName (the same field used in
Dependencies()) for all connection and operation calls to match the Delete() fix
and ensure consistency—replace occurrences of provider.Spec.DatabaseName with
r.DatabaseName in the Create() method (affecting the connection call and
subsequent statements that reference the DB name), and leave Dependencies() and
ReplicationSlotResource.Delete() unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/api/apiv1/validate_test.go`:
- Around line 773-774: The test's expected error message prefixes are wrong:
replace the "users[...]" prefixes in the assertions with "database_users[...]"
to match the validator output; update the two strings currently written as
"users[1]: usernames must be unique within a database" and "users[3]: cannot
have multiple users with db_owner = true" to "database_users[1]: usernames must
be unique within a database" and "database_users[3]: cannot have multiple users
with db_owner = true" so the assertions in validate_test.go (the expected error
messages array) match the validator's actual keys.

In `@server/internal/api/apiv1/validate.go`:
- Around line 360-368: The code currently checks hasOwner for every user and
appends a duplicate-owner error before verifying the current user's DbOwner
flag; update the condition so the duplicate-owner error is only raised when the
current user actually has DbOwner set to true. Concretely, modify the block that
constructs the error (the one using hasOwner, newValidationError, errs) to only
execute when user.DbOwner != nil && *user.DbOwner && hasOwner (or move that
block inside the subsequent if that sets hasOwner), leaving
seenNames.Add(user.Username) and the hasOwner = true assignment intact.

In `@server/internal/database/postgres_database.go`:
- Around line 28-35: The JSON tag for the HasRestoreConfig field in the
PostgresDatabaseResource struct is misspelled as `json:"hast_restore_config"`;
update the struct definition so the HasRestoreConfig field uses the correct tag
`json:"has_restore_config"` (fix the typo in the tag for the HasRestoreConfig
field) to avoid baking the incorrect key into persisted resource payloads and
snapshots.

In `@server/internal/database/subscription_resource.go`:
- Around line 25-31: SubscriptionResource must remain backward-compatible: keep
Identifier() as-is (old resource key) and ensure deserialized payloads without
database_name or with ExtraDependencies stored under dependent_subscriptions
still work by (1) adding a fallback in Dependencies() and providerDSN() to use
r.Subscriber.Spec.DatabaseName when r.DatabaseName == "" and (2) preserving or
adding compatibility for ExtraDependencies by accepting the legacy JSON key
"dependent_subscriptions" (either via struct tag or UnmarshalJSON) so
ExtraDependencies isn't lost after reload; update SubscriptionResource (fields
DatabaseName, ExtraDependencies), Dependencies(), and providerDSN() accordingly.

In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 759-761: Update the error message to refer to the Instance
resource instead of a node: when the call to
resource.FromContext[*database.InstanceResource](rc,
database.InstanceResourceIdentifier(fromInstanceID)) fails, change the
fmt.Errorf message from "failed to get node resource: %w" to something like
"failed to get instance resource: %w" so logs correctly identify the missing
InstanceResource (referencing resource.FromContext, database.InstanceResource,
and InstanceResourceIdentifier to locate the call).

In `@server/internal/postgres/create_db.go`:
- Around line 563-576: StartRepairModeTxn opens a transaction with conn.Begin
but returns on subsequent setup failures without closing it; before returning
after errors from IsSpockEnabled().Scalar or EnableRepairMode().Exec, call
tx.Rollback(ctx) (and ignore or log its error) to ensure the begun transaction
is cleaned up; update the error return paths in StartRepairModeTxn to perform a
rollback on tx prior to returning the wrapped error so the connection isn't left
with an open transaction.

---

Outside diff comments:
In `@server/internal/postgres/roles.go`:
- Around line 98-111: The existence check in CreateApplicationReadOnlyRole is
querying for 'pgedge_application' but the function creates
'pgedge_application_read_only', so change the Query SQL in
CreateApplicationReadOnlyRole to check for NOT EXISTS ... WHERE rolname =
'pgedge_application_read_only' (so the ConditionalStatement's If Query matches
the actual role created) and ensure the Then Statement remains CREATE ROLE
pgedge_application_read_only WITH NOLOGIN; to avoid the creation depending on
the wrong role name.

---

Duplicate comments:
In `@server/internal/database/instance_resource.go`:
- Around line 137-143: The error message when ConnectToInstance fails uses
r.Spec.DatabaseName but should reference the actual target database parameter
dbName; update the fmt.Errorf call in the Connection() function (the block that
calls ConnectToInstance with ConnectionOptions and
r.ConnectionInfo.AdminDSN(dbName)) to interpolate dbName instead of
r.Spec.DatabaseName so the error accurately reports the admin DB being connected
to.

In `@server/internal/database/operations/common.go`:
- Around line 83-85: The error message after calling state.AddResource(db)
incorrectly references a node resource; update the error to name the actual
resource being added (a PostgresDatabaseResource) so debugging points to the
database path. Locate the state.AddResource(db) call and change the fmt.Errorf
text from "failed to add node resources to state: %w" to something like "failed
to add database resource to state: %w" (optionally include db identifier if
available) to clearly reference the PostgresDatabaseResource being added.

In `@server/internal/database/spec.go`:
- Around line 587-593: The loop that sets owner by taking the first
s.DatabaseUsers with DBOwner is unsafe; instead, count and collect DBOwner
entries from s.DatabaseUsers, if count != 1 return or propagate an error (or
fail validation) and only then set the DatabaseOwner field to that sole
user.Username; apply the same exact validation/assignment change for the other
occurrence that currently picks an owner (the second block using DBOwner at the
later assignment).

---

Nitpick comments:
In `@server/internal/database/lag_tracker_commit_ts_resource.go`:
- Around line 38-43: The constructor NewLagTrackerCommitTimestampResource
currently doesn't set the DatabaseName field but Dependencies() relies on it;
modify NewLagTrackerCommitTimestampResource to accept a databaseName string
parameter and assign it to the returned LagTrackerCommitTimestampResource's
DatabaseName field, and update all callers to pass the database name;
alternatively (if you prefer minimal changes) document and enforce via comment
that callers must set DatabaseName immediately after construction, but the
preferred fix is adding the databaseName parameter and assigning it inside
NewLagTrackerCommitTimestampResource so Dependencies() always has a valid
DatabaseName.

In `@server/internal/database/replication_slot_advance_from_cts_resource.go`:
- Around line 85-89: The Create() method is still using
provider.Spec.DatabaseName when opening connections and operating on the DB;
update Create() to use the resource's r.DatabaseName (the same field used in
Dependencies()) for all connection and operation calls to match the Delete() fix
and ensure consistency—replace occurrences of provider.Spec.DatabaseName with
r.DatabaseName in the Create() method (affecting the connection call and
subsequent statements that reference the DB name), and leave Dependencies() and
ReplicationSlotResource.Delete() unchanged.

In `@server/internal/postgres/roles.go`:
- Around line 152-162: The Schemas() implementation in
BuiltinRolePrivilegeOptions duplicates BuiltinRoleOptions.Schemas(); extract the
shared logic into a single helper (e.g., a package-level function like
combinedSchemas(extra []string) or a small embedded struct with Schemas()) that
appends defaultSchemas and the provided extra list, then have both
BuiltinRolePrivilegeOptions.Schemas() and BuiltinRoleOptions.Schemas() call that
helper; reference the existing symbols BuiltinRolePrivilegeOptions.Schemas,
BuiltinRoleOptions.Schemas, and defaultSchemas when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 36c456ed-9b71-48d1-a8f2-550e9af59c31

📥 Commits

Reviewing files that changed from the base of the PR and between e47f053 and 42e1c4c.

📒 Files selected for processing (66)
  • docker/control-plane-dev/config.json
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/database/instance_resource.go
  • server/internal/database/lag_tracker_commit_ts_resource.go
  • server/internal/database/node_resource.go
  • server/internal/database/operations/add_nodes.go
  • server/internal/database/operations/add_nodes_test.go
  • server/internal/database/operations/common.go
  • server/internal/database/operations/end.go
  • server/internal/database/operations/golden_test/TestRestoreDatabase/restore_all_nodes_in_two-node_db.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db.json
  • server/internal/database/operations/golden_test/TestRestoreDatabase/single_node_restore_in_two-node_db_with_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_an_instance_dependency_with_forced_update.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/add_update_and_remove_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_a_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_concurrent.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_rolling.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/delete_two_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/forced_update.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/multiple_nodes_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_default_source_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate_and_replica.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_node.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_one_node_from_three_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_two_nodes_from_three_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_with_service_from_empty.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/two_nodes_to_three_nodes_with_populate.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/update_database_node_with_unchanged_service.json
  • server/internal/database/operations/helpers_test.go
  • server/internal/database/operations/populate_nodes.go
  • server/internal/database/operations/populate_nodes_test.go
  • server/internal/database/operations/restore_database.go
  • server/internal/database/operations/restore_database_test.go
  • server/internal/database/operations/update_database_test.go
  • server/internal/database/operations/update_nodes.go
  • server/internal/database/operations/update_nodes_test.go
  • server/internal/database/orchestrator.go
  • server/internal/database/postgres_database.go
  • server/internal/database/replication_slot_advance_from_cts_resource.go
  • server/internal/database/replication_slot_create_resource.go
  • server/internal/database/replication_slot_resource.go
  • server/internal/database/resources.go
  • server/internal/database/spec.go
  • server/internal/database/subscription_resource.go
  • server/internal/database/switchover_resource.go
  • server/internal/database/sync_event_resource.go
  • server/internal/database/wait_for_sync_event_resource.go
  • server/internal/monitor/instance_monitor_resource.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/swarm/service_user_role.go
  • server/internal/postgres/create_db.go
  • server/internal/postgres/dsn.go
  • server/internal/postgres/errors.go
  • server/internal/postgres/roles.go
  • server/internal/postgres/roles_test.go
  • server/internal/resource/event_test.go
  • server/internal/resource/resource.go
  • server/internal/workflows/activities/apply_event.go
  • server/internal/workflows/activities/executor.go
  • server/internal/workflows/common.go
  • server/internal/workflows/plan_restore.go
💤 Files with no reviewable changes (1)
  • server/internal/postgres/roles_test.go
✅ Files skipped from review due to trivial changes (1)
  • server/internal/database/sync_event_resource.go
🚧 Files skipped from review as they are similar to previous changes (18)
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_one_node_from_three_node_database.json
  • server/internal/database/operations/golden_test/TestUpdateDatabase/remove_two_nodes_from_three_node_database.json
  • server/internal/database/operations/add_nodes.go
  • server/internal/workflows/activities/executor.go
  • server/internal/monitor/instance_monitor_resource.go
  • server/internal/database/operations/helpers_test.go
  • server/internal/database/operations/update_nodes.go
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate_and_replica.json
  • server/internal/workflows/plan_restore.go
  • server/internal/database/operations/golden_test/TestUpdateDatabase/one_node_to_two_nodes_with_populate.json
  • server/internal/database/operations/end.go
  • server/internal/database/operations/golden_test/TestUpdateDatabase/forced_update.json
  • server/internal/database/operations/restore_database.go
  • server/internal/resource/resource.go
  • server/internal/database/resources.go
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_with_service_from_empty.json
  • server/internal/resource/event_test.go
  • server/internal/database/operations/golden_test/TestUpdateDatabase/single_node_from_empty.json

Comment on lines +773 to +774
"users[1]: usernames must be unique within a database",
"users[3]: cannot have multiple users with db_owner = true",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expected path prefix is inconsistent with validator output.

The validator reports database_users[...], but this test asserts users[...], so these assertions will fail.

🧪 Proposed fix
-				"users[1]: usernames must be unique within a database",
-				"users[3]: cannot have multiple users with db_owner = true",
+				"database_users[1]: usernames must be unique within a database",
+				"database_users[3]: cannot have multiple users with db_owner = true",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"users[1]: usernames must be unique within a database",
"users[3]: cannot have multiple users with db_owner = true",
"database_users[1]: usernames must be unique within a database",
"database_users[3]: cannot have multiple users with db_owner = true",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/api/apiv1/validate_test.go` around lines 773 - 774, The
test's expected error message prefixes are wrong: replace the "users[...]"
prefixes in the assertions with "database_users[...]" to match the validator
output; update the two strings currently written as "users[1]: usernames must be
unique within a database" and "users[3]: cannot have multiple users with
db_owner = true" to "database_users[1]: usernames must be unique within a
database" and "database_users[3]: cannot have multiple users with db_owner =
true" so the assertions in validate_test.go (the expected error messages array)
match the validator's actual keys.

Comment on lines +360 to +368
if hasOwner {
err := errors.New("cannot have multiple users with db_owner = true")
errs = append(errs, newValidationError(err, userPath))
}

seenNames.Add(user.Username)

if user.DbOwner != nil && *user.DbOwner {
hasOwner = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

db_owner check currently rejects valid non-owner users after the first owner.

At Line 360, the hasOwner check runs for every later user, regardless of that user’s DbOwner value. This creates false errors for valid specs.

🔧 Proposed fix
 	for i, user := range users {
 		userPath := appendPath(path, arrayIndexPath(i))
+		isOwner := user.DbOwner != nil && *user.DbOwner

 		if seenNames.Has(user.Username) {
 			err := errors.New("usernames must be unique within a database")
 			errs = append(errs, newValidationError(err, userPath))
 		}
-		if hasOwner {
+		if isOwner && hasOwner {
 			err := errors.New("cannot have multiple users with db_owner = true")
 			errs = append(errs, newValidationError(err, userPath))
 		}

 		seenNames.Add(user.Username)

-		if user.DbOwner != nil && *user.DbOwner {
+		if isOwner {
 			hasOwner = true
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if hasOwner {
err := errors.New("cannot have multiple users with db_owner = true")
errs = append(errs, newValidationError(err, userPath))
}
seenNames.Add(user.Username)
if user.DbOwner != nil && *user.DbOwner {
hasOwner = true
for i, user := range users {
userPath := appendPath(path, arrayIndexPath(i))
isOwner := user.DbOwner != nil && *user.DbOwner
if seenNames.Has(user.Username) {
err := errors.New("usernames must be unique within a database")
errs = append(errs, newValidationError(err, userPath))
}
if isOwner && hasOwner {
err := errors.New("cannot have multiple users with db_owner = true")
errs = append(errs, newValidationError(err, userPath))
}
seenNames.Add(user.Username)
if isOwner {
hasOwner = true
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/api/apiv1/validate.go` around lines 360 - 368, The code
currently checks hasOwner for every user and appends a duplicate-owner error
before verifying the current user's DbOwner flag; update the condition so the
duplicate-owner error is only raised when the current user actually has DbOwner
set to true. Concretely, modify the block that constructs the error (the one
using hasOwner, newValidationError, errs) to only execute when user.DbOwner !=
nil && *user.DbOwner && hasOwner (or move that block inside the subsequent if
that sets hasOwner), leaving seenNames.Add(user.Username) and the hasOwner =
true assignment intact.

Comment on lines +28 to +35
type PostgresDatabaseResource struct {
NodeName string `json:"node_name"`
DatabaseName string `json:"database_name"`
Owner string `json:"owner"`
RenameFrom string `json:"rename_from"`
HasRestoreConfig bool `json:"hast_restore_config"`
ExtraDependencies []resource.Identifier `json:"extra_dependencies"`
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the HasRestoreConfig JSON tag before this is persisted.

json:"hast_restore_config" bakes a typo into the resource payload schema. Since this type is new in this PR, it is worth correcting now before state snapshots and goldens start depending on the misspelled key.

Suggested fix
-	HasRestoreConfig  bool                  `json:"hast_restore_config"`
+	HasRestoreConfig  bool                  `json:"has_restore_config"`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type PostgresDatabaseResource struct {
NodeName string `json:"node_name"`
DatabaseName string `json:"database_name"`
Owner string `json:"owner"`
RenameFrom string `json:"rename_from"`
HasRestoreConfig bool `json:"hast_restore_config"`
ExtraDependencies []resource.Identifier `json:"extra_dependencies"`
}
type PostgresDatabaseResource struct {
NodeName string `json:"node_name"`
DatabaseName string `json:"database_name"`
Owner string `json:"owner"`
RenameFrom string `json:"rename_from"`
HasRestoreConfig bool `json:"has_restore_config"`
ExtraDependencies []resource.Identifier `json:"extra_dependencies"`
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/postgres_database.go` around lines 28 - 35, The JSON
tag for the HasRestoreConfig field in the PostgresDatabaseResource struct is
misspelled as `json:"hast_restore_config"`; update the struct definition so the
HasRestoreConfig field uses the correct tag `json:"has_restore_config"` (fix the
typo in the tag for the HasRestoreConfig field) to avoid baking the incorrect
key into persisted resource payloads and snapshots.

Comment on lines +25 to +31
DatabaseName string `json:"database_name"`
SubscriberNode string `json:"subscriber_node"`
ProviderNode string `json:"provider_node"`
Disabled bool `json:"disabled"`
SyncStructure bool `json:"sync_structure"`
SyncData bool `json:"sync_data"`
ExtraDependencies []resource.Identifier `json:"dependent_subscriptions"`
ExtraDependencies []resource.Identifier `json:"extra_dependencies"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep SubscriptionResource backward-compatible with existing state.

Identifier() still reuses the old resource key, so previously persisted subscriptions will be deserialized into this struct on upgrade. Those payloads will not have database_name, and they will still store ExtraDependencies under dependent_subscriptions. After reload, DatabaseName == "" and ExtraDependencies == nil, so Dependencies() targets PostgresDatabaseResourceIdentifier(..., "") and providerDSN() asks the orchestrator for an empty DB name. That can break refresh/update of live subscriptions until they are recreated. Please keep the old serialized key and add a subscriber.Spec.DatabaseName fallback until a state migration exists.

Based on learnings, the current API contract allows only one Postgres database per control plane database, and DatabaseName migrations are intentionally deferred.

Also applies to: 55-58, 145-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/subscription_resource.go` around lines 25 - 31,
SubscriptionResource must remain backward-compatible: keep Identifier() as-is
(old resource key) and ensure deserialized payloads without database_name or
with ExtraDependencies stored under dependent_subscriptions still work by (1)
adding a fallback in Dependencies() and providerDSN() to use
r.Subscriber.Spec.DatabaseName when r.DatabaseName == "" and (2) preserving or
adding compatibility for ExtraDependencies by accepting the legacy JSON key
"dependent_subscriptions" (either via struct tag or UnmarshalJSON) so
ExtraDependencies isn't lost after reload; update SubscriptionResource (fields
DatabaseName, ExtraDependencies), Dependencies(), and providerDSN() accordingly.

Comment on lines +759 to +761
instance, err := resource.FromContext[*database.InstanceResource](rc, database.InstanceResourceIdentifier(fromInstanceID))
if err != nil {
return nil, fmt.Errorf("failed to get node resource: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the resource name in this error.

This branch is loading an InstanceResource, so the current message points debugging at the wrong missing object.

📝 Proposed fix
 	instance, err := resource.FromContext[*database.InstanceResource](rc, database.InstanceResourceIdentifier(fromInstanceID))
 	if err != nil {
-		return nil, fmt.Errorf("failed to get node resource: %w", err)
+		return nil, fmt.Errorf("failed to get instance resource: %w", err)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
instance, err := resource.FromContext[*database.InstanceResource](rc, database.InstanceResourceIdentifier(fromInstanceID))
if err != nil {
return nil, fmt.Errorf("failed to get node resource: %w", err)
instance, err := resource.FromContext[*database.InstanceResource](rc, database.InstanceResourceIdentifier(fromInstanceID))
if err != nil {
return nil, fmt.Errorf("failed to get instance resource: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/swarm/orchestrator.go` around lines 759 - 761,
Update the error message to refer to the Instance resource instead of a node:
when the call to resource.FromContext[*database.InstanceResource](rc,
database.InstanceResourceIdentifier(fromInstanceID)) fails, change the
fmt.Errorf message from "failed to get node resource: %w" to something like
"failed to get instance resource: %w" so logs correctly identify the missing
InstanceResource (referencing resource.FromContext, database.InstanceResource,
and InstanceResourceIdentifier to locate the call).

Comment on lines +563 to +576
func StartRepairModeTxn(ctx context.Context, conn *pgx.Conn) (pgx.Tx, error) {
tx, err := conn.Begin(ctx)
if err != nil {
return nil, fmt.Errorf("failed to begin transaction: %w", err)
}
enabled, err := IsSpockEnabled().Scalar(ctx, tx)
if err != nil {
return nil, fmt.Errorf("failed to check if spock is enabled: %w", err)
}
if enabled {
err = EnableRepairMode().Exec(ctx, tx)
if err != nil {
return nil, fmt.Errorf("failed to enable repair mode: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollback tx before returning setup errors.

Once Begin succeeds on Line 564, the returns on Lines 569-575 leave that transaction open. Any caller that keeps using the connection will keep the aborted tx alive until the connection is closed. Roll it back before returning.

Suggested fix
 func StartRepairModeTxn(ctx context.Context, conn *pgx.Conn) (pgx.Tx, error) {
 	tx, err := conn.Begin(ctx)
 	if err != nil {
 		return nil, fmt.Errorf("failed to begin transaction: %w", err)
 	}
 	enabled, err := IsSpockEnabled().Scalar(ctx, tx)
 	if err != nil {
+		_ = tx.Rollback(ctx)
 		return nil, fmt.Errorf("failed to check if spock is enabled: %w", err)
 	}
 	if enabled {
 		err = EnableRepairMode().Exec(ctx, tx)
 		if err != nil {
+			_ = tx.Rollback(ctx)
 			return nil, fmt.Errorf("failed to enable repair mode: %w", err)
 		}
 	}
 	return tx, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func StartRepairModeTxn(ctx context.Context, conn *pgx.Conn) (pgx.Tx, error) {
tx, err := conn.Begin(ctx)
if err != nil {
return nil, fmt.Errorf("failed to begin transaction: %w", err)
}
enabled, err := IsSpockEnabled().Scalar(ctx, tx)
if err != nil {
return nil, fmt.Errorf("failed to check if spock is enabled: %w", err)
}
if enabled {
err = EnableRepairMode().Exec(ctx, tx)
if err != nil {
return nil, fmt.Errorf("failed to enable repair mode: %w", err)
}
func StartRepairModeTxn(ctx context.Context, conn *pgx.Conn) (pgx.Tx, error) {
tx, err := conn.Begin(ctx)
if err != nil {
return nil, fmt.Errorf("failed to begin transaction: %w", err)
}
enabled, err := IsSpockEnabled().Scalar(ctx, tx)
if err != nil {
_ = tx.Rollback(ctx)
return nil, fmt.Errorf("failed to check if spock is enabled: %w", err)
}
if enabled {
err = EnableRepairMode().Exec(ctx, tx)
if err != nil {
_ = tx.Rollback(ctx)
return nil, fmt.Errorf("failed to enable repair mode: %w", err)
}
}
return tx, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/create_db.go` around lines 563 - 576,
StartRepairModeTxn opens a transaction with conn.Begin but returns on subsequent
setup failures without closing it; before returning after errors from
IsSpockEnabled().Scalar or EnableRepairMode().Exec, call tx.Rollback(ctx) (and
ignore or log its error) to ensure the begun transaction is cleaned up; update
the error return paths in StartRepairModeTxn to perform a rollback on tx prior
to returning the wrapped error so the connection isn't left with an open
transaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant