diff --git a/docs/configuration/pgdog.toml/general.md b/docs/configuration/pgdog.toml/general.md index 71484ff..d1da439 100644 --- a/docs/configuration/pgdog.toml/general.md +++ b/docs/configuration/pgdog.toml/general.md @@ -141,6 +141,28 @@ Delay running idle healthchecks at PgDog startup to give databases (and pools) t Default: **`5_000`** (5s) +### `connection_recovery` + +Controls if server connections are recovered or dropped if a client abruptly disconnects. + +Available options: + +- `recover` (default) +- `rollback_only` +- `drop` + +`rollback_only` will only attempt to `ROLLBACK` any unfinished transactions but won't attempt to resynchronize connections. `drop` will close connections, without attempting recovery. + +### `client_connection_recovery` + +Controls whether to disconnect clients upon encountering connection pool errors (e.g., checkout timeout). Set this to `drop` if your clients are async / use pipelining mode. + +Available options: + +- `recover` (default) +- `drop` + + ## Timeouts These settings control how long PgDog waits for maintenance tasks to complete. These timeouts make sure PgDog can recover @@ -261,21 +283,6 @@ Enable load balancer [HTTP health checks](../../features/load-balancer/healthche Default: **none** (disabled) -## Service discovery - -### `broadcast_address` - -Send multicast packets to this address on the local network. Configuring this setting enables -mutual service discovery. Instances of PgDog running on the same network will be able to see -each other. - -Default: **none** (disabled) - -### `broadcast_port` - -The port used for sending and receiving broadcast messages. - -Default: **`6433`** ## Monitoring @@ -410,11 +417,41 @@ Available options: Default: **`auto`** -### `system_catalogs_omnisharded` +### `system_catalogs` -Enables sticky routing for system catalog tables and treats them as [omnisharded](../../features/sharding/omnishards.md) tables. This makes tools like `psql` work out of the box. +Changes how system catalog tables (like `pg_database`, `pg_class`, etc.) are treated by the query router. Default behavior is to assume they are the same on all shards and send queries referencing them to a random shard. This makes tools like `psql` work out of the box. -Default: **`true`** (enabled) +Available options: + +- `omnisharded` +- `omnisharded_sticky` (default) +- `sharded` + +Default: **`omnisharded_sticky`** (enabled) + +### `omnisharded_sticky` + +If turned on, queries touching [omnisharded](../../features/sharding/omnishards.md) tables are always sent to the same shard for any given client connection. The shard is determined at random on connection creation. + +Default: **`false`** + +### `resharding_copy_format` + +Which format to use for `COPY` statements during [resharding](../../features/sharding/resharding/index.md). + +Available options: + +- `binary` (default) +- `text` + +`text` format is required when migrating from `INTEGER` to `BIGINT` primary keys during resharding. + +### `reload_schema_on_ddl` + +!!! warning + This setting is intended for local development / CI / single node PgDog deployments. + +Automatically reload the schema cache used by PgDog to route queries upon detecting DDL statements (e.g., `CREATE TABLE`, `ALTER TABLE`, etc.). ## Logging diff --git a/docs/features/connection-recovery.md b/docs/features/connection-recovery.md new file mode 100644 index 0000000..df235bf --- /dev/null +++ b/docs/features/connection-recovery.md @@ -0,0 +1,106 @@ +--- +icon: material/connection +--- + +# Connection recovery + +PostgreSQL database connections are expensive to create so PgDog does its best not to close them unless absolutely necessary. In case a client disconnects before fully processing a query response, PgDog will attempt to preserve the connection using several recovery steps. + +## Abandoned transactions + +If a client disconnects abruptly while inside a transaction, the transaction is considered abandoned and PgDog will automatically execute a `ROLLBACK`, making sure none of its changes are persisted in the database. + +This is a common occurrence if there is a bug that causes the application to crash while executing multiple statements inside a manually started transaction, for example: + +=== "Rails" + ```ruby + ActiveRecord.transaction do + user = User.find(5) + # crash happens here. + end + ``` +=== "SQLAlchemy" + ```python + with session.begin(): + user = session.get(User, 5) + # crash happens here. + ``` +=== "Go" + ```go + tx, _ := db.Begin() + row := tx.QueryRow("SELECT * FROM users WHERE id = $1", 5) + // crash happens here. + ``` + +### Connection storms + +By preserving connections, PgDog protects the database against connection storms. Other connection poolers like PgBouncer close server connections without attempting any recovery. + +When the application restarts, the pooler must recreate all of these connections at once, causing thousands of server connections to be opened and closed in rapid succession. This leads to unnecessary contention on database resources and can cause 100% CPU spikes on the database. + +## Abandoned queries + +A client can abruptly disconnect while receiving query response data from the server. This can happen due to out-of-memory errors or hardware failure, for example: + +=== "Rails" + ```ruby + orders = Order.where(user_id: 5) + # ^ crash happens inside `pg`, + # while receiving multiple rows + ``` +=== "SQLAlchemy" + ```python + orders = session.execute( + select(Order).where(Order.user_id == 5) + ).all() + # ^ crash happens while receiving multiple rows + ``` +=== "Go" + ```go + rows, _ := db.Query("SELECT * FROM orders WHERE user_id = $1", 5) + for rows.Next() { + // crash happens here while iterating over rows + } + ``` + +PgDog will detect this and drain server connections, restoring them to their normal state, before returning them back to the connection pool. The drain mechanism works by receiving and discarding `DataRow` messages and sending [`Sync`](https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-SYNC) to the server to resynchronize the extended protocol state. + +Just like [abandoned transactions](#abandoned-transactions), this protects PostgreSQL databases from connection storms caused by unreliable clients. If the client was executing a transaction, it will be rolled back as well. + +### Configuration + +Connection recovery is an optional feature, enabled by default. You can change how it behaves through configuration: + +```toml +[general] +connection_recovery = "recover" +``` + +| Configuration value | Description | +|-|-| +| `recover` | Attempt full connection recovery, including rollback and resynchronization. This is the default. | +| `rollback_only` | Rollback abandoned transactions but drop the connection if a query was abandoned mid-response. | +| `drop` | Disable connection recovery and close the server connection (identical to PgBouncer). | + +To make sure abandoned server connections don't block normal operations, PgDog supports a configurable timeout on the recovery operation. If connection recovery doesn't complete in time, the connection will be closed: + +```toml +[general] +rollback_timeout = 5_000 +``` + +## Client connections + +Just like server connections, PgDog can maintain client connections (application --> PgDog) during incidents. This helps preserve application-side connection pools and avoids re-creating thousands of connections unnecessarily. + +While enabled by default, some applications don't behave well when their queries return errors instead of results. Therefore, this feature is configurable and can be disabled: + +```toml +[general] +client_connection_recovery = "drop" +``` + +| Configuration value | Description | +|-|-| +| `recover` | Attempt to maintain client connections open after database-related errors, like `checkout timeout`. | +| `drop` | Disable connection recovery and close the client connection (identical to PgBouncer). | diff --git a/docs/features/sharding/omnishards.md b/docs/features/sharding/omnishards.md index 0189283..8679005 100644 --- a/docs/features/sharding/omnishards.md +++ b/docs/features/sharding/omnishards.md @@ -106,11 +106,11 @@ tables = [ ] ``` -This is configurable with the `system_catalogs_omnisharded` setting in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#system_catalogs_omnisharded): +This is configurable with the `system_catalogs` setting in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#system_catalogs_omnisharded): ```toml [general] -system_catalogs_omnisharded = true +system_catalogs = "omnisharded_sticky" ``` -If enabled (it is by default), commands like `\d`, `\d+` and others sent from `psql` will start to return correct results. +If enabled (it is by default), commands like `\d`, `\d+` and others sent from `psql` will return correct results.