Skip to content

Conversation

@coderdan
Copy link
Contributor

@coderdan coderdan commented Feb 12, 2026

Summary

  • Adds an ARCHITECTURE.md document describing the internal design of CipherStash Proxy for developers and anyone browsing the repo.
  • Covers the SQL type inference engine (unification, trait bounds, associated types), the transformation pipeline, the PostgreSQL protocol bridge, encryption/key management, and schema management.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Architecture guide describing the proxy's design, components, query lifecycle, encryption/key management, protocol handling, and operational behaviors.
    • Added links to the new Architecture guide in the README's "More info" sections for easier discovery.

Explains how the system works for developers and casual readers:
the SQL type inference engine, unification algorithm, transformation
pipeline, PostgreSQL protocol bridge, and schema management.
Copilot AI review requested due to automatic review settings February 12, 2026 09:25
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Adds a new ARCHITECTURE.md describing the CipherStash Proxy internals and inserts links to it from README.md. The architecture doc details proxy positioning, eql-mapper and protocol-bridge subsystems, query lifecycle, type inference/unification, transformation rules, encryption/key management, authentication bridging, schema handling, and package layout.

Changes

Cohort / File(s) Summary
Architecture doc
ARCHITECTURE.md
New comprehensive architecture documentation covering proxy positioning, core subsystems (eql-mapper, protocol bridge), query lifecycle, EQL type system and unification, AST transformation rules, batch decryption, authentication bridging, ZeroKMS/key caching, schema discovery/reload strategies, and package structure.
Readme updates
README.md
Two insertions adding links to the new Architecture guide in the "More info" sections. No code changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 I hopped through pages, found the plan so neat,

Blueprints of the proxy — every path complete,
Types, keys, and bridges in tidy rows,
Rules that tell the AST where it goes,
I nibble docs with joy — architecture sweet. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add ARCHITECTURE.md' directly and clearly describes the main change: the addition of a new architecture documentation file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/architecture-md

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@ARCHITECTURE.md`:
- Around line 205-229: The fenced package tree in ARCHITECTURE.md is missing a
language identifier causing markdownlint warnings; update the opening code fence
for the package tree (the triple-backtick block that contains the directory
listing for packages/ and entries like cipherstash-proxy/, eql-mapper/,
eql-mapper-macros/, showcase/) to include a language (use "text") so the block
starts with ```text instead of ```; no other content changes are needed.
- Around line 16-33: The fenced ASCII diagram block in ARCHITECTURE.md is
missing a language identifier (causing markdownlint MD040); update the opening
fence for the diagram (the triple-backtick block containing the "Application
CipherStash Proxy PostgreSQL" ASCII flow) to include a language token such as
text (i.e., change ``` to ```text) so the diagram is treated as plain text and
the linter warning is resolved.

Comment on lines +16 to +33
```
Application CipherStash Proxy PostgreSQL
| | |
|--- SQL statement ------------->| |
| Parse SQL into AST |
| Import schema (tables, columns, EQL types) |
| Run type inference (unification) |
| Identify encrypted literals & parameters |
| Encrypt values via ZeroKMS |
| Apply transformation rules to AST |
| Emit rewritten SQL |
| |--- transformed SQL ------------------>|
| |<-- result rows ----------------------|
| Identify encrypted columns in results |
| Batch-decrypt values via ZeroKMS |
| Re-encode to PostgreSQL wire format |
|<-- plaintext results ----------| |
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the code fence.

Markdownlint flags the diagram block as missing a language specifier. Consider text (or plain) for the ASCII diagram to satisfy MD040.

Suggested fix
-```
+```text
 Application                    CipherStash Proxy                         PostgreSQL
     |                                |                                        |
     |--- SQL statement ------------->|                                        |
     |                          Parse SQL into AST                             |
     |                          Import schema (tables, columns, EQL types)     |
     |                          Run type inference (unification)               |
     |                          Identify encrypted literals & parameters       |
     |                          Encrypt values via ZeroKMS                     |
     |                          Apply transformation rules to AST             |
     |                          Emit rewritten SQL                             |
     |                                |--- transformed SQL ------------------>|
     |                                |<-- result rows ----------------------|
     |                          Identify encrypted columns in results          |
     |                          Batch-decrypt values via ZeroKMS              |
     |                          Re-encode to PostgreSQL wire format            |
     |<-- plaintext results ----------|                                        |
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
Application CipherStash Proxy PostgreSQL
| | |
|--- SQL statement ------------->| |
| Parse SQL into AST |
| Import schema (tables, columns, EQL types) |
| Run type inference (unification) |
| Identify encrypted literals & parameters |
| Encrypt values via ZeroKMS |
| Apply transformation rules to AST |
| Emit rewritten SQL |
| |--- transformed SQL ------------------>|
| |<-- result rows ----------------------|
| Identify encrypted columns in results |
| Batch-decrypt values via ZeroKMS |
| Re-encode to PostgreSQL wire format |
|<-- plaintext results ----------| |
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 16 - 33, The fenced ASCII diagram block in
ARCHITECTURE.md is missing a language identifier (causing markdownlint MD040);
update the opening fence for the diagram (the triple-backtick block containing
the "Application CipherStash Proxy PostgreSQL" ASCII flow) to include a language
token such as text (i.e., change ``` to ```text) so the diagram is treated as
plain text and the linter warning is resolved.

Comment on lines +205 to +229
```
packages/
├── cipherstash-proxy/ # Main proxy binary
│ └── src/
│ ├── postgresql/ # Wire protocol implementation
│ │ ├── frontend.rs # Client → Server message handling
│ │ ├── backend.rs # Server → Client message handling
│ │ ├── handler.rs # Connection startup and auth
│ │ ├── protocol.rs # Low-level message reading
│ │ ├── parser.rs # SQL parsing entry point
│ │ └── context/ # Session state (statements, portals, metadata)
│ ├── proxy/ # Encryption service, schema management, config
│ └── config/ # Configuration parsing
├── eql-mapper/ # SQL type inference and transformation
│ └── src/
│ ├── inference/ # Type inference engine
│ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds
│ │ ├── sql_types/ # Operator and function type signatures
│ │ └── infer_type_impls/# Per-AST-node type inference implementations
│ ├── transformation_rules/# AST rewriting rules
│ ├── model/ # Schema, tables, columns, DDL tracking
│ └── scope_tracker.rs # Lexical scope management
├── eql-mapper-macros/ # Proc macros for operator/function declarations
└── showcase/ # Example healthcare data model
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the code fence.

Markdownlint flags the package tree block as missing a language specifier. Consider text for the tree.

Suggested fix
-```
+```text
 packages/
 ├── cipherstash-proxy/           # Main proxy binary
 │   └── src/
 │       ├── postgresql/          # Wire protocol implementation
 │       │   ├── frontend.rs      # Client → Server message handling
 │       │   ├── backend.rs       # Server → Client message handling
 │       │   ├── handler.rs       # Connection startup and auth
 │       │   ├── protocol.rs      # Low-level message reading
 │       │   ├── parser.rs        # SQL parsing entry point
 │       │   └── context/         # Session state (statements, portals, metadata)
 │       ├── proxy/               # Encryption service, schema management, config
 │       └── config/              # Configuration parsing
 ├── eql-mapper/                  # SQL type inference and transformation
 │   └── src/
 │       ├── inference/           # Type inference engine
 │       │   ├── unifier/         # Unification algorithm, type definitions, trait bounds
 │       │   ├── sql_types/       # Operator and function type signatures
 │       │   └── infer_type_impls/# Per-AST-node type inference implementations
 │       ├── transformation_rules/# AST rewriting rules
 │       ├── model/               # Schema, tables, columns, DDL tracking
 │       └── scope_tracker.rs     # Lexical scope management
 ├── eql-mapper-macros/           # Proc macros for operator/function declarations
 └── showcase/                    # Example healthcare data model
-```
+```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 205-205: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 205 - 229, The fenced package tree in
ARCHITECTURE.md is missing a language identifier causing markdownlint warnings;
update the opening code fence for the package tree (the triple-backtick block
that contains the directory listing for packages/ and entries like
cipherstash-proxy/, eql-mapper/, eql-mapper-macros/, showcase/) to include a
language (use "text") so the block starts with ```text instead of ```; no other
content changes are needed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ARCHITECTURE.md document to describe CipherStash Proxy’s internal design for developers and repo readers, covering the query flow, eql-mapper inference/transformation, protocol bridging, encryption/key management, and schema/config management.

Changes:

  • Introduces a comprehensive architecture write-up for CipherStash Proxy.
  • Documents the SQL type inference + transformation pipeline concepts used by eql-mapper.
  • Describes the PostgreSQL wire-protocol bridge and encryption/config reload behaviors at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


## Schema Management

The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema-loading description is inaccurate: the proxy’s schema loader marks encrypted columns by checking information_schema.columns.udt_name == 'eql_v2_encrypted' (see select_table_schemas.sql / SchemaManager::load_schema). It does not consult eql_v2_configuration for this, and index support comes from the separate Encrypt configuration loaded from public.eql_v2_configuration (via select_config.sql). Please update this paragraph to reflect the actual split between schema discovery vs encrypt-config loading.

Suggested change
The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support.
The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns and marks encrypted columns by checking `information_schema.columns.udt_name = 'eql_v2_encrypted'` (via `select_table_schemas.sql` / `SchemaManager::load_schema`). Separately, it loads the Encrypt configuration from `public.eql_v2_configuration` (via `select_config.sql`) to determine index types and other search capabilities for those encrypted columns.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +201
1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet
2. **Periodic** — a background task reloads schema on a configurable interval
3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reload-cycle bullet for Startup says schema is loaded with exponential backoff retry. In the current implementation, startup loads schema/config without retries (SchemaManager::init calls load_schema, and EncryptConfigManager::init_reloader calls load_encrypt_config); retries with backoff only happen during reloads (load_schema_with_retry / load_encrypt_config_with_retry). Please adjust the bullet list so it matches runtime behavior.

Suggested change
1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet
2. **Periodic** — a background task reloads schema on a configurable interval
3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes
1. **Startup** — load schema and encryption configuration once; if this fails, proxy startup fails (no retry/backoff is performed during startup)
2. **Periodic** — a background task reloads schema and encryption configuration on a configurable interval, using exponential backoff retry on failure (up to 10 attempts, max 2-second backoff)
3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes, also using the same exponential backoff retry behavior on failure

Copilot uses AI. Check for mistakes.

This process propagates type information across the entire statement — through subqueries, CTEs, JOINs, `UNION` branches, function calls, and `RETURNING` clauses.

A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims the system infers the minimum encryption payload needed via Partial bounds. While eql-mapper does infer EqlTerm::Partial(_, bounds), the proxy currently discards those bounds when mapping/encrypting (it uses EqlTermVariant and treats Full/Partial/Tokenized the same as EqlOperation::Store). Please either clarify that the payload-minimization is an eql-mapper internal concept not currently used by the proxy encryption path, or update the description to match the current behavior.

Suggested change
A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value.
A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. Within `eql-mapper`, this allows the type system to infer the minimum bounds required for each value; however, the current proxy encryption path does not yet use these `Partial` bounds and treats `Full`/`Partial`/`Tokenized` the same when deciding what to encrypt.

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +187
| `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms |
| `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed |
| `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed |
| `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms |
| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument |
| `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument |

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EQL operation routing table suggests Partial(Eq) vs Partial(Ord) map to different storage behavior and that JsonAccessor uses a distinct “field selector” query op. In the current proxy encryption service, Full/Partial/Tokenized all use EqlOperation::Store, and both JsonPath and JsonAccessor currently use EqlOperation::Query(..., QueryOp::SteVecSelector) when a SteVec index exists. Please align this table with the implemented routing logic (or note where behavior differs by design).

Suggested change
| `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms |
| `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed |
| `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed |
| `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms |
| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument |
| `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument |
| `Full` | `EqlOperation::Store` | Insert/update with all configured search terms materialised |
| `Partial(Eq)` | `EqlOperation::Store` | Equality-oriented operations — only equality search terms are constructed |
| `Partial(Ord)` | `EqlOperation::Store` | Ordering/comparison operations — only ORE search terms are constructed |
| `Tokenized` | `EqlOperation::Store` | Pattern/LIKE-style operations — tokenized search terms are constructed |
| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument (uses SteVec index when available) |
| `JsonAccessor` | `EqlOperation::Query` with `SteVecSelector` | JSON field access argument (same SteVec selector as `JsonPath` in current implementation) |
**Note:** In the current proxy implementation, all of `Full`, `Partial(Eq)`, `Partial(Ord)`, and `Tokenized` are routed to `EqlOperation::Store`; the `Partial`/`Tokenized` variants only affect which search terms are built inside the payload. Likewise, both `JsonPath` and `JsonAccessor` use `EqlOperation::Query(..., QueryOp::SteVecSelector)` when a SteVec index exists, even though the mapper distinguishes them conceptually.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant