-
Notifications
You must be signed in to change notification settings - Fork 1
Add ARCHITECTURE.md #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add ARCHITECTURE.md #374
Conversation
Explains how the system works for developers and casual readers: the SQL type inference engine, unification algorithm, transformation pipeline, PostgreSQL protocol bridge, and schema management.
📝 WalkthroughWalkthroughAdds a new ARCHITECTURE.md describing the CipherStash Proxy internals and inserts links to it from README.md. The architecture doc details proxy positioning, eql-mapper and protocol-bridge subsystems, query lifecycle, type inference/unification, transformation rules, encryption/key management, authentication bridging, schema handling, and package layout. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@ARCHITECTURE.md`:
- Around line 205-229: The fenced package tree in ARCHITECTURE.md is missing a
language identifier causing markdownlint warnings; update the opening code fence
for the package tree (the triple-backtick block that contains the directory
listing for packages/ and entries like cipherstash-proxy/, eql-mapper/,
eql-mapper-macros/, showcase/) to include a language (use "text") so the block
starts with ```text instead of ```; no other content changes are needed.
- Around line 16-33: The fenced ASCII diagram block in ARCHITECTURE.md is
missing a language identifier (causing markdownlint MD040); update the opening
fence for the diagram (the triple-backtick block containing the "Application
CipherStash Proxy PostgreSQL" ASCII flow) to include a language token such as
text (i.e., change ``` to ```text) so the diagram is treated as plain text and
the linter warning is resolved.
| ``` | ||
| Application CipherStash Proxy PostgreSQL | ||
| | | | | ||
| |--- SQL statement ------------->| | | ||
| | Parse SQL into AST | | ||
| | Import schema (tables, columns, EQL types) | | ||
| | Run type inference (unification) | | ||
| | Identify encrypted literals & parameters | | ||
| | Encrypt values via ZeroKMS | | ||
| | Apply transformation rules to AST | | ||
| | Emit rewritten SQL | | ||
| | |--- transformed SQL ------------------>| | ||
| | |<-- result rows ----------------------| | ||
| | Identify encrypted columns in results | | ||
| | Batch-decrypt values via ZeroKMS | | ||
| | Re-encode to PostgreSQL wire format | | ||
| |<-- plaintext results ----------| | | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language identifier to the code fence.
Markdownlint flags the diagram block as missing a language specifier. Consider text (or plain) for the ASCII diagram to satisfy MD040.
Suggested fix
-```
+```text
Application CipherStash Proxy PostgreSQL
| | |
|--- SQL statement ------------->| |
| Parse SQL into AST |
| Import schema (tables, columns, EQL types) |
| Run type inference (unification) |
| Identify encrypted literals & parameters |
| Encrypt values via ZeroKMS |
| Apply transformation rules to AST |
| Emit rewritten SQL |
| |--- transformed SQL ------------------>|
| |<-- result rows ----------------------|
| Identify encrypted columns in results |
| Batch-decrypt values via ZeroKMS |
| Re-encode to PostgreSQL wire format |
|<-- plaintext results ----------| |
-```
+```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| Application CipherStash Proxy PostgreSQL | |
| | | | | |
| |--- SQL statement ------------->| | | |
| | Parse SQL into AST | | |
| | Import schema (tables, columns, EQL types) | | |
| | Run type inference (unification) | | |
| | Identify encrypted literals & parameters | | |
| | Encrypt values via ZeroKMS | | |
| | Apply transformation rules to AST | | |
| | Emit rewritten SQL | | |
| | |--- transformed SQL ------------------>| | |
| | |<-- result rows ----------------------| | |
| | Identify encrypted columns in results | | |
| | Batch-decrypt values via ZeroKMS | | |
| | Re-encode to PostgreSQL wire format | | |
| |<-- plaintext results ----------| | | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 16 - 33, The fenced ASCII diagram block in
ARCHITECTURE.md is missing a language identifier (causing markdownlint MD040);
update the opening fence for the diagram (the triple-backtick block containing
the "Application CipherStash Proxy PostgreSQL" ASCII flow) to include a language
token such as text (i.e., change ``` to ```text) so the diagram is treated as
plain text and the linter warning is resolved.
| ``` | ||
| packages/ | ||
| ├── cipherstash-proxy/ # Main proxy binary | ||
| │ └── src/ | ||
| │ ├── postgresql/ # Wire protocol implementation | ||
| │ │ ├── frontend.rs # Client → Server message handling | ||
| │ │ ├── backend.rs # Server → Client message handling | ||
| │ │ ├── handler.rs # Connection startup and auth | ||
| │ │ ├── protocol.rs # Low-level message reading | ||
| │ │ ├── parser.rs # SQL parsing entry point | ||
| │ │ └── context/ # Session state (statements, portals, metadata) | ||
| │ ├── proxy/ # Encryption service, schema management, config | ||
| │ └── config/ # Configuration parsing | ||
| ├── eql-mapper/ # SQL type inference and transformation | ||
| │ └── src/ | ||
| │ ├── inference/ # Type inference engine | ||
| │ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds | ||
| │ │ ├── sql_types/ # Operator and function type signatures | ||
| │ │ └── infer_type_impls/# Per-AST-node type inference implementations | ||
| │ ├── transformation_rules/# AST rewriting rules | ||
| │ ├── model/ # Schema, tables, columns, DDL tracking | ||
| │ └── scope_tracker.rs # Lexical scope management | ||
| ├── eql-mapper-macros/ # Proc macros for operator/function declarations | ||
| └── showcase/ # Example healthcare data model | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language identifier to the code fence.
Markdownlint flags the package tree block as missing a language specifier. Consider text for the tree.
Suggested fix
-```
+```text
packages/
├── cipherstash-proxy/ # Main proxy binary
│ └── src/
│ ├── postgresql/ # Wire protocol implementation
│ │ ├── frontend.rs # Client → Server message handling
│ │ ├── backend.rs # Server → Client message handling
│ │ ├── handler.rs # Connection startup and auth
│ │ ├── protocol.rs # Low-level message reading
│ │ ├── parser.rs # SQL parsing entry point
│ │ └── context/ # Session state (statements, portals, metadata)
│ ├── proxy/ # Encryption service, schema management, config
│ └── config/ # Configuration parsing
├── eql-mapper/ # SQL type inference and transformation
│ └── src/
│ ├── inference/ # Type inference engine
│ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds
│ │ ├── sql_types/ # Operator and function type signatures
│ │ └── infer_type_impls/# Per-AST-node type inference implementations
│ ├── transformation_rules/# AST rewriting rules
│ ├── model/ # Schema, tables, columns, DDL tracking
│ └── scope_tracker.rs # Lexical scope management
├── eql-mapper-macros/ # Proc macros for operator/function declarations
└── showcase/ # Example healthcare data model
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 205-205: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 205 - 229, The fenced package tree in
ARCHITECTURE.md is missing a language identifier causing markdownlint warnings;
update the opening code fence for the package tree (the triple-backtick block
that contains the directory listing for packages/ and entries like
cipherstash-proxy/, eql-mapper/, eql-mapper-macros/, showcase/) to include a
language (use "text") so the block starts with ```text instead of ```; no other
content changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new ARCHITECTURE.md document to describe CipherStash Proxy’s internal design for developers and repo readers, covering the query flow, eql-mapper inference/transformation, protocol bridging, encryption/key management, and schema/config management.
Changes:
- Introduces a comprehensive architecture write-up for CipherStash Proxy.
- Documents the SQL type inference + transformation pipeline concepts used by
eql-mapper. - Describes the PostgreSQL wire-protocol bridge and encryption/config reload behaviors at a high level.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Schema Management | ||
|
|
||
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support. |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema-loading description is inaccurate: the proxy’s schema loader marks encrypted columns by checking information_schema.columns.udt_name == 'eql_v2_encrypted' (see select_table_schemas.sql / SchemaManager::load_schema). It does not consult eql_v2_configuration for this, and index support comes from the separate Encrypt configuration loaded from public.eql_v2_configuration (via select_config.sql). Please update this paragraph to reflect the actual split between schema discovery vs encrypt-config loading.
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support. | |
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns and marks encrypted columns by checking `information_schema.columns.udt_name = 'eql_v2_encrypted'` (via `select_table_schemas.sql` / `SchemaManager::load_schema`). Separately, it loads the Encrypt configuration from `public.eql_v2_configuration` (via `select_config.sql`) to determine index types and other search capabilities for those encrypted columns. |
| 1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet | ||
| 2. **Periodic** — a background task reloads schema on a configurable interval | ||
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reload-cycle bullet for Startup says schema is loaded with exponential backoff retry. In the current implementation, startup loads schema/config without retries (SchemaManager::init calls load_schema, and EncryptConfigManager::init_reloader calls load_encrypt_config); retries with backoff only happen during reloads (load_schema_with_retry / load_encrypt_config_with_retry). Please adjust the bullet list so it matches runtime behavior.
| 1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet | |
| 2. **Periodic** — a background task reloads schema on a configurable interval | |
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes | |
| 1. **Startup** — load schema and encryption configuration once; if this fails, proxy startup fails (no retry/backoff is performed during startup) | |
| 2. **Periodic** — a background task reloads schema and encryption configuration on a configurable interval, using exponential backoff retry on failure (up to 10 attempts, max 2-second backoff) | |
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes, also using the same exponential backoff retry behavior on failure |
|
|
||
| This process propagates type information across the entire statement — through subqueries, CTEs, JOINs, `UNION` branches, function calls, and `RETURNING` clauses. | ||
|
|
||
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value. |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section claims the system infers the minimum encryption payload needed via Partial bounds. While eql-mapper does infer EqlTerm::Partial(_, bounds), the proxy currently discards those bounds when mapping/encrypting (it uses EqlTermVariant and treats Full/Partial/Tokenized the same as EqlOperation::Store). Please either clarify that the payload-minimization is an eql-mapper internal concept not currently used by the proxy encryption path, or update the description to match the current behavior.
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value. | |
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. Within `eql-mapper`, this allows the type system to infer the minimum bounds required for each value; however, the current proxy encryption path does not yet use these `Partial` bounds and treats `Full`/`Partial`/`Tokenized` the same when deciding what to encrypt. |
| | `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms | | ||
| | `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed | | ||
| | `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed | | ||
| | `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms | | ||
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument | | ||
| | `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument | | ||
|
|
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EQL operation routing table suggests Partial(Eq) vs Partial(Ord) map to different storage behavior and that JsonAccessor uses a distinct “field selector” query op. In the current proxy encryption service, Full/Partial/Tokenized all use EqlOperation::Store, and both JsonPath and JsonAccessor currently use EqlOperation::Query(..., QueryOp::SteVecSelector) when a SteVec index exists. Please align this table with the implemented routing logic (or note where behavior differs by design).
| | `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms | | |
| | `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed | | |
| | `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed | | |
| | `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms | | |
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument | | |
| | `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument | | |
| | `Full` | `EqlOperation::Store` | Insert/update with all configured search terms materialised | | |
| | `Partial(Eq)` | `EqlOperation::Store` | Equality-oriented operations — only equality search terms are constructed | | |
| | `Partial(Ord)` | `EqlOperation::Store` | Ordering/comparison operations — only ORE search terms are constructed | | |
| | `Tokenized` | `EqlOperation::Store` | Pattern/LIKE-style operations — tokenized search terms are constructed | | |
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument (uses SteVec index when available) | | |
| | `JsonAccessor` | `EqlOperation::Query` with `SteVecSelector` | JSON field access argument (same SteVec selector as `JsonPath` in current implementation) | | |
| **Note:** In the current proxy implementation, all of `Full`, `Partial(Eq)`, `Partial(Ord)`, and `Tokenized` are routed to `EqlOperation::Store`; the `Partial`/`Tokenized` variants only affect which search terms are built inside the payload. Likewise, both `JsonPath` and `JsonAccessor` use `EqlOperation::Query(..., QueryOp::SteVecSelector)` when a SteVec index exists, even though the mapper distinguishes them conceptually. |
Summary
ARCHITECTURE.mddocument describing the internal design of CipherStash Proxy for developers and anyone browsing the repo.Summary by CodeRabbit