chore(qwp): simplify the wire protocol to one version with inline schemas#39
Conversation
Mirror the server-side QWP simplification on the Java client. The wire format drops the per-table-block schema mode byte and schema id; column schemas always travel inline. Egress RESULT_BATCH carries the schema only on the first batch (batch_seq == 0); continuation batches reuse it. Ingress encode: QwpColumnWriter writes columns inline (no mode byte or schema_id), QwpWebSocketEncoder and QwpUdpSender drop the useSchemaRef parameter, and QwpWebSocketSender drops the schema-id allocator and the maxSchemasPerConnection plumbing across its connect overloads. QwpTableBuffer loses its schemaId field. Egress decode: QwpResultBatchDecoder parses the inline schema only on batch_seq == 0 into a reused per-query schema, replacing the connection schema registry; QwpEgressIoThread invalidates it when a new query starts. QwpQueryClient always reads SERVER_INFO now that the version gate is gone. Constants collapse to a single VERSION = 1, and CACHE_RESET keeps only RESET_MASK_DICT. The Sender builder loses maxSchemasPerConnection() and the max_schemas_per_connection connect-string key (a breaking change; QWP is beta). Client unit tests updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up doc cleanup after removing QWP schema references and protocol v2. No logic changes. Reword comments that still described the removed two-version, schema-reference world: the CACHE_RESET doc no longer advertises a schema-cache mask bit, SERVER_INFO is "every connection" rather than "every v2 connection", and the capability-trailer comments are version-neutral instead of naming a nonexistent v2.0/v2.1. Expand the QwpResultBatchDecoder querySchema comment to record two consequences of dropping the schema registry: every query's batch 0 now re-parses the inline schema and allocates a column-name String per column, and the single schema slot is correct only while the IoThread runs one query at a time (pipelining would need request_id keying). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Un-glue eight connect() parameter lists in QwpWebSocketSender where removing the maxSchemasPerConnection parameter had collapsed two arguments onto a single line with stray indentation. No behaviour change -- whitespace only. Repoint three Javadoc/comment references that pointed at the deleted docs/qwp/*.md design docs to their published equivalents under https://questdb.com/docs/ -- in QwpColumnBatch, ColumnView, and WebSocketResponse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add client-module coverage for the per-query schema state machine that replaced the connection-scoped registry when schema-reference mode was removed, plus the decoder and connect edge cases the change exposed: - testResetQuerySchemaRejectsContinuationFromPriorQuery drives resetQuerySchema() and asserts a continuation no longer binds to the prior query's schema (the IoThread call site was at 0% coverage). - testSchemaSwitchesAcrossQueriesWithDifferentColumnCount and testSchemaSlotsReusedAcrossQueriesWithDifferentTypes exercise the ensureQuerySchema grow/shrink branches, querySchema.size() on continuation batches, and pooled-slot reuse across type changes. - testEmptyResultSetCarriesSchema covers a result with the schema present but zero rows (per-column null flags, no bodies). - testColumnCountOutOfRangeIsRejected pins the surviving column_count guard, and testVersionTwoIsRejected pins the version-2 rejection the flattened version check introduced. - testWalk_ServerInfoTimeoutIsTransportNotTerminal verifies the walk treats a post-upgrade SERVER_INFO timeout as a transport error and falls through to a healthy node. Also convert a raw NUL byte and raw U+FFFF bytes in a char[] fixture to \u escapes. The stray NUL (pre-existing) made the source read as binary data, so grep and similar tooling silently skipped the whole file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
QwpResultBatchDecoder.close() now clears querySchemaValid alongside the other connection-scoped state it tears down. The decoder is single-use per connection today, so this is future-proofing: if a decoder instance were ever pooled and reused across connections, a surviving querySchemaValid would let the first continuation batch (batch_seq > 0) on the reused decoder bind rows to the previous connection's schema. No behaviour change in the current single-use lifecycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three test comments survived the schema-reference / protocol-v2 removal with stale wording. One was actively contradictory: the fall-through reset test claimed the test server "doesn't emit" SERVER_INFO directly above the setSendServerInfo(true) call the scenario now relies on -- once the rejection clears, the rehabilitated node must send the frame because the client always expects it. Reword that comment to explain why target=any keeps the test focused on the fall-through reset rather than the role filter, and drop the obsolete "v2 server" / "v2 SERVER_INFO" qualifiers from the WalkTracker class javadoc and the getServerInfo unit-test note. Also refresh the decoder hardening class javadoc, which still listed "growing the schema registry without bound" as a guarded scenario; that guard is gone, so it now names the surviving guard (out-of-range column_count). Comment-only changes; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The connect-walk role filter rejects an endpoint whose SERVER_INFO frame advertises a role the target= filter excludes, but no in-repo test exercised that branch: the existing REPLICA tests all drive the 421 X-QuestDB-Role header path, and the only test that sent a real SERVER_INFO role used target=any, so matchesTarget(info.getRole(), target) was never reached. Removing the v1-mismatch branch left this surviving branch uncovered. Add two tests against TestWebSocketServer with a clean 101 upgrade and setSendServerInfo(true), so the role is carried only by the decoded SERVER_INFO frame (a clean 101 ignores the upgrade-time role header): - a single REPLICA endpoint under target=primary throws QwpRoleMismatchException with the observed role taken from the frame; - a REPLICA-then-PRIMARY endpoint pair under target=primary skips the replica and binds the primary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review: PR #39 —
|
The max_schemas_per_connection connection-string option was removed along with the QWP schema-reference mechanism, but the tests that exercised it were deleted with nothing asserting the new behaviour. The key's removal is a breaking change, so its rejection deserves explicit coverage on both transports. Add a test on each fromConfig path that the key now fails: - Egress (QwpQueryClient.fromConfig): asserts the exact IllegalArgumentException "unknown configuration key: max_schemas_per_connection". This restores the coverage lost when the key was dropped from testIngressOnlyKeysSilentlyAcceptedOnEgress, pinning that egress now rejects the key rather than silently accepting it. - Ingress (Sender.fromConfig): asserts the LineSenderException "unknown configuration key [key=max_schemas_per_connection]". The exact-message assertions also pin the wording so a future refactor that tweaks the diagnostic must update the tests deliberately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[PR Coverage check]😍 pass : 38 / 43 (88.37%) file detail
|
bluestreak01
left a comment
There was a problem hiding this comment.
Approve (Level 3 review).
No Critical, correctness, concurrency, or resource-leak issues. The decoder rewrite (connection-scoped schemaRegistry → per-query inline querySchema + querySchemaValid) is sound, leak-safe, and correctly confined to the single I/O thread (resetQuerySchema() runs per query before any decode; a decode error sets currentQueryDone=true so no continuation can bind a stale schema). Module compiles; affected suites pass. All changed-symbol callsites in this repo and the parent questdb repo are SAFE — parent-repo hits for maxSchemasPerConnection/SCHEMA_MODE_REFERENCE/MAX_SUPPORTED_VERSION are the server's own io.questdb.cutlass.qwp.* symbols (tandem #7200), not callers of the removed client API.
Non-blocking items:
- Title:
chore(qwp):understates a breaking wire + public-API change (removesSender.builder().maxSchemasPerConnection(...)+ themax_schemas_per_connectionoption). Since the squash title lands onmainand feeds release notes, preferfeat(qwp)!:or add aBREAKING CHANGE:footer. - Nits: stale comment "post CACHE_RESET re-registration" (
QwpResultBatchDecoder.java:93);getServerInfo()Javadoc says null "if not connected" but a cleanclose()leavesserverInfonon-null;testEmptyResultSetCarriesSchemadoesn't feed a continuation to provequerySchemaValidwas set; theclose()-pathquerySchemaValidreset is untested.
Note vs the earlier review: the latest commits resolved its M1 (description now says "unknown configuration key", which is accurate) and M3 (egress + ingress reject tests now exist with exact-message assertions).
Design dependency to confirm at merge: the client now always blocks on receiveServerInfoSync(), so it is hard-dependent on the tandem server (#7200) always emitting SERVER_INFO post-upgrade — merge the pair together.
What
Removes the QWP schema-reference mechanism and the protocol-v2 designation from the Java client. Column schemas now always travel inline on the wire, and the client runs a single protocol version.
This is the client (Phase 2) half of a tandem change; the server half is questdb/questdb#7200. The QWP wire format is breaking, but QWP is beta so there is no compatibility window to preserve.
Why
QWP shipped with a schema-reference optimisation: a sender could transmit a column schema once, receive a schema id, and reference that id on later frames. Store-and-forward retry made this unworkable -- a recorded frame can replay against a fresh server connection that never saw the registering frame, leaving the reference dangling. Senders already stopped using reference mode in production, so the wire kept paying for a mechanism nobody used.
Wire format
Ingress table block (sender -> server):
Egress
RESULT_BATCH(server -> client): the schema rides only the first batch of a query (batch_seq == 0); continuation batches carry rows and bind against the schema parsed there.The per-frame version byte stays at offset 4, hardcoded to
1and validated== 1. TheX-QWP-Max-Versionhandshake header and negotiation machinery stay so a future version bump can re-introduce a range.Notable changes
batch_seq == 0is now rejected rather than binding rows to a stale schema.Sender.builder().maxSchemasPerConnection(...)and themax_schemas_per_connectionconnection-string option are gone. Code or config strings that reference them now fail -- with a compile error and an "unknown configuration key" error respectively.SERVER_INFOis always delivered post-upgrade (no v2 gate), so the failover role filter no longer has a v1 "no role available" branch.Trade-offs
maxSchemasPerConnectionbuilder method disappears; external callers that set it will no longer compile. Acceptable for beta.Test plan
mvn -pl core test(client unit + decoder-hardening suites)SelfSufficientFramesTesttrimmed to its symbol-dict-delta invariant.cutlass-test-suite, which runs this client against the matching server branch.Coordination
Tandem with questdb/questdb#7200. The wire format is breaking; merge the two together. This PR's branch is named to match the server branch so the client CI builds against the server PR instead of master.
🤖 Generated with Claude Code