From b0c29e8a67708c2fa84894ee815c1ded70f36bad Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 13 May 2026 22:14:48 -0700 Subject: [PATCH 1/3] Draft struct support reference --- .../pages/sql/sql-data-types/row.adoc | 146 +++++++++++++++++- .../sql/sql-statements/create-table.adoc | 8 +- 2 files changed, 148 insertions(+), 6 deletions(-) diff --git a/modules/reference/pages/sql/sql-data-types/row.adoc b/modules/reference/pages/sql/sql-data-types/row.adoc index bc55a2dac..453c24dc0 100644 --- a/modules/reference/pages/sql/sql-data-types/row.adoc +++ b/modules/reference/pages/sql/sql-data-types/row.adoc @@ -2,7 +2,7 @@ :description: The ROW data type represents a composite value containing one or more fields of different types. :page-topic-type: reference -The `ROW` data type represents a composite value (also known as a struct or record) containing one or more fields of different types. +The `ROW` data type represents a composite value (also known as a struct or record) containing one or more fields of different types. ROW values support field access, lexicographic comparison, NULL checks, conversion to text, and use in `GROUP BY`, `ORDER BY`, and `JOIN` clauses. == Syntax @@ -75,3 +75,147 @@ SELECT ROW(); () (1 row) ---- + +== Access fields + +=== Access by position + +For anonymous ROW expressions, fields are accessed by the positional names `f1`, `f2`, and so on, in declaration order: + +[source,sql] +---- +SELECT (ROW(1, 'hello', 3.14)).f1, (ROW(1, 'hello', 3.14)).f2; +---- + +[source,sql] +---- + f1 | f2 +----+------- + 1 | hello +(1 row) +---- + +The parentheses around the ROW expression are required when accessing a field. + +=== Access by name + +For composite columns with declared field names — for example, columns mapped from a topic with `struct_mapping_policy = 'COMPOUND'` (see xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]) — access fields by their declared names: + +[source,sql] +---- +SELECT (record).customer_id, (record).order_total FROM orders; +---- + +=== Expand all fields with a wildcard + +To project every field of a ROW value as a separate result column, use the `.*` form: + +[source,sql] +---- +SELECT (ROW(1, 'hello', 3.14)).*; +---- + +[source,sql] +---- + f1 | f2 | f3 +----+-------+------ + 1 | hello | 3.14 +(1 row) +---- + +The wildcard form also works inside a `ROW(...)` constructor to copy fields from one composite into another. + +== Compare ROW values + +ROW values support the standard comparison operators `=`, `<>`, `<`, `\<=`, `>`, and `>=`. Comparison is *lexicographic*: fields are compared in order, left to right, and the first differing field determines the result. + +[source,sql] +---- +SELECT ROW(1, 'a') < ROW(1, 'b'); +---- + +[source,sql] +---- + ?column? +---------- + t +(1 row) +---- + +Both ROW values must have the same number of fields, and corresponding fields must have comparable types. + +== Check for NULL + +ROW values support `IS NULL` and `IS NOT NULL`. + +[source,sql] +---- +SELECT ROW(1, 'a') IS NULL; +---- + +[source,sql] +---- + ?column? +---------- + f +(1 row) +---- + +To check whether a specific field within a ROW is NULL, access the field directly and test that. + +== Convert to text + +Cast a ROW value to `text` to produce the standard PostgreSQL composite literal form: + +[source,sql] +---- +SELECT ROW(1, 'hello', 3.14)::text; +---- + +[source,sql] +---- + row +----------------- + (1,"hello",3.14) +(1 row) +---- + +== Use ROW in queries + +ROW values can be used in `GROUP BY`, `ORDER BY`, and `JOIN` clauses with lexicographic comparison semantics. + +=== Group by a ROW field + +[source,sql] +---- +SELECT (customer).region, COUNT(*) +FROM orders +GROUP BY (customer).region; +---- + +=== Order by a whole ROW + +[source,sql] +---- +SELECT * FROM orders ORDER BY customer; +---- + +The rows are sorted lexicographically by the fields of the `customer` composite column, in their declared order. + +=== Join on a multi-column key + +Compare implicit tuples to match multi-column keys without spelling out each field in a `WHERE` clause: + +[source,sql] +---- +SELECT * +FROM table_a a +JOIN table_b b +ON (a.col1, a.col2) = (b.col1, b.col2); +---- + +// TODO: SME — confirm whether nested array-of-struct access (for example, `(arr_of_rows[1]).field_name`) works at GA, and whether wildcard expansion on an empty ROW (`(ROW()).*`) is supported. Both are tracked under OXLA-9444 and OXLA-9431 respectively and remain open as of 2026-05-13. + +== See also + +* xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]: maps a Redpanda topic to a SQL table. Use `struct_mapping_policy = 'COMPOUND'` to surface nested topic fields as ROW columns. diff --git a/modules/reference/pages/sql/sql-statements/create-table.adoc b/modules/reference/pages/sql/sql-statements/create-table.adoc index bcd8bdf6a..d84899c9c 100644 --- a/modules/reference/pages/sql/sql-statements/create-table.adoc +++ b/modules/reference/pages/sql/sql-statements/create-table.adoc @@ -51,12 +51,10 @@ a|How to handle records that fail deserialization. |`struct_mapping_policy` |STRING |No -a|How to map nested structures to SQL columns. +a|How to map nested structures from the topic schema to SQL columns. -* `JSON` (default): Stores nested data as JSON. -* `FLATTEN`: Expands nested fields into top-level columns. -* `COMPOUND`: Maps to ROW types. -* `VARIANT`: Stores as a variant type. +* `JSON` (default): Stores each nested structure as a JSON value. Required for recursive (cyclic) types. +* `COMPOUND`: Maps each nested structure to a SQL xref:reference:sql/sql-data-types/row.adoc[ROW] value with named fields, queryable using `(column).field_name` syntax. Cyclic types are not supported in `COMPOUND` mode — use `JSON` for recursive schemas. |`output_schema_message_full_name` |STRING From 2099c0106a00753b73c01d5bd0c6395ed750aa08 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 13 May 2026 22:18:16 -0700 Subject: [PATCH 2/3] How to query structs/nested fields --- modules/ROOT/nav.adoc | 1 + .../pages/query-data/query-nested-fields.adoc | 134 ++++++++++++++++++ 2 files changed, 135 insertions(+) create mode 100644 modules/sql/pages/query-data/query-nested-fields.adoc diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index f93d23547..884ac7104 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -355,6 +355,7 @@ *** xref:sql:query-data/redpanda-catalogs.adoc[Redpanda Catalogs] *** xref:sql:query-data/query-streaming-topics.adoc[Query Streaming Topics] *** xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg Topics] +*** xref:sql:query-data/query-nested-fields.adoc[Query Topics with Nested Fields] ** xref:sql:manage/index.adoc[Manage Redpanda SQL] ** xref:sql:troubleshoot/index.adoc[Troubleshoot] *** xref:sql:troubleshoot/degraded-state-handling.adoc[] diff --git a/modules/sql/pages/query-data/query-nested-fields.adoc b/modules/sql/pages/query-data/query-nested-fields.adoc new file mode 100644 index 000000000..e5fe753a1 --- /dev/null +++ b/modules/sql/pages/query-data/query-nested-fields.adoc @@ -0,0 +1,134 @@ += Query topics with nested fields +:description: Map a topic with nested protobuf or Avro fields to SQL ROW columns, then query those fields directly. +:page-topic-type: how-to +:personas: app_developer, data_engineer +:learning-objective-1: Map a topic with a nested schema as a SQL table using struct_mapping_policy = 'COMPOUND' +:learning-objective-2: Query nested fields using ROW field-access syntax +:learning-objective-3: Recognize and resolve cyclic-reference errors + +When a topic schema includes nested protobuf or Avro message types, you can map those nested structures as SQL `ROW` columns instead of opaque JSON. This makes nested fields queryable by name, includable in projections, and usable in `WHERE`, `GROUP BY`, and `ORDER BY` clauses, without parsing JSON at query time. + +After completing these steps, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +Before you query a topic with nested fields: + +* Enable Redpanda SQL on your Redpanda Bring Your Own Cloud (BYOC) cluster. See xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL]. +* Connect to Redpanda SQL with `psql` or another PostgreSQL client. See xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]. +* The topic has a schema (Protobuf or Avro) registered in Schema Registry. The schema includes one or more nested message types. +* You have a Redpanda catalog connection. See xref:reference:sql/sql-statements/create-redpanda-catalog.adoc[CREATE REDPANDA CATALOG]. + +== Map the topic as a SQL table + +Create the SQL table with `struct_mapping_policy = 'COMPOUND'` to surface each nested message as a SQL `ROW` column: + +[source,sql] +---- +CREATE TABLE default_redpanda_catalog=>orders WITH ( + topic = 'orders', + schema_subject = 'orders-value', + struct_mapping_policy = 'COMPOUND' +); +---- + +Replace `orders` with your topic name and `orders-value` with the Schema Registry subject that holds the topic's value schema. + +For a topic schema with this Protobuf definition: + +[source,proto] +---- +message Order { + string order_id = 1; + Customer customer = 2; + double amount = 3; +} + +message Customer { + string customer_id = 1; + string name = 2; + string region = 3; +} +---- + +Redpanda SQL maps the table with three columns: `order_id` (text), `customer` (a `ROW` with fields `customer_id`, `name`, and `region`), and `amount` (double precision). + +TIP: To map nested structures as JSON instead, use `struct_mapping_policy = 'JSON'`. `JSON` is the default, and it is the only option that supports recursive (cyclic) types. See <>. + +== Query nested fields + +Access a nested field by its declared name using the `(column).field` form. The parentheses around the column are required: + +[source,sql] +---- +SELECT order_id, (customer).name, (customer).region, amount +FROM default_redpanda_catalog=>orders +WHERE (customer).region = 'EMEA'; +---- + +To project every field of a nested structure as separate result columns, use the wildcard `.*` form: + +[source,sql] +---- +SELECT order_id, (customer).* +FROM default_redpanda_catalog=>orders +LIMIT 10; +---- + +For schemas with multiple levels of nesting, chain the parenthesized field access: + +[source,sql] +---- +SELECT ((order_metadata).customer).customer_id FROM default_redpanda_catalog=>orders; +---- + +For the full `ROW` reference, including comparison operators, NULL handling, and `::text` casting, see xref:reference:sql/sql-data-types/row.adoc[ROW]. + +[[handle-recursive-cyclic-schemas]] +== Handle recursive (cyclic) schemas + +If your topic schema includes a recursive structure — for example, a `Comment` message that references itself, or two messages that reference each other — mapping the table with `COMPOUND` fails at table-creation time with the following error: + +[source,text] +---- +Cyclic reference at '.' → ''. Cyclic types are not supported in COMPOUND struct mapping policy; use struct_mapping_policy=JSON for recursive types. +---- + +The error message tells you the resolution: re-create the table with `struct_mapping_policy = 'JSON'`. In JSON mode, Redpanda SQL stores each nested structure as a JSON value: + +[source,sql] +---- +CREATE TABLE default_redpanda_catalog=>comments WITH ( + topic = 'comments', + schema_subject = 'comments-value', + struct_mapping_policy = 'JSON' +); +---- + +Query JSON-mapped fields with standard JSON functions instead of ROW field access. See xref:reference:sql/sql-data-types/json.adoc[JSON]. + +== Choose between COMPOUND and JSON + +[cols="<20%,<40%,<40%",options="header"] +|=== +| Policy | Use when | Trade-offs + +| `COMPOUND` +| The topic schema has nested structures that are not recursive, and you want to query nested fields directly by name. +| Typed access; usable in `WHERE`, `GROUP BY`, `ORDER BY`. Required if you also plan to run xref:sql:query-data/query-iceberg-topics.adoc[bridge queries] against an Iceberg catalog, so that nested fields align as typed `ROW` columns on both sides of the union. + +| `JSON` (default) +| The topic schema is recursive, or you prefer flexible access through JSON functions. +| Recursive types supported; fields are untyped until extracted with JSON functions. Bridge queries that compare nested fields across the Kafka topic and the linked Iceberg table do not align cleanly, because Iceberg always exposes nested structures as `ROW` columns. +|=== + +== Next steps + +* xref:sql:query-data/query-streaming-topics.adoc[Query streaming topics]: query a topic without Iceberg history. +* xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg topics]: query the Iceberg-translated history of a topic. Use `struct_mapping_policy = 'COMPOUND'` so nested fields align between the Redpanda topic and the linked Iceberg table. +* xref:reference:sql/sql-data-types/row.adoc[ROW]: full reference for the `ROW` data type, including comparisons, NULL semantics, and conversion to text. +* xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]: complete option list for mapping a Redpanda topic to a SQL table. From f4233e3eaf8bf201e11426583384d905891de4f2 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 13 May 2026 22:26:33 -0700 Subject: [PATCH 3/3] Review pass --- .../pages/query-data/query-nested-fields.adoc | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/modules/sql/pages/query-data/query-nested-fields.adoc b/modules/sql/pages/query-data/query-nested-fields.adoc index e5fe753a1..339d0f8a7 100644 --- a/modules/sql/pages/query-data/query-nested-fields.adoc +++ b/modules/sql/pages/query-data/query-nested-fields.adoc @@ -1,14 +1,14 @@ = Query topics with nested fields -:description: Map a topic with nested protobuf or Avro fields to SQL ROW columns, then query those fields directly. +:description: Map a topic with nested Protobuf or Avro fields to SQL ROW columns, then query those fields directly. :page-topic-type: how-to :personas: app_developer, data_engineer :learning-objective-1: Map a topic with a nested schema as a SQL table using struct_mapping_policy = 'COMPOUND' :learning-objective-2: Query nested fields using ROW field-access syntax :learning-objective-3: Recognize and resolve cyclic-reference errors -When a topic schema includes nested protobuf or Avro message types, you can map those nested structures as SQL `ROW` columns instead of opaque JSON. This makes nested fields queryable by name, includable in projections, and usable in `WHERE`, `GROUP BY`, and `ORDER BY` clauses, without parsing JSON at query time. +When a glossterm:topic[]'s schema includes nested Protobuf or Avro message types, you can map those nested structures as SQL `ROW` columns instead of opaque JSON. This makes nested fields queryable by name, includable in projections, and usable in `WHERE`, `GROUP BY`, and `ORDER BY` clauses, without parsing JSON at query time. -After completing these steps, you will be able to: +After reading this page, you will be able to: * [ ] {learning-objective-1} * [ ] {learning-objective-2} @@ -20,7 +20,7 @@ Before you query a topic with nested fields: * Enable Redpanda SQL on your Redpanda Bring Your Own Cloud (BYOC) cluster. See xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL]. * Connect to Redpanda SQL with `psql` or another PostgreSQL client. See xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]. -* The topic has a schema (Protobuf or Avro) registered in Schema Registry. The schema includes one or more nested message types. +* The topic has a schema (Protobuf or Avro) registered in glossterm:schema-registry[Schema Registry]. The schema includes one or more nested message types. * You have a Redpanda catalog connection. See xref:reference:sql/sql-statements/create-redpanda-catalog.adoc[CREATE REDPANDA CATALOG]. == Map the topic as a SQL table @@ -61,7 +61,7 @@ TIP: To map nested structures as JSON instead, use `struct_mapping_policy = 'JSO == Query nested fields -Access a nested field by its declared name using the `(column).field` form. The parentheses around the column are required: +Access a nested field by its declared name using the `(column).field` form. You must wrap the column in parentheses: [source,sql] ---- @@ -79,11 +79,11 @@ FROM default_redpanda_catalog=>orders LIMIT 10; ---- -For schemas with multiple levels of nesting, chain the parenthesized field access: +For schemas with multiple levels of nesting, chain the parenthesized field access. For example, if `Customer` itself contained a nested `address` message with a `zip_code` field, you would query the zip code as: [source,sql] ---- -SELECT ((order_metadata).customer).customer_id FROM default_redpanda_catalog=>orders; +SELECT ((customer).address).zip_code FROM default_redpanda_catalog=>orders; ---- For the full `ROW` reference, including comparison operators, NULL handling, and `::text` casting, see xref:reference:sql/sql-data-types/row.adoc[ROW]. @@ -91,7 +91,7 @@ For the full `ROW` reference, including comparison operators, NULL handling, and [[handle-recursive-cyclic-schemas]] == Handle recursive (cyclic) schemas -If your topic schema includes a recursive structure — for example, a `Comment` message that references itself, or two messages that reference each other — mapping the table with `COMPOUND` fails at table-creation time with the following error: +Topic schemas can include recursive structures, such as a `Comment` message that references itself or two messages that reference each other. Mapping such a schema with `COMPOUND` fails at table-creation time with the following error: [source,text] ----