Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions modules/manage/pages/iceberg/about-iceberg-topics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ ifdef::env-cloud[]
To create an Iceberg table for a Redpanda topic, you must set the cluster configuration property config_ref:iceberg_enabled,true,properties/cluster-properties[`iceberg_enabled`] to `true`, and also configure the topic property `redpanda.iceberg.mode`. You can choose to provide a schema if you need the Iceberg table to be structured with defined columns.
endif::[]

. Set the `iceberg_enabled` configuration option on your cluster to `true`.
. Set the `iceberg_enabled` configuration option on your cluster to `true`.
ifdef::env-cloud[]
+
[tabs]
Expand All @@ -88,7 +88,7 @@ rpk::
[,bash]
----
rpk cloud login
rpk profile create --from-cloud <CLUSTER ID>
rpk profile create --from-cloud <cluster-id>
rpk cluster config set iceberg_enabled true
----
--
Expand Down Expand Up @@ -122,9 +122,13 @@ The link:/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecl
endif::[]
ifndef::env-cloud[]
+
When multiple clusters write to the same catalog, each cluster must use a distinct namespace to avoid table name collisions. This is especially critical for REST catalog providers that offer a single global catalog per account (such as AWS Glue), where there is no other isolation mechanism. By default, Redpanda creates Iceberg tables in a namespace called `redpanda`. To use a unique namespace for your cluster's REST catalog integration, set config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`] at the same time. This property cannot be changed after you enable Iceberg topics on the cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use a unique namespace for your cluster's REST catalog integration, set config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[iceberg_default_catalog_namespace] at the same time.

This sentence seems imcomplete.

"at the same time" -> "at the same time iceberg is enabled"

or -> "while/before enabling iceberg"?

+
[,bash]
----
rpk cluster config set iceberg_enabled true
rpk cluster config set iceberg_enabled true
# Optional: set a custom namespace (default is "redpanda")
# rpk cluster config set iceberg_default_catalog_namespace '["<custom-namespace>"]'
----
+
[,bash,role=no-copy]
Expand Down
4 changes: 4 additions & 0 deletions modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,15 @@ To configure your Redpanda cluster to enable Iceberg on a topic and integrate wi
. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below.
ifndef::env-cloud[]
+
By default, Redpanda creates Iceberg tables in a namespace called `redpanda`. Because AWS Glue provides a single catalog per account, each Redpanda cluster that writes to the same Glue catalog must use a distinct namespace to avoid table name collisions. To set a unique namespace, set config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`] at the same time. This property cannot be changed after Iceberg is enabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+
Run `rpk cluster config edit` to update these properties:
+
[,bash]
----
iceberg_enabled: true
# Set a custom namespace instead of the default "redpanda"
iceberg_default_catalog_namespace: ["<custom-namespace>"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvartolomei Based on the description for ENG-917 I included the property in this example without commenting it out or saying it's optional. Does that sound good, and should I also change the wording on 133 and be explicit that it's required?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it strictly required @nvartolomei or just commonly needed because Glue uses a single catalog (or something like that) and so namespaces are the primary table isolation mechanism. Is that right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not required as we have a default BUT, I believe, you almost always want to set it to something unique as in Glue you get only 1 catalog so once you create second cluster you'll start getting conflicts.

This document should make this very clear to the user so they don't shoot themselves in the foot now or later.

This section:

By default, Redpanda creates Iceberg tables in a namespace called redpanda. To use a custom namespace, set config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[iceberg_default_catalog_namespace] at the same time. This property cannot be changed after Iceberg is enabled.

Suggested change by Claude:

The explanatory paragraph should be expanded to explain the multi-cluster / shared-catalog conflict risk explicitly. Something along the lines of:

"When multiple clusters write to the same catalog, each cluster must use a distinct namespace to avoid table name collisions. This is especially critical for catalog providers that offer a single global catalog per account (e.g., AWS Glue), where there is no other isolation mechanism."

Then, at each point where the user is prompted to enable Iceberg, the docs should actively prompt the user to evaluate whether a unique namespace is needed, with a cross-reference to the expanded rationale above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Glue requires Redpanda Iceberg tables to be manually deleted
iceberg_delete: false
iceberg_catalog_type: rest
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,13 @@ echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='

You should see the topic as a table with data in Unity Catalog. The data may take some time to become visible, depending on your config_ref:iceberg_target_lag_ms,true,properties/cluster-properties[`iceberg_target_lag_ms`] setting.

ifndef::env-cloud[]
. In Catalog Explorer, open your catalog. You should see a `redpanda` schema (or the namespace you configured with config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`]), in addition to `default` and `information_schema`.
endif::[]
ifdef::env-cloud[]
. In Catalog Explorer, open your catalog. You should see a `redpanda` schema, in addition to `default` and `information_schema`.
. The `redpanda` schema and the table residing within this schema are automatically added for you. The table name is the same as the topic name.
endif::[]
. The schema and the table residing within it are automatically added for you. The table name is the same as the topic name.

== Query Iceberg table using Databricks SQL

Expand Down
4 changes: 4 additions & 0 deletions modules/manage/pages/iceberg/query-iceberg-topics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ endif::[]
{"user_id": 2324, "event_type": "BUTTON_CLICK", "ts": "2024-11-25T20:23:59.380Z"}
----

ifndef::env-cloud[]
NOTE: The query examples on this page use `redpanda` as the Iceberg namespace, which is the default. If you configured a different namespace using config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`], replace `redpanda` with your configured namespace.
endif::[]

=== Topic with schema (`value_schema_id_prefix` mode)

NOTE: The steps in this section also apply to the `value_schema_latest` mode, except the produce step. The `value_schema_latest` mode is not compatible with the Schema Registry wire format. The xref:reference:rpk/rpk-topic/rpk-topic-produce[`rpk topic produce`] command embeds the wire format header, so you must use your own producer code with `value_schema_latest`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,12 @@ echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='
You should see the topic as a table in Open Catalog.

. In Open Catalog, select *Catalogs*, then open your catalog.
. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically added for you.
ifndef::env-cloud[]
. Under your catalog, you should see the `redpanda` namespace (or the namespace you configured with config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`]), and a table with the name of your topic. The namespace and the table are automatically added for you.
endif::[]
ifdef::env-cloud[]
. Under your catalog, you should see the `redpanda` namespace and a table with the name of your topic. The namespace and the table are automatically added for you.
endif::[]

== Query Iceberg table in Snowflake

Expand Down
13 changes: 12 additions & 1 deletion modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ To connect to a REST catalog, set the following cluster configuration properties

NOTE: You must set `iceberg_rest_catalog_endpoint` at the same time that you set `iceberg_catalog_type` to `rest`.

ifndef::env-cloud[]
==== Configure table namespace

Check if your REST catalog provider has specific requirements or recommendations for namespaces. For example, AWS Glue offers only a single global catalog per account, and each cluster that writes to the same Glue catalog must use a distinct namespace to avoid table name collisions.

By default, Redpanda creates Iceberg tables in a namespace called `redpanda`. To use a unique namespace, configure the config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`] cluster property. You must set this property before enabling the Iceberg integration or at the same time. After you have enabled Iceberg, do not change this property value.
endif::[]

==== Configure authentication

To authenticate with the REST catalog, set the following cluster properties:
Expand Down Expand Up @@ -272,7 +280,10 @@ The Spark engine can use the REST catalog to automatically discover the topic's
SELECT * FROM streaming.redpanda.<table-name>;
----

The Iceberg table name is the name of your Redpanda topic. Redpanda puts the Iceberg table into a namespace called `redpanda`, creating the namespace if necessary.
The Iceberg table name is the name of your Redpanda topic.
ifndef::env-cloud[]
If you configured a different namespace using config_ref:iceberg_default_catalog_namespace,true,properties/cluster-properties[`iceberg_default_catalog_namespace`], replace `redpanda` with your configured namespace.
endif::[]

TIP: You may need to explicitly create a table for the Iceberg data in your query engine. For an example, see xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[].

Expand Down
Loading