Iceberg: Manage access for end users/clients#1648
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request adds a new documentation section to the Iceberg configuration guide that instructs users how to grant read access for query engines and end users to Iceberg data. The section outlines two complementary access control strategies: storage bucket/prefix-level access through cloud provider IAM/RBAC mechanisms (AWS S3, GCP GCS, Azure Blob Storage) and catalog-level table access via REST catalog access control layers (AWS Glue Lake Formation, Databricks Unity Catalog, Snowflake Open Catalog, GCP BigLake). The new guidance is positioned between existing sections on accessing Iceberg tables and refreshing table data. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
modules/manage/pages/iceberg/query-iceberg-topics.adoc (1)
83-83: Consider using the established bucket naming pattern.The placeholder
<cluster-storage-bucket-name>is generic, while lines 36 and 45 establish the specific patternredpanda-cloud-storage-<cluster-id>. For consistency and clarity, consider updating the example to match:s3:GetObject and s3:ListBucket on the Iceberg prefix (for example, `redpanda-cloud-storage-<cluster-id>/redpanda-iceberg-catalog/*`)This helps readers connect the permission example to the bucket naming convention already documented in the file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/manage/pages/iceberg/query-iceberg-topics.adoc` at line 83, Replace the generic placeholder `<cluster-storage-bucket-name>` in the AWS S3 permission example with the established bucket naming pattern used earlier (`redpanda-cloud-storage-<cluster-id>`) so the example reads like `redpanda-cloud-storage-<cluster-id>/redpanda-iceberg-catalog/*`; update the sentence that references `s3:GetObject` and `s3:ListBucket` accordingly to keep consistency with the naming convention already shown on lines referencing `redpanda-cloud-storage-<cluster-id>`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/manage/pages/iceberg/query-iceberg-topics.adoc`:
- Around line 91-94: Replace the incorrect Snowflake Open Catalog link used in
the "Snowflake Open Catalog: See
https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog
access control^]." bullet by updating the URL to the correct one
(https://docs.snowflake.com/en/user-guide/opencatalog/access-control) so the
"Snowflake Open Catalog" link points to the proper documentation; locate the
string "Snowflake Open Catalog" or the existing incorrect URL and substitute it
with the corrected URL.
---
Nitpick comments:
In `@modules/manage/pages/iceberg/query-iceberg-topics.adoc`:
- Line 83: Replace the generic placeholder `<cluster-storage-bucket-name>` in
the AWS S3 permission example with the established bucket naming pattern used
earlier (`redpanda-cloud-storage-<cluster-id>`) so the example reads like
`redpanda-cloud-storage-<cluster-id>/redpanda-iceberg-catalog/*`; update the
sentence that references `s3:GetObject` and `s3:ListBucket` accordingly to keep
consistency with the naming convention already shown on lines referencing
`redpanda-cloud-storage-<cluster-id>`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b007682f-c6e9-4b4d-8926-5cfd95dabd42
📒 Files selected for processing (1)
modules/manage/pages/iceberg/query-iceberg-topics.adoc
| * AWS Glue: Use https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html[AWS Lake Formation^] to grant table-level and column-level permissions. | ||
| * Databricks Unity Catalog: See the https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html[Unity Catalog privileges documentation^]. | ||
| * Snowflake Open Catalog: See https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog access control^]. | ||
| * GCP BigLake: See https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions[BigLake table permissions^]. |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
`Are the following catalog provider documentation links valid and accessible?
- AWS Lake Formation: https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
- Databricks Unity Catalog privileges: https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html
- Snowflake Open Catalog access control: https://other-docs.snowflake.com/en/opencatalog/access-control
- GCP BigLake permissions: https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions`
💡 Result:
- AWS Lake Formation: Valid and accessible. 2. Databricks Unity Catalog privileges: Valid and accessible (note: exact path is /aws/en/data-governance/unity-catalog/manage-privileges/, but content matches). 3. Snowflake Open Catalog access control: Valid and accessible. 4. GCP BigLake permissions: Valid and accessible (note: full page is https://cloud.google.com/bigquery/docs/manage-open-source-metadata, which covers permissions including grant_permissions section).
Citations:
- 1: https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
- 2: https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/
- 3: https://docs.snowflake.com/en/user-guide/opencatalog/access-control
- 4: https://docs.cloud.google.com/bigquery/docs/manage-open-source-metadata
Fix incorrect Snowflake Open Catalog documentation link.
The Snowflake Open Catalog documentation link uses the incorrect domain other-docs.snowflake.com. The correct URL is: https://docs.snowflake.com/en/user-guide/opencatalog/access-control
The other three catalog documentation links (AWS Lake Formation, Databricks Unity Catalog, and GCP BigLake) are valid and accessible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/manage/pages/iceberg/query-iceberg-topics.adoc` around lines 91 - 94,
Replace the incorrect Snowflake Open Catalog link used in the "Snowflake Open
Catalog: See https://other-docs.snowflake.com/en/opencatalog/access-control[Open
Catalog access control^]." bullet by updating the URL to the correct one
(https://docs.snowflake.com/en/user-guide/opencatalog/access-control) so the
"Snowflake Open Catalog" link points to the proper documentation; locate the
string "Snowflake Open Catalog" or the existing incorrect URL and substitute it
with the corrected URL.
Feediver1
left a comment
There was a problem hiding this comment.
Docs standards review
Files reviewed: 1 .adoc file (modules/manage/pages/iceberg/query-iceberg-topics.adoc)
Net diff: 25+/0-.
What this PR does
Adds a new === Grant access to query engine users H3 section under the existing == Access Iceberg tables H2. The section explains the Redpanda-vs-customer responsibility split (Redpanda manages service-to-service catalog permissions; customers handle end-user/query-engine read access) and presents two access-control approaches with concrete vendor-doc links:
- Cloud storage prefix-level access — IAM/RBAC scoping on the underlying S3/GCS/Blob Storage bucket prefix, with specific actions (
s3:GetObject,storage.objects.get, "Storage Blob Data Reader") - Catalog-level table access — REST-catalog access control via Lake Formation, Unity Catalog, Open Catalog, or BigLake
As a side benefit, the PR also adds an === Refresh table data H3 above what was previously unanchored prose, making the "manual refresh" guidance anchorable for the first time.
Jira ticket alignment
Ticket: DOC-1692 (per the branch name; PR body has the placeholder <jira-ticket> still in it).
Status: Addressed. Fills a real content gap — the page already mentioned that the customer is responsible for end-user access without telling them how to grant it. The new section makes that responsibility explicit and gives them the IAM/RBAC primitives to act on.
Critical issues
None.
Suggestions
-
CodeRabbit's placeholder-consistency finding has merit. The existing page (lines 36 and 45) establishes the bucket naming pattern as
redpanda-cloud-storage-<cluster-id>. The new AWS bullet uses a generic<cluster-storage-bucket-name>instead:`<cluster-storage-bucket-name>/redpanda-iceberg-catalog/*`Aligning to the established pattern would be:
`redpanda-cloud-storage-<cluster-id>/redpanda-iceberg-catalog/*`Minor but worth picking up since the convention already exists on the same page.
-
PR description has a placeholder Jira link —
Resolves https://redpandadata.atlassian.net/browse/<jira-ticket>still literally contains the placeholder. Branch name says DOC-1692; fill it in. -
None of the PR's "Checks" boxes are ticked, but the change is clearly a "Content gap" (per the PR's own framing of customer-vs-Redpanda responsibility for end-user access). Worth ticking that box.
-
No Preview pages section filled in. Standard for docs PRs; the template still has the example URL. A real preview link to
https://deploy-preview-1648--redpanda-docs-preview.netlify.app/current/manage/iceberg/query-iceberg-topics/#grant-access-to-query-engine-userswould speed reviewers. -
"Storage Blob Data Reader" not in a code span. This is a proper-noun Azure RBAC role name. Consistent with how the doc treats other proper-noun product names (AWS Lake Formation, Databricks Unity Catalog), so probably correct as-is. Worth a quick sanity-check against sibling Azure-RBAC mentions elsewhere in the docs to confirm convention.
-
Consider a cross-link from the Iceberg integration guides. Pages like Snowflake catalog setup, AWS Glue catalog setup, etc., might benefit from a "For end-user access, see <>" pointer back to this new section. The integration guides probably document the Redpanda-to-catalog permissions in isolation; pointing readers to the end-user side closes the loop. Out of scope for this PR but worth a follow-up.
-
The new section renders for both cloud (
ifdef::env-cloud[]) and self-managed (ifndef::env-cloud[]) — correct scoping. The "you are responsible for granting end-user access" framing applies regardless of deployment model. ✓
Impact on other files
- No nav.adoc / antora.yml changes needed — just adding content to an existing page.
- No xref breakage — only new headings added; nothing renamed or moved.
- Iceberg integration guides (Snowflake / AWS Glue / Unity Catalog / Polaris pages, if they exist) could optionally link back to this new section. See Suggestion 6.
- No What's New entry needed — this is documenting existing customer responsibility, not announcing a new feature.
CodeRabbit findings worth considering
- Placeholder consistency on line 83 — see Suggestion 1.
What works well
- Clear two-approach framing (prefix-level vs catalog-level) with explicit "you can use them together." Readers don't have to guess whether they pick one or stack them.
- Specific IAM actions called out (
s3:GetObject,s3:ListBucket,storage.objects.get,storage.objects.list) rather than vague "grant read access." A customer can copy these into an IAM policy directly. - Real vendor doc links for each cloud + catalog combination — AWS, GCP, Azure, AWS Lake Formation, Databricks Unity Catalog, Snowflake Open Catalog, GCP BigLake. The customer can click through to authoritative reference without us having to maintain detailed how-tos in our docs.
- Responsibility framing is explicit: "Redpanda manages the service-to-service permissions… However, you are responsible for granting your end users and query engines…" — directly addresses the support-question pattern where customers assume Redpanda's catalog access covers their app's access too.
- Implicit cleanup of "Refresh table data" prose — adding an H3 over what was previously unanchored makes it linkable from elsewhere. Bonus value beyond the PR's stated goal.
- Symmetric coverage for the three major cloud providers and the four major catalog vendors. No "AWS-only" omission for a feature that works on all three.
Process note
PR has been open 2 months without human review. The Iceberg topic area has its own SME pool worth pinging if you want this to land — could be Iceberg/SQL engineering or a Cloud product manager.
|
@kbatuigas Open for 2 months with no updates. Can you please provide a status update? thx |
d6ec360 to
d519140
Compare
Description
This pull request adds a new section to the Iceberg documentation, clarifying how to grant query engine users access to Iceberg data. The update explains both cloud storage-level and catalog-level access control, providing practical guidance and links for AWS, GCP, Azure, and popular catalogs.
Access control documentation improvements:
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Checks