docs (k8s): Document rolling restart impact during Helm to Operator migration by david-yu · Pull Request #1625 · redpanda-data/docs

david-yu · 2026-03-20T16:15:20Z

Summary

Adds a new "Expected impact: rolling restart of broker pods" section to the Helm-to-Operator migration guide
Explains why pods restart (resource adoption, StatefulSet re-render, ControllerRevision mismatch)
Documents impact by cluster configuration (3+ brokers RF>=3, RF=1, single broker)
Adds recommended producer settings (acks=all, retries, idempotence) for zero-downtime migration
Clarifies that Redpanda uses Raft consensus (not Kafka ISR), so there is no min.insync.replicas to configure

Context

Customers migrating from Helm to Operator were surprised that broker pods restart during migration, even when clusterSpec values match existing Helm values and updateStrategy.type: OnDelete is set. The current docs don't warn about this. The restart is unavoidable because the operator re-renders the StatefulSet from its own templates, creating a new ControllerRevision that triggers pod rolling.

Test plan

Verify AsciiDoc renders correctly in the docs site preview
Confirm technical accuracy of the rolling restart sequence against operator source code
Review producer settings recommendations

🤖 Generated with Claude Code

When migrating from Helm to the Redpanda Operator, all broker pods are restarted sequentially even if clusterSpec values match existing Helm values. This adds a new section explaining why the restart happens, impact by cluster configuration, and recommended producer settings to avoid message loss during migration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

netlify · 2026-03-20T16:15:26Z

✅ Deploy Preview for redpanda-docs-preview ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`f46c77b`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-docs-preview/deploys/69bd7a153b8e980008a97203
😎 Deploy Preview	https://deploy-preview-1625--redpanda-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-03-20T16:15:35Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 725a73c3-66bf-492a-a37c-16de963c46c0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This change adds an "Expected impact" section to the Helm-to-operator migration documentation that explains the rolling restart behavior triggered when applying a Redpanda custom resource. The section details the adoption mechanism via server-side apply, StatefulSet spec re-rendering effects, pod deletion sequence with health checks, and timing considerations. It provides scenario-specific guidance based on cluster topology (broker count and replication factor) and recommends producer configuration adjustments for operational resilience during the transition.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested reviewers

JakeSCahill

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description provides context, motivation, and test plan items, but is missing required template sections: JIRA ticket link, review deadline, and page preview URL.	Add the missing template sections: Resolves link with JIRA ticket number, review deadline, and Netlify page preview URL for the modified documentation page.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: documenting rolling restart impact during Helm to Operator migration, which matches the PR's core objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Fix passive voice ("is flagged" → "the operator flags") - Fix nested list continuation under ordered list (add +/-- block) - Use active voice consistently in list items - Use "Apache Kafka" on first reference per terminology standards - Add serial commas where missing - Ensure parallel structure in list items Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

modules/migrate/pages/kubernetes/helm-to-operator.adoc (2)
109-117: Consider mentioning write caching as a tuning alternative.

The current recommendations prioritize maximum durability with acks=all. For users who prioritize lower latency during migration and can tolerate some risk, write caching (documented in modules/develop/pages/config-topics.adoc#configure-write-caching) offers a relaxed mode of acks=all that acknowledges on majority without waiting for fsync.

This is optional—the current conservative recommendation is appropriate for most migration scenarios. However, mentioning write caching would give advanced users awareness of tuning options for their specific durability/performance trade-offs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/migrate/pages/kubernetes/helm-to-operator.adoc` around lines 109 -
117, Add a brief note after the "Recommended producer settings" block mentioning
write caching as an optional, lower-latency tuning alternative for advanced
users: reference the existing settings `acks=all`, `retries`, and
`enable.idempotence=true`, explain that write caching relaxes fsync waits while
still acknowledging to a majority (link to the existing docs at
modules/develop/pages/config-topics.adoc#configure-write-caching), and state it
is only recommended when users can tolerate reduced durability during migration.
93-108: Well-structured impact scenarios.

The table clearly differentiates impact based on broker count and replication factor. The explanation of Raft quorum semantics (2 of 3 brokers for RF=3) correctly justifies why writes continue during individual broker restarts.

Consider using the glossterm macro for domain-specific terms like "replication factor", "quorum", and "Raft consensus" for consistency with the repository's glossary usage. Based on learnings, the syntax would be glossterm:replication-factor[] or glossterm:raft-consensus[].
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/migrate/pages/kubernetes/helm-to-operator.adoc` around lines 93 -
108, Update the document to wrap domain-specific terms with the repository
glossary macro: replace occurrences of "replication factor", "quorum", and "Raft
consensus" in the "Impact by cluster configuration" section (the table content
in helm-to-operator.adoc) with the glossterm macro forms (e.g.,
glossterm:replication-factor[], glossterm:quorum[], glossterm:raft-consensus[])
to ensure consistent glossary linking and styling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/migrate/pages/kubernetes/helm-to-operator.adoc`:
- Around line 109-117: Replace the duplicated producer configuration block (the
recommendations for `acks=all`, `retries`, and `enable.idempotence=true` and the
NOTE about Raft) with a single xref to the authoritative producer guide
`modules/develop/pages/produce-data/configure-producers.adoc` using
empty-bracket xref syntax
`xref:modules/develop/pages/produce-data/configure-producers.adoc[]`; in
addition, keep a short migration-specific callout that explicitly highlights two
critical bits of context from the guide: that `enable.idempotence=true` must be
used together with `acks=all` to guarantee idempotent behavior, and that high
`retries` can interact with `max.in.flight.requests.per.connection` to affect
ordering (point readers to the corresponding sections in the referenced doc for
full details).

---

Nitpick comments:
In `@modules/migrate/pages/kubernetes/helm-to-operator.adoc`:
- Around line 109-117: Add a brief note after the "Recommended producer
settings" block mentioning write caching as an optional, lower-latency tuning
alternative for advanced users: reference the existing settings `acks=all`,
`retries`, and `enable.idempotence=true`, explain that write caching relaxes
fsync waits while still acknowledging to a majority (link to the existing docs
at modules/develop/pages/config-topics.adoc#configure-write-caching), and state
it is only recommended when users can tolerate reduced durability during
migration.
- Around line 93-108: Update the document to wrap domain-specific terms with the
repository glossary macro: replace occurrences of "replication factor",
"quorum", and "Raft consensus" in the "Impact by cluster configuration" section
(the table content in helm-to-operator.adoc) with the glossterm macro forms
(e.g., glossterm:replication-factor[], glossterm:quorum[],
glossterm:raft-consensus[]) to ensure consistent glossary linking and styling.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 448631bc-9409-4fdb-918b-d99cd8987722

📥 Commits

Reviewing files that changed from the base of the PR and between 4230c3f and 43ab2e4.

📒 Files selected for processing (1)

modules/migrate/pages/kubernetes/helm-to-operator.adoc

coderabbitai · 2026-03-20T16:19:55Z

modules/migrate/pages/kubernetes/helm-to-operator.adoc

+=== Recommended producer settings
+
+To avoid message loss during the rolling restart, configure producers with:
+
+- `acks=all` (or `-1`): Waits for the write to be committed by the Raft quorum (majority of replicas).
+- `retries`: Set to a high value to handle `NOT_LEADER_FOR_PARTITION` errors during leader elections.
+- `enable.idempotence=true`: Prevents duplicate messages from retries.
+
+NOTE: Redpanda uses Raft-based replication, not Kafka's in-sync replica (ISR) mechanism. With Raft, writes require a majority quorum (2 of 3 for RF=3). There is no `min.insync.replicas` setting to configure. The replication factor alone determines fault tolerance.


🛠️ Refactor suggestion | 🟠 Major

Replace inline producer guidance with xref to comprehensive guide.

This section duplicates content from the existing comprehensive producer configuration guide at modules/develop/pages/produce-data/configure-producers.adoc. Inline duplication creates maintenance burden and risks inconsistency.

Additionally, the recommendations are missing critical context:

Line 115 suggests enable.idempotence=true but doesn't mention that true idempotent behavior requires pairing with acks=all. The comprehensive guide (configure-producers.adoc:138-143) explicitly states: "To guarantee true idempotent behavior, you must also set acks=all."

Line 114 suggests high retries without mentioning ordering risks related to max.in.flight.requests.per.connection, which is documented in configure-producers.adoc:91-116.

♻️ Proposed refactor to reference comprehensive producer guide

Replace lines 109-117 with a focused reference to the authoritative guide:

=== Recommended producer settings -To avoid message loss during the rolling restart, configure producers with: - -- `acks=all` (or `-1`): Waits for the write to be committed by the Raft quorum (majority of replicas). -- `retries`: Set to a high value to handle `NOT_LEADER_FOR_PARTITION` errors during leader elections. -- `enable.idempotence=true`: Prevents duplicate messages from retries. - -NOTE: Redpanda uses Raft-based replication, not Kafka's in-sync replica (ISR) mechanism. With Raft, writes require a majority quorum (2 of 3 for RF=3). There is no `min.insync.replicas` setting to configure. The replication factor alone determines fault tolerance. +To avoid message loss during the rolling restart, configure producers for durability and resilience. See xref:develop:produce-data/configure-producers.adoc[] for comprehensive guidance. Key settings include: + +- xref:develop:produce-data/configure-producers.adoc#acksall[`acks=all`] to ensure writes are committed by the Raft quorum (majority of replicas) +- xref:develop:produce-data/configure-producers.adoc#retries[`retries`] set to a high value to handle transient errors during leader elections +- xref:develop:produce-data/configure-producers.adoc#enableidempotence[`enable.idempotence=true`] paired with `acks=all` to prevent duplicate messages + +NOTE: Redpanda uses Raft-based replication, not Kafka's in-sync replica (ISR) mechanism. With Raft, writes require a majority quorum (2 of 3 for RF=3). The replication factor alone determines fault tolerance.

This approach maintains migration-specific context while linking to the authoritative source for configuration details. Based on learnings, xref links should use empty brackets [] to automatically pull the target document's title when publishing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/migrate/pages/kubernetes/helm-to-operator.adoc` around lines 109 - 117, Replace the duplicated producer configuration block (the recommendations for `acks=all`, `retries`, and `enable.idempotence=true` and the NOTE about Raft) with a single xref to the authoritative producer guide `modules/develop/pages/produce-data/configure-producers.adoc` using empty-bracket xref syntax `xref:modules/develop/pages/produce-data/configure-producers.adoc[]`; in addition, keep a short migration-specific callout that explicitly highlights two critical bits of context from the guide: that `enable.idempotence=true` must be used together with `acks=all` to guarantee idempotent behavior, and that high `retries` can interact with `max.in.flight.requests.per.connection` to affect ordering (point readers to the corresponding sections in the referenced doc for full details).

- Rename "Why pods restart" to "How the rolling restart works" - Remove NOTE about Raft vs ISR comparison as it is not needed here Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu requested a review from a team as a code owner March 20, 2026 16:15

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

docs: Remove ISR note and rename rolling restart subsection

2e2ef3e

- Rename "Why pods restart" to "How the rolling restart works" - Remove NOTE about Raft vs ISR comparison as it is not needed here Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

david-yu changed the title ~~docs: Document rolling restart impact during Helm to Operator migration~~ docs (k8s): Document rolling restart impact during Helm to Operator migration Mar 20, 2026

docs: Rename heading to clarify migration context

f46c77b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs (k8s): Document rolling restart impact during Helm to Operator migration#1625

docs (k8s): Document rolling restart impact during Helm to Operator migration#1625
david-yu wants to merge 4 commits intoredpanda-data:mainfrom
david-yu:doc/helm-to-operator-rolling-restart-impact

david-yu commented Mar 20, 2026

Uh oh!

netlify bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Review skipped

Walkthrough

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

david-yu commented Mar 20, 2026

Summary

Context

Test plan

Uh oh!

netlify bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for redpanda-docs-preview ready!

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify bot commented Mar 20, 2026 •

edited

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading