Skip to content

fix: tls+opentext ports for zookeeper -> only tls (2281)#204

Merged
HardMax71 merged 8 commits intomainfrom
fix/zookeeper-ports
Feb 18, 2026
Merged

fix: tls+opentext ports for zookeeper -> only tls (2281)#204
HardMax71 merged 8 commits intomainfrom
fix/zookeeper-ports

Conversation

@HardMax71
Copy link
Owner

@HardMax71 HardMax71 commented Feb 18, 2026


Summary by cubic

Switch Kafka to single-node KRaft and remove ZooKeeper; bind external ports to localhost. Make saga cancellation atomic with a revision bump and tighten SSE event typing; update compose, CI/CD, and docs.

  • New Features

    • Kafka runs in single-node KRaft (broker+controller); ZooKeeper and its certgen image removed.
    • Compose and workflows bind ports to 127.0.0.1; CI/CD now builds/scans 4 images.
    • Docs and health checks updated for KRaft and the new topology.
  • Bug Fixes

    • Atomic saga cancellation via findOneAndUpdate that bumps revision_id to prevent concurrent write races; orchestrator updated with clear not-found/invalid-state errors.
    • E2E tests deflaked by deriving saga_id from SAGA_STARTED and waiting for the final command.
    • SSE event typing fixed by validating Redis payloads to coerce enums/timestamps and ignore extra keys.

Written for commit cb84794. Summary will update on new commits.

Summary by CodeRabbit

  • Chores

    • Switched Zookeeper and Kafka to TLS/mTLS for client connections with a CA-backed PKI; added automated certificate generation and keystore/truststore provisioning; services wait for certs before starting
    • Replaced plaintext client ports with TLS ports and restricted many service bindings to localhost
  • Documentation

    • Updated deployment healthchecks and guidance for TLS-only configuration
  • Tests

    • E2E tests now wait for a SagaStarted event and derive saga IDs from that event to ensure persistence before proceeding
  • Chores

    • SSE event emission now uses a centralized validation/merging flow for event payloads

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces plaintext ZooKeeper client port with a TLS-only port and adds a CA-backed certificate generation flow producing keystores/truststores for ZooKeeper and a Kafka client; docker-compose now requires and mounts generated certs and uses TLS ports; docs healthcheck updated to use TLS port; tests updated to get saga_id from SAGA_STARTED event; SSE event construction delegated to adapter-based validation.

Changes

Cohort / File(s) Summary
ZooKeeper Core Configuration
backend/zookeeper/conf/zoo.cfg
Removed clientPort=2181 and sslQuorum=true; metricsProvider.httpPort=7070 unchanged (whitespace-only diff).
Certificate Generation
backend/zookeeper/generate-certs.sh
Rewrote to a CA-backed PKI: create CA key/cert, generate and CA-sign ZooKeeper server and Kafka client CSRs, produce PKCS#12 then JKS keystores/truststores for zookeeper and kafka-client; adjusted existence checks and status messages.
Container Orchestration
docker-compose.yaml
Adds zookeeper_certgen service and zookeeper_certs volume, mounts certs into ZooKeeper/Kafka, switches ZooKeeper client port to TLS-only 2281, updates Kafka to use TLS/ZK SSL envs, and binds many services to 127.0.0.1; healthchecks switched to TLS ports.
Operational Documentation
docs/operations/deployment.md
Replaced plain-text ZK healthcheck (`echo ruok
End-to-end Tests
backend/tests/e2e/conftest.py, backend/tests/e2e/test_saga_routes.py
conftest.py: import and return type updated to SagaStartedEvent, assert event type before returning. test_saga_routes.py: obtain saga_id from SAGA_STARTED event via wait_for_saga_started and use it for cancel/status operations.
SSE Service
backend/app/services/sse/sse_service.py
Refactored event construction to delegate coercion/validation to _sse_event_adapter.validate_python, removing manual ResourceUsageDomain construction and direct resource_usage handling.

Sequence Diagram

sequenceDiagram
    actor Operator
    participant CertGen as Certificate Generator
    participant CA as Certificate Authority
    participant ZK as ZooKeeper
    participant Kafka as Kafka
    participant Store as Keystore/Truststore

    Operator->>CertGen: trigger cert generation
    CertGen->>CA: create CA key & self-signed cert
    CA-->>CertGen: CA cert
    CertGen->>CA: submit ZooKeeper CSR
    CA-->>CertGen: ZooKeeper cert (CA-signed)
    CertGen->>CA: submit Kafka client CSR
    CA-->>CertGen: Kafka client cert (CA-signed)
    CertGen->>Store: build PKCS#12 -> JKS keystore & truststore for ZooKeeper
    CertGen->>Store: build PKCS#12 -> JKS keystore & truststore for Kafka client
    Operator->>ZK: start ZooKeeper with TLS config (port 2281) and mounted keystores
    ZK->>Store: load keystore & truststore
    Operator->>Kafka: start Kafka with TLS config and client keystore
    Kafka->>Store: load client keystore & truststore
    Kafka->>ZK: connect via TLS/mTLS (2281)
    ZK-->>Kafka: TLS handshake & client verification
    Operator->>ZK: healthcheck (TLS/metrics port)
    ZK-->>Operator: metrics/health response
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I nibbled roots of keys and crests,

Sprung a CA beneath my nests,
No plain-text whispers on the line,
Keystores tucked, certs signed fine,
Hops of joy — secure connections blessed.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'fix: tls+opentext ports for zookeeper -> only tls (2281)' directly describes the primary change: switching Zookeeper from supporting both TLS and plaintext ports to TLS-only on port 2281.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/zookeeper-ports

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/zookeeper/generate-certs.sh">

<violation number="1" location="backend/zookeeper/generate-certs.sh:5">
P2: Incomplete existence check: truststores are not verified. If the script is interrupted between keystore and truststore creation, a re-run will skip generation, leaving the system without truststores and causing TLS connection failures.</violation>

<violation number="2" location="backend/zookeeper/generate-certs.sh:11">
P1: The new CA private key (`ca.key`) and `kafka-client.key` will be made world-readable by the existing `chmod 644 /certs/*`. A leaked CA key allows signing arbitrary certificates trusted by both Zookeeper and Kafka. Consider restricting private key permissions to 600 and only setting 644 on certificates and keystores.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/zookeeper/generate-certs.sh (1)

33-44: ⚠️ Potential issue | 🟡 Minor

Add explicit -deststoretype JKS to ensure the keystore format matches the filename and intended format.

The keytool -importkeystore command without -deststoretype uses the JDK's default keystore type, which is PKCS12 on Java 9+ (JEP 229). Confluent Platform images use Java 17 or 21 (both defaulting to PKCS12), so without the explicit flag, the resulting keystore will be PKCS12 despite the .jks filename. Add -deststoretype JKS to ensure predictable behavior and match the filename's format intent.

Suggested fix
 keytool -importkeystore \
     -deststorepass zookeeper_keystore_password \
     -destkeypass zookeeper_keystore_password \
     -destkeystore /certs/zookeeper.keystore.jks \
+    -deststoretype JKS \
     -srckeystore /certs/zookeeper.p12 \
     -srcstoretype PKCS12 \
     -srcstorepass zookeeper_keystore_password \
     -alias zookeeper \
     -noprompt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/zookeeper/generate-certs.sh` around lines 33 - 44, The keytool import
step currently relies on the JDK default keystore type and may produce a PKCS12
keystore despite the .jks filename; update the keytool -importkeystore
invocation (the command that references -destkeystore
/certs/zookeeper.keystore.jks and -srckeystore /certs/zookeeper.p12) to include
the explicit flag -deststoretype JKS so the destination keystore format matches
the .jks intent.
docker-compose.yaml (1)

256-259: ⚠️ Potential issue | 🟡 Minor

Remove the unused zoo.cfg from the mounted ./backend/zookeeper/conf directory.

The Confluent cp-zookeeper image generates its ZooKeeper configuration dynamically from ZOOKEEPER_* environment variables into /etc/kafka/zookeeper.properties at startup, not from a zoo.cfg file. The mounted zoo.cfg is ignored and creates a maintenance trap where changes to it will have no effect. However, the directory mount is still needed for log4j.properties, which configures ZooKeeper's logging.

Recommendation: Either remove zoo.cfg from the directory and document that only log4j.properties is mounted, or move zoo.cfg elsewhere to clarify that it's not actively used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 256 - 259, The mounted
./backend/zookeeper/conf directory currently contains an unused zoo.cfg which is
ignored by the cp-zookeeper image (it generates /etc/kafka/zookeeper.properties
from ZOOKEEPER_* env vars); remove zoo.cfg from ./backend/zookeeper/conf (or
move it to a separate docs/ or backups/ directory) and update any documentation
to state that only log4j.properties in that mount is required for ZooKeeper
logging, keeping the volumes entry that mounts
./backend/zookeeper/conf:/etc/kafka/conf:ro solely for log4j.properties.
🧹 Nitpick comments (5)
backend/zookeeper/conf/zoo.cfg (1)

8-11: Hardcoded keystore/truststore passwords in config file.

Passwords like zookeeper_keystore_password and zookeeper_truststore_password are committed in plaintext. For a local dev stack this is pragmatic, but if this config is ever used in a non-dev environment, these should be injected via environment variables or secrets management. Consider adding a comment noting these are dev-only values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/zookeeper/conf/zoo.cfg` around lines 8 - 11, The config currently
contains hardcoded secrets for ssl.keyStore.password and
ssl.trustStore.password; change these to read from environment variables or a
secrets manager (e.g., replace the literal values with references like an env
variable) so passwords are injected at runtime, and add a clear comment next to
ssl.keyStore.location / ssl.trustStore.location indicating the current values
are dev-only placeholders and must not be used in non-dev environments.
backend/zookeeper/generate-certs.sh (2)

5-8: Skip-check doesn't cover truststores.

If a previous run partially failed after creating keystores but before creating truststores, re-running the script would skip generation entirely, leaving a broken PKI state. Consider checking all four output files, or alternatively, check only the CA cert (the first artifact produced).

💡 Suggested improvement
-if [ -f /certs/zookeeper.keystore.jks ] && [ -f /certs/kafka-client.keystore.jks ]; then
+if [ -f /certs/zookeeper.keystore.jks ] && [ -f /certs/zookeeper.truststore.jks ] && \
+   [ -f /certs/kafka-client.keystore.jks ] && [ -f /certs/kafka-client.truststore.jks ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/zookeeper/generate-certs.sh` around lines 5 - 8, The skip-check
currently only verifies /certs/zookeeper.keystore.jks and
/certs/kafka-client.keystore.jks, which can falsely skip when truststores are
missing; update the generation guard in generate-certs.sh to either verify all
four expected output files (e.g., /certs/zookeeper.keystore.jks,
/certs/kafka-client.keystore.jks, /certs/zookeeper.truststore.jks,
/certs/kafka-client.truststore.jks) or, more simply and robustly, check for the
CA certificate file produced first (the CA cert path used earlier in the script)
and only skip when that CA cert exists, so partial runs don't leave PKI in a
broken state.

10-21: Add Subject Alternative Name (SAN) extensions to certificates for best practice.

The generated certificates use CN (e.g., CN=zookeeper) but lack SAN entries. While Java 17 will fall back to CN for DNS hostname matching, SAN is the modern standard per RFC 2818 and future-proofs the certificates against stricter hostname verification policies. Adding SAN is especially important if the environment later enforces stricter verification or uses IP-based connections (Java requires SAN for IP matching and does not fall back to CN).

Example fix for ZooKeeper cert (apply similarly for Kafka client)
 openssl x509 -req -days 365 -in /certs/zookeeper.csr \
     -CA /certs/ca.crt -CAkey /certs/ca.key -CAcreateserial \
-    -out /certs/zookeeper.crt
+    -out /certs/zookeeper.crt \
+    -extfile <(printf "subjectAltName=DNS:zookeeper,DNS:localhost")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/zookeeper/generate-certs.sh` around lines 10 - 21, The certificates
lack Subject Alternative Names (SANs); update generate-certs.sh so the zookeeper
CSR and final certificate include SAN entries (e.g., DNS:zookeeper and any
relevant IPs). Concretely, modify the openssl req invocation that creates
/certs/zookeeper.csr to add SANs (use -addext
"subjectAltName=DNS:zookeeper,IP:127.0.0.1" or an -config/-req_ext section), and
ensure the openssl x509 signing step that writes /certs/zookeeper.crt includes
the same SANs via -extfile/-extensions (or -addext if available) so the issued
/certs/zookeeper.crt carries the subjectAltName extension. Ensure the CA cert
generation remains unchanged.
docker-compose.yaml (2)

285-286: Kafka mounts the entire zookeeper_certs volume, including ZooKeeper's private key.

Both zookeeper (line 259) and kafka (line 335) mount the same zookeeper_certs volume. This gives Kafka read access to zookeeper.key and ca.key, which it doesn't need — Kafka only requires kafka-client.keystore.jks and kafka-client.truststore.jks.

For a dev setup this is acceptable, but in production you'd want separate volumes or a sidecar that copies only the needed files. Worth a TODO comment at minimum.

Also applies to: 335-335

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 285 - 286, zookeeper currently exposes the
entire zookeeper_certs volume (including zookeeper.key and ca.key) to kafka;
restrict kafka to only the certs it needs by creating a separate volume (e.g.,
zookeeper_public_certs) or a sidecar that copies only kafka-client.keystore.jks
and kafka-client.truststore.jks into a kafka-only volume, update the kafka
service to mount that new volume instead of zookeeper_certs, and update the
zookeeper service or its init job (zookeeper-certgen) to populate both volumes
(private keys stay in zookeeper_certs, public keystores go to
zookeeper_public_certs); add a TODO comment near the services noting this
separation for production.

227-231: Hardcoded TLS passwords in docker-compose environment.

Keystore/truststore passwords (zookeeper_keystore_password, kafka_keystore_password, etc.) are hardcoded across the compose file, zoo.cfg, and generate-certs.sh. For a local dev stack this is pragmatic; for any non-dev use, these should be externalized (e.g., Docker secrets, .env file, or secrets: top-level element).

Also applies to: 297-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 227 - 231, Replace hardcoded
keystore/truststore passwords by externalized secrets or environment variables:
change the environment keys ZOOKEEPER_SSL_KEYSTORE_PASSWORD and
ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD (and the analogous kafka_* variables
referenced elsewhere) in the compose service definitions to reference Docker
secrets or an env var (e.g., use secrets: and file-based secret mounts or
${ZOOKEEPER_SSL_KEYSTORE_PASSWORD} from an env_file), update the secret
definitions at top-level and ensure secrets are created at deploy time; then
update any places in zoo.cfg and generate-certs.sh that currently embed the
literal passwords to read them from the corresponding secret file or environment
variable (e.g., read from /run/secrets/<name> or $ENV_VAR), and ensure .env with
real passwords is excluded from VCS and documented for local dev.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/zookeeper/generate-certs.sh`:
- Line 74: The script currently sets permissions for all files in /certs to 644
which makes private keys world-readable; update the permission logic in
generate-certs.sh so that private key files (ca.key, zookeeper.key,
kafka-client.key) are set to 600 (owner read/write only) while leaving
certificate/public files (e.g., *.crt, *.pem) at 644; locate the chmod
invocation (the line containing "chmod 644 /certs/*") and replace it with
targeted chmod calls or a selective loop that applies chmod 600 to those key
filenames and chmod 644 to the rest.

---

Outside diff comments:
In `@backend/zookeeper/generate-certs.sh`:
- Around line 33-44: The keytool import step currently relies on the JDK default
keystore type and may produce a PKCS12 keystore despite the .jks filename;
update the keytool -importkeystore invocation (the command that references
-destkeystore /certs/zookeeper.keystore.jks and -srckeystore
/certs/zookeeper.p12) to include the explicit flag -deststoretype JKS so the
destination keystore format matches the .jks intent.

In `@docker-compose.yaml`:
- Around line 256-259: The mounted ./backend/zookeeper/conf directory currently
contains an unused zoo.cfg which is ignored by the cp-zookeeper image (it
generates /etc/kafka/zookeeper.properties from ZOOKEEPER_* env vars); remove
zoo.cfg from ./backend/zookeeper/conf (or move it to a separate docs/ or
backups/ directory) and update any documentation to state that only
log4j.properties in that mount is required for ZooKeeper logging, keeping the
volumes entry that mounts ./backend/zookeeper/conf:/etc/kafka/conf:ro solely for
log4j.properties.

---

Nitpick comments:
In `@backend/zookeeper/conf/zoo.cfg`:
- Around line 8-11: The config currently contains hardcoded secrets for
ssl.keyStore.password and ssl.trustStore.password; change these to read from
environment variables or a secrets manager (e.g., replace the literal values
with references like an env variable) so passwords are injected at runtime, and
add a clear comment next to ssl.keyStore.location / ssl.trustStore.location
indicating the current values are dev-only placeholders and must not be used in
non-dev environments.

In `@backend/zookeeper/generate-certs.sh`:
- Around line 5-8: The skip-check currently only verifies
/certs/zookeeper.keystore.jks and /certs/kafka-client.keystore.jks, which can
falsely skip when truststores are missing; update the generation guard in
generate-certs.sh to either verify all four expected output files (e.g.,
/certs/zookeeper.keystore.jks, /certs/kafka-client.keystore.jks,
/certs/zookeeper.truststore.jks, /certs/kafka-client.truststore.jks) or, more
simply and robustly, check for the CA certificate file produced first (the CA
cert path used earlier in the script) and only skip when that CA cert exists, so
partial runs don't leave PKI in a broken state.
- Around line 10-21: The certificates lack Subject Alternative Names (SANs);
update generate-certs.sh so the zookeeper CSR and final certificate include SAN
entries (e.g., DNS:zookeeper and any relevant IPs). Concretely, modify the
openssl req invocation that creates /certs/zookeeper.csr to add SANs (use
-addext "subjectAltName=DNS:zookeeper,IP:127.0.0.1" or an -config/-req_ext
section), and ensure the openssl x509 signing step that writes
/certs/zookeeper.crt includes the same SANs via -extfile/-extensions (or -addext
if available) so the issued /certs/zookeeper.crt carries the subjectAltName
extension. Ensure the CA cert generation remains unchanged.

In `@docker-compose.yaml`:
- Around line 285-286: zookeeper currently exposes the entire zookeeper_certs
volume (including zookeeper.key and ca.key) to kafka; restrict kafka to only the
certs it needs by creating a separate volume (e.g., zookeeper_public_certs) or a
sidecar that copies only kafka-client.keystore.jks and
kafka-client.truststore.jks into a kafka-only volume, update the kafka service
to mount that new volume instead of zookeeper_certs, and update the zookeeper
service or its init job (zookeeper-certgen) to populate both volumes (private
keys stay in zookeeper_certs, public keystores go to zookeeper_public_certs);
add a TODO comment near the services noting this separation for production.
- Around line 227-231: Replace hardcoded keystore/truststore passwords by
externalized secrets or environment variables: change the environment keys
ZOOKEEPER_SSL_KEYSTORE_PASSWORD and ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD (and the
analogous kafka_* variables referenced elsewhere) in the compose service
definitions to reference Docker secrets or an env var (e.g., use secrets: and
file-based secret mounts or ${ZOOKEEPER_SSL_KEYSTORE_PASSWORD} from an
env_file), update the secret definitions at top-level and ensure secrets are
created at deploy time; then update any places in zoo.cfg and generate-certs.sh
that currently embed the literal passwords to read them from the corresponding
secret file or environment variable (e.g., read from /run/secrets/<name> or
$ENV_VAR), and ensure .env with real passwords is excluded from VCS and
documented for local dev.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docker-compose.yaml (1)

221-234: Duplicate 4lw whitelist configuration.

-Dzookeeper.4lw.commands.whitelist=* is set in KAFKA_OPTS (line 224) and again as ZOOKEEPER_4LW_COMMANDS_WHITELIST: '*' (line 234). The env-var form is the canonical way for the Confluent image; the JVM property in KAFKA_OPTS is redundant. Also, whitelisting all commands (*) is more permissive than needed — in production you'd typically restrict to mntr, ruok, srvr, stat or the subset you actually use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 221 - 234, Remove the redundant 4lw
whitelist JVM property from KAFKA_OPTS (remove
"-Dzookeeper.4lw.commands.whitelist=*") and rely on the canonical env var
ZOOKEEPER_4LW_COMMANDS_WHITELIST for the Confluent image; also tighten
ZOOKEEPER_4LW_COMMANDS_WHITELIST from '*' to a restricted comma-separated list
such as "mntr,ruok,srvr,stat" (or the minimal subset your deployment requires)
to avoid permitting all 4lw commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose.yaml`:
- Around line 226-231: Replace hardcoded JKS passwords by parameterizing the
compose file to use environment substitution with defaults (like how
MONGO_ROOT_PASSWORD is handled): change ZOOKEEPER_SSL_KEYSTORE_PASSWORD and
ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD to reference env vars (e.g.,
${ZOOKEEPER_SSL_KEYSTORE_PASSWORD:-default}) and do the same for the
kafka_keystore_password/kafka_truststore_password variables; also update the
cert generation script to read those same environment variables so generated
keystore/truststore passwords stay in sync with the compose configuration.

---

Nitpick comments:
In `@docker-compose.yaml`:
- Around line 221-234: Remove the redundant 4lw whitelist JVM property from
KAFKA_OPTS (remove "-Dzookeeper.4lw.commands.whitelist=*") and rely on the
canonical env var ZOOKEEPER_4LW_COMMANDS_WHITELIST for the Confluent image; also
tighten ZOOKEEPER_4LW_COMMANDS_WHITELIST from '*' to a restricted
comma-separated list such as "mntr,ruok,srvr,stat" (or the minimal subset your
deployment requires) to avoid permitting all 4lw commands.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker-compose.yaml (1)

268-273: ⚠️ Potential issue | 🟠 Major

nc -z healthcheck only verifies TCP reachability — TLS misconfiguration still reports healthy.

nc -z localhost 2281 opens a TCP connection and immediately closes it without performing a TLS handshake. ZooKeeper's Netty listener accepts the TCP SYN before any certificate exchange occurs, so this check passes even when the keystore or truststore is misconfigured and Kafka's mTLS negotiation would fail. Since the deployment enforces ZOOKEEPER_SSL_CLIENT_AUTH: need and Kafka connects with KAFKA_ZOOKEEPER_SSL_CLIENT_ENABLE: 'true', a healthy healthcheck followed by TLS handshake failure creates a harder-to-diagnose crash loop.

The suggested openssl s_client probe below is executable: openssl is available in the image, and the PEM certificates generated by the certgen service are mounted at /etc/kafka/certs/ in the ZooKeeper container.

💡 Stronger healthcheck alternative (openssl is available in the image)
    healthcheck:
-     test: ["CMD-SHELL", "nc -z localhost 2281"]
+     test: ["CMD-SHELL", "echo '' | openssl s_client -connect localhost:2281 -CAfile /etc/kafka/certs/ca.crt -cert /etc/kafka/certs/zookeeper.crt -key /etc/kafka/certs/zookeeper.key -quiet 2>&1 | grep -q 'Verify return code: 0'"]
      interval: 3s
      timeout: 5s
      retries: 15
      start_period: 5s

This probe verifies the TLS handshake completes successfully, catching misconfigured keystores before Kafka attempts to connect.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 268 - 273, The current healthcheck block
uses "test: [\"CMD-SHELL\", \"nc -z localhost 2281\"]" which only verifies TCP
reachability; replace the test command in the healthcheck (the "healthcheck"
stanza / "test" field) to run an openssl TLS handshake probe (e.g. using
"openssl s_client") against localhost:2281 and point it to the certs mounted at
/etc/kafka/certs/ so the handshake and client auth are validated; keep the rest
of the healthcheck settings (interval, timeout, retries, start_period) but
ensure the new test returns non-zero on TLS failure so the container is marked
unhealthy when keystore/truststore or mTLS is misconfigured.
🧹 Nitpick comments (2)
docker-compose.yaml (2)

278-282: zookeeper-certgen dependency in Kafka's depends_on is redundant.

Kafka already depends on zookeeper: service_healthy, and zookeeper itself depends on zookeeper-certgen: service_completed_successfully. By the time ZooKeeper is healthy, cert generation has already finished, so the explicit zookeeper-certgen entry in Kafka's depends_on is unreachable dead config.

♻️ Proposed cleanup
     depends_on:
-      zookeeper-certgen:
-        condition: service_completed_successfully
       zookeeper:
         condition: service_healthy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 278 - 282, Remove the redundant depends_on
entry for zookeeper-certgen under the Kafka service: since kafka already depends
on zookeeper with condition service_healthy and the zookeeper service itself
depends on zookeeper-certgen with condition service_completed_successfully, the
explicit zookeeper-certgen dependency in Kafka is dead config; update the Kafka
service's depends_on block to only reference zookeeper (condition:
service_healthy) and remove the zookeeper-certgen entry to simplify the compose
file.

222-224: KAFKA_OPTS duplicates settings already expressed through typed env vars.

ZOOKEEPER_SERVER_CNXN_FACTORY (line 223) already translates to serverCnxnFactory=… in zoo.cfg; the -Dzookeeper.serverCnxnFactory=… JVM system property in KAFKA_OPTS (line 224) sets the same thing via a different mechanism. Likewise ZOOKEEPER_4LW_COMMANDS_WHITELIST: '*' (line 234) duplicates the -Dzookeeper.4lw.commands.whitelist=* flag already in KAFKA_OPTS. The env-var approach is idiomatic for cp-zookeeper and the JVM property duplication can be removed to reduce confusion.

♻️ Proposed cleanup
      KAFKA_OPTS: '-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
-                  -Dzookeeper.4lw.commands.whitelist=*
                   -Djava.security.auth.login.config=/etc/kafka/secrets/kafka_jaas.conf'

Remove the serverCnxnFactory and 4lw flags from KAFKA_OPTS, retaining only the JAAS config system property there (which cannot be expressed as a ZOOKEEPER_ env var).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yaml` around lines 222 - 224, KAFKA_OPTS currently duplicates
settings already provided via ZOOKEEPER_SERVER_CNXN_FACTORY and
ZOOKEEPER_4LW_COMMANDS_WHITELIST; remove the redundant JVM system properties
"-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory"
and "-Dzookeeper.4lw.commands.whitelist=*" from the KAFKA_OPTS value and leave
only the JAAS config property
"-Djava.security.auth.login.config=/etc/kafka/secrets/kafka_jaas.conf", keeping
ZOOKEEPER_SERVER_CNXN_FACTORY and ZOOKEEPER_4LW_COMMANDS_WHITELIST env vars as
the authoritative configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose.yaml`:
- Around line 304-306: Set KAFKA_ZOOKEEPER_SET_ACL to 'true' so ZooKeeper znodes
get ACLs when brokers authenticate via the existing ZOOKEEPER_SSL_CLIENT_AUTH
and KAFKA_OPTS (SASL/mTLS) settings; update the docker-compose env value for
KAFKA_ZOOKEEPER_SET_ACL and ensure the cluster operator runs the
zookeeper-security-migration tool to apply ACLs to existing znodes before
rolling brokers; also verify KAFKA_AUTHORIZER_CLASS_NAME and KAFKA_SUPER_USERS
are configured consistently with ACL-based authorization.
- Around line 211-213: The cp-zookeeper:7.8.2 image requires
ZOOKEEPER_CLIENT_PORT to be set even when using ZOOKEEPER_SECURE_CLIENT_PORT, so
add ZOOKEEPER_CLIENT_PORT (e.g., "2181") to the environment alongside
ZOOKEEPER_SECURE_CLIENT_PORT in the docker-compose service for
confluentinc/cp-zookeeper:7.8.2, do not expose or bind that plaintext port in
the compose ports mapping, and ensure ZooKeeper is configured to disable the
plaintext listener (clientPort) at the application level so only the secure
listener (ZOOKEEPER_SECURE_CLIENT_PORT) is active.

---

Outside diff comments:
In `@docker-compose.yaml`:
- Around line 268-273: The current healthcheck block uses "test: [\"CMD-SHELL\",
\"nc -z localhost 2281\"]" which only verifies TCP reachability; replace the
test command in the healthcheck (the "healthcheck" stanza / "test" field) to run
an openssl TLS handshake probe (e.g. using "openssl s_client") against
localhost:2281 and point it to the certs mounted at /etc/kafka/certs/ so the
handshake and client auth are validated; keep the rest of the healthcheck
settings (interval, timeout, retries, start_period) but ensure the new test
returns non-zero on TLS failure so the container is marked unhealthy when
keystore/truststore or mTLS is misconfigured.

---

Duplicate comments:
In `@docker-compose.yaml`:
- Around line 226-231: The compose snippet currently hardcodes
ZOOKEEPER_SSL_KEYSTORE_PASSWORD and ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD; change
this to pull secrets or environment variables instead (e.g., reference external
Docker secrets or use env_file/${VAR_NAME} substitution) and remove the literal
passwords. Update the service to use secret names or variables (retain the env
var keys ZOOKEEPER_SSL_KEYSTORE_PASSWORD and ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD
so callers remain the same) and add corresponding secret definitions or .env
entries so the actual passwords are not stored in the repo.
- Around line 290-294: The docker-compose currently hardcodes
keystore/truststore passwords in the env vars
KAFKA_ZOOKEEPER_SSL_KEYSTORE_PASSWORD and
KAFKA_ZOOKEEPER_SSL_TRUSTSTORE_PASSWORD (and similarly any other plaintext
creds); remove the literal values and wire these to a secure source instead —
e.g., use Docker secrets (declare secrets and reference them in the service), or
read from an external .env that is excluded from version control, or mount a
credential file and reference it, and update the compose service to use the
secret names (and ensure the secret files/secret store are provisioned and not
checked in).

---

Nitpick comments:
In `@docker-compose.yaml`:
- Around line 278-282: Remove the redundant depends_on entry for
zookeeper-certgen under the Kafka service: since kafka already depends on
zookeeper with condition service_healthy and the zookeeper service itself
depends on zookeeper-certgen with condition service_completed_successfully, the
explicit zookeeper-certgen dependency in Kafka is dead config; update the Kafka
service's depends_on block to only reference zookeeper (condition:
service_healthy) and remove the zookeeper-certgen entry to simplify the compose
file.
- Around line 222-224: KAFKA_OPTS currently duplicates settings already provided
via ZOOKEEPER_SERVER_CNXN_FACTORY and ZOOKEEPER_4LW_COMMANDS_WHITELIST; remove
the redundant JVM system properties
"-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory"
and "-Dzookeeper.4lw.commands.whitelist=*" from the KAFKA_OPTS value and leave
only the JAAS config property
"-Djava.security.auth.login.config=/etc/kafka/secrets/kafka_jaas.conf", keeping
ZOOKEEPER_SERVER_CNXN_FACTORY and ZOOKEEPER_4LW_COMMANDS_WHITELIST env vars as
the authoritative configuration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/sse/sse_service.py`:
- Around line 149-154: The dict currently spreads **msg.data after the explicit
"event_type": msg.event_type which allows msg.data["event_type"] to overwrite
msg.event_type; change the merge order in the return passed to
_sse_event_adapter.validate_python so that **msg.data comes first and then
"event_type": msg.event_type (keeping execution_id and result as-is) to ensure
the explicit msg.event_type wins; update the call site where
_sse_event_adapter.validate_python is invoked (the return statement building the
dict) to reorder keys accordingly.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/app/db/repositories/saga_repository.py">

<violation number="1" location="backend/app/db/repositories/saga_repository.py:88">
P2: `atomic_cancel_saga` bypasses the revision check but doesn’t bump `revision_id`, so concurrent `save_saga()` calls can still succeed after cancellation and mutate a cancelled saga. Update the revision as part of this atomic update to preserve optimistic concurrency guarantees.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@sonarqubecloud
Copy link

@HardMax71 HardMax71 merged commit 6200d83 into main Feb 18, 2026
23 of 24 checks passed
@HardMax71 HardMax71 deleted the fix/zookeeper-ports branch February 18, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments