Skip to content

clusterd: remove unused CTP server-FQDN validation#36876

Draft
jasonhernandez wants to merge 1 commit into
jason/distroless-orchestrator-fqdnfrom
jason/distroless-ctp-rip-fqdn
Draft

clusterd: remove unused CTP server-FQDN validation#36876
jasonhernandez wants to merge 1 commit into
jason/distroless-orchestrator-fqdnfrom
jason/distroless-ctp-rip-fqdn

Conversation

@jasonhernandez
Copy link
Copy Markdown
Contributor

@jasonhernandez jasonhernandez commented Jun 2, 2026

Stacks on #36872

Base: jason/distroless-orchestrator-fqdn (#36872). Review the diff against that branch — it's just the FQDN removal; the SIGTERM handler lives in the base PR.

Removes the CTP server_fqdn handshake check that CLUSTERD_GRPC_HOST fed. Despite the name it has nothing to do with gRPC (which is now only persist pubsub). In the CTP handshake, clusterd advertised its FQDN and the controller compared it against the address it dialed, failing on mismatch (transport.rs::handshake). That check is:

  • optional — only fires when both sides set a value, so it's already a no-op when the host is unset;
  • narrow — guards only against reaching a misrouted/stale replica (DNS / pod-IP reuse);
  • misnamed, which is what prompted the review question on the previous PR.

The distroless migration removes entrypoint.sh, which set CLUSTERD_GRPC_HOST via hostname --fqdn. Rather than re-plumb that, this removes the feature: drops --grpc-host/CLUSTERD_GRPC_HOST, the server_fqdn field from the CTP Hello, the host_from_address helper, and the test_handshake_fqdn_mismatch test.

Notes

  • Wire change: Hello loses a field. CTP version-gates the handshake (mismatched versions fail and reconnect), so this is safe across a release boundary.
  • test_metrics byte-count bounds were loosened (the handshake shrank).

This is the "rip it out" answer to the FQDN question; it makes #36100 (in-process resolve) unnecessary.

Test plan

  • cargo check -p mz-clusterd -p mz-service -p mz-compute-client -p mz-storage-controller (rustc 1.96.0)
  • cargo test -p mz-service --test transport — green across 10 consecutive runs
  • Confirm controller↔replica CTP connects in a k8s/kind cluster

🤖 Generated with Claude Code

@jasonhernandez jasonhernandez force-pushed the jason/distroless-ctp-rip-fqdn branch from d27b55c to 7e976bd Compare June 2, 2026 20:24
jasonhernandez added a commit that referenced this pull request Jun 2, 2026
Distroless containers run the binary directly as PID 1 (no tini/shell). On
Linux, PID 1 ignores signals with a SIG_DFL disposition, so SIGTERM from
Kubernetes pod termination would be silently dropped. Install an explicit
termination-signal handler in clusterd (environmentd already has one), and
derive CLUSTERD_PROCESS from the StatefulSet ordinal in-process (previously
done by the entrypoint.sh that distroless removes).

Minimal distroless-lifecycle change. #36876 stacks on this to remove the
now-unused CTP server-FQDN validation.

Part of SEC-236 distroless migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacks on #36872 (SIGTERM handler). Removes the optional CTP `server_fqdn`
handshake check: clusterd advertised its FQDN (via CLUSTERD_GRPC_HOST, set by
the now-removed entrypoint.sh) and the controller compared it to the address
it dialed. The check only fired when the value was set, is unrelated to gRPC
despite the name, and guards only against reaching a misrouted replica.

Drops `--grpc-host`/`CLUSTERD_GRPC_HOST`, the `server_fqdn` field from the CTP
`Hello`, the `host_from_address` helper, and the `test_handshake_fqdn_mismatch`
test. CTP version-gates the handshake, so dropping the field is fine across a
release boundary.

Part of SEC-236 distroless migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jasonhernandez jasonhernandez force-pushed the jason/distroless-ctp-rip-fqdn branch from 7e976bd to d5356e4 Compare June 2, 2026 20:34
@jasonhernandez jasonhernandez changed the base branch from main to jason/distroless-orchestrator-fqdn June 2, 2026 20:34
@jasonhernandez jasonhernandez changed the title clusterd: SIGTERM handler + rip out CTP server-FQDN validation (Variant C) clusterd: remove unused CTP server-FQDN validation Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant