You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wait for the renewal daemon's next cycle (~2 minutes). Re-compare:
diff <(openssl x509 -in /tmp/server.crt -pubkey -noout | openssl pkey -pubin -outform DER | sha256sum) \
<(openssl pkey -in /tmp/server.key -pubout -outform DER | sha256sum)
# → DIFFER. cert.pubkey is the keypair from step 1 (cached in the daemon),
# key.pubkey is the keypair from step 4 (on disk).
The daemon's stdout reports the renewal as successful. No error, no warning.
Reproducible on Alpine 3.23 container, Linux 5.15 host
Expected Behavior
One of:
Per-renewal re-read. Before each renewal, re-read key_path from disk and use that key to sign the CSR. Slight added I/O cost per cycle, but matches the user expectation that the daemon "operates on the files at the given paths."
Detect change and exit / warn. Stat the key file at renewal time; if mtime or inode has changed since startup, log an error and exit (or refuse to renew) rather than silently producing a mismatched pair.
Documentation. If caching is intentional and considered correct behavior, the man page / docs should explicitly state: "the key file MUST NOT be modified while the daemon is running; replace it only by stopping the daemon, replacing the file, and restarting." Currently this constraint is invisible to users.
Actual Behavior
The daemon silently signs CSRs with its in-memory cached key indefinitely, producing certs that fail validation against the on-disk key for any downstream consumer.
Additional Context
step ca renew --daemon reads the certificate and private key once at process startup and re-uses the cached private key to sign the CSR for every subsequent renewal. It never re-reads the key file from disk. If the on-disk key is replaced externally (manual rotation, backup restore, GitOps deploy, a separate step ca certificate invocation, etc.) while the daemon is running, every following renewal produces a certificate whose pubkey matches the cached key, not the current on-disk key. The renewal itself reports success — exit 0, --exec hook fires, log says "certificate renewed" — and the mismatch only surfaces when a downstream consumer reloads TLS from the same files and fails with tls: private key does not match public key.
Impact
This bit our cert-rotation pipeline (multi-cert, openbao stack). Recovery procedure for any mismatch involves regenerating the keypair via a separate provisioner-direct issuance — which, when run while the daemon is alive, creates the very mismatch we were trying to fix. The pattern is identifiable: the bad cert's pubkey is identical across multiple consecutive failed renewals even though the on-disk key has been regenerated to different values between failures. The "ghost" pubkey is the value of the key file at daemon-startup time.
Anyone running the daemon long-lived and rotating keys out-of-band — GitOps pipelines, backup/restore drills, multi-cert rotators, our own setup — is exposed.
Proposed fix
PR #1441 (currently open against issue #1343) is already extending (r *renewer).Daemon in command/ca/renew.go with a per-cycle rekeyFunc callback:
rekeyFunc fires after each successful renewal cycle and, when set by rekey --daemon, generates a fresh key. For plain renew --daemon, rekeyFunc is nil, so the existing renewal path is untouched and the cached-key bug described above persists.
The natural extension is a symmetric callback that re-loads the on-disk key before the CSR is constructed:
// proposed extension on top of PR #1441func (r*renewer) Daemon(
outFilestring,
next, expiresIn, renewPeriod time.Duration,
afterRenewfunc() error,
rekeyFuncfunc() error,
reloadKeyFuncfunc() error, // <-- new; nil = preserve current behavior
) error {
// ... existing loop ...for {
select {
case<-tickerC:
ifreloadKeyFunc!=nil {
iferr:=reloadKeyFunc(); err!=nil {
errLog.Println(err)
continue// skip this cycle; try again next tick
}
}
// ... existing renew code that constructs the CSR + calls /renew ...
}
}
}
reloadKeyFunc would be implemented by renewCertificateAction (the renew command's setup) as something like:
reloadKey:=func() error {
signer, err:=cryptoutil.CreateSigner(kmsURI, keyFile,
pemutil.WithFilename(keyFile),
pemutil.WithPasswordFile(passFile),
)
iferr!=nil {
returnerr
}
r.signer=signer// or whatever field the renewer stores the cached key inreturnnil
}
Properties of this extension:
Backwards compatible. Callers that pass nil (or that haven't been updated for the new signature) get exactly the current behavior. No surprise regressions for users who rely on the cached-key semantics (if any do).
Optional KMS consideration. If the key is in a KMS rather than a file, reloadKeyFunc is a no-op — the KMS handle is already a live reference, not a cached blob. Worth a sentence in the docs but no code change.
Workaround
Replace step ca renew --daemon with a bash poll loop calling one-shot step ca renew --force per cycle. Each invocation re-reads cert + key fresh from disk, sidestepping the cache. We've shipped this fix locally and it eliminates the failure class. Not a long-term answer — step ca renew --daemon should Just Work for the documented use case.
Related
smallstep/cli#1343 — rekey --daemon silent failure (sister bug). Same file, same root cause shape (key handling tied to daemon-startup state rather than per-cycle state), opposite manifestation: rekey-daemon never rotates the key when it should; renew-daemon rotates a stale cached key when it shouldn't.
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Steps to Reproduce
Issue an initial server cert/key:
Start the renewal daemon. To make the cycle observable quickly, use
--renew-period:Wait for at least one successful renewal (
certificate renewed, next in 2m0sin the daemon log). Confirm cert/key still match:Externally replace the keypair without notifying the daemon — equivalent to a manual key rotation:
Confirm cert and key still match (they were just minted together):
Wait for the renewal daemon's next cycle (~2 minutes). Re-compare:
The daemon's stdout reports the renewal as successful. No error, no warning.
Your Environment
step-cli0.30.2 (Linux/amd64)step-ca0.30.2 (admin provisioner, JWK, badger v2 db)Expected Behavior
One of:
key_pathfrom disk and use that key to sign the CSR. Slight added I/O cost per cycle, but matches the user expectation that the daemon "operates on the files at the given paths."Actual Behavior
The daemon silently signs CSRs with its in-memory cached key indefinitely, producing certs that fail validation against the on-disk key for any downstream consumer.
Additional Context
step ca renew --daemonreads the certificate and private key once at process startup and re-uses the cached private key to sign the CSR for every subsequent renewal. It never re-reads the key file from disk. If the on-disk key is replaced externally (manual rotation, backup restore, GitOps deploy, a separatestep ca certificateinvocation, etc.) while the daemon is running, every following renewal produces a certificate whose pubkey matches the cached key, not the current on-disk key. The renewal itself reports success — exit 0,--exechook fires, log says "certificate renewed" — and the mismatch only surfaces when a downstream consumer reloads TLS from the same files and fails withtls: private key does not match public key.Impact
This bit our cert-rotation pipeline (multi-cert, openbao stack). Recovery procedure for any mismatch involves regenerating the keypair via a separate provisioner-direct issuance — which, when run while the daemon is alive, creates the very mismatch we were trying to fix. The pattern is identifiable: the bad cert's pubkey is identical across multiple consecutive failed renewals even though the on-disk key has been regenerated to different values between failures. The "ghost" pubkey is the value of the key file at daemon-startup time.
Anyone running the daemon long-lived and rotating keys out-of-band — GitOps pipelines, backup/restore drills, multi-cert rotators, our own setup — is exposed.
Proposed fix
PR #1441 (currently open against issue #1343) is already extending
(r *renewer).Daemonincommand/ca/renew.gowith a per-cyclerekeyFunccallback:rekeyFuncfires after each successful renewal cycle and, when set byrekey --daemon, generates a fresh key. For plainrenew --daemon,rekeyFuncisnil, so the existing renewal path is untouched and the cached-key bug described above persists.The natural extension is a symmetric callback that re-loads the on-disk key before the CSR is constructed:
reloadKeyFuncwould be implemented byrenewCertificateAction(therenewcommand's setup) as something like:Properties of this extension:
nil(or that haven't been updated for the new signature) get exactly the current behavior. No surprise regressions for users who rely on the cached-key semantics (if any do).renew --daemonandrekey --daemon) become robust against external key changes between cycles.reloadKeyFuncis a no-op — the KMS handle is already a live reference, not a cached blob. Worth a sentence in the docs but no code change.Workaround
Replace
step ca renew --daemonwith a bash poll loop calling one-shotstep ca renew --forceper cycle. Each invocation re-reads cert + key fresh from disk, sidestepping the cache. We've shipped this fix locally and it eliminates the failure class. Not a long-term answer —step ca renew --daemonshould Just Work for the documented use case.Related
rekey --daemonsilent failure (sister bug). Same file, same root cause shape (key handling tied to daemon-startup state rather than per-cycle state), opposite manifestation: rekey-daemon never rotates the key when it should; renew-daemon rotates a stale cached key when it shouldn't.Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).