Submit an issue View all issues Source
MIR-786

TLS cert startup race: OIDC requests fail until autocert controller finishes reconciling

Done Bug public
phinze phinze Opened Mar 11, 2026 Updated Mar 12, 2026

After a server restart, TLS handshakes for domains whose http_route hasn't been reconciled yet get the self-signed fallback certificate (which only has localhost in its SANs) instead of the cached ACME cert.

Observed: On miren-garden, after a restart at ~22:13, the OIDC client tried to reach https://multipass.miren.garden before the autocert controller had reconciled that route. The fallback cert was served, and the OIDC discovery request failed with:

x509: certificate is valid for localhost, not multipass.miren.garden

Five seconds later, the cert was provisioned and subsequent requests succeeded.

Root cause: AutocertController.GetCertificate() checks isAllowedHost() before consulting the autocert manager. During startup, allowedHosts is populated incrementally as each route is reconciled. Any TLS handshake arriving before a domain's Reconcile call skips autocert entirely and gets the fallback cert — even though the real cert is sitting in the DirCache on disk from the previous run.

Fix: Remove the isAllowedHost guard in GetCertificate and always try c.mgr.GetCertificate() first. The autocert manager serves cached certs from its DirCache without consulting HostPolicy, so previously-provisioned certs are served immediately on restart. HostPolicy only gates new ACME provisioning, so unknown hosts still can't trigger cert issuance — they fall through to the fallback as before.

Introduced by #661.