Submit an issue View all issues Source
MIR-819

`distributedrunners` labs flag breaks server startup — flannel can't connect to mTLS etcd

Done Bug public
phinze phinze Opened Mar 13, 2026 Updated Mar 13, 2026

Summary

When distributedrunners is enabled (directly or via --labs all), the server hangs during startup and never starts the runner or HTTP ingress. The flannel subnet manager in grunge.Start() doesn't receive TLS credentials, so it can't connect to the mTLS-enabled etcd.

Root Cause

In pkg/grunge/grunge.go:242-246, Start() creates a flannel EtcdConfig with only Endpoints and Prefix — no TLS fields. Meanwhile, NewNetwork() correctly uses mTLS for its own etcd client (to write flannel config), but Start() creates a separate flannel subnet manager that connects without certs.

When distributedrunners is enabled:

  1. etcd starts with --client-cert-auth (mTLS required)
  2. grunge.NewNetwork() writes flannel config to etcd using mTLS ✓
  3. grunge.Start()fetcd.NewLocalManager() connects without TLS ✗
  4. Flannel logs "no certificate provided: connecting to etcd with http" and hangs
  5. Runner and HTTP ingress never start (ports 443/80 never bind)

Observed on miren-garden

Reproduced by setting MIREN_LABS=all and restarting the service. Server appeared to start (coordinator, controllers all came up) but ingress never listened. Reverting to MIREN_LABS=adminapi,routeoidc immediately fixed it since etcd runs without mTLS.

Fix

grunge.Start() needs to pass TLS credentials to flannel's EtcdConfig. The Network struct already has TLSCert, TLSKey, TLSCACert fields populated by the caller. Flannel's EtcdConfig takes file paths (Keyfile, Certfile, CAFile), so the PEM bytes need to be written to temp files and passed through.

Key Files

  • pkg/grunge/grunge.go:242-246 — missing TLS passthrough
  • cli/commands/server.go:275distributedrunners gates etcd mTLS
  • cli/commands/server.go:676-679 — TLS certs are set on NetworkOptions but unused by Start()