MIR-968

Harden embedded etcd: freelist type, automated defrag, monitoring

Done public

phinze Opened Apr 3, 2026 Updated Apr 6, 2026

Harden embedded etcd against freelist bloat

Follow-up from MIR-967 (production outage caused by wedged etcd). The embedded etcd's BoltDB file grew to 627 MB with only 53 MB of live data due to missing defrag, eventually causing the freelist to become so large that write operations stalled and took down the cluster.

Changes

1. Switch BoltDB freelist type to `map`

Add --experimental-backend-bbolt-freelist-type=map to the etcd container args in components/etcd/etcd.go. The default array freelist does O(n) page allocation; map uses a hashmap for O(1). This directly mitigates the freelist bloat issue that caused the outage. Available since etcd 3.4.9, stable and widely used despite the experimental- prefix.

2. Automated defrag

Compaction (already configured: periodic/1h) marks old revisions as deleted, but BoltDB never releases pages without explicit defrag. Add a periodic check that triggers defrag when dbSize > 2 * dbSizeInUse. This adapts to any cluster workload — heavy-write clusters defrag more often, quiet ones less.

Considerations:

Defrag briefly blocks the etcd server (sub-second for typical DB sizes)
For single-node embedded etcd this means a brief unavailability window
Should log when defrag runs and how much space was reclaimed

3. etcd health monitoring

Track and expose key etcd health metrics:

db_size (total file size)
db_size_in_use (live data)
Bloat ratio (db_size / db_size_in_use)
Backend commit duration

At minimum, log warnings when the bloat ratio exceeds thresholds. Ideally expose via the metrics endpoint.

Context