Harden embedded etcd: freelist type, automated defrag, monitoring
Follow-up from MIR-967 (production outage caused by wedged etcd). The embedded etcd's BoltDB file grew to 627 MB with only 53 MB of live data due to missing defrag, eventually causing the freelist to become so large that write operations stalled and took down the cluster.
Changes
1. Switch BoltDB freelist type to map
Add --experimental-backend-bbolt-freelist-type=map to the etcd container args in components/etcd/etcd.go. The default array freelist does O(n) page allocation; map uses a hashmap for O(1). This directly mitigates the freelist bloat issue that caused the outage. Available since etcd 3.4.9, stable and widely used despite the experimental- prefix.
2. Automated defrag
Compaction (already configured: periodic/1h) marks old revisions as deleted, but BoltDB never releases pages without explicit defrag. Add a periodic check that triggers defrag when dbSize > 2 * dbSizeInUse. This adapts to any cluster workload — heavy-write clusters defrag more often, quiet ones less.
Considerations:
- Defrag briefly blocks the etcd server (sub-second for typical DB sizes)
- For single-node embedded etcd this means a brief unavailability window
- Should log when defrag runs and how much space was reclaimed
3. etcd health monitoring
Track and expose key etcd health metrics:
db_size(total file size)db_size_in_use(live data)- Bloat ratio (
db_size / db_size_in_use) - Backend commit duration
At minimum, log warnings when the bloat ratio exceeds thresholds. Ideally expose via the metrics endpoint.
Context
- etcd is the primary data store for all entity metadata
- We manage it as a component in
components/etcd/etcd.go - Clusters will vary widely in size and write rates — defaults should be broadly reasonable
- The freelist type change is the highest-impact, lowest-risk item