Overlay IP allocator assigns duplicate IPs to concurrent sandboxes
Bug
The overlay network IP allocator assigned the same IP address (10.8.95.4) to two different running sandboxes, causing cross-app traffic routing. Requests to meet.miren.garden sporadically returned the websocket-echo test page instead.
This is a data-integrity / security issue — traffic intended for one application was delivered to a completely different application.
Timeline
All times UTC, 2026-03-27. Coordinator: miren-garden (main:1226e9a then dev build).
16:05:05 — websocket-echo sandbox CZTsQRFt created, assigned IP 10.8.95.4:3000
dns added sandbox to DNS mapping │ sandbox: websocket-echo-web-CZTsQRFt app: websocket-echo ip: 10.8.95.4
16:37–16:40 — websocket-echo sandbox survived two miren restarts (KillMode=process preserves shims), recovered each time on same IP, confirmed running:
coordinator.activator recovered sandbox │ app: websocket-echo sandbox: CZTsQRFt url: http://10.8.95.4:3000
16:55:00 — New meet deploy, meet sandbox CZTwDFSB created and scheduled to node/miren:
coordinator.sandboxpool created sandbox │ sandbox: meet-web-CZTwDFSB
coordinator.scheduler assigning sandbox to node │ sandbox: meet-web-CZTwDFSB node: node/miren
16:55:01 — Meet sandbox assigned the same IP 10.8.95.4, container starts:
dns added sandbox to DNS mapping │ sandbox: meet-web-CZTwDFSB app: meet ip: 10.8.95.4
runner.sandbox container started │ id: sandbox.meet-web-CZTwDFSB-app
16:55:01 onward — Both sandboxes running on 10.8.95.4:3000. HTTP ingress routes meet.miren.garden to app/meet correctly, but the IP resolves to either sandbox's container depending on timing. Operator observed websocket-echo responses on meet.miren.garden.
16:56:46 — Connection resets as the two containers fight over the same IP:
coordinator.httpingress proxy error │ error: "write tcp 10.8.95.1:37232->10.8.95.4:3000: connection reset by peer" app: meet
Context
This occurred after multiple server restarts during MIR-890 dogfooding. The crash/restart churn may have caused the IP allocator to lose track of which IPs were in use. The websocket-echo sandbox survived restarts via containerd shim preservation (KillMode=process), but the IP allocator may not have accounted for these surviving sandboxes when assigning IPs to new sandboxes.
Impact
- Cross-application traffic routing (security issue)
- Application instability (connection resets)
- Operator confusion (wrong app responding)
Expected behavior
The IP allocator must never assign an IP that is currently in use by a running sandbox. It should either:
- Track all allocated IPs in etcd and refuse duplicates
- Check containerd for running containers before allocating
- Use a lease-based allocation that's tied to sandbox lifecycle