Submit an issue View all issues Source
MIR-1245

Disk lease not released when owning sandbox dies, blocking restarts for ~lease-timeout

Done runtime Bug public
phinze phinze Opened Jun 22, 2026 Updated Jun 23, 2026

Observed on club debugging jobsv2 (koikonom/jobsv2): a fixed-scale-1 app with a single-writer persistent disk (jobs-data, 8GB, mounted /data/, holds a DuckDB file at /data/merged.duckdb).

When a sandbox holding a single-writer disk lease is marked DEAD (here: a failed 15s port health check), its disk lease stays status.bound instead of releasing. Every subsequent boot attempt then fails volume config with disk … has an active lease (status.bound) for sandbox … and is itself marked DEAD. The lease only frees on timeout, which I observed taking ~66 minutes (bound ~14:07, released ~15:14).

This turns a quick app misconfiguration (binding the wrong port) into a ~1-hour wedge plus a graveyard of dozens of DEAD jobsv2-web-* sandboxes, and it's self-perpetuating: when the stale lease finally frees, the replacement grabs it, dies the same way, and re-locks the disk.

Expected: when a sandbox transitions to a terminal/DEAD state, its disk leases should release promptly (or be preemptable by a newer sandbox for the same service), so a replacement can boot without waiting out the full lease timeout.

Evidence: journalctl -u miren.service on miren-club, grep disk-CcAtEJ8SEjy8dKKYLgyih. Key lines: sandbox boot failed, marking DEAD … has an active lease (status.bound); lease disk-lease-CcCkP6Z… bound 14:07, released 15:14.