Sandbox pool: add backoff for repeated sandbox failures
Problem
When a sandbox crashes immediately after creation, the pool controller creates a replacement with no delay. If the underlying cause persists (e.g. corrupted data volume, missing config), this produces a rapid accumulation of dead sandboxes and constant churn.
On James's server, postgres couldn't start due to corrupted WAL data. The pool controller created 15+ dead postgres sandboxes in rapid succession, each dying immediately, with new ones spawning as fast as old ones were cleaned up.
The activator's fail-fast check already tracks dead sandbox counts per pool (visible in logs as has_pending_or_running: false with growing dead counts), but this information isn't used to slow down creation.
Expected Behavior
When sandboxes in a pool are repeatedly failing:
- Apply exponential backoff to sandbox creation after consecutive failures
- Cap the number of dead sandboxes that can accumulate before pausing creation
- Transition the pool to a "failing" state that is visible to users (see MIR-{doctor issue})
Observed Log Pattern
coordinator.activator fail-fast check │ app: app/gleester ...
sandboxes: "[...5 dead sandboxes...]"
has_pending_or_running: false
increment_pool: true
sandbox_count_before: 5
The controller sees 5 dead sandboxes, no pending or running ones, and still decides to increment the pool.