Submit an issue View all issues Source
MIR-905

Add `miren app restart` command to reset crash cooldown

Done public
phinze phinze Opened Mar 27, 2026 Updated Mar 27, 2026

Problem

After a server restart or crash recovery, apps enter crash cooldown with exponential backoff (e.g. "application in crash cooldown until 2026-03-27T15:38:37Z, consecutive crashes: 8"). There's no way to tell the coordinator "the underlying issue is resolved, try again now" — you just have to wait for the cooldown timer to expire.

This came up during MIR-890 when a dev build regression crashed Garden. After rolling back to a healthy build, all apps were stuck in crash cooldown for ~15 minutes even though the server was perfectly healthy.

Proposal

Add miren app restart <app-name> that:

  • Resets the crash counter and cooldown timer for the app
  • Triggers an immediate reconciliation attempt
  • Optionally restarts running sandboxes (--force to kill + recreate)

Without --force, it just clears the backoff and lets the normal reconciliation loop retry immediately.

Context

Discovered during MIR-890 dogfooding. Any time the server restarts ungracefully, operators currently have no recourse but to wait out the cooldown.