Deploy doesn't stop old stateful sandbox before starting new one (flock conflict)
Summary
When deploying a new version of an app with a provider = "local" disk, the old version's stateful sandbox continues running while the new version's sandbox tries to start. Both mount the same local disk, causing flock conflicts.
Observed behavior
Deploying victoriametrics (which uses a local disk with flock):
-
Old sandbox
victoriametrics-vCZbsR9x(previous version) stayedrunning -
New sandboxes
victoriametrics-vCZfG64htried to start and panicked:FATAL: cannot acquire lock on file "/miren/data/local/victoria-metrics-data/flock.lock": resource temporarily unavailable; make sure a single process has exclusive access -
New sandboxes crash-looped 3 times before the old sandbox was eventually stopped
-
Once old sandbox went
dead, new sandbox started successfully
Expected behavior
For stateful services with local disks, the deploy should stop the old version's sandbox before starting the new one — a rolling deploy isn't possible when they share exclusive access to a local disk.
Environment
- Cluster: Garden
- App: victoriametrics (
[[services.victoriametrics.disks]]withprovider = "local")