Submit an issue View all issues Source
MIR-596

Simplify activator cache architecture to reduce consistency bugs

Open public
phinze phinze Opened Dec 18, 2025 Updated Apr 21, 2026

Problem

The activator maintains three separate caches that overlap in confusing ways:

  1. versions (map[verKey]*versionPoolRef) - maps version+service → pool reference
  2. pools (map[verKey]*poolState) - maps version+service → pool state with sentinel pattern
  3. poolSandboxes (map[entity.Id]*poolSandboxes) - maps pool ID → sandboxes

These caches duplicate data:

  • Pool entity is stored in both pools[key].pool and poolSandboxes[poolID].pool
  • Strategy is stored in both versions[key].strategy and poolSandboxes[poolID].strategy
  • Service is duplicated across caches

All three are guarded by a single RWMutex, so the separation provides no concurrency benefit - just cognitive overhead and consistency bugs when caches get out of sync.

We recently hit a production issue where a deleted pool remained in the caches, causing "pool has reached maximum size" errors. Fixed in https://github.com/mirendev/runtime/pull/498 by adding a watchPools goroutine, but this is a band-aid on a fundamentally fragile architecture.

Suggested Direction

Consolidate to a two-cache model:

versionToPool map[verKey]entity.Id    // Index for hot path lookup
pools map[entity.Id]*poolState        // Single source of truth for pool data

Where poolState contains everything:

  • Pool entity + revision
  • Sandboxes list
  • Strategy
  • Sentinel pattern fields (inProgress, done, err)

Benefits:

  • Pool data lives in one place - no sync bugs between pools and poolSandboxes
  • versionToPool is just a routing index, not duplicated data
  • Cleanup on pool deletion: delete from pools, scan versionToPool for stale references
  • Hot path (AcquireLease) remains two map lookups under RLock