Submit an issue View all issues Source
MIR-1026

Add first-party OTel metric instruments for server operational telemetry

Open public
phinze phinze Opened Apr 17, 2026 Updated Apr 30, 2026

While auditing log noise in MIR-1024 we identified several log lines that were really metrics wearing a trenchcoat. The OTel SDK is already bootstrapped in pkg/rpc/otel.go but we don't currently emit any first-party Counter/Histogram/UpDownCounter instruments from the server itself. (Note: the VictoriaMetrics pipeline under metrics/ is for customer app telemetry, not server operational.)

Natural first instruments

  1. HTTP ingress request count / latency by app + route_type ("route" vs "default" vs fallback). Replaces the old using http route debug log and gives us RED metrics for the proxy.
  2. Controller reconcile count / duration / error rate keyed by entity type, emitted from the generic controller framework in pkg/controller. Replaces the old INFO Processing event log with actual dashboards.
  3. Sandbox pool gauges: desired / actual / ready per pool. Replaces the sandbox counts debug log, which is literally already shaped as a gauge.

Acceptance

  • A meter + instruments pattern established somewhere reusable (likely pkg/rpc or a new pkg/observability).
  • The three instruments above wired up and verified in a local OTel collector.
  • Docs updated with what we expose.