rdev/ai-lookup/services/external-health.md
jordan 210064d490 feat: add diagnostics endpoint and external health monitoring
- Add /diagnostics endpoint for system health overview
- Add external health worker for monitoring Gitea, Woodpecker, Registry
- Add health check methods to Gitea and Woodpecker clients
- Remove hardcoded fallback projects (pantheon, aeries)
- Add diagnostics domain types and service layer
- Add comprehensive tests for diagnostics handler and service
- Fix tests to use registered test project instead of hardcoded one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:10:56 -07:00

2.1 KiB

External Health Checker

Last Updated: 2026-02-03 Confidence: High

Summary

Background worker that continuously monitors external systems (registry, CI, git) and surfaces issues proactively via metrics, logs, and the /ready endpoint. Runs every 30s, caches results for instant lookups, and logs state transitions.

Key Facts:

  • Monitors: registry (zot), ci (woodpecker), git (gitea)
  • Check interval: 30 seconds (configurable)
  • Caches results for /ready endpoint (no blocking network calls)
  • Logs only on state changes (healthy→unhealthy, unhealthy→healthy)
  • Preserves LastHealthy timestamp through unhealthy periods

File Pointers:

  • Domain types: internal/domain/external_health.go
  • Worker implementation: internal/worker/external_health.go
  • Port interface: internal/port/health.go:ExternalHealthChecker
  • Handler integration: internal/handlers/health.go:WithExternalHealthChecker
  • Wiring: cmd/rdev-api/main.go:433-455

How It Works

  1. Background goroutine polls all configured external systems every 30s
  2. Checks run in parallel with 10s timeout per system
  3. Results cached in thread-safe map
  4. /ready reads cached statuses (no network calls)
  5. Prometheus metrics updated on each check cycle

Adapter implementations:

  • Registry: internal/adapter/zot/client.go:Check() - calls /v2/ endpoint
  • CI: internal/adapter/woodpecker/client.go:Check() - calls Self() API
  • Git: internal/adapter/gitea/client.go:Check() - calls ListMyOrgs()

Prometheus Metrics

Metric Type Labels Description
rdev_external_system_healthy Gauge system 1=healthy, 0=unhealthy
rdev_external_system_latency_seconds Gauge system Check latency
rdev_external_system_last_check_timestamp Gauge system Unix timestamp of last check