rdev/ai-lookup/services/external-health.md
jordan 210064d490 feat: add diagnostics endpoint and external health monitoring
- Add /diagnostics endpoint for system health overview
- Add external health worker for monitoring Gitea, Woodpecker, Registry
- Add health check methods to Gitea and Woodpecker clients
- Remove hardcoded fallback projects (pantheon, aeries)
- Add diagnostics domain types and service layer
- Add comprehensive tests for diagnostics handler and service
- Fix tests to use registered test project instead of hardcoded one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:10:56 -07:00

50 lines
2.1 KiB
Markdown

# External Health Checker
**Last Updated:** 2026-02-03
**Confidence:** High
## Summary
Background worker that continuously monitors external systems (registry, CI, git) and surfaces issues proactively via metrics, logs, and the `/ready` endpoint. Runs every 30s, caches results for instant lookups, and logs state transitions.
**Key Facts:**
- Monitors: `registry` (zot), `ci` (woodpecker), `git` (gitea)
- Check interval: 30 seconds (configurable)
- Caches results for `/ready` endpoint (no blocking network calls)
- Logs only on state changes (healthy→unhealthy, unhealthy→healthy)
- Preserves `LastHealthy` timestamp through unhealthy periods
**File Pointers:**
- Domain types: `internal/domain/external_health.go`
- Worker implementation: `internal/worker/external_health.go`
- Port interface: `internal/port/health.go:ExternalHealthChecker`
- Handler integration: `internal/handlers/health.go:WithExternalHealthChecker`
- Wiring: `cmd/rdev-api/main.go:433-455`
## How It Works
1. Background goroutine polls all configured external systems every 30s
2. Checks run in parallel with 10s timeout per system
3. Results cached in thread-safe map
4. `/ready` reads cached statuses (no network calls)
5. Prometheus metrics updated on each check cycle
**Adapter implementations:**
- Registry: `internal/adapter/zot/client.go:Check()` - calls `/v2/` endpoint
- CI: `internal/adapter/woodpecker/client.go:Check()` - calls `Self()` API
- Git: `internal/adapter/gitea/client.go:Check()` - calls `ListMyOrgs()`
## Prometheus Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `rdev_external_system_healthy` | Gauge | `system` | 1=healthy, 0=unhealthy |
| `rdev_external_system_latency_seconds` | Gauge | `system` | Check latency |
| `rdev_external_system_last_check_timestamp` | Gauge | `system` | Unix timestamp of last check |
## Related Topics
- [Work Queue](./work-queue.md) - Uses similar background worker pattern
- [CI Provider](./ci-provider.md) - Woodpecker adapter details
- [Worker Pool](./worker-pool.md) - Another background worker example