- Add /diagnostics endpoint for system health overview - Add external health worker for monitoring Gitea, Woodpecker, Registry - Add health check methods to Gitea and Woodpecker clients - Remove hardcoded fallback projects (pantheon, aeries) - Add diagnostics domain types and service layer - Add comprehensive tests for diagnostics handler and service - Fix tests to use registered test project instead of hardcoded one Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
50 lines
2.1 KiB
Markdown
50 lines
2.1 KiB
Markdown
# External Health Checker
|
|
|
|
**Last Updated:** 2026-02-03
|
|
**Confidence:** High
|
|
|
|
## Summary
|
|
|
|
Background worker that continuously monitors external systems (registry, CI, git) and surfaces issues proactively via metrics, logs, and the `/ready` endpoint. Runs every 30s, caches results for instant lookups, and logs state transitions.
|
|
|
|
**Key Facts:**
|
|
- Monitors: `registry` (zot), `ci` (woodpecker), `git` (gitea)
|
|
- Check interval: 30 seconds (configurable)
|
|
- Caches results for `/ready` endpoint (no blocking network calls)
|
|
- Logs only on state changes (healthy→unhealthy, unhealthy→healthy)
|
|
- Preserves `LastHealthy` timestamp through unhealthy periods
|
|
|
|
**File Pointers:**
|
|
- Domain types: `internal/domain/external_health.go`
|
|
- Worker implementation: `internal/worker/external_health.go`
|
|
- Port interface: `internal/port/health.go:ExternalHealthChecker`
|
|
- Handler integration: `internal/handlers/health.go:WithExternalHealthChecker`
|
|
- Wiring: `cmd/rdev-api/main.go:433-455`
|
|
|
|
## How It Works
|
|
|
|
1. Background goroutine polls all configured external systems every 30s
|
|
2. Checks run in parallel with 10s timeout per system
|
|
3. Results cached in thread-safe map
|
|
4. `/ready` reads cached statuses (no network calls)
|
|
5. Prometheus metrics updated on each check cycle
|
|
|
|
**Adapter implementations:**
|
|
- Registry: `internal/adapter/zot/client.go:Check()` - calls `/v2/` endpoint
|
|
- CI: `internal/adapter/woodpecker/client.go:Check()` - calls `Self()` API
|
|
- Git: `internal/adapter/gitea/client.go:Check()` - calls `ListMyOrgs()`
|
|
|
|
## Prometheus Metrics
|
|
|
|
| Metric | Type | Labels | Description |
|
|
|--------|------|--------|-------------|
|
|
| `rdev_external_system_healthy` | Gauge | `system` | 1=healthy, 0=unhealthy |
|
|
| `rdev_external_system_latency_seconds` | Gauge | `system` | Check latency |
|
|
| `rdev_external_system_last_check_timestamp` | Gauge | `system` | Unix timestamp of last check |
|
|
|
|
## Related Topics
|
|
|
|
- [Work Queue](./work-queue.md) - Uses similar background worker pattern
|
|
- [CI Provider](./ci-provider.md) - Woodpecker adapter details
|
|
- [Worker Pool](./worker-pool.md) - Another background worker example
|