Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping
Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)
Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health
Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add POST /workers/register and POST /workers/{workerId}/heartbeat endpoints
- Start worker health checker goroutine in main.go
- Fix network policy to allow K8s API server access (includes real endpoint IPs)
- Add rdev.orchard9.ai/role: worker label to claudebox StatefulSet
This enables the embedded WorkExecutor to reach claudebox-0 for executing
builds on composable projects that don't have dedicated pods.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Short-form DNS names (e.g. postgres.databases.svc) fail to resolve in
new pods due to k8s DNS search domain limitations. Switch all service
hostnames to FQDNs (*.svc.cluster.local).
Remove commonLabels from kustomization.yaml — it injected labels into
all selectors including NetworkPolicy egress rules (blocking DNS to
CoreDNS) and Deployment selectors (causing immutability errors).
Add OTEL_EXPORTER_OTLP_ENDPOINT env var to deployment YAML so the
telemetry collector endpoint uses the FQDN without requiring a binary
rebuild.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The shared worker pool (claudebox-0) now handles all project builds
with dynamic git cloning. The dedicated per-project pods were stuck
in Init state and are no longer needed.
Removed:
- claudebox-aeries StatefulSet and PVC
- claudebox-pantheon StatefulSet and PVC
- Associated secrets and configmaps (deleted from cluster)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>