Woodpecker's K8s backend creates a PVC per pipeline for workspace sharing.
If the agent misses cleanup, stale PVCs cause "already exists" errors that
mark pipelines as failed despite all steps succeeding.
Two-part fix:
1. Scale woodpecker-agent from 2 to 1 replica (eliminates PVC name race
between agents processing the same repo)
2. Add CronJob that garbage-collects wp-* PVCs older than 30 minutes
every 5 minutes (handles crash/restart edge cases)
Includes dedicated ServiceAccount and least-privilege RBAC (PVC list/delete
only in threesix namespace).
Ref: https://github.com/woodpecker-ci/woodpecker/issues/1594
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CI deploy step runs `kubectl set image statefulset/claudebox` but
the woodpecker-deployer Role only included `deployments`. Add
`statefulsets` to the allowed resources.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Woodpecker CI was timing out when watching deployment rollout status
due to missing RBAC permissions. The deployments were succeeding but
CI couldn't verify completion.
Changes:
- Add 'watch' verb to woodpecker-deployer Role
- Add threesix/default service account to RoleBinding
- Consolidate woodpecker-deployer RBAC into base/rbac.yaml
This resolves the "Failed to watch: deployments.apps is forbidden"
errors in CI logs while maintaining successful deployment rollouts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add OpenTelemetry environment variables to export Claude Code logs
and metrics to the existing OTEL collector. Provides visibility into
long-running builds.
- claudebox-worker: sidecar in rdev-worker deployment
- claudebox-standalone: StatefulSet for direct access
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add longhorn-rwx StorageClass for RWX volume support
- Add slackpath-5-full-lifecycle.yaml cookbook tree (all 10 SDLC phases)
- Update worker-pool.md documentation
- Consolidate PVC configuration, remove separate pvc-shared-claude.yaml
- Update rdev-worker and kustomization for new PVC structure
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add WaitGroup for graceful shutdown of in-flight tasks
- Change replicas to 1 with Recreate strategy (RWO PVC limitation)
- Optimize Dockerfile: combine RUN commands for smaller layers
- Add compiled binaries to .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .woodpecker.yml with build steps for api, worker, claudebox
- Update K8s manifests to use registry.threesix.ai/rdev/*
- Remove ghcr-secret imagePullSecrets (Zot is unauthenticated)
Builds will run on Woodpecker using kaniko, pushing to our internal
Zot registry. This eliminates the QEMU cross-compilation issues on
Apple Silicon.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping
Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)
Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health
Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add POST /workers/register and POST /workers/{workerId}/heartbeat endpoints
- Start worker health checker goroutine in main.go
- Fix network policy to allow K8s API server access (includes real endpoint IPs)
- Add rdev.orchard9.ai/role: worker label to claudebox StatefulSet
This enables the embedded WorkExecutor to reach claudebox-0 for executing
builds on composable projects that don't have dedicated pods.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Short-form DNS names (e.g. postgres.databases.svc) fail to resolve in
new pods due to k8s DNS search domain limitations. Switch all service
hostnames to FQDNs (*.svc.cluster.local).
Remove commonLabels from kustomization.yaml — it injected labels into
all selectors including NetworkPolicy egress rules (blocking DNS to
CoreDNS) and Deployment selectors (causing immutability errors).
Add OTEL_EXPORTER_OTLP_ENDPOINT env var to deployment YAML so the
telemetry collector endpoint uses the FQDN without requiring a binary
rebuild.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>