Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add UndeployAll() using label selectors to clean up monorepo components on project deletion (replaces name-based Undeploy in DeleteProject and the direct undeploy handler) - Add ResourceGC background worker that periodically finds K8s resources whose project label has no matching DB record, deletes after 1h safety window - Widen deployer client type from *kubernetes.Clientset to kubernetes.Interface for testability - UndeployAll accumulates errors via errors.Join instead of failing fast - Add checkout/checkin sidecar dev flow: temporary git tokens, branch checkout, review on checkin with cleanup workers - Add interactive sessions: pod binding, command execution, SSE streaming, ephemeral preview URLs with session cleanup workers - Add GET /workers/pool endpoint for aggregate capacity and queue depth - Add sessions:read and sessions:execute auth scopes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 KiB
15 KiB
rdev - Remote Developer
Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.
Platform: threesix.ai - Agent-driven development at scale with shared worker pools.
Terminology
| Term | Meaning | Location |
|---|---|---|
| platform | rdev itself (orchestrator API, handlers, workers) | cmd/rdev-api/, internal/, pkg/api/ |
| skeleton | Code that ships in generated projects | internal/adapter/templates/templates/skeleton/ |
| component templates | Service/worker/app/cli templates added to skeleton | templates/components/{service,worker,cli,app-*}/ |
When discussing code: "add to platform" = edit rdev; "add to skeleton" = edit project templates.
Find Your Guide
| If you need to... | Read this |
|---|---|
| Set up local dev | local/setup.md |
| Run tests | local/testing.md |
| Write Go code / handlers | backend/go-guidelines.md |
| Understand pkg/api | packages/api-framework.md |
| Add a new handler/endpoint | backend/adding-handlers.md |
| Understand hexagonal architecture | backend/hexagonal.md |
| Deploy to k3s | ops/deploying.md |
| Release a new version | ops/releasing.md |
| Work with Kubernetes adapters | services/kubernetes.md |
| Database / migrations | ops/database.md |
| Manage credentials | ops/credentials.md |
| Work queue system | services/work-queue.md |
| Worker pool management | services/worker-pool.md |
| Project templates | services/templates.md |
| Composable monorepo templates | services/composable-monorepo.md |
| E2E testing strategy | services/e2e-testing-strategy.md |
| Cookbook tree system (commands) | services/cookbook-trees.md |
| Slackpath reference architectures | services/cookbook-trees.md |
| Write cookbook trees | cookbook-trees/SKILL.md |
| Build orchestration | services/build-orchestration.md |
| Build event streaming | services/build-streaming.md |
| Resource provisioning plan | services/resource-provisioning-plan.md |
| Database provisioning | services/database-provisioning.md |
| Cache provisioning | services/cache-provisioning.md |
| CockroachDB operations | services/cockroachdb.md |
| Redis operations | services/redis.md |
| DNS / Cloudflare | services/dns-cloudflare.md |
| Network policies / internal routing | ops/networking.md |
| Debug external system health | ops/external-health-diagnostics.md |
| SDLC orchestration | services/sdlc.md |
| Visual verification (Playwright) | services/visual-verification.md |
| Interactive remote development | services/interactive-remote-dev.md |
| Structured logging | internal/logging/ - field constants, context propagation, redaction |
Critical Rules
- Root cause fixes: When diagnosing failures in generated projects, NEVER patch the project directly. Find the systemic root cause in: (1) platform - rdev handlers/services that create resources, (2) skeleton - templates that ship in generated projects, or (3) cookbook - test scripts with wrong assumptions. Fix the source, not the symptom. Every project-specific fix is technical debt that will recur.
- LLM vs rdev: LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
- Pod git ops: Git operations run inside pods via
PodGitOperations(kubectl exec), never locally. - No dead code: Delete unused code immediately. Don't leave "might use later" exports.
- KUBECONFIG: ALWAYS set
export KUBECONFIG=~/.kube/orchard9-k3sf.yamlbefore kubectl commands - Container builds: NEVER build Docker images locally. ALWAYS use
git push origin mainto trigger Woodpecker CI which builds via in-cluster Kaniko. Local Docker builds produce wrong architecture (arm64 vs amd64). If an image is missing from registry.threesix.ai, push to origin — don't improvise. - Hexagonal: Domain models in
internal/domain/must have ZERO external dependencies - Ports: All adapters implement interfaces from
internal/port/ - Migrations: NEVER modify committed migrations. Create NEW ones.
- 500-line limit: Files exceeding 500 lines must be split
- Tests: All handlers and services require tests
- Multi-step ops: NEVER log-and-continue after partial failure. Rollback or document partial state.
- Logging: Use
logging.FromContext(ctx)or injected*slog.Logger. NEVERfmt.Println,log.Fatal,log.Printf, or bareslog.Info(). Error key is ALWAYS"error"(not"err"). Use field constants frominternal/logging/fields.go(e.g.,logging.FieldProjectID,logging.FieldError). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted. - HTTP clients: NEVER create
&http.Client{}without aTimeoutfield. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely. - Config: Use
envutil.GetEnv()/GetEnvInt()/GetEnvBool()frominternal/envutilfor all env var reads with defaults. NEVER define localgetEnvhelpers — they duplicate and drift. Rawos.Getenv()is fine for required values with no default (secrets, passwords). - Handler timeouts: NEVER use inline
time.Durationincontext.WithTimeoutinside handlers. Use constants frominternal/handlers/timeouts.go:TimeoutFastLookup(5s),TimeoutLookup(10s),TimeoutStandard(30s),TimeoutHeavyWrite(60s),TimeoutOrchestration(90s),TimeoutLongRunning(10m). - Worker timeouts: NEVER use inline
time.Durationincontext.WithTimeoutinside worker code. Use constants frominternal/worker/timeouts.go:TimeoutQuickOp(5s),TimeoutHealthCheck(10s),TimeoutMaintenance(30s),TimeoutWorkExecution(10m). - Response helpers: Use
api.WriteUnauthorized,api.WriteForbidden,api.WriteBadRequest,api.WriteNotFound,api.WriteInternalErrorinstead of bareapi.WriteErrorwith status codes. Only useapi.WriteErrordirectly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED). - Auth scopes: EVERY route in a handler's
Mount()function MUST user.With(auth.RequireScope(...)). UseScopeProjectsReadfor GET endpoints,ScopeProjectsExecutefor mutation endpoints. Use the appropriate domain scope (e.g.,ScopeQueueRead,ScopeBuildWrite) when available. Admin-only endpoints useauth.ScopeAdminalone. Seeinternal/handlers/builds.gofor the canonical pattern. - JSON decoding: ALWAYS use
api.DecodeJSON(r, &req)to decode request bodies. NEVER use rawjson.NewDecoder(r.Body).Decode(). The helper handles nil body, EOF, and returns typed errors. Decode error message is always"invalid request body". - Validation: Use
validate.New()accumulator for 2+ field checks in handlers:v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }. Single-field checks can stay inline. NEVER duplicate validation logic that exists ininternal/validate. - Error wrapping: ALWAYS use
%w(not%v) when wrapping errors infmt.Errorf. Using%vstringifies the error and breakserrors.Is/errors.Aschains. For non-error types (structs, slices), create a typed error implementingerrorinstead of stringifying with%v. - Context propagation: NEVER use
context.Background()in handlers, services, or adapters that receive a context parameter. Always derive from parent context. Usecontext.WithoutCancel(ctx)for fire-and-forget goroutines that need tracing but independent cancellation. - Cookbooks: Load
.claude/skills/cookbook-trees/SKILL.mdbefore writing/modifying any cookbook tree. - Version alignment: Skeleton templates MUST use consistent versions across all files: Go 1.25 (go.work, go.mod, Dockerfiles, CI images), Node 20, Alpine 3.19. When updating a version, grep the entire templates/ tree and update ALL occurrences to prevent drift.
Quick Reference
# Environment (should already be in ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<your-api-key>" # Already set in ~/.zshrc
# Verify environment is loaded
echo $RDEV_API_KEY # Should print a base64 string
# If empty: source ~/.zshrc
# For scripts: use cookbooks/scripts/common.sh library
# Provides: api_call(), wait_for_build(), wait_for_pipeline(), wait_for_site()
# Example: source "$(dirname "$0")/common.sh" && api_call GET "/health"
# Run locally
go run ./cmd/rdev-api
# Run tests
go test ./...
# Automated deploy (push triggers Woodpecker CI)
git push origin main # Builds and deploys automatically via Woodpecker
# Manual deploy (if Woodpecker unavailable)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api
# Images are at registry.threesix.ai/rdev/{api,worker,claudebox}
# Verify pods
kubectl get pods -n rdev
# View logs
./scripts/logs.sh # Last 100 lines
./scripts/logs.sh -f # Follow/stream
./scripts/logs.sh -n 500 # Last 500 lines
./scripts/logs.sh -e # Errors only
./scripts/logs.sh -p # Previous crashed container
# Shell aliases (after source ~/.zshrc)
rdev-logs # Last 100 lines
rdev-logs-f # Follow/stream
rdev-pods # List pods
# API calls - use cookbook test scripts (they handle auth via common.sh)
./cookbooks/scripts/landing-test.sh run|status|teardown <name>
./cookbooks/scripts/tree-runner.sh run <tree-name> --project-name <name>
# Or direct API calls (requires env vars above)
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health | jq
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects | jq
Architecture Overview
cmd/rdev-api/ # Entry point, DI, OpenAPI spec
cmd/sdlc/ # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/ # SDLC library (types, classifier, state I/O)
├── domain/ # Pure business models (no deps)
├── port/ # Interface contracts
├── service/ # Business logic orchestration
├── handlers/ # HTTP handlers (REST endpoints)
├── adapter/ # Infrastructure implementations
│ ├── kubernetes/ # K8s client, pod executor
│ ├── postgres/ # Audit, queue, webhooks, credentials
│ ├── cockroach/ # Database provisioning (project DBs)
│ ├── redis/ # Cache provisioning via ACLs
│ ├── gitea/ # Git repository management
│ ├── cloudflare/ # DNS provider
│ └── woodpecker/ # CI provider
├── auth/ # API key auth, scopes
├── middleware/ # Rate limiting
├── worker/ # Background queue processor
└── webhook/ # Event dispatcher
pkg/api/ # HTTP framework (app, responses)
deployments/k8s/ # Kustomize manifests
└── base/templates/ # Project templates
scripts/ # Operational scripts
├── load-credentials.sh # Load secrets to rdev-api
├── release.sh # Build, tag, push releases
└── logs.sh # View rdev-api logs
cookbooks/ # End-to-end workflow guides
├── landing-page.md # Landing page deployment flow
└── scripts/ # Executable cookbook scripts
Key Concepts
- Projects: Kubernetes pods with Claude Code, discovered by label
rdev.orchard9.ai/project=true - Workers: Shared claudebox pods that execute any project's tasks, labeled
rdev.orchard9.ai/role=worker - Work Queue: Async task queue for build/test/deploy jobs
- Credentials: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
- Commands: Claude/shell/git commands executed via kubectl exec, streamed via SSE
- API Keys: Scoped auth with project restrictions, IP filtering, expiration
- Webhooks: Event subscriptions with retry delivery
- Templates: Project scaffolding with .woodpecker.yml, .claude/, and stack files
threesix.ai Platform Status
| Feature | Status | Description |
|---|---|---|
| Woodpecker Auto-Activation | Done | CI enabled on project creation via SDK |
| Project Templates | Done | Embedded templates (astro-landing, go-api, default) |
| Work Queue | Done | PostgreSQL with atomic dequeue, retry logic |
| Multi-Provider Agents | Done | Claude Code + OpenCode via registry |
| Webhooks | Done | Event dispatcher with retry delivery |
| Embedded Worker | Done | Goroutine in rdev-api, polls queue |
| Multi-Domain Support | Done | Auto-slugs, custom subdomains, DNS aliases |
| Build Event Streaming | Done | Real-time SSE/WebSocket for build output |
| Database Provisioning | Done | CockroachDB adapter with auto-provisioning |
| Cache Provisioning | Done | Redis ACL-based adapter with auto-provisioning |
| Build Orchestration | Planned | Structured build specs via API |
| SDLC Orchestration | Done | Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands |
| Composable Monorepo Templates | Done | Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli) |
| Visual Verification | Planned | Playwright screenshots/video + AI evaluation for feature completeness |
| Checkout/Checkin | Done | Sidecar dev flow: temporary git tokens, branch checkout, review on checkin |
| Interactive Remote Dev | Done | Sessions with pod binding, command execution, SSE streaming, ephemeral preview URLs |
Current Version: v0.10.25
Constraints
- ON-PREM k3s - not GKE, always set KUBECONFIG
- Kustomize only - no ArgoCD
- chi/v5 router - no gin, echo, or other frameworks
- sqlx for DB - no GORM
- slog for logging - no logrus, zap