rdev/CLAUDE.md
jordan 9226454b85
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
feat: label-based undeploy, GC reconciliation, checkout/sessions, pool status
- Add UndeployAll() using label selectors to clean up monorepo components
  on project deletion (replaces name-based Undeploy in DeleteProject and
  the direct undeploy handler)
- Add ResourceGC background worker that periodically finds K8s resources
  whose project label has no matching DB record, deletes after 1h safety
  window
- Widen deployer client type from *kubernetes.Clientset to
  kubernetes.Interface for testability
- UndeployAll accumulates errors via errors.Join instead of failing fast
- Add checkout/checkin sidecar dev flow: temporary git tokens, branch
  checkout, review on checkin with cleanup workers
- Add interactive sessions: pod binding, command execution, SSE streaming,
  ephemeral preview URLs with session cleanup workers
- Add GET /workers/pool endpoint for aggregate capacity and queue depth
- Add sessions:read and sessions:execute auth scopes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:11:28 -07:00

15 KiB

rdev - Remote Developer

Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.

Platform: threesix.ai - Agent-driven development at scale with shared worker pools.

Terminology

Term Meaning Location
platform rdev itself (orchestrator API, handlers, workers) cmd/rdev-api/, internal/, pkg/api/
skeleton Code that ships in generated projects internal/adapter/templates/templates/skeleton/
component templates Service/worker/app/cli templates added to skeleton templates/components/{service,worker,cli,app-*}/

When discussing code: "add to platform" = edit rdev; "add to skeleton" = edit project templates.

Find Your Guide

If you need to... Read this
Set up local dev local/setup.md
Run tests local/testing.md
Write Go code / handlers backend/go-guidelines.md
Understand pkg/api packages/api-framework.md
Add a new handler/endpoint backend/adding-handlers.md
Understand hexagonal architecture backend/hexagonal.md
Deploy to k3s ops/deploying.md
Release a new version ops/releasing.md
Work with Kubernetes adapters services/kubernetes.md
Database / migrations ops/database.md
Manage credentials ops/credentials.md
Work queue system services/work-queue.md
Worker pool management services/worker-pool.md
Project templates services/templates.md
Composable monorepo templates services/composable-monorepo.md
E2E testing strategy services/e2e-testing-strategy.md
Cookbook tree system (commands) services/cookbook-trees.md
Slackpath reference architectures services/cookbook-trees.md
Write cookbook trees cookbook-trees/SKILL.md
Build orchestration services/build-orchestration.md
Build event streaming services/build-streaming.md
Resource provisioning plan services/resource-provisioning-plan.md
Database provisioning services/database-provisioning.md
Cache provisioning services/cache-provisioning.md
CockroachDB operations services/cockroachdb.md
Redis operations services/redis.md
DNS / Cloudflare services/dns-cloudflare.md
Network policies / internal routing ops/networking.md
Debug external system health ops/external-health-diagnostics.md
SDLC orchestration services/sdlc.md
Visual verification (Playwright) services/visual-verification.md
Interactive remote development services/interactive-remote-dev.md
Structured logging internal/logging/ - field constants, context propagation, redaction

Critical Rules

  • Root cause fixes: When diagnosing failures in generated projects, NEVER patch the project directly. Find the systemic root cause in: (1) platform - rdev handlers/services that create resources, (2) skeleton - templates that ship in generated projects, or (3) cookbook - test scripts with wrong assumptions. Fix the source, not the symptom. Every project-specific fix is technical debt that will recur.
  • LLM vs rdev: LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
  • Pod git ops: Git operations run inside pods via PodGitOperations (kubectl exec), never locally.
  • No dead code: Delete unused code immediately. Don't leave "might use later" exports.
  • KUBECONFIG: ALWAYS set export KUBECONFIG=~/.kube/orchard9-k3sf.yaml before kubectl commands
  • Container builds: NEVER build Docker images locally. ALWAYS use git push origin main to trigger Woodpecker CI which builds via in-cluster Kaniko. Local Docker builds produce wrong architecture (arm64 vs amd64). If an image is missing from registry.threesix.ai, push to origin — don't improvise.
  • Hexagonal: Domain models in internal/domain/ must have ZERO external dependencies
  • Ports: All adapters implement interfaces from internal/port/
  • Migrations: NEVER modify committed migrations. Create NEW ones.
  • 500-line limit: Files exceeding 500 lines must be split
  • Tests: All handlers and services require tests
  • Multi-step ops: NEVER log-and-continue after partial failure. Rollback or document partial state.
  • Logging: Use logging.FromContext(ctx) or injected *slog.Logger. NEVER fmt.Println, log.Fatal, log.Printf, or bare slog.Info(). Error key is ALWAYS "error" (not "err"). Use field constants from internal/logging/fields.go (e.g., logging.FieldProjectID, logging.FieldError). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted.
  • HTTP clients: NEVER create &http.Client{} without a Timeout field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
  • Config: Use envutil.GetEnv() / GetEnvInt() / GetEnvBool() from internal/envutil for all env var reads with defaults. NEVER define local getEnv helpers — they duplicate and drift. Raw os.Getenv() is fine for required values with no default (secrets, passwords).
  • Handler timeouts: NEVER use inline time.Duration in context.WithTimeout inside handlers. Use constants from internal/handlers/timeouts.go: TimeoutFastLookup (5s), TimeoutLookup (10s), TimeoutStandard (30s), TimeoutHeavyWrite (60s), TimeoutOrchestration (90s), TimeoutLongRunning (10m).
  • Worker timeouts: NEVER use inline time.Duration in context.WithTimeout inside worker code. Use constants from internal/worker/timeouts.go: TimeoutQuickOp (5s), TimeoutHealthCheck (10s), TimeoutMaintenance (30s), TimeoutWorkExecution (10m).
  • Response helpers: Use api.WriteUnauthorized, api.WriteForbidden, api.WriteBadRequest, api.WriteNotFound, api.WriteInternalError instead of bare api.WriteError with status codes. Only use api.WriteError directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
  • Auth scopes: EVERY route in a handler's Mount() function MUST use r.With(auth.RequireScope(...)). Use ScopeProjectsRead for GET endpoints, ScopeProjectsExecute for mutation endpoints. Use the appropriate domain scope (e.g., ScopeQueueRead, ScopeBuildWrite) when available. Admin-only endpoints use auth.ScopeAdmin alone. See internal/handlers/builds.go for the canonical pattern.
  • JSON decoding: ALWAYS use api.DecodeJSON(r, &req) to decode request bodies. NEVER use raw json.NewDecoder(r.Body).Decode(). The helper handles nil body, EOF, and returns typed errors. Decode error message is always "invalid request body".
  • Validation: Use validate.New() accumulator for 2+ field checks in handlers: v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }. Single-field checks can stay inline. NEVER duplicate validation logic that exists in internal/validate.
  • Error wrapping: ALWAYS use %w (not %v) when wrapping errors in fmt.Errorf. Using %v stringifies the error and breaks errors.Is/errors.As chains. For non-error types (structs, slices), create a typed error implementing error instead of stringifying with %v.
  • Context propagation: NEVER use context.Background() in handlers, services, or adapters that receive a context parameter. Always derive from parent context. Use context.WithoutCancel(ctx) for fire-and-forget goroutines that need tracing but independent cancellation.
  • Cookbooks: Load .claude/skills/cookbook-trees/SKILL.md before writing/modifying any cookbook tree.
  • Version alignment: Skeleton templates MUST use consistent versions across all files: Go 1.25 (go.work, go.mod, Dockerfiles, CI images), Node 20, Alpine 3.19. When updating a version, grep the entire templates/ tree and update ALL occurrences to prevent drift.

Quick Reference

# Environment (should already be in ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<your-api-key>"  # Already set in ~/.zshrc

# Verify environment is loaded
echo $RDEV_API_KEY  # Should print a base64 string
# If empty: source ~/.zshrc

# For scripts: use cookbooks/scripts/common.sh library
# Provides: api_call(), wait_for_build(), wait_for_pipeline(), wait_for_site()
# Example: source "$(dirname "$0")/common.sh" && api_call GET "/health"

# Run locally
go run ./cmd/rdev-api

# Run tests
go test ./...

# Automated deploy (push triggers Woodpecker CI)
git push origin main  # Builds and deploys automatically via Woodpecker

# Manual deploy (if Woodpecker unavailable)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api

# Images are at registry.threesix.ai/rdev/{api,worker,claudebox}

# Verify pods
kubectl get pods -n rdev

# View logs
./scripts/logs.sh           # Last 100 lines
./scripts/logs.sh -f        # Follow/stream
./scripts/logs.sh -n 500    # Last 500 lines
./scripts/logs.sh -e        # Errors only
./scripts/logs.sh -p        # Previous crashed container

# Shell aliases (after source ~/.zshrc)
rdev-logs                   # Last 100 lines
rdev-logs-f                 # Follow/stream
rdev-pods                   # List pods

# API calls - use cookbook test scripts (they handle auth via common.sh)
./cookbooks/scripts/landing-test.sh run|status|teardown <name>
./cookbooks/scripts/tree-runner.sh run <tree-name> --project-name <name>

# Or direct API calls (requires env vars above)
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health | jq
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects | jq

Architecture Overview

cmd/rdev-api/          # Entry point, DI, OpenAPI spec
cmd/sdlc/              # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/              # SDLC library (types, classifier, state I/O)
├── domain/            # Pure business models (no deps)
├── port/              # Interface contracts
├── service/           # Business logic orchestration
├── handlers/          # HTTP handlers (REST endpoints)
├── adapter/           # Infrastructure implementations
│   ├── kubernetes/    # K8s client, pod executor
│   ├── postgres/      # Audit, queue, webhooks, credentials
│   ├── cockroach/     # Database provisioning (project DBs)
│   ├── redis/         # Cache provisioning via ACLs
│   ├── gitea/         # Git repository management
│   ├── cloudflare/    # DNS provider
│   └── woodpecker/    # CI provider
├── auth/              # API key auth, scopes
├── middleware/        # Rate limiting
├── worker/            # Background queue processor
└── webhook/           # Event dispatcher
pkg/api/               # HTTP framework (app, responses)
deployments/k8s/       # Kustomize manifests
  └── base/templates/  # Project templates
scripts/               # Operational scripts
  ├── load-credentials.sh  # Load secrets to rdev-api
  ├── release.sh           # Build, tag, push releases
  └── logs.sh              # View rdev-api logs
cookbooks/             # End-to-end workflow guides
  ├── landing-page.md      # Landing page deployment flow
  └── scripts/             # Executable cookbook scripts

Key Concepts

  • Projects: Kubernetes pods with Claude Code, discovered by label rdev.orchard9.ai/project=true
  • Workers: Shared claudebox pods that execute any project's tasks, labeled rdev.orchard9.ai/role=worker
  • Work Queue: Async task queue for build/test/deploy jobs
  • Credentials: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
  • Commands: Claude/shell/git commands executed via kubectl exec, streamed via SSE
  • API Keys: Scoped auth with project restrictions, IP filtering, expiration
  • Webhooks: Event subscriptions with retry delivery
  • Templates: Project scaffolding with .woodpecker.yml, .claude/, and stack files

threesix.ai Platform Status

Feature Status Description
Woodpecker Auto-Activation Done CI enabled on project creation via SDK
Project Templates Done Embedded templates (astro-landing, go-api, default)
Work Queue Done PostgreSQL with atomic dequeue, retry logic
Multi-Provider Agents Done Claude Code + OpenCode via registry
Webhooks Done Event dispatcher with retry delivery
Embedded Worker Done Goroutine in rdev-api, polls queue
Multi-Domain Support Done Auto-slugs, custom subdomains, DNS aliases
Build Event Streaming Done Real-time SSE/WebSocket for build output
Database Provisioning Done CockroachDB adapter with auto-provisioning
Cache Provisioning Done Redis ACL-based adapter with auto-provisioning
Build Orchestration Planned Structured build specs via API
SDLC Orchestration Done Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands
Composable Monorepo Templates Done Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli)
Visual Verification Planned Playwright screenshots/video + AI evaluation for feature completeness
Checkout/Checkin Done Sidecar dev flow: temporary git tokens, branch checkout, review on checkin
Interactive Remote Dev Done Sessions with pod binding, command execution, SSE streaming, ephemeral preview URLs

Current Version: v0.10.25

Constraints

  • ON-PREM k3s - not GKE, always set KUBECONFIG
  • Kustomize only - no ArgoCD
  • chi/v5 router - no gin, echo, or other frameworks
  • sqlx for DB - no GORM
  • slog for logging - no logrus, zap