rdev/CLAUDE.md

14 KiB

rdev - Remote Developer

Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.

Platform: threesix.ai - Agent-driven development at scale with shared worker pools.

Terminology

Term Meaning Location
platform rdev itself (orchestrator API, handlers, workers) cmd/rdev-api/, internal/, pkg/api/
skeleton Code that ships in generated projects internal/adapter/templates/templates/skeleton/
component templates Service/worker/app/cli templates added to skeleton templates/components/{service,worker,cli,app-*}/

When discussing code: "add to platform" = edit rdev; "add to skeleton" = edit project templates.

Find Your Guide

If you need to... Read this
Set up local dev local/setup.md
Run tests local/testing.md
Write Go code / handlers backend/go-guidelines.md
Understand pkg/api packages/api-framework.md
Add a new handler/endpoint backend/adding-handlers.md
Understand hexagonal architecture backend/hexagonal.md
Deploy to k3s ops/deploying.md
Release a new version ops/releasing.md
Work with Kubernetes adapters services/kubernetes.md
Database / migrations ops/database.md
Manage credentials ops/credentials.md
Work queue system services/work-queue.md
Worker pool management services/worker-pool.md
Project templates services/templates.md
Composable monorepo templates services/composable-monorepo.md
E2E testing strategy services/e2e-testing-strategy.md
Cookbook tree system (commands) services/cookbook-trees.md
Slackpath reference architectures services/cookbook-trees.md
Write E2E cookbook scripts cookbook-scripts/SKILL.md
Build orchestration services/build-orchestration.md
Build event streaming services/build-streaming.md
Resource provisioning plan services/resource-provisioning-plan.md
Database provisioning services/database-provisioning.md
Cache provisioning services/cache-provisioning.md
CockroachDB operations services/cockroachdb.md
Redis operations services/redis.md
DNS / Cloudflare services/dns-cloudflare.md
Network policies / internal routing ops/networking.md
Debug external system health ops/external-health-diagnostics.md
SDLC orchestration services/sdlc.md
Visual verification (Playwright) services/visual-verification.md
Structured logging internal/logging/ - field constants, context propagation, redaction

Critical Rules

  • LLM vs rdev: LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
  • Pod git ops: Git operations run inside pods via PodGitOperations (kubectl exec), never locally.
  • No dead code: Delete unused code immediately. Don't leave "might use later" exports.
  • KUBECONFIG: ALWAYS set export KUBECONFIG=~/.kube/orchard9-k3sf.yaml before kubectl commands
  • Hexagonal: Domain models in internal/domain/ must have ZERO external dependencies
  • Ports: All adapters implement interfaces from internal/port/
  • Migrations: NEVER modify committed migrations. Create NEW ones.
  • 500-line limit: Files exceeding 500 lines must be split
  • Tests: All handlers and services require tests
  • Multi-step ops: NEVER log-and-continue after partial failure. Rollback or document partial state.
  • Logging: Use logging.FromContext(ctx) or injected *slog.Logger. NEVER fmt.Println, log.Fatal, log.Printf, or bare slog.Info(). Error key is ALWAYS "error" (not "err"). Use field constants from internal/logging/fields.go (e.g., logging.FieldProjectID, logging.FieldError). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted.
  • HTTP clients: NEVER create &http.Client{} without a Timeout field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
  • Config: Use envutil.GetEnv() / GetEnvInt() / GetEnvBool() from internal/envutil for all env var reads with defaults. NEVER define local getEnv helpers — they duplicate and drift. Raw os.Getenv() is fine for required values with no default (secrets, passwords).
  • Handler timeouts: NEVER use inline time.Duration in context.WithTimeout inside handlers. Use constants from internal/handlers/timeouts.go: TimeoutFastLookup (5s), TimeoutLookup (10s), TimeoutStandard (30s), TimeoutHeavyWrite (60s), TimeoutOrchestration (90s), TimeoutLongRunning (10m).
  • Worker timeouts: NEVER use inline time.Duration in context.WithTimeout inside worker code. Use constants from internal/worker/timeouts.go: TimeoutQuickOp (5s), TimeoutHealthCheck (10s), TimeoutMaintenance (30s), TimeoutWorkExecution (10m).
  • Response helpers: Use api.WriteUnauthorized, api.WriteForbidden, api.WriteBadRequest, api.WriteNotFound, api.WriteInternalError instead of bare api.WriteError with status codes. Only use api.WriteError directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
  • Auth scopes: EVERY route in a handler's Mount() function MUST use r.With(auth.RequireScope(...)). Use ScopeProjectsRead for GET endpoints, ScopeProjectsExecute for mutation endpoints. Use the appropriate domain scope (e.g., ScopeQueueRead, ScopeBuildWrite) when available. Admin-only endpoints use auth.ScopeAdmin alone. See internal/handlers/builds.go for the canonical pattern.
  • JSON decoding: ALWAYS use api.DecodeJSON(r, &req) to decode request bodies. NEVER use raw json.NewDecoder(r.Body).Decode(). The helper handles nil body, EOF, and returns typed errors. Decode error message is always "invalid request body".
  • Validation: Use validate.New() accumulator for 2+ field checks in handlers: v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }. Single-field checks can stay inline. NEVER duplicate validation logic that exists in internal/validate.
  • Error wrapping: ALWAYS use %w (not %v) when wrapping errors in fmt.Errorf. Using %v stringifies the error and breaks errors.Is/errors.As chains. For non-error types (structs, slices), create a typed error implementing error instead of stringifying with %v.

Quick Reference

# Required env vars (add to ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<from rdev-credentials secret>"

# Infrastructure credentials stored in .secrets (gitignored)
# See: .claude/guides/ops/credentials.md for setup
# Keys: GITEA_TOKEN, CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_ID, WOODPECKER_*

# Run locally
go run ./cmd/rdev-api

# Run tests
go test ./...

# Release + deploy (one command)
./scripts/release.sh v0.10.1 "Description of changes" --deploy

# Release only (no deploy)
./scripts/release.sh v0.10.1 "Description of changes"

# Manual deploy (if needed)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api

# Deploy claudebox worker (when Dockerfile changes)
./scripts/build-push.sh v0.4.0 claudebox && kubectl apply -f deployments/k8s/base/claudebox.yaml && kubectl rollout restart -n rdev statefulset/claudebox

# Verify pods
kubectl get pods -n rdev

# View logs
./scripts/logs.sh           # Last 100 lines
./scripts/logs.sh -f        # Follow/stream
./scripts/logs.sh -n 500    # Last 500 lines
./scripts/logs.sh -e        # Errors only
./scripts/logs.sh -p        # Previous crashed container

# Shell aliases (after source ~/.zshrc)
rdev-logs                   # Last 100 lines
rdev-logs-f                 # Follow/stream
rdev-pods                   # List pods

# API calls (NOTE: $RDEV_API_KEY doesn't expand in curl -H, use the test script instead)
# ./cookbooks/scripts/landing-test.sh run|status|teardown <name>
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/work/stats

Architecture Overview

cmd/rdev-api/          # Entry point, DI, OpenAPI spec
cmd/sdlc/              # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/              # SDLC library (types, classifier, state I/O)
├── domain/            # Pure business models (no deps)
├── port/              # Interface contracts
├── service/           # Business logic orchestration
├── handlers/          # HTTP handlers (REST endpoints)
├── adapter/           # Infrastructure implementations
│   ├── kubernetes/    # K8s client, pod executor
│   ├── postgres/      # Audit, queue, webhooks, credentials
│   ├── cockroach/     # Database provisioning (project DBs)
│   ├── redis/         # Cache provisioning via ACLs
│   ├── gitea/         # Git repository management
│   ├── cloudflare/    # DNS provider
│   └── woodpecker/    # CI provider
├── auth/              # API key auth, scopes
├── middleware/        # Rate limiting
├── worker/            # Background queue processor
└── webhook/           # Event dispatcher
pkg/api/               # HTTP framework (app, responses)
deployments/k8s/       # Kustomize manifests
  └── base/templates/  # Project templates
scripts/               # Operational scripts
  ├── load-credentials.sh  # Load secrets to rdev-api
  ├── release.sh           # Build, tag, push releases
  └── logs.sh              # View rdev-api logs
cookbooks/             # End-to-end workflow guides
  ├── landing-page.md      # Landing page deployment flow
  └── scripts/             # Executable cookbook scripts

Key Concepts

  • Projects: Kubernetes pods with Claude Code, discovered by label rdev.orchard9.ai/project=true
  • Workers: Shared claudebox pods that execute any project's tasks, labeled rdev.orchard9.ai/role=worker
  • Work Queue: Async task queue for build/test/deploy jobs
  • Credentials: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
  • Commands: Claude/shell/git commands executed via kubectl exec, streamed via SSE
  • API Keys: Scoped auth with project restrictions, IP filtering, expiration
  • Webhooks: Event subscriptions with retry delivery
  • Templates: Project scaffolding with .woodpecker.yml, .claude/, and stack files

threesix.ai Platform Status

Feature Status Description
Woodpecker Auto-Activation Done CI enabled on project creation via SDK
Project Templates Done Embedded templates (astro-landing, go-api, default)
Work Queue Done PostgreSQL with atomic dequeue, retry logic
Multi-Provider Agents Done Claude Code + OpenCode via registry
Webhooks Done Event dispatcher with retry delivery
Embedded Worker Done Goroutine in rdev-api, polls queue
Multi-Domain Support Done Auto-slugs, custom subdomains, DNS aliases
Build Event Streaming Done Real-time SSE/WebSocket for build output
Database Provisioning Done CockroachDB adapter with auto-provisioning
Cache Provisioning Done Redis ACL-based adapter with auto-provisioning
Build Orchestration Planned Structured build specs via API
SDLC Orchestration Done Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands
Composable Monorepo Templates Done Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli)
Visual Verification Planned Playwright screenshots/video + AI evaluation for feature completeness

Current Version: v0.10.25

Constraints

  • ON-PREM k3s - not GKE, always set KUBECONFIG
  • Kustomize only - no ArgoCD
  • chi/v5 router - no gin, echo, or other frameworks
  • sqlx for DB - no GORM
  • slog for logging - no logrus, zap