rdev/CLAUDE.md
jordan 863dfd3214
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: skip root deployment for empty template (defaults to skeleton)
When req.Template is empty, it defaults to 'skeleton' but the check
in createInitialDeployment only matched 'skeleton' explicitly, not
empty string. This caused a broken deployment to be created for
monorepo projects with a non-existent image.

Root cause: slackpath-5 creates project with empty template, which
defaults to skeleton, but createInitialDeployment was still creating
a root deployment that references registry.threesix.ai/{project}:latest
which never gets built (skeleton has no root Dockerfile).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:32:19 -07:00

203 lines
14 KiB
Markdown

# rdev - Remote Developer
Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.
**Platform:** threesix.ai - Agent-driven development at scale with shared worker pools.
## Terminology
| Term | Meaning | Location |
|------|---------|----------|
| **platform** | rdev itself (orchestrator API, handlers, workers) | `cmd/rdev-api/`, `internal/`, `pkg/api/` |
| **skeleton** | Code that ships in generated projects | `internal/adapter/templates/templates/skeleton/` |
| **component templates** | Service/worker/app/cli templates added to skeleton | `templates/components/{service,worker,cli,app-*}/` |
When discussing code: "add to **platform**" = edit rdev; "add to **skeleton**" = edit project templates.
## Find Your Guide
| If you need to... | Read this |
|-------------------|-----------|
| **Set up local dev** | [local/setup.md](.claude/guides/local/setup.md) |
| **Run tests** | [local/testing.md](.claude/guides/local/testing.md) |
| **Write Go code / handlers** | [backend/go-guidelines.md](.claude/guides/backend/go-guidelines.md) |
| **Understand pkg/api** | [packages/api-framework.md](.claude/guides/packages/api-framework.md) |
| **Add a new handler/endpoint** | [backend/adding-handlers.md](.claude/guides/backend/adding-handlers.md) |
| **Understand hexagonal architecture** | [backend/hexagonal.md](.claude/guides/backend/hexagonal.md) |
| **Deploy to k3s** | [ops/deploying.md](.claude/guides/ops/deploying.md) |
| **Release a new version** | [ops/releasing.md](.claude/guides/ops/releasing.md) |
| **Work with Kubernetes adapters** | [services/kubernetes.md](.claude/guides/services/kubernetes.md) |
| **Database / migrations** | [ops/database.md](.claude/guides/ops/database.md) |
| **Manage credentials** | [ops/credentials.md](.claude/guides/ops/credentials.md) |
| **Work queue system** | [services/work-queue.md](.claude/guides/services/work-queue.md) |
| **Worker pool management** | [services/worker-pool.md](.claude/guides/services/worker-pool.md) |
| **Project templates** | [services/templates.md](.claude/guides/services/templates.md) |
| **Composable monorepo templates** | [services/composable-monorepo.md](.claude/guides/services/composable-monorepo.md) |
| **E2E testing strategy** | [services/e2e-testing-strategy.md](.claude/guides/services/e2e-testing-strategy.md) |
| **Cookbook tree system (commands)** | [services/cookbook-trees.md](.claude/guides/services/cookbook-trees.md) |
| **Slackpath reference architectures** | [services/cookbook-trees.md](.claude/guides/services/cookbook-trees.md#slackpath-trees-reference-architectures) |
| **Write E2E cookbook scripts** | [cookbook-scripts/SKILL.md](.claude/skills/cookbook-scripts/SKILL.md) |
| **Build orchestration** | [services/build-orchestration.md](.claude/guides/services/build-orchestration.md) |
| **Build event streaming** | [services/build-streaming.md](.claude/guides/services/build-streaming.md) |
| **Resource provisioning plan** | [services/resource-provisioning-plan.md](.claude/guides/services/resource-provisioning-plan.md) |
| **Database provisioning** | [services/database-provisioning.md](.claude/guides/services/database-provisioning.md) |
| **Cache provisioning** | [services/cache-provisioning.md](.claude/guides/services/cache-provisioning.md) |
| **CockroachDB operations** | [services/cockroachdb.md](.claude/guides/services/cockroachdb.md) |
| **Redis operations** | [services/redis.md](.claude/guides/services/redis.md) |
| **DNS / Cloudflare** | [services/dns-cloudflare.md](.claude/guides/services/dns-cloudflare.md) |
| **Network policies / internal routing** | [ops/networking.md](.claude/guides/ops/networking.md) |
| **Debug external system health** | [ops/external-health-diagnostics.md](.claude/guides/ops/external-health-diagnostics.md) |
| **SDLC orchestration** | [services/sdlc.md](.claude/guides/services/sdlc.md) |
| **Visual verification (Playwright)** | [services/visual-verification.md](.claude/guides/services/visual-verification.md) |
| **Structured logging** | `internal/logging/` - field constants, context propagation, redaction |
## Critical Rules
- **Root cause fixes:** When diagnosing failures in generated projects, NEVER patch the project directly. Find the systemic root cause in: (1) **platform** - rdev handlers/services that create resources, (2) **skeleton** - templates that ship in generated projects, or (3) **cookbook** - test scripts with wrong assumptions. Fix the source, not the symptom. Every project-specific fix is technical debt that will recur.
- **LLM vs rdev:** LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
- **Pod git ops:** Git operations run inside pods via `PodGitOperations` (kubectl exec), never locally.
- **No dead code:** Delete unused code immediately. Don't leave "might use later" exports.
- **KUBECONFIG:** ALWAYS set `export KUBECONFIG=~/.kube/orchard9-k3sf.yaml` before kubectl commands
- **Container builds:** NEVER build Docker images locally. ALWAYS use `git push origin main` to trigger Woodpecker CI which builds via in-cluster Kaniko. Local Docker builds produce wrong architecture (arm64 vs amd64). If an image is missing from registry.threesix.ai, push to origin — don't improvise.
- **Hexagonal:** Domain models in `internal/domain/` must have ZERO external dependencies
- **Ports:** All adapters implement interfaces from `internal/port/`
- **Migrations:** NEVER modify committed migrations. Create NEW ones.
- **500-line limit:** Files exceeding 500 lines must be split
- **Tests:** All handlers and services require tests
- **Multi-step ops:** NEVER log-and-continue after partial failure. Rollback or document partial state.
- **Logging:** Use `logging.FromContext(ctx)` or injected `*slog.Logger`. NEVER `fmt.Println`, `log.Fatal`, `log.Printf`, or bare `slog.Info()`. Error key is ALWAYS `"error"` (not `"err"`). Use field constants from `internal/logging/fields.go` (e.g., `logging.FieldProjectID`, `logging.FieldError`). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted.
- **HTTP clients:** NEVER create `&http.Client{}` without a `Timeout` field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
- **Config:** Use `envutil.GetEnv()` / `GetEnvInt()` / `GetEnvBool()` from `internal/envutil` for all env var reads with defaults. NEVER define local `getEnv` helpers — they duplicate and drift. Raw `os.Getenv()` is fine for required values with no default (secrets, passwords).
- **Handler timeouts:** NEVER use inline `time.Duration` in `context.WithTimeout` inside handlers. Use constants from `internal/handlers/timeouts.go`: `TimeoutFastLookup` (5s), `TimeoutLookup` (10s), `TimeoutStandard` (30s), `TimeoutHeavyWrite` (60s), `TimeoutOrchestration` (90s), `TimeoutLongRunning` (10m).
- **Worker timeouts:** NEVER use inline `time.Duration` in `context.WithTimeout` inside worker code. Use constants from `internal/worker/timeouts.go`: `TimeoutQuickOp` (5s), `TimeoutHealthCheck` (10s), `TimeoutMaintenance` (30s), `TimeoutWorkExecution` (10m).
- **Response helpers:** Use `api.WriteUnauthorized`, `api.WriteForbidden`, `api.WriteBadRequest`, `api.WriteNotFound`, `api.WriteInternalError` instead of bare `api.WriteError` with status codes. Only use `api.WriteError` directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
- **Auth scopes:** EVERY route in a handler's `Mount()` function MUST use `r.With(auth.RequireScope(...))`. Use `ScopeProjectsRead` for GET endpoints, `ScopeProjectsExecute` for mutation endpoints. Use the appropriate domain scope (e.g., `ScopeQueueRead`, `ScopeBuildWrite`) when available. Admin-only endpoints use `auth.ScopeAdmin` alone. See `internal/handlers/builds.go` for the canonical pattern.
- **JSON decoding:** ALWAYS use `api.DecodeJSON(r, &req)` to decode request bodies. NEVER use raw `json.NewDecoder(r.Body).Decode()`. The helper handles nil body, EOF, and returns typed errors. Decode error message is always `"invalid request body"`.
- **Validation:** Use `validate.New()` accumulator for 2+ field checks in handlers: `v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }`. Single-field checks can stay inline. NEVER duplicate validation logic that exists in `internal/validate`.
- **Error wrapping:** ALWAYS use `%w` (not `%v`) when wrapping errors in `fmt.Errorf`. Using `%v` stringifies the error and breaks `errors.Is`/`errors.As` chains. For non-error types (structs, slices), create a typed error implementing `error` instead of stringifying with `%v`.
- **Context propagation:** NEVER use `context.Background()` in handlers, services, or adapters that receive a context parameter. Always derive from parent context. Use `context.WithoutCancel(ctx)` for fire-and-forget goroutines that need tracing but independent cancellation.
## Quick Reference
```bash
# Required env vars (add to ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<from rdev-credentials secret>"
# Infrastructure credentials stored in .secrets (gitignored)
# See: .claude/guides/ops/credentials.md for setup
# Keys: GITEA_TOKEN, CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_ID, WOODPECKER_*
# Run locally
go run ./cmd/rdev-api
# Run tests
go test ./...
# Automated deploy (push triggers Woodpecker CI)
git push origin main # Builds and deploys automatically via Woodpecker
# Manual deploy (if Woodpecker unavailable)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api
# Images are at registry.threesix.ai/rdev/{api,worker,claudebox}
# Verify pods
kubectl get pods -n rdev
# View logs
./scripts/logs.sh # Last 100 lines
./scripts/logs.sh -f # Follow/stream
./scripts/logs.sh -n 500 # Last 500 lines
./scripts/logs.sh -e # Errors only
./scripts/logs.sh -p # Previous crashed container
# Shell aliases (after source ~/.zshrc)
rdev-logs # Last 100 lines
rdev-logs-f # Follow/stream
rdev-pods # List pods
# API calls (NOTE: $RDEV_API_KEY doesn't expand in curl -H, use the test script instead)
# ./cookbooks/scripts/landing-test.sh run|status|teardown <name>
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/work/stats
```
## Architecture Overview
```
cmd/rdev-api/ # Entry point, DI, OpenAPI spec
cmd/sdlc/ # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/ # SDLC library (types, classifier, state I/O)
├── domain/ # Pure business models (no deps)
├── port/ # Interface contracts
├── service/ # Business logic orchestration
├── handlers/ # HTTP handlers (REST endpoints)
├── adapter/ # Infrastructure implementations
│ ├── kubernetes/ # K8s client, pod executor
│ ├── postgres/ # Audit, queue, webhooks, credentials
│ ├── cockroach/ # Database provisioning (project DBs)
│ ├── redis/ # Cache provisioning via ACLs
│ ├── gitea/ # Git repository management
│ ├── cloudflare/ # DNS provider
│ └── woodpecker/ # CI provider
├── auth/ # API key auth, scopes
├── middleware/ # Rate limiting
├── worker/ # Background queue processor
└── webhook/ # Event dispatcher
pkg/api/ # HTTP framework (app, responses)
deployments/k8s/ # Kustomize manifests
└── base/templates/ # Project templates
scripts/ # Operational scripts
├── load-credentials.sh # Load secrets to rdev-api
├── release.sh # Build, tag, push releases
└── logs.sh # View rdev-api logs
cookbooks/ # End-to-end workflow guides
├── landing-page.md # Landing page deployment flow
└── scripts/ # Executable cookbook scripts
```
## Key Concepts
- **Projects**: Kubernetes pods with Claude Code, discovered by label `rdev.orchard9.ai/project=true`
- **Workers**: Shared claudebox pods that execute any project's tasks, labeled `rdev.orchard9.ai/role=worker`
- **Work Queue**: Async task queue for build/test/deploy jobs
- **Credentials**: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
- **Commands**: Claude/shell/git commands executed via kubectl exec, streamed via SSE
- **API Keys**: Scoped auth with project restrictions, IP filtering, expiration
- **Webhooks**: Event subscriptions with retry delivery
- **Templates**: Project scaffolding with .woodpecker.yml, .claude/, and stack files
## threesix.ai Platform Status
| Feature | Status | Description |
|---------|--------|-------------|
| Woodpecker Auto-Activation | **Done** | CI enabled on project creation via SDK |
| Project Templates | **Done** | Embedded templates (astro-landing, go-api, default) |
| Work Queue | **Done** | PostgreSQL with atomic dequeue, retry logic |
| Multi-Provider Agents | **Done** | Claude Code + OpenCode via registry |
| Webhooks | **Done** | Event dispatcher with retry delivery |
| Embedded Worker | **Done** | Goroutine in rdev-api, polls queue |
| Multi-Domain Support | **Done** | Auto-slugs, custom subdomains, DNS aliases |
| Build Event Streaming | **Done** | Real-time SSE/WebSocket for build output |
| Database Provisioning | **Done** | CockroachDB adapter with auto-provisioning |
| Cache Provisioning | **Done** | Redis ACL-based adapter with auto-provisioning |
| Build Orchestration | Planned | Structured build specs via API |
| SDLC Orchestration | **Done** | Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands |
| Composable Monorepo Templates | **Done** | Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli) |
| Visual Verification | Planned | Playwright screenshots/video + AI evaluation for feature completeness |
**Current Version:** v0.10.25
## Constraints
- **ON-PREM k3s** - not GKE, always set KUBECONFIG
- **Kustomize only** - no ArgoCD
- **chi/v5 router** - no gin, echo, or other frameworks
- **sqlx for DB** - no GORM
- **slog for logging** - no logrus, zap