rdev/CLAUDE.md
jordan 592b2d5ec0 fix: clarify database types across docs and fix video storage persistence
Two distinct fixes:

1. Database terminology: Make it crystal clear that generated projects use
   CockroachDB in production and PostgreSQL for local dev, while the rdev
   platform itself uses PostgreSQL. Updated 15 files across skeleton agents,
   component templates, cookbook trees, and platform docs.

2. Video storage: VideoHandler was ignoring vid.Data bytes (already downloaded
   by the Gemini adapter with auth) and re-downloading from the provider URL
   with a plain GET — which fails because Gemini URLs require API key auth.
   Now uses vid.Data first, falls back to downloadURL only for public URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:13:21 -07:00

234 lines
17 KiB
Markdown

# rdev - Remote Developer
Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.
**Platform:** threesix.ai - Agent-driven development at scale with shared worker pools.
## Terminology
| Term | Meaning | Location |
|------|---------|----------|
| **platform** | rdev itself (orchestrator API, handlers, workers) | `cmd/rdev-api/`, `internal/`, `pkg/api/` |
| **skeleton** | Code that ships in generated projects | `internal/adapter/templates/templates/skeleton/` |
| **component templates** | Service/worker/app/cli templates added to skeleton | `templates/components/{service,worker,cli,app-*}/` |
When discussing code: "add to **platform**" = edit rdev; "add to **skeleton**" = edit project templates.
### Database Rule
| Context | Database | Details |
|---------|----------|---------|
| **rdev platform** | PostgreSQL | API keys, audit logs, work queue, credentials (`internal/adapter/postgres/`) |
| **Generated projects (production)** | CockroachDB | Provisioned per-project by rdev (`internal/adapter/cockroach/`) |
| **Generated projects (local dev)** | PostgreSQL | Via docker-compose, wire-compatible with CockroachDB |
Both use `lib/pq` driver. The `type: postgres` component API provisions **CockroachDB** in production — the name is a legacy artifact. Skeleton SQL must be compatible with both PostgreSQL and CockroachDB.
## Find Your Guide
| If you need to... | Read this |
|-------------------|-----------|
| **Set up local dev** | [local/setup.md](.claude/guides/local/setup.md) |
| **Run tests** | [local/testing.md](.claude/guides/local/testing.md) |
| **Write Go code / handlers** | [backend/go-guidelines.md](.claude/guides/backend/go-guidelines.md) |
| **Understand pkg/api** | [packages/api-framework.md](.claude/guides/packages/api-framework.md) |
| **Add a new handler/endpoint** | [backend/adding-handlers.md](.claude/guides/backend/adding-handlers.md) |
| **Understand hexagonal architecture** | [backend/hexagonal.md](.claude/guides/backend/hexagonal.md) |
| **Deploy to k3s** | [ops/deploying.md](.claude/guides/ops/deploying.md) |
| **Release a new version** | [ops/releasing.md](.claude/guides/ops/releasing.md) |
| **Work with Kubernetes adapters** | [services/kubernetes.md](.claude/guides/services/kubernetes.md) |
| **Database / migrations** | [ops/database.md](.claude/guides/ops/database.md) |
| **Manage credentials** | [ops/credentials.md](.claude/guides/ops/credentials.md) |
| **Work queue system** | [services/work-queue.md](.claude/guides/services/work-queue.md) |
| **Worker pool management** | [services/worker-pool.md](.claude/guides/services/worker-pool.md) |
| **Project templates** | [services/templates.md](.claude/guides/services/templates.md) |
| **Composable monorepo templates** | [services/composable-monorepo.md](.claude/guides/services/composable-monorepo.md) |
| **E2E testing strategy** | [services/e2e-testing-strategy.md](.claude/guides/services/e2e-testing-strategy.md) |
| **Cookbook tree system (commands)** | [services/cookbook-trees.md](.claude/guides/services/cookbook-trees.md) |
| **Slackpath reference architectures** | [services/cookbook-trees.md](.claude/guides/services/cookbook-trees.md#slackpath-trees-reference-architectures) |
| **Write cookbook trees** | [cookbook-trees/SKILL.md](.claude/skills/cookbook-trees/SKILL.md) |
| **Build/maintain skeleton packages** | [skeleton-craftsman/SKILL.md](.claude/skills/skeleton-craftsman/SKILL.md) |
| **Build orchestration** | [services/build-orchestration.md](.claude/guides/services/build-orchestration.md) |
| **Build event streaming** | [services/build-streaming.md](.claude/guides/services/build-streaming.md) |
| **Resource provisioning plan** | [services/resource-provisioning-plan.md](.claude/guides/services/resource-provisioning-plan.md) |
| **Database provisioning** | [services/database-provisioning.md](.claude/guides/services/database-provisioning.md) |
| **Cache provisioning** | [services/cache-provisioning.md](.claude/guides/services/cache-provisioning.md) |
| **CockroachDB operations** | [services/cockroachdb.md](.claude/guides/services/cockroachdb.md) |
| **Redis operations** | [services/redis.md](.claude/guides/services/redis.md) |
| **DNS / Cloudflare** | [services/dns-cloudflare.md](.claude/guides/services/dns-cloudflare.md) |
| **Network policies / internal routing** | [ops/networking.md](.claude/guides/ops/networking.md) |
| **Debug external system health** | [ops/external-health-diagnostics.md](.claude/guides/ops/external-health-diagnostics.md) |
| **SDLC orchestration** | [services/sdlc.md](.claude/guides/services/sdlc.md) |
| **Visual verification (Playwright)** | [services/visual-verification.md](.claude/guides/services/visual-verification.md) |
| **Interactive remote development** | [services/interactive-remote-dev.md](.claude/guides/services/interactive-remote-dev.md) |
| **Gitea 1.22 / SDK / webhooks** | [ops/gitea-1.22.md](.claude/guides/ops/gitea-1.22.md) |
| **Go 1.25 features & migration** | [backend/go-1.25.md](.claude/guides/backend/go-1.25.md) |
| **Woodpecker CI v3 pipelines** | [ops/woodpecker-v3.md](.claude/guides/ops/woodpecker-v3.md) |
| **Traefik v3 ingress & middleware** | [ops/traefik-v3.md](.claude/guides/ops/traefik-v3.md) |
| **Zot container registry** | [ops/zot-registry.md](.claude/guides/ops/zot-registry.md) |
| **cert-manager / TLS certificates** | [ops/cert-manager.md](.claude/guides/ops/cert-manager.md) |
| **Structured logging** | `internal/logging/` - field constants, context propagation, redaction |
## Critical Rules
- **Frustration = systemic fix:** When the user says they're tired of repeating something, stop what you're doing and find or create a systemic fix in `.claude/**/*` or `CLAUDE.md` — don't just apologize and do the same thing again.
- **AI credentials are provisioned:** rdev injects `LAOZHANG_API_KEY` and `GEMINI_API_KEY` as env vars into every deployed component (`component_deploy.go:fetchProjectCredentials`). Skeleton code reads them with `os.Getenv()`. Never treat AI packages as needing external setup.
- **Root cause fixes:** When diagnosing failures in generated projects, NEVER patch the project directly. Find the systemic root cause in: (1) **platform** - rdev handlers/services that create resources, (2) **skeleton** - templates that ship in generated projects, or (3) **cookbook** - test scripts with wrong assumptions. Fix the source, not the symptom. Every project-specific fix is technical debt that will recur.
- **LLM vs rdev:** LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
- **Pod git ops:** Git operations run inside pods via `PodGitOperations` (kubectl exec), never locally.
- **No dead code:** Delete unused code immediately. Don't leave "might use later" exports.
- **KUBECONFIG:** ALWAYS set `export KUBECONFIG=~/.kube/orchard9-k3sf.yaml` before kubectl commands
- **Container builds:** NEVER build Docker images locally. ALWAYS use `git push origin main` to trigger Woodpecker CI which builds via in-cluster Kaniko. Local Docker builds produce wrong architecture (arm64 vs amd64). If an image is missing from registry.threesix.ai, push to origin — don't improvise.
- **Hexagonal:** Domain models in `internal/domain/` must have ZERO external dependencies
- **Ports:** All adapters implement interfaces from `internal/port/`
- **Migrations:** NEVER modify committed migrations. Create NEW ones.
- **500-line limit:** Files exceeding 500 lines must be split
- **Tests:** All handlers and services require tests
- **No fallbacks:** NEVER design "try X, fall back to Y" flows — fix X. Fallbacks hide errors and deliver inferior experiences.
- **Multi-step ops:** NEVER log-and-continue after partial failure. Rollback or document partial state.
- **Logging:** Use `logging.FromContext(ctx)` or injected `*slog.Logger`. NEVER `fmt.Println`, `log.Fatal`, `log.Printf`, or bare `slog.Info()`. Error key is ALWAYS `"error"` (not `"err"`). Use field constants from `internal/logging/fields.go` (e.g., `logging.FieldProjectID`, `logging.FieldError`). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted.
- **HTTP clients:** NEVER create `&http.Client{}` without a `Timeout` field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
- **Config:** Use `envutil.GetEnv()` / `GetEnvInt()` / `GetEnvBool()` from `internal/envutil` for all env var reads with defaults. NEVER define local `getEnv` helpers — they duplicate and drift. Raw `os.Getenv()` is fine for required values with no default (secrets, passwords).
- **Handler timeouts:** NEVER use inline `time.Duration` in `context.WithTimeout` inside handlers. Use constants from `internal/handlers/timeouts.go`: `TimeoutFastLookup` (5s), `TimeoutLookup` (10s), `TimeoutStandard` (30s), `TimeoutHeavyWrite` (60s), `TimeoutOrchestration` (90s), `TimeoutLongRunning` (10m).
- **Worker timeouts:** NEVER use inline `time.Duration` in `context.WithTimeout` inside worker code. Use constants from `internal/worker/timeouts.go`: `TimeoutQuickOp` (5s), `TimeoutHealthCheck` (10s), `TimeoutMaintenance` (30s), `TimeoutWorkExecution` (10m).
- **Response helpers:** Use `api.WriteUnauthorized`, `api.WriteForbidden`, `api.WriteBadRequest`, `api.WriteNotFound`, `api.WriteInternalError` instead of bare `api.WriteError` with status codes. Only use `api.WriteError` directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
- **Auth scopes:** EVERY route in a handler's `Mount()` function MUST use `r.With(auth.RequireScope(...))`. Use `ScopeProjectsRead` for GET endpoints, `ScopeProjectsExecute` for mutation endpoints. Use the appropriate domain scope (e.g., `ScopeQueueRead`, `ScopeBuildWrite`) when available. Admin-only endpoints use `auth.ScopeAdmin` alone. See `internal/handlers/builds.go` for the canonical pattern.
- **JSON decoding:** ALWAYS use `api.DecodeJSON(r, &req)` to decode request bodies. NEVER use raw `json.NewDecoder(r.Body).Decode()`. The helper handles nil body, EOF, and returns typed errors. Decode error message is always `"invalid request body"`.
- **Validation:** Use `validate.New()` accumulator for 2+ field checks in handlers: `v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }`. Single-field checks can stay inline. NEVER duplicate validation logic that exists in `internal/validate`.
- **Error wrapping:** ALWAYS use `%w` (not `%v`) when wrapping errors in `fmt.Errorf`. Using `%v` stringifies the error and breaks `errors.Is`/`errors.As` chains. For non-error types (structs, slices), create a typed error implementing `error` instead of stringifying with `%v`.
- **Context propagation:** NEVER use `context.Background()` in handlers, services, or adapters that receive a context parameter. Always derive from parent context. Use `context.WithoutCancel(ctx)` for fire-and-forget goroutines that need tracing but independent cancellation.
- **Cookbooks:** Load `.claude/skills/cookbook-trees/SKILL.md` before writing/modifying any cookbook tree.
- **Version alignment:** Skeleton templates MUST use consistent versions across all files: Go 1.25 (go.work, go.mod, Dockerfiles, CI images), Node 20, Alpine 3.19. When updating a version, grep the entire templates/ tree and update ALL occurrences to prevent drift.
## Quick Reference
```bash
# Environment (should already be in ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<your-api-key>" # Already set in ~/.zshrc
# Verify environment is loaded
echo $RDEV_API_KEY # Should print a base64 string
# If empty: source ~/.zshrc
# For scripts: use cookbooks/scripts/common.sh library
# Provides: api_call(), wait_for_build(), wait_for_pipeline(), wait_for_site()
# Example: source "$(dirname "$0")/common.sh" && api_call GET "/health"
# Run locally
go run ./cmd/rdev-api
# Run tests
go test ./...
# Automated deploy (push triggers Woodpecker CI)
git push origin main # Builds and deploys automatically via Woodpecker
# Manual deploy (if Woodpecker unavailable)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api
# Images are at registry.threesix.ai/rdev/{api,worker,claudebox}
# Verify pods
kubectl get pods -n rdev
# View logs
./scripts/logs.sh # Last 100 lines
./scripts/logs.sh -f # Follow/stream
./scripts/logs.sh -n 500 # Last 500 lines
./scripts/logs.sh -e # Errors only
./scripts/logs.sh -p # Previous crashed container
# Shell aliases (after source ~/.zshrc)
rdev-logs # Last 100 lines
rdev-logs-f # Follow/stream
rdev-pods # List pods
# API calls - use cookbook test scripts (they handle auth via common.sh)
./cookbooks/scripts/landing-test.sh run|status|teardown <name>
./cookbooks/scripts/tree-runner.sh run <tree-name> --project-name <name>
# Or direct API calls (requires env vars above)
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health | jq
curl -s -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects | jq
```
## Architecture Overview
```
cmd/rdev-api/ # Entry point, DI, OpenAPI spec
cmd/sdlc/ # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/ # SDLC library (types, classifier, state I/O)
├── domain/ # Pure business models (no deps)
├── port/ # Interface contracts
├── service/ # Business logic orchestration
├── handlers/ # HTTP handlers (REST endpoints)
├── adapter/ # Infrastructure implementations
│ ├── kubernetes/ # K8s client, pod executor
│ ├── postgres/ # Audit, queue, webhooks, credentials
│ ├── cockroach/ # Database provisioning (project DBs)
│ ├── redis/ # Cache provisioning via ACLs
│ ├── gitea/ # Git repository management
│ ├── cloudflare/ # DNS provider
│ └── woodpecker/ # CI provider
├── auth/ # API key auth, scopes
├── middleware/ # Rate limiting
├── worker/ # Background queue processor
└── webhook/ # Event dispatcher
pkg/api/ # HTTP framework (app, responses)
deployments/k8s/ # Kustomize manifests
└── base/templates/ # Project templates
scripts/ # Operational scripts
├── load-credentials.sh # Load secrets to rdev-api
├── release.sh # Build, tag, push releases
└── logs.sh # View rdev-api logs
cookbooks/ # End-to-end workflow guides
├── landing-page.md # Landing page deployment flow
└── scripts/ # Executable cookbook scripts
```
## Key Concepts
- **Projects**: Kubernetes pods with Claude Code, discovered by label `rdev.orchard9.ai/project=true`
- **Workers**: Shared claudebox pods that execute any project's tasks, labeled `rdev.orchard9.ai/role=worker`
- **Work Queue**: Async task queue for build/test/deploy jobs
- **Credentials**: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
- **Commands**: Claude/shell/git commands executed via kubectl exec, streamed via SSE
- **API Keys**: Scoped auth with project restrictions, IP filtering, expiration
- **Webhooks**: Event subscriptions with retry delivery
- **Templates**: Project scaffolding with .woodpecker.yml, .claude/, and stack files
## threesix.ai Platform Status
| Feature | Status | Description |
|---------|--------|-------------|
| Woodpecker Auto-Activation | **Done** | CI enabled on project creation via SDK |
| Project Templates | **Done** | Embedded templates (astro-landing, go-api, default) |
| Work Queue | **Done** | PostgreSQL with atomic dequeue, retry logic |
| Multi-Provider Agents | **Done** | Claude Code + OpenCode via registry |
| Webhooks | **Done** | Event dispatcher with retry delivery |
| Embedded Worker | **Done** | Goroutine in rdev-api, polls queue |
| Multi-Domain Support | **Done** | Auto-slugs, custom subdomains, DNS aliases |
| Build Event Streaming | **Done** | Real-time SSE/WebSocket for build output |
| Database Provisioning | **Done** | CockroachDB adapter with auto-provisioning |
| Cache Provisioning | **Done** | Redis ACL-based adapter with auto-provisioning |
| Build Orchestration | Planned | Structured build specs via API |
| SDLC Orchestration | **Done** | Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands |
| Composable Monorepo Templates | **Done** | Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli) |
| Visual Verification | Planned | Playwright screenshots/video + AI evaluation for feature completeness |
| Checkout/Checkin | **Done** | Sidecar dev flow: temporary git tokens, branch checkout, review on checkin |
| Interactive Remote Dev | **Done** | Sessions with pod binding, command execution, SSE streaming, ephemeral preview URLs |
**Current Version:** v0.10.25
## Constraints
- **ON-PREM k3s** - not GKE, always set KUBECONFIG
- **Kustomize only** - no ArgoCD
- **chi/v5 router** - no gin, echo, or other frameworks
- **sqlx for DB** - no GORM
- **slog for logging** - no logrus, zap