rdev/CLAUDE.md
jordan c280a92012 feat: add operations audit system and template improvements
Operations Audit (new feature):
- Add Operation domain model with status tracking (pending, running, completed, failed, cancelled)
- Add OperationRepository with PostgreSQL implementation
- Add OperationService for CRUD and lifecycle management
- Add operations handlers (list, get, cancel endpoints)
- Add migration 015_operations.sql for operations table
- Add operation cleanup worker for stale operation handling
- Add ErrOperationNotFound to domain errors

Template Improvements:
- Add CLAUDE.md configuration files to astro-landing, default, and go-api templates
- Fix PORT template variable usage in nginx configs for app templates
- Add replace directives for local pkg module in Go templates
- Simplify Go service/worker Dockerfiles for workspace builds
- Fix TypeScript error in logger template

Other:
- Refactor landing-test.sh cookbook script
- Update CLAUDE.md version reference

Note: Some files exceed 500-line limit (pre-existing debt + new feature)
- component.go: 550 lines (unchanged, pre-existing)
- main.go: 522 lines (added operations wiring)
- operation_repo.go: 569 lines (new, needs splitting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:57 -07:00

177 lines
10 KiB
Markdown

# rdev - Remote Developer
Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.
**Platform:** threesix.ai - Agent-driven development at scale with shared worker pools.
## Find Your Guide
| If you need to... | Read this |
|-------------------|-----------|
| **Set up local dev** | [local/setup.md](.claude/guides/local/setup.md) |
| **Run tests** | [local/testing.md](.claude/guides/local/testing.md) |
| **Write Go code / handlers** | [backend/go-guidelines.md](.claude/guides/backend/go-guidelines.md) |
| **Understand pkg/api** | [packages/api-framework.md](.claude/guides/packages/api-framework.md) |
| **Add a new handler/endpoint** | [backend/adding-handlers.md](.claude/guides/backend/adding-handlers.md) |
| **Understand hexagonal architecture** | [backend/hexagonal.md](.claude/guides/backend/hexagonal.md) |
| **Deploy to k3s** | [ops/deploying.md](.claude/guides/ops/deploying.md) |
| **Release a new version** | [ops/releasing.md](.claude/guides/ops/releasing.md) |
| **Work with Kubernetes adapters** | [services/kubernetes.md](.claude/guides/services/kubernetes.md) |
| **Database / migrations** | [ops/database.md](.claude/guides/ops/database.md) |
| **Manage credentials** | [ops/credentials.md](.claude/guides/ops/credentials.md) |
| **Work queue system** | [services/work-queue.md](.claude/guides/services/work-queue.md) |
| **Worker pool management** | [services/worker-pool.md](.claude/guides/services/worker-pool.md) |
| **Project templates** | [services/templates.md](.claude/guides/services/templates.md) |
| **Composable monorepo templates** | [services/composable-monorepo.md](.claude/guides/services/composable-monorepo.md) |
| **Write E2E cookbook test scripts** | [cookbook-scripts/SKILL.md](.claude/skills/cookbook-scripts/SKILL.md) |
| **Build orchestration** | [services/build-orchestration.md](.claude/guides/services/build-orchestration.md) |
| **Build event streaming** | [services/build-streaming.md](.claude/guides/services/build-streaming.md) |
| **Resource provisioning plan** | [services/resource-provisioning-plan.md](.claude/guides/services/resource-provisioning-plan.md) |
| **Database provisioning** | [services/database-provisioning.md](.claude/guides/services/database-provisioning.md) |
| **Cache provisioning** | [services/cache-provisioning.md](.claude/guides/services/cache-provisioning.md) |
| **CockroachDB operations** | [services/cockroachdb.md](.claude/guides/services/cockroachdb.md) |
| **Redis operations** | [services/redis.md](.claude/guides/services/redis.md) |
| **DNS / Cloudflare** | [services/dns-cloudflare.md](.claude/guides/services/dns-cloudflare.md) |
| **Network policies / internal routing** | [ops/networking.md](.claude/guides/ops/networking.md) |
## Critical Rules
- **LLM vs rdev:** LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
- **Pod git ops:** Git operations run inside pods via `PodGitOperations` (kubectl exec), never locally.
- **No dead code:** Delete unused code immediately. Don't leave "might use later" exports.
- **KUBECONFIG:** ALWAYS set `export KUBECONFIG=~/.kube/orchard9-k3sf.yaml` before kubectl commands
- **Hexagonal:** Domain models in `internal/domain/` must have ZERO external dependencies
- **Ports:** All adapters implement interfaces from `internal/port/`
- **Migrations:** NEVER modify committed migrations. Create NEW ones.
- **500-line limit:** Files exceeding 500 lines must be split
- **Tests:** All handlers and services require tests
- **Multi-step ops:** NEVER log-and-continue after partial failure. Rollback or document partial state.
- **Logging:** Use injected `*slog.Logger` only. NEVER `fmt.Println`, `log.Fatal`, `log.Printf`, or bare `slog.Info()`. Error key is ALWAYS `"error"` (not `"err"`). Log once at boundary (handlers/workers log, services return errors).
- **HTTP clients:** NEVER create `&http.Client{}` without a `Timeout` field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
- **Config:** Use `envutil.GetEnv()` / `GetEnvInt()` / `GetEnvBool()` from `internal/envutil` for all env var reads with defaults. NEVER define local `getEnv` helpers — they duplicate and drift. Raw `os.Getenv()` is fine for required values with no default (secrets, passwords).
- **Handler timeouts:** NEVER use inline `time.Duration` in `context.WithTimeout` inside handlers. Use constants from `internal/handlers/timeouts.go`: `TimeoutFastLookup` (5s), `TimeoutLookup` (10s), `TimeoutStandard` (30s), `TimeoutHeavyWrite` (60s), `TimeoutOrchestration` (90s), `TimeoutLongRunning` (10m).
- **Response helpers:** Use `api.WriteUnauthorized`, `api.WriteForbidden`, `api.WriteBadRequest`, `api.WriteNotFound`, `api.WriteInternalError` instead of bare `api.WriteError` with status codes. Only use `api.WriteError` directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
- **JSON decoding:** ALWAYS use `api.DecodeJSON(r, &req)` to decode request bodies. NEVER use raw `json.NewDecoder(r.Body).Decode()`. The helper handles nil body, EOF, and returns typed errors. Decode error message is always `"invalid request body"`.
- **Validation:** Use `validate.New()` accumulator for 2+ field checks in handlers: `v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }`. Single-field checks can stay inline. NEVER duplicate validation logic that exists in `internal/validate`.
## Quick Reference
```bash
# Required env vars (add to ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<from rdev-credentials secret>"
# Infrastructure credentials stored in .secrets (gitignored)
# See: .claude/guides/ops/credentials.md for setup
# Keys: GITEA_TOKEN, CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_ID, WOODPECKER_*
# Run locally
go run ./cmd/rdev-api
# Run tests
go test ./...
# Release + deploy (one command)
./scripts/release.sh v0.10.1 "Description of changes" --deploy
# Release only (no deploy)
./scripts/release.sh v0.10.1 "Description of changes"
# Manual deploy (if needed)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api
# Verify pods
kubectl get pods -n rdev
# View logs
./scripts/logs.sh # Last 100 lines
./scripts/logs.sh -f # Follow/stream
./scripts/logs.sh -n 500 # Last 500 lines
./scripts/logs.sh -e # Errors only
./scripts/logs.sh -p # Previous crashed container
# Shell aliases (after source ~/.zshrc)
rdev-logs # Last 100 lines
rdev-logs-f # Follow/stream
rdev-pods # List pods
# API calls (NOTE: $RDEV_API_KEY doesn't expand in curl -H, use the test script instead)
# ./cookbooks/scripts/landing-test.sh run|status|teardown <name>
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/work/stats
```
## Architecture Overview
```
cmd/rdev-api/ # Entry point, DI, OpenAPI spec
internal/
├── domain/ # Pure business models (no deps)
├── port/ # Interface contracts
├── service/ # Business logic orchestration
├── handlers/ # HTTP handlers (REST endpoints)
├── adapter/ # Infrastructure implementations
│ ├── kubernetes/ # K8s client, pod executor
│ ├── postgres/ # Audit, queue, webhooks, credentials
│ ├── cockroach/ # Database provisioning (project DBs)
│ ├── redis/ # Cache provisioning via ACLs
│ ├── gitea/ # Git repository management
│ ├── cloudflare/ # DNS provider
│ └── woodpecker/ # CI provider
├── auth/ # API key auth, scopes
├── middleware/ # Rate limiting
├── worker/ # Background queue processor
└── webhook/ # Event dispatcher
pkg/api/ # HTTP framework (app, responses)
deployments/k8s/ # Kustomize manifests
└── base/templates/ # Project templates
scripts/ # Operational scripts
├── load-credentials.sh # Load secrets to rdev-api
├── release.sh # Build, tag, push releases
└── logs.sh # View rdev-api logs
cookbooks/ # End-to-end workflow guides
├── landing-page.md # Landing page deployment flow
└── scripts/ # Executable cookbook scripts
```
## Key Concepts
- **Projects**: Kubernetes pods with Claude Code, discovered by label `rdev.orchard9.ai/project=true`
- **Workers**: Shared claudebox pods that execute any project's tasks, labeled `rdev.orchard9.ai/role=worker`
- **Work Queue**: Async task queue for build/test/deploy jobs
- **Credentials**: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
- **Commands**: Claude/shell/git commands executed via kubectl exec, streamed via SSE
- **API Keys**: Scoped auth with project restrictions, IP filtering, expiration
- **Webhooks**: Event subscriptions with retry delivery
- **Templates**: Project scaffolding with .woodpecker.yml, .claude/, and stack files
## threesix.ai Platform Status
| Feature | Status | Description |
|---------|--------|-------------|
| Woodpecker Auto-Activation | **Done** | CI enabled on project creation via SDK |
| Project Templates | **Done** | Embedded templates (astro-landing, go-api, default) |
| Work Queue | **Done** | PostgreSQL with atomic dequeue, retry logic |
| Multi-Provider Agents | **Done** | Claude Code + OpenCode via registry |
| Webhooks | **Done** | Event dispatcher with retry delivery |
| Embedded Worker | **Done** | Goroutine in rdev-api, polls queue |
| Multi-Domain Support | **Done** | Auto-slugs, custom subdomains, DNS aliases |
| Build Event Streaming | **Done** | Real-time SSE/WebSocket for build output |
| Database Provisioning | **Done** | CockroachDB adapter with auto-provisioning |
| Cache Provisioning | **Done** | Redis ACL-based adapter with auto-provisioning |
| Build Orchestration | Planned | Structured build specs via API |
| Composable Monorepo Templates | **Done** | Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli) |
**Current Version:** v0.10.25
## Constraints
- **ON-PREM k3s** - not GKE, always set KUBECONFIG
- **Kustomize only** - no ArgoCD
- **chi/v5 router** - no gin, echo, or other frameworks
- **sqlx for DB** - no GORM
- **slog for logging** - no logrus, zap