jordan d69da6d627 feat: add structured logging infrastructure and SDLC extensions

Major changes:
- Add internal/logging package with field constants, context propagation,
  sensitive data auto-redaction, and per-component log levels
- Add worker timeout constants (TimeoutQuickOp, TimeoutHealthCheck, etc.)
- Extend SDLC with callback handlers, generate endpoints, and executor
- Add new cookbook trees for aeries and slackpath progression
- Add skeleton templates for queue, realtime, and microservices
- Add worker component template with async job processing
- Refactor services and handlers to use new logging infrastructure
- Split component.go into component_infra.go and component_listing.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 22:56:04 -07:00

13 KiB

Raw Blame History

rdev - Remote Developer

Run Claude Code instances in isolated Kubernetes pods with REST API control. Enables bots, CI/CD systems, and external orchestrators to dispatch agentive development work to isolated environments.

Platform: threesix.ai - Agent-driven development at scale with shared worker pools.

Find Your Guide

If you need to...	Read this
Set up local dev	local/setup.md
Run tests	local/testing.md
Write Go code / handlers	backend/go-guidelines.md
Understand pkg/api	packages/api-framework.md
Add a new handler/endpoint	backend/adding-handlers.md
Understand hexagonal architecture	backend/hexagonal.md
Deploy to k3s	ops/deploying.md
Release a new version	ops/releasing.md
Work with Kubernetes adapters	services/kubernetes.md
Database / migrations	ops/database.md
Manage credentials	ops/credentials.md
Work queue system	services/work-queue.md
Worker pool management	services/worker-pool.md
Project templates	services/templates.md
Composable monorepo templates	services/composable-monorepo.md
E2E testing strategy	services/e2e-testing-strategy.md
Cookbook tree system (commands)	services/cookbook-trees.md
Write E2E cookbook scripts	cookbook-scripts/SKILL.md
Build orchestration	services/build-orchestration.md
Build event streaming	services/build-streaming.md
Resource provisioning plan	services/resource-provisioning-plan.md
Database provisioning	services/database-provisioning.md
Cache provisioning	services/cache-provisioning.md
CockroachDB operations	services/cockroachdb.md
Redis operations	services/redis.md
DNS / Cloudflare	services/dns-cloudflare.md
Network policies / internal routing	ops/networking.md
Debug external system health	ops/external-health-diagnostics.md
SDLC orchestration	services/sdlc.md
Visual verification (Playwright)	services/visual-verification.md
Structured logging	`internal/logging/` - field constants, context propagation, redaction

Critical Rules

LLM vs rdev: LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks.
Pod git ops: Git operations run inside pods via PodGitOperations (kubectl exec), never locally.
No dead code: Delete unused code immediately. Don't leave "might use later" exports.
KUBECONFIG: ALWAYS set export KUBECONFIG=~/.kube/orchard9-k3sf.yaml before kubectl commands
Hexagonal: Domain models in internal/domain/ must have ZERO external dependencies
Ports: All adapters implement interfaces from internal/port/
Migrations: NEVER modify committed migrations. Create NEW ones.
500-line limit: Files exceeding 500 lines must be split
Tests: All handlers and services require tests
Multi-step ops: NEVER log-and-continue after partial failure. Rollback or document partial state.
Logging: Use logging.FromContext(ctx) or injected *slog.Logger. NEVER fmt.Println, log.Fatal, log.Printf, or bare slog.Info(). Error key is ALWAYS "error" (not "err"). Use field constants from internal/logging/fields.go (e.g., logging.FieldProjectID, logging.FieldError). Log once at boundary (handlers/workers log, services return errors). Sensitive data (passwords, tokens, keys) is auto-redacted.
HTTP clients: NEVER create &http.Client{} without a Timeout field. All HTTP clients must have explicit timeouts (30s standard, 5s for health checks). A bare client can hang indefinitely.
Config: Use envutil.GetEnv() / GetEnvInt() / GetEnvBool() from internal/envutil for all env var reads with defaults. NEVER define local getEnv helpers — they duplicate and drift. Raw os.Getenv() is fine for required values with no default (secrets, passwords).
Handler timeouts: NEVER use inline time.Duration in context.WithTimeout inside handlers. Use constants from internal/handlers/timeouts.go: TimeoutFastLookup (5s), TimeoutLookup (10s), TimeoutStandard (30s), TimeoutHeavyWrite (60s), TimeoutOrchestration (90s), TimeoutLongRunning (10m).
Worker timeouts: NEVER use inline time.Duration in context.WithTimeout inside worker code. Use constants from internal/worker/timeouts.go: TimeoutQuickOp (5s), TimeoutHealthCheck (10s), TimeoutMaintenance (30s), TimeoutWorkExecution (10m).
Response helpers: Use api.WriteUnauthorized, api.WriteForbidden, api.WriteBadRequest, api.WriteNotFound, api.WriteInternalError instead of bare api.WriteError with status codes. Only use api.WriteError directly for custom error codes (e.g., KEY_REVOKED, IP_NOT_ALLOWED).
Auth scopes: EVERY route in a handler's Mount() function MUST use r.With(auth.RequireScope(...)). Use ScopeProjectsRead for GET endpoints, ScopeProjectsExecute for mutation endpoints. Use the appropriate domain scope (e.g., ScopeQueueRead, ScopeBuildWrite) when available. Admin-only endpoints use auth.ScopeAdmin alone. See internal/handlers/builds.go for the canonical pattern.
JSON decoding: ALWAYS use api.DecodeJSON(r, &req) to decode request bodies. NEVER use raw json.NewDecoder(r.Body).Decode(). The helper handles nil body, EOF, and returns typed errors. Decode error message is always "invalid request body".
Validation: Use validate.New() accumulator for 2+ field checks in handlers: v := validate.New(); v.Required(req.Name, "name"); v.Required(req.Type, "type"); if err := v.Error() { ... }. Single-field checks can stay inline. NEVER duplicate validation logic that exists in internal/validate.
Error wrapping: ALWAYS use %w (not %v) when wrapping errors in fmt.Errorf. Using %v stringifies the error and breaks errors.Is/errors.As chains. For non-error types (structs, slices), create a typed error implementing error instead of stringifying with %v.

Quick Reference

# Required env vars (add to ~/.zshrc)
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai"
export RDEV_API_KEY="<from rdev-credentials secret>"

# Infrastructure credentials stored in .secrets (gitignored)
# See: .claude/guides/ops/credentials.md for setup
# Keys: GITEA_TOKEN, CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_ID, WOODPECKER_*

# Run locally
go run ./cmd/rdev-api

# Run tests
go test ./...

# Release + deploy (one command)
./scripts/release.sh v0.10.1 "Description of changes" --deploy

# Release only (no deploy)
./scripts/release.sh v0.10.1 "Description of changes"

# Manual deploy (if needed)
kubectl apply -f deployments/k8s/base/rdev-api.yaml
kubectl rollout restart -n rdev deployment/rdev-api

# Verify pods
kubectl get pods -n rdev

# View logs
./scripts/logs.sh           # Last 100 lines
./scripts/logs.sh -f        # Follow/stream
./scripts/logs.sh -n 500    # Last 500 lines
./scripts/logs.sh -e        # Errors only
./scripts/logs.sh -p        # Previous crashed container

# Shell aliases (after source ~/.zshrc)
rdev-logs                   # Last 100 lines
rdev-logs-f                 # Follow/stream
rdev-pods                   # List pods

# API calls (NOTE: $RDEV_API_KEY doesn't expand in curl -H, use the test script instead)
# ./cookbooks/scripts/landing-test.sh run|status|teardown <name>
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects
curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/work/stats

Architecture Overview

cmd/rdev-api/          # Entry point, DI, OpenAPI spec
cmd/sdlc/              # SDLC CLI binary (runs inside project pods)
internal/
├── sdlc/              # SDLC library (types, classifier, state I/O)
├── domain/            # Pure business models (no deps)
├── port/              # Interface contracts
├── service/           # Business logic orchestration
├── handlers/          # HTTP handlers (REST endpoints)
├── adapter/           # Infrastructure implementations
│   ├── kubernetes/    # K8s client, pod executor
│   ├── postgres/      # Audit, queue, webhooks, credentials
│   ├── cockroach/     # Database provisioning (project DBs)
│   ├── redis/         # Cache provisioning via ACLs
│   ├── gitea/         # Git repository management
│   ├── cloudflare/    # DNS provider
│   └── woodpecker/    # CI provider
├── auth/              # API key auth, scopes
├── middleware/        # Rate limiting
├── worker/            # Background queue processor
└── webhook/           # Event dispatcher
pkg/api/               # HTTP framework (app, responses)
deployments/k8s/       # Kustomize manifests
  └── base/templates/  # Project templates
scripts/               # Operational scripts
  ├── load-credentials.sh  # Load secrets to rdev-api
  ├── release.sh           # Build, tag, push releases
  └── logs.sh              # View rdev-api logs
cookbooks/             # End-to-end workflow guides
  ├── landing-page.md      # Landing page deployment flow
  └── scripts/             # Executable cookbook scripts

Key Concepts

Projects: Kubernetes pods with Claude Code, discovered by label rdev.orchard9.ai/project=true
Workers: Shared claudebox pods that execute any project's tasks, labeled rdev.orchard9.ai/role=worker
Work Queue: Async task queue for build/test/deploy jobs
Credentials: Infrastructure secrets (tokens, keys) stored encrypted in PostgreSQL
Commands: Claude/shell/git commands executed via kubectl exec, streamed via SSE
API Keys: Scoped auth with project restrictions, IP filtering, expiration
Webhooks: Event subscriptions with retry delivery
Templates: Project scaffolding with .woodpecker.yml, .claude/, and stack files

threesix.ai Platform Status

Feature	Status	Description
Woodpecker Auto-Activation	Done	CI enabled on project creation via SDK
Project Templates	Done	Embedded templates (astro-landing, go-api, default)
Work Queue	Done	PostgreSQL with atomic dequeue, retry logic
Multi-Provider Agents	Done	Claude Code + OpenCode via registry
Webhooks	Done	Event dispatcher with retry delivery
Embedded Worker	Done	Goroutine in rdev-api, polls queue
Multi-Domain Support	Done	Auto-slugs, custom subdomains, DNS aliases
Build Event Streaming	Done	Real-time SSE/WebSocket for build output
Database Provisioning	Done	CockroachDB adapter with auto-provisioning
Cache Provisioning	Done	Redis ACL-based adapter with auto-provisioning
Build Orchestration	Planned	Structured build specs via API
SDLC Orchestration	Done	Deterministic feature lifecycle with classifier engine, API, orchestrator, and 15 skeleton commands
Composable Monorepo Templates	Done	Monorepo skeleton + component templates (service, worker, app-astro, app-react, cli)
Visual Verification	Planned	Playwright screenshots/video + AI evaluation for feature completeness

Current Version: v0.10.25

Constraints

ON-PREM k3s - not GKE, always set KUBECONFIG
Kustomize only - no ArgoCD
chi/v5 router - no gin, echo, or other frameworks
sqlx for DB - no GORM
slog for logging - no logrus, zap

13 KiB Raw Blame History