diff --git a/.gitignore b/.gitignore index 31ebb81..cc8c57e 100644 --- a/.gitignore +++ b/.gitignore @@ -35,3 +35,4 @@ tmp/ *-deploy-key *-deploy-key.pub *-deploy-key.b64 +.agentive-remediation/ diff --git a/CLAUDE.md b/CLAUDE.md index 33b5ebb..12fdd30 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -22,16 +22,28 @@ Run Claude Code instances in isolated Kubernetes pods with REST API control. Ena | **Work queue system** | [services/work-queue.md](.claude/guides/services/work-queue.md) | | **Worker pool management** | [services/worker-pool.md](.claude/guides/services/worker-pool.md) | | **Project templates** | [services/templates.md](.claude/guides/services/templates.md) | +| **Composable monorepo templates** | [services/composable-monorepo.md](.claude/guides/services/composable-monorepo.md) | | **Build orchestration** | [services/build-orchestration.md](.claude/guides/services/build-orchestration.md) | +| **Build event streaming** | [services/build-streaming.md](.claude/guides/services/build-streaming.md) | +| **Resource provisioning plan** | [services/resource-provisioning-plan.md](.claude/guides/services/resource-provisioning-plan.md) | +| **Database provisioning** | [services/database-provisioning.md](.claude/guides/services/database-provisioning.md) | +| **Cache provisioning** | [services/cache-provisioning.md](.claude/guides/services/cache-provisioning.md) | +| **CockroachDB operations** | [services/cockroachdb.md](.claude/guides/services/cockroachdb.md) | +| **Redis operations** | [services/redis.md](.claude/guides/services/redis.md) | +| **DNS / Cloudflare** | [services/dns-cloudflare.md](.claude/guides/services/dns-cloudflare.md) | ## Critical Rules +- **LLM vs rdev:** LLMs generate code; rdev executes deterministic operations (git, lint, deploy). Never rely on LLMs for runbook tasks. +- **Pod git ops:** Git operations run inside pods via `PodGitOperations` (kubectl exec), never locally. +- **No dead code:** Delete unused code immediately. Don't leave "might use later" exports. - **KUBECONFIG:** ALWAYS set `export KUBECONFIG=~/.kube/orchard9-k3sf.yaml` before kubectl commands - **Hexagonal:** Domain models in `internal/domain/` must have ZERO external dependencies - **Ports:** All adapters implement interfaces from `internal/port/` - **Migrations:** NEVER modify committed migrations. Create NEW ones. - **500-line limit:** Files exceeding 500 lines must be split - **Tests:** All handlers and services require tests +- **Multi-step ops:** NEVER log-and-continue after partial failure. Rollback or document partial state. ## Quick Reference @@ -41,6 +53,10 @@ export KUBECONFIG=~/.kube/orchard9-k3sf.yaml export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai" export RDEV_API_KEY="" +# Infrastructure credentials stored in .secrets (gitignored) +# See: .claude/guides/ops/credentials.md for setup +# Keys: GITEA_TOKEN, CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_ID, WOODPECKER_* + # Run locally go run ./cmd/rdev-api @@ -91,6 +107,8 @@ internal/ ├── adapter/ # Infrastructure implementations │ ├── kubernetes/ # K8s client, pod executor │ ├── postgres/ # Audit, queue, webhooks, credentials +│ ├── cockroach/ # Database provisioning (project DBs) +│ ├── redis/ # Cache provisioning via ACLs │ ├── gitea/ # Git repository management │ ├── cloudflare/ # DNS provider │ └── woodpecker/ # CI provider @@ -132,9 +150,13 @@ cookbooks/ # End-to-end workflow guides | Webhooks | **Done** | Event dispatcher with retry delivery | | Embedded Worker | **Done** | Goroutine in rdev-api, polls queue | | Multi-Domain Support | **Done** | Auto-slugs, custom subdomains, DNS aliases | +| Build Event Streaming | **Done** | Real-time SSE/WebSocket for build output | +| Database Provisioning | **Done** | CockroachDB adapter with auto-provisioning | +| Cache Provisioning | **Done** | Redis ACL-based adapter with auto-provisioning | | Build Orchestration | Planned | Structured build specs via API | +| Composable Monorepo Templates | Planned | Monorepo skeleton + component templates | -**Current Version:** v0.10.0 +**Current Version:** v0.10.12 ## Constraints diff --git a/ai-lookup/features/build-orchestration.md b/ai-lookup/features/build-orchestration.md index 36c5da7..9f67520 100644 --- a/ai-lookup/features/build-orchestration.md +++ b/ai-lookup/features/build-orchestration.md @@ -23,7 +23,7 @@ Build orchestration enables structured build specs for bot-driven development. B - Handler: `internal/handlers/builds.go` (StartBuild, ListBuilds, GetBuild) - Handler: `internal/handlers/create_and_build.go` (CreateAndBuild) - Executor: `internal/worker/build_executor.go` (BuildSpec→AgentRequest translation) -- Git: `internal/worker/git_operations.go` (clone, commit, push with token injection) +- Git: `internal/worker/pod_git_operations.go` (post-build commit/push via kubectl exec) - Migration: `internal/db/migrations/012_worker_registry.sql` (build_audit table) ## API Endpoints @@ -42,9 +42,10 @@ Build orchestration enables structured build specs for bot-driven development. B 3. Creates BuildAuditEntry with status "pending" 4. Returns task ID immediately 5. WorkExecutor poll loop claims task from queue -6. BuildExecutor translates spec: clones repo, builds AgentRequest, calls CodeAgent.Execute() -7. On success with auto_commit: GitOperations commits and pushes changes -8. WorkExecutor reports completion with BuildResult +6. BuildExecutor builds AgentRequest, calls CodeAgent.Execute() in pod via kubectl exec +7. **Post-build phase**: If auto_commit, PodGitOperations runs `git add/commit/push` in pod + - Git operations are programmatic, not LLM-driven (deterministic) +8. WorkExecutor reports completion with BuildResult (includes commit_sha, files_changed) 9. Audit entry updated, callback URL notified ## Build Audit Statuses diff --git a/ai-lookup/features/composable-monorepo.md b/ai-lookup/features/composable-monorepo.md new file mode 100644 index 0000000..cb35f68 --- /dev/null +++ b/ai-lookup/features/composable-monorepo.md @@ -0,0 +1,106 @@ +# Composable Monorepo Templates + +**Last Updated:** 2026-01-30 +**Confidence:** High (Planned) + +## Summary + +Composable Monorepo Templates evolve rdev's project scaffolding from single templates to full monorepo architecture. Every project starts as a monorepo skeleton, with components (services, workers, apps, cli) added via API calls. Deployment can target the whole monorepo or individual components. + +**Key Facts:** +- `POST /projects` creates monorepo skeleton (not single template) +- `POST /projects/{id}/components` adds services/workers/apps/cli +- Convention-based discovery: `services/*/`, `workers/*/`, `apps/*/`, `cli/*/` +- Optional `component.yaml` per component for ports, dependencies, build order +- Shared `pkg/` from Aeries chassis + Colix patterns (8 packages) +- Deployment supports whole-monorepo or individual-component targets + +**File Pointers:** +- Plan: `tmp/template-monorepo-plan.md` +- Current templates: `internal/adapter/templates/templates/` +- Port: `internal/port/template_provider.go` + +## How It Works + +### Project Creation Flow + +``` +POST /projects {"name": "acme"} + ↓ +Creates monorepo skeleton: + - CLAUDE.md, README.md, Procfile + - docker-compose.yml, go.work, .golangci.yml + - scripts/ (discover, install, quality, dev) + - pkg/ (8 shared packages from Aeries + Colix) + - .claude/ (guides, skills, commands) +``` + +### Component Addition Flow + +``` +POST /projects/acme/components {"type": "service", "name": "auth-api"} + ↓ +Creates services/auth-api/: + - cmd/server/main.go + - internal/, Makefile, Dockerfile + - component.yaml (port, deps) + ↓ +Auto-updates: + - Procfile (add service entry) + - go.work (add module) + - CLAUDE.md (add routing) +``` + +### Monorepo Structure + +``` +acme/ +├── CLAUDE.md # AI router +├── Procfile # Local dev (auto-updated) +├── docker-compose.yml # Local services +├── go.work # Go workspace (auto-updated) +├── scripts/ # Discovery scripts +├── pkg/ # Shared packages (8 total) +├── services/auth-api/ # Go API component +├── workers/email-worker/ # Background worker component +├── apps/dashboard/ # Frontend component +└── cli/acme-cli/ # CLI tool component +``` + +## Shared Packages (pkg/) + +Combines best patterns from Aeries (chassis) and Colix (modular): + +| Package | Source | Purpose | +|---------|--------|---------| +| `app/` | Aeries chassis | Service bootstrapper | +| `middleware/` | Colix | HTTP middleware (CORS, recovery, request_id, logger) | +| `httpcontext/` | Colix | Type-safe context helpers | +| `httpresponse/` | Aeries+Colix | JSON helpers + envelope pattern | +| `httpvalidation/` | Colix | Request validation | +| `logging/` | Both | Structured logging (slog + env detection) | +| `config/` | Aeries | Configuration via viper | +| `httpclient/` | Both | Resilient HTTP client with retry | + +## Component Types + +| Type | Directory | Template | Identifier | +|------|-----------|----------|------------| +| Service | `services/` | go-api | `Makefile` or `go.mod` | +| Worker | `workers/` | worker | `Makefile` or `go.mod` | +| App | `apps/` | app-astro, app-react | `package.json` | +| CLI | `cli/` | cli | `Makefile` or `go.mod` | + +## Template Migration + +| Current Template | New Location | Component Type | +|------------------|--------------|----------------| +| `go-api` | `components/service/` | `services/*` | +| `astro-landing` | `components/app-astro/` | `apps/*` | +| `default` | Becomes skeleton | N/A | + +## Related Topics + +- [Template Provider](../services/template-provider.md) - Current template system +- [Project Service](../services/project-service.md) - Project lifecycle +- [Build Orchestration](./build-orchestration.md) - Component builds diff --git a/ai-lookup/index.md b/ai-lookup/index.md index 0743f5f..e7fdb38 100644 --- a/ai-lookup/index.md +++ b/ai-lookup/index.md @@ -14,12 +14,14 @@ Quick reference for rdev concepts and facts. | Work Queue | [services/work-queue.md](./services/work-queue.md) | High | 2025-01 | Task queue for worker pool | | Worker Pool | [services/worker-pool.md](./services/worker-pool.md) | High | 2026-01 | Embedded work executor with queue maintenance and metrics | | CI Provider | [services/ci-provider.md](./services/ci-provider.md) | High | 2025-01 | Woodpecker auto-activation | +| DNS / Cloudflare | [services/dns-cloudflare.md](./services/dns-cloudflare.md) | High | 2026-01 | Domain management for threesix.ai | | Template Provider | [services/template-provider.md](./services/template-provider.md) | High | 2025-01 | Project template seeding | | **Features** | | Command Execution | [features/command-execution.md](./features/command-execution.md) | High | 2025-01 | Claude/shell/git command flow | | SSE Streaming | [features/sse-streaming.md](./features/sse-streaming.md) | High | 2025-01 | Real-time output streaming | | Infrastructure Management | [features/infrastructure.md](./features/infrastructure.md) | High | 2025-01 | Gitea, Cloudflare, deployment | | Build Orchestration | [features/build-orchestration.md](./features/build-orchestration.md) | High | 2026-01 | Bot-driven build specs with audit trail | +| Composable Monorepo | [features/composable-monorepo.md](./features/composable-monorepo.md) | High | 2026-01 | Monorepo skeleton + component templates | ## Roadmap Reference diff --git a/ai-lookup/services/dns-cloudflare.md b/ai-lookup/services/dns-cloudflare.md new file mode 100644 index 0000000..7e04557 --- /dev/null +++ b/ai-lookup/services/dns-cloudflare.md @@ -0,0 +1,66 @@ +# DNS Management (Cloudflare) + +**Last Updated:** 2026-01 +**Confidence:** High + +## Summary + +DNS for threesix.ai domains is managed via Cloudflare API. Projects get auto-generated subdomains on creation, and users can add custom subdomains or external domain aliases. The Cloudflare adapter implements the `DNSProvider` port interface. + +**Key Facts:** +- Auto-provisioned subdomains: `{random}.threesix.ai` created on project creation +- Custom subdomains: User-chosen `{name}.threesix.ai` auto-configured via API +- External aliases: User manages DNS, rdev only configures ingress +- Credentials: `CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ZONE_ID` in `.secrets` → loaded to PostgreSQL + +**Credential Keys:** `internal/domain/credential.go:23-24` + +## Domain Types + +| Type | Example | Auto-DNS | +|------|---------|----------| +| `primary_auto` | `k7m2x9p4.threesix.ai` | Yes | +| `primary_custom` | `my-app.threesix.ai` | Yes | +| `alias` | `www.myapp.com` | No | + +## Architecture + +**Port Interface:** `internal/port/dns_provider.go` +``` +CreateRecord, UpdateRecord, UpsertRecord, DeleteRecord +DeleteRecordByName, GetRecord, ListRecords, FindRecord +``` + +**Adapter:** `internal/adapter/cloudflare/client.go` +- Uses Cloudflare API v4 with Bearer token auth +- 3-attempt retry on UpsertRecord for race conditions +- Auto-normalizes subdomain names + +**Service:** `internal/service/project_infra_domains.go` +- AddDomain, RemoveDomain, ListDomains, GetPrimaryDomain +- Coordinates between Cloudflare, database, and K8s ingress + +**Handler:** `internal/handlers/infrastructure_domains.go` +- REST endpoints: GET/POST/DELETE `/projects/{id}/domains` + +## Database Schema + +**Table:** `project_domains` +- `project_id` UUID → cascade delete +- `domain` VARCHAR(255) UNIQUE +- `type` CHECK (primary_auto|primary_custom|alias) +- `dns_record_id` VARCHAR(64) - Cloudflare record ID for cleanup +- `verified` BOOLEAN + +## API Endpoints + +``` +GET /projects/{id}/domains - List all domains +POST /projects/{id}/domains - Add domain +DELETE /projects/{id}/domains/{domain} - Remove domain +``` + +## Related Topics + +- [Infrastructure Management](../features/infrastructure.md) - Broader infra context +- [Credentials Guide](../../.claude/guides/ops/credentials.md) - Loading secrets diff --git a/ai-lookup/services/template-provider.md b/ai-lookup/services/template-provider.md index 4fc51f5..cf4e700 100644 --- a/ai-lookup/services/template-provider.md +++ b/ai-lookup/services/template-provider.md @@ -1,7 +1,9 @@ # Template Provider -**Last Updated:** 2025-01 -**Confidence:** High (Planned - see address-the-gaps.md) +**Last Updated:** 2026-01 +**Confidence:** High + +> **Evolution:** This documents the current single-template system. See [Composable Monorepo](../features/composable-monorepo.md) for the upcoming monorepo architecture. ## Summary @@ -72,5 +74,6 @@ POST /project ## Related Topics +- [Composable Monorepo](../features/composable-monorepo.md) - Upcoming monorepo architecture - [Infrastructure Management](../features/infrastructure.md) - [Project Service](./project-service.md) diff --git a/cmd/rdev-api/config.go b/cmd/rdev-api/config.go index 165c1e0..ad092d2 100644 --- a/cmd/rdev-api/config.go +++ b/cmd/rdev-api/config.go @@ -61,6 +61,17 @@ type InfraConfig struct { WoodpeckerURL string WoodpeckerAPIToken string WoodpeckerWebhookSecret string + + // CockroachDB provisioner (for project databases) + CRDBHost string // e.g., "cockroachdb-public.databases.svc" + CRDBPort int // e.g., 26257 + CRDBUser string // e.g., "root" (insecure mode) + CRDBSSLMode string // e.g., "disable" (insecure) or "verify-full" (production) + + // Redis provisioner (for project cache) + RedisHost string // e.g., "redis.threesix.svc" + RedisPort int // e.g., 6379 + RedisPassword string // admin password for ACL management } func loadConfig() Config { @@ -148,6 +159,20 @@ func loadInfraConfig(ctx context.Context, store port.CredentialStore, cfg Config return envFallback } + // Parse CRDB and Redis ports + crdbPort := 26257 + if v := os.Getenv("CRDB_PORT"); v != "" { + if p, err := strconv.Atoi(v); err == nil { + crdbPort = p + } + } + redisPort := 6379 + if v := os.Getenv("REDIS_PORT"); v != "" { + if p, err := strconv.Atoi(v); err == nil { + redisPort = p + } + } + infraCfg := InfraConfig{ GiteaURL: getOrFallback(domain.CredKeyGiteaURL, cfg.GiteaURL), GiteaToken: getOrFallback(domain.CredKeyGiteaToken, cfg.GiteaToken), @@ -162,6 +187,15 @@ func loadInfraConfig(ctx context.Context, store port.CredentialStore, cfg Config WoodpeckerURL: getOrFallback(domain.CredKeyWoodpeckerURL, cfg.WoodpeckerURL), WoodpeckerAPIToken: getOrFallback(domain.CredKeyWoodpeckerAPIToken, cfg.WoodpeckerAPIToken), WoodpeckerWebhookSecret: getOrFallback(domain.CredKeyWoodpeckerWebhookSecret, cfg.WoodpeckerWebhookSecret), + + // CockroachDB and Redis provisioners (env-only for now) + CRDBHost: os.Getenv("CRDB_HOST"), // e.g., "cockroachdb-public.databases.svc" + CRDBPort: crdbPort, + CRDBUser: getEnv("CRDB_USER", "root"), + CRDBSSLMode: getEnv("CRDB_SSL_MODE", "disable"), + RedisHost: os.Getenv("REDIS_HOST"), // e.g., "redis.threesix.svc" + RedisPort: redisPort, + RedisPassword: os.Getenv("REDIS_PASSWORD"), } // Log which credentials were loaded from store vs env diff --git a/cookbooks/fullstack-app.md b/cookbooks/fullstack-app.md new file mode 100644 index 0000000..b510dc3 --- /dev/null +++ b/cookbooks/fullstack-app.md @@ -0,0 +1,383 @@ +# Full-Stack App Cookbook + +> Deploy a full-stack application (Next.js + Go backend) built entirely by Claude through the threesix.ai infrastructure. + +## Overview + +This cookbook creates and deploys a complete full-stack application using **agent-driven development**: + +``` +POST /project/create-and-build + ↓ +Creates: Gitea repo + DNS + Woodpecker CI + K8s deployment + ↓ +Enqueues build task with comprehensive prompt + ↓ +Worker picks up task → Claude builds the entire stack + ↓ +Agent commits + pushes + ↓ +CI builds and deploys + ↓ +Live full-stack app +``` + +**Claude builds everything from scratch: Next.js frontend with shadcn/ui, Go backend API, Docker configs, and CI pipeline.** + +--- + +## Prerequisites + +### API Access +```bash +export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai" +export RDEV_API_KEY="" +``` + +### Infrastructure Required +- rdev-api running with embedded worker +- Gitea at https://git.threesix.ai +- Woodpecker CI at https://ci.threesix.ai +- claudebox-0 pod running in rdev namespace + +--- + +## Step 1: Create Project and Build Full-Stack App + +Single API call that creates infrastructure AND enqueues the full-stack build: + +```bash +curl -X POST "$RDEV_API_URL/project/create-and-build" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "my-fullstack-app", + "description": "Full-stack app with Next.js frontend and Go backend", + "build": { + "prompt": "Build a full-stack task management application with the following structure:\n\nFRONTEND (Next.js 14 + shadcn/ui):\n- Create a Next.js 14 app with App Router in /frontend\n- Use shadcn/ui for all components (install with npx shadcn-ui@latest init)\n- Dark theme with modern aesthetic\n- Pages: Dashboard showing tasks, Add Task form, Task detail view\n- Use Tailwind CSS for styling\n- Connect to backend API at /api proxy\n\nBACKEND (Go):\n- Create a Go HTTP server in /backend using chi router\n- Endpoints: GET /api/tasks, POST /api/tasks, GET /api/tasks/{id}, DELETE /api/tasks/{id}\n- In-memory task storage (no database needed)\n- Structured JSON responses\n- CORS middleware for frontend\n\nDOCKER:\n- /frontend/Dockerfile: Multi-stage build for Next.js (node:20-alpine)\n- /backend/Dockerfile: Multi-stage build for Go (golang:1.22-alpine)\n- /docker-compose.yml: Run both services, frontend proxies to backend\n\nCI/CD:\n- /.woodpecker.yml: Build both images, push to registry, deploy to k8s\n\nCreate all necessary files including package.json, go.mod, and configuration files.", + "auto_commit": true, + "auto_push": true + } + }' +``` + +**Response:** +```json +{ + "project": { + "project_id": "my-fullstack-app", + "domain": "xyz789ab.threesix.ai", + "git": { + "html_url": "https://git.threesix.ai/jordan/my-fullstack-app" + } + }, + "build": { + "task_id": "task-uuid", + "status": "pending", + "status_url": "/builds/task-uuid" + } +} +``` + +--- + +## Step 2: Monitor Build Progress + +Poll the build status: + +```bash +curl -s "$RDEV_API_URL/builds/{task_id}" \ + -H "X-API-Key: $RDEV_API_KEY" | jq . +``` + +**Status progression:** `pending` → `running` → `completed` (or `failed`) + +Full-stack builds take longer than simple landing pages. Expect 2-5 minutes. + +When completed: +```json +{ + "task_id": "task-uuid", + "status": "completed", + "result": { + "success": true, + "commit_sha": "def456", + "files_changed": [ + "frontend/package.json", + "frontend/app/page.tsx", + "frontend/app/layout.tsx", + "frontend/components/task-list.tsx", + "frontend/components/add-task-form.tsx", + "frontend/Dockerfile", + "backend/main.go", + "backend/go.mod", + "backend/Dockerfile", + "docker-compose.yml", + ".woodpecker.yml" + ], + "duration_ms": 180000 + } +} +``` + +--- + +## Step 3: Monitor CI Pipeline + +The agent's push triggers Woodpecker CI to build both services: + +```bash +curl -s "$RDEV_API_URL/projects/my-fullstack-app/pipelines" \ + -H "X-API-Key: $RDEV_API_KEY" | jq '.data[0]' +``` + +Pipeline stages: +1. Build frontend Docker image +2. Build backend Docker image +3. Push both to registry +4. Deploy to Kubernetes + +Wait for `status: "success"`. + +--- + +## Step 4: Verify Deployment + +```bash +# Check site is live +curl -I https://xyz789ab.threesix.ai + +# Test frontend loads +curl -s https://xyz789ab.threesix.ai | head -20 + +# Test backend API +curl -s https://xyz789ab.threesix.ai/api/tasks | jq . + +# Open in browser +open https://xyz789ab.threesix.ai +``` + +--- + +## Iterating on the App + +### Add a Feature + +```bash +curl -X POST "$RDEV_API_URL/projects/my-fullstack-app/builds" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Add task priority levels (low, medium, high) with color-coded badges in the UI. Update the backend Task struct and frontend components to support priorities. Add a priority filter dropdown on the dashboard.", + "auto_commit": true, + "auto_push": true + }' +``` + +### Fix a Bug + +```bash +curl -X POST "$RDEV_API_URL/projects/my-fullstack-app/builds" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Fix the task deletion - ensure the DELETE endpoint returns 204 No Content and the frontend removes the task from the list immediately without requiring a page refresh.", + "auto_commit": true, + "auto_push": true + }' +``` + +### Add Authentication + +```bash +curl -X POST "$RDEV_API_URL/projects/my-fullstack-app/builds" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Add simple JWT authentication:\n- Backend: Add /api/auth/login endpoint that accepts username/password and returns JWT\n- Backend: Add auth middleware to protect /api/tasks endpoints\n- Frontend: Add login page with shadcn form components\n- Frontend: Store JWT in localStorage, include in API requests\n- Create a demo user (admin/admin123) for testing", + "auto_commit": true, + "auto_push": true + }' +``` + +Each build: +1. Claude clones the existing repo +2. Makes the requested changes +3. Commits and pushes +4. CI deploys automatically + +--- + +## Alternative Prompts + +### E-commerce Storefront + +```bash +curl -X POST "$RDEV_API_URL/project/create-and-build" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "my-store", + "description": "E-commerce storefront", + "build": { + "prompt": "Build an e-commerce storefront:\n\nFRONTEND: Next.js 14 with shadcn/ui, dark theme\n- Product grid with images, prices, descriptions\n- Product detail page\n- Shopping cart (localStorage)\n- Checkout form (no payment processing)\n\nBACKEND: Go with chi router\n- GET /api/products - list products\n- GET /api/products/{id} - product detail\n- POST /api/orders - create order (log to console)\n- Seed with 6 sample products\n\nInclude Dockerfiles and .woodpecker.yml for CI/CD.", + "auto_commit": true, + "auto_push": true + } + }' +``` + +### Dashboard App + +```bash +curl -X POST "$RDEV_API_URL/project/create-and-build" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "my-dashboard", + "description": "Analytics dashboard", + "build": { + "prompt": "Build an analytics dashboard:\n\nFRONTEND: Next.js 14 with shadcn/ui + recharts\n- Dashboard with 4 stat cards (users, revenue, orders, growth)\n- Line chart showing weekly trends\n- Bar chart showing top products\n- Recent activity table\n- Dark theme, responsive grid layout\n\nBACKEND: Go with chi router\n- GET /api/stats - return dashboard statistics\n- GET /api/trends - return weekly trend data\n- GET /api/activity - return recent activity\n- Generate realistic sample data\n\nInclude Dockerfiles and .woodpecker.yml for CI/CD.", + "auto_commit": true, + "auto_push": true + } + }' +``` + +--- + +## Adding Custom Domains + +```bash +# Add custom domain +curl -X POST "$RDEV_API_URL/projects/my-fullstack-app/domains" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"domain": "app.mycompany.com"}' + +# List all domains +curl -s "$RDEV_API_URL/projects/my-fullstack-app/domains" \ + -H "X-API-Key: $RDEV_API_KEY" | jq '.data.domains' +``` + +--- + +## Teardown + +```bash +curl -X DELETE "$RDEV_API_URL/project/my-fullstack-app" \ + -H "X-API-Key: $RDEV_API_KEY" +``` + +Removes: DNS records, K8s deployment, project metadata. Gitea repo preserved for safety. + +--- + +## E2E Test Script + +Run the full flow: +```bash +./cookbooks/scripts/fullstack-test.sh run my-test-fullstack +``` + +Check status: +```bash +./cookbooks/scripts/fullstack-test.sh status my-test-fullstack +``` + +Cleanup: +```bash +./cookbooks/scripts/fullstack-test.sh teardown my-test-fullstack +``` + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Agent-Driven Full-Stack App │ +│ │ +│ POST /project/create-and-build │ +│ │ │ +│ ├──► Gitea: creates repo │ +│ ├──► Cloudflare: creates DNS │ +│ ├──► Woodpecker: activates CI │ +│ ├──► K8s: creates Deployment/Service/Ingress │ +│ └──► Work Queue: enqueues build task │ +│ │ │ +│ ▼ │ +│ Worker polls queue, claims task │ +│ │ │ +│ ▼ │ +│ Claude Code executes in claudebox-0: │ +│ - Clones repo │ +│ - Creates Next.js frontend with shadcn/ui │ +│ - Creates Go backend with chi router │ +│ - Writes Dockerfiles and CI config │ +│ - Commits and pushes │ +│ │ │ +│ ▼ │ +│ Woodpecker CI triggered by push: │ +│ - Builds frontend Docker image │ +│ - Builds backend Docker image │ +│ - Pushes to registry │ +│ - Deploys to K8s │ +│ │ │ +│ ▼ │ +│ Full-stack app live at https://{slug}.threesix.ai │ +│ - Frontend: Next.js + shadcn/ui │ +│ - Backend: Go API │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Troubleshooting + +### Build stuck in pending +```bash +# Check worker status +curl -s "$RDEV_API_URL/workers" -H "X-API-Key: $RDEV_API_KEY" | jq '.data.summary' + +# Should show at least 1 idle worker +``` + +### Build failed +```bash +# Get build details with full output +curl -s "$RDEV_API_URL/builds/{task_id}" -H "X-API-Key: $RDEV_API_KEY" | jq '.result' + +# Check rdev-api logs for worker errors +./scripts/logs.sh -e +``` + +### Pipeline not triggering +```bash +# Check if commit was pushed +curl -s "https://git.threesix.ai/api/v1/repos/jordan/my-fullstack-app/commits" | jq '.[0]' + +# Check Woodpecker +open https://ci.threesix.ai/jordan/my-fullstack-app +``` + +### Frontend/Backend connection issues +```bash +# Check both containers are running +kubectl get pods -n projects -l app=my-fullstack-app + +# Check frontend logs +kubectl logs -n projects -l app=my-fullstack-app -c frontend + +# Check backend logs +kubectl logs -n projects -l app=my-fullstack-app -c backend +``` + +--- + +## Related + +- [Landing Page Cookbook](./landing-page.md) - Simpler single-page deployment +- [Worker Pool Guide](../.claude/guides/services/worker-pool.md) +- [Build Orchestration](../.claude/guides/services/build-orchestration.md) diff --git a/cookbooks/scripts/common.sh b/cookbooks/scripts/common.sh new file mode 100755 index 0000000..aa6e38e --- /dev/null +++ b/cookbooks/scripts/common.sh @@ -0,0 +1,213 @@ +#!/bin/bash +# Common utilities for rdev cookbook scripts +# +# Usage: +# source "$(dirname "${BASH_SOURCE[0]}")/common.sh" +# +# Provides: +# - api_call() - Make authenticated API calls +# - wait_for_build() - Poll for build completion +# - wait_for_pipeline() - Poll for CI pipeline completion +# - wait_for_site() - Wait for site to respond +# - Colors for output + +set -euo pipefail + +# Require environment variables +: "${RDEV_API_URL:?RDEV_API_URL must be set}" +: "${RDEV_API_KEY:?RDEV_API_KEY must be set}" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +CYAN='\033[0;36m' +NC='\033[0m' # No Color + +# Make an authenticated API call +# Arguments: method endpoint [data] +# Example: api_call GET "/projects" +# Example: api_call POST "/projects" '{"name": "test"}' +api_call() { + local method="$1" + local endpoint="$2" + local data="${3:-}" + + if [[ -n "$data" ]]; then + curl -s -X "$method" "$RDEV_API_URL$endpoint" \ + -H "X-API-Key: $RDEV_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$data" + else + curl -s -X "$method" "$RDEV_API_URL$endpoint" \ + -H "X-API-Key: $RDEV_API_KEY" + fi +} + +# Wait for a build to complete +# Arguments: task_id [max_attempts] [poll_interval] +# Returns: 0 on success, 1 on failure, 2 on timeout +wait_for_build() { + local task_id="$1" + local max_attempts="${2:-60}" # 5 minutes default (5s * 60) + local poll_interval="${3:-5}" + local attempt=0 + + echo -e "${CYAN}Waiting for build to complete (task: $task_id)...${NC}" + + while [[ $attempt -lt $max_attempts ]]; do + local result + result=$(api_call GET "/builds/$task_id") + local status + status=$(echo "$result" | jq -r '.status // .data.status // "unknown"') + + case "$status" in + completed) + local success + success=$(echo "$result" | jq -r '.result.success // .data.result.success // false') + if [[ "$success" == "true" ]]; then + echo -e "${GREEN}Build completed successfully!${NC}" + echo "$result" | jq '.result // .data.result' + return 0 + else + echo -e "${RED}Build completed but failed:${NC}" + echo "$result" | jq '.result // .data.result' + return 1 + fi + ;; + failed) + echo -e "${RED}Build failed:${NC}" + echo "$result" | jq '.' + return 1 + ;; + running) + echo " Build running... (attempt $((attempt + 1))/$max_attempts)" + ;; + pending) + echo " Build pending... (attempt $((attempt + 1))/$max_attempts)" + ;; + *) + echo " Unknown status: $status (attempt $((attempt + 1))/$max_attempts)" + ;; + esac + + sleep "$poll_interval" + ((attempt++)) + done + + echo -e "${YELLOW}Timeout waiting for build to complete${NC}" + return 2 +} + +# Wait for CI pipeline to complete +# Arguments: project_id [max_attempts] [poll_interval] +# Returns: 0 on success, 1 on failure, 2 on timeout +wait_for_pipeline() { + local project_id="$1" + local max_attempts="${2:-60}" # 5 minutes default + local poll_interval="${3:-5}" + local attempt=0 + + echo -e "${CYAN}Waiting for CI pipeline...${NC}" + + # Wait a bit for pipeline to be created + sleep 5 + + while [[ $attempt -lt $max_attempts ]]; do + local result + result=$(api_call GET "/projects/$project_id/pipelines") + + # Check if we have any pipelines + local pipeline_count + pipeline_count=$(echo "$result" | jq '.data | length // 0') + + if [[ "$pipeline_count" -eq 0 ]]; then + echo " No pipelines yet... (attempt $((attempt + 1))/$max_attempts)" + sleep "$poll_interval" + ((attempt++)) + continue + fi + + # Get latest pipeline status + local status + status=$(echo "$result" | jq -r '.data[0].status // "unknown"') + local pipeline_number + pipeline_number=$(echo "$result" | jq -r '.data[0].number // "?"') + + case "$status" in + success) + echo -e "${GREEN}Pipeline #$pipeline_number completed successfully!${NC}" + return 0 + ;; + failure|error|killed) + echo -e "${RED}Pipeline #$pipeline_number failed with status: $status${NC}" + return 1 + ;; + running|pending) + echo " Pipeline #$pipeline_number $status... (attempt $((attempt + 1))/$max_attempts)" + ;; + *) + echo " Pipeline #$pipeline_number status: $status (attempt $((attempt + 1))/$max_attempts)" + ;; + esac + + sleep "$poll_interval" + ((attempt++)) + done + + echo -e "${YELLOW}Timeout waiting for pipeline to complete${NC}" + return 2 +} + +# Wait for site to be accessible +# Arguments: domain [max_attempts] [poll_interval] +# Returns: 0 on success, 1 on timeout +wait_for_site() { + local domain="$1" + local max_attempts="${2:-30}" + local poll_interval="${3:-5}" + local attempt=0 + + echo -e "${CYAN}Waiting for site to be accessible at https://$domain...${NC}" + + while [[ $attempt -lt $max_attempts ]]; do + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" "https://$domain" 2>/dev/null || echo "000") + + if [[ "$http_code" == "200" ]]; then + echo -e "${GREEN}Site is live! (HTTP $http_code)${NC}" + return 0 + fi + + echo " HTTP $http_code... (attempt $((attempt + 1))/$max_attempts)" + sleep "$poll_interval" + ((attempt++)) + done + + echo -e "${YELLOW}Timeout waiting for site to respond${NC}" + return 1 +} + +# Print a section header +print_header() { + local title="$1" + echo "" + echo -e "${BLUE}=== $title ===${NC}" + echo "" +} + +# Print success message +print_success() { + echo -e "${GREEN}✓ $1${NC}" +} + +# Print error message +print_error() { + echo -e "${RED}✗ $1${NC}" +} + +# Print warning message +print_warning() { + echo -e "${YELLOW}⚠ $1${NC}" +} diff --git a/cookbooks/scripts/fullstack-test.sh b/cookbooks/scripts/fullstack-test.sh new file mode 100755 index 0000000..f269724 --- /dev/null +++ b/cookbooks/scripts/fullstack-test.sh @@ -0,0 +1,202 @@ +#!/bin/bash +set -euo pipefail + +# Full-Stack App E2E Test Script +# Usage: ./cookbooks/scripts/fullstack-test.sh +# Commands: run, status, teardown + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/common.sh" + +COMMAND="${1:-}" +PROJECT_NAME="${2:-}" + +if [[ -z "$COMMAND" || -z "$PROJECT_NAME" ]]; then + echo "Usage: $0 " + echo "Commands:" + echo " run - Create project and run full-stack build" + echo " status - Check build and deployment status" + echo " teardown - Delete the project" + exit 1 +fi + +# Full-stack app build prompt +FULLSTACK_PROMPT='Build a full-stack task management application with the following structure: + +FRONTEND (Next.js 14 + shadcn/ui): +- Create a Next.js 14 app with App Router in /frontend +- Use shadcn/ui for all components (install with npx shadcn-ui@latest init) +- Dark theme with modern aesthetic +- Pages: Dashboard showing tasks, Add Task form, Task detail view +- Use Tailwind CSS for styling +- Connect to backend API at /api proxy + +BACKEND (Go): +- Create a Go HTTP server in /backend using chi router +- Endpoints: GET /api/tasks, POST /api/tasks, GET /api/tasks/{id}, DELETE /api/tasks/{id} +- In-memory task storage (no database needed) +- Structured JSON responses +- CORS middleware for frontend + +DOCKER: +- /frontend/Dockerfile: Multi-stage build for Next.js (node:20-alpine) +- /backend/Dockerfile: Multi-stage build for Go (golang:1.22-alpine) +- /docker-compose.yml: Run both services, frontend proxies to backend + +CI/CD: +- /.woodpecker.yml: Build both images, push to registry, deploy to k8s + +Create all necessary files including package.json, go.mod, and configuration files.' + +# Test backend API +test_backend_api() { + local domain="$1" + + echo "Testing backend API..." + + # Test GET /api/tasks + local response + response=$(curl -s "https://$domain/api/tasks" 2>/dev/null || echo '{"error":"failed"}') + + if echo "$response" | jq -e '.' > /dev/null 2>&1; then + echo " GET /api/tasks: OK" + echo " Response: $response" + return 0 + else + echo " GET /api/tasks: FAILED" + echo " Response: $response" + return 1 + fi +} + +run_flow() { + echo "=== Full-Stack App E2E Test ===" + echo "Project: $PROJECT_NAME" + echo "" + + # Step 1: Create project with build + echo "Step 1: Creating project and submitting full-stack build..." + local create_result + # Build the JSON payload (prompt, auto_commit, auto_push are top-level fields) + local payload + payload=$(jq -n \ + --arg name "$PROJECT_NAME" \ + --arg desc "Full-stack app E2E test" \ + --arg prompt "$FULLSTACK_PROMPT" \ + '{ + name: $name, + description: $desc, + prompt: $prompt, + auto_commit: true, + auto_push: true + }') + create_result=$(api_call POST "/project/create-and-build" "$payload") + + echo "$create_result" | jq '.' + + local domain + domain=$(echo "$create_result" | jq -r '.data.domain // .domain // ""') + local task_id + task_id=$(echo "$create_result" | jq -r '.data.task_id // .task_id // ""') + + if [[ -z "$domain" || -z "$task_id" ]]; then + echo "ERROR: Failed to create project" + exit 1 + fi + + echo "" + echo "Domain: $domain" + echo "Build Task: $task_id" + echo "" + + # Step 2: Wait for build + echo "Step 2: Waiting for Claude to build the full-stack app..." + if ! wait_for_build "$task_id"; then + echo "ERROR: Build failed" + exit 1 + fi + echo "" + + # Step 3: Wait for CI pipeline + echo "Step 3: Waiting for CI pipeline to build and deploy..." + if ! wait_for_pipeline "$PROJECT_NAME"; then + echo "WARNING: Pipeline may have failed, continuing to check site..." + fi + echo "" + + # Step 4: Wait for site + echo "Step 4: Verifying site is accessible..." + if ! wait_for_site "$domain"; then + echo "ERROR: Site not accessible" + exit 1 + fi + echo "" + + # Step 5: Test backend API + echo "Step 5: Testing backend API..." + if ! test_backend_api "$domain"; then + echo "WARNING: Backend API test failed" + fi + echo "" + + # Summary + echo "=== E2E Test Results ===" + echo "Project created: PASS" + echo "Build completed: PASS" + echo "CI Pipeline: $(wait_for_pipeline "$PROJECT_NAME" > /dev/null 2>&1 && echo "PASS" || echo "CHECK")" + echo "Site accessible: PASS" + echo "Backend API: $(test_backend_api "$domain" > /dev/null 2>&1 && echo "PASS" || echo "CHECK")" + echo "" + echo "Site URL: https://$domain" + echo "Git repo: https://git.threesix.ai/jordan/$PROJECT_NAME" + echo "CI: https://ci.threesix.ai/jordan/$PROJECT_NAME" +} + +check_status() { + echo "=== Project Status: $PROJECT_NAME ===" + echo "" + + # Get project info + local project_result + project_result=$(api_call GET "/projects/$PROJECT_NAME") + echo "Project:" + echo "$project_result" | jq '.' + echo "" + + # Get latest build + echo "Latest Builds:" + api_call GET "/projects/$PROJECT_NAME/builds" | jq '.data[:3]' + echo "" + + # Get latest pipeline + echo "Latest Pipelines:" + api_call GET "/projects/$PROJECT_NAME/pipelines" | jq '.data[:3]' +} + +teardown() { + echo "=== Tearing down: $PROJECT_NAME ===" + + local result + result=$(api_call DELETE "/project/$PROJECT_NAME") + echo "$result" | jq '.' + + echo "" + echo "Project deleted. Gitea repo preserved." +} + +case "$COMMAND" in + run) + run_flow + ;; + status) + check_status + ;; + teardown) + teardown + ;; + *) + echo "Unknown command: $COMMAND" + echo "Valid commands: run, status, teardown" + exit 1 + ;; +esac diff --git a/cookbooks/scripts/landing-test.sh b/cookbooks/scripts/landing-test.sh index 7a85388..1370ae0 100755 --- a/cookbooks/scripts/landing-test.sh +++ b/cookbooks/scripts/landing-test.sh @@ -26,12 +26,15 @@ log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; } log_error() { echo -e "${RED}[ERROR]${NC} $1"; } # Timeouts -BUILD_TIMEOUT=180 # 3 minutes for Claude to build the site +BUILD_TIMEOUT=600 # 10 minutes for Claude to build the site BUILD_POLL_INTERVAL=5 # Check every 5 seconds PIPELINE_TIMEOUT=300 # 5 minutes max wait for CI pipeline PIPELINE_POLL_INTERVAL=10 SITE_TIMEOUT=60 # 1 minute max wait for site to be live +# Streaming mode (set to true to stream live build output via SSE) +STREAM_MODE="${STREAM_MODE:-false}" + api_call() { local method="$1" local endpoint="$2" @@ -62,17 +65,82 @@ check_health() { fi } +# Stream build events via SSE (real-time output) +# Arguments: project_id, task_id +stream_build_events() { + local project_id="$1" + local task_id="$2" + local stream_url="${API_URL}/projects/${project_id}/events?stream_id=${task_id}" + + log_info "Streaming build events from: $stream_url" + echo "" + + # Use curl to stream SSE events + curl -s -N \ + -H "X-API-Key: ${API_KEY}" \ + -H "Accept: text/event-stream" \ + "$stream_url" 2>/dev/null | while IFS= read -r line; do + # Skip empty lines and event headers + if [[ -z "$line" || "$line" == "event:"* || "$line" == "id:"* ]]; then + continue + fi + + # Parse data lines + if [[ "$line" == "data:"* ]]; then + local data="${line#data: }" + + # Parse event type and content + local event_type content + event_type=$(echo "$data" | jq -r '.type // "unknown"' 2>/dev/null) + + case "$event_type" in + build.started) + echo -e "${GREEN}[BUILD STARTED]${NC}" + ;; + build.output) + content=$(echo "$data" | jq -r '.content // ""' 2>/dev/null) + [[ -n "$content" ]] && echo "$content" + ;; + build.tool_use) + local tool_name + tool_name=$(echo "$data" | jq -r '.tool_name // "unknown"' 2>/dev/null) + echo -e "${YELLOW}[TOOL: $tool_name]${NC}" + ;; + build.completed) + echo -e "${GREEN}[BUILD COMPLETED]${NC}" + return 0 + ;; + build.failed) + local error + error=$(echo "$data" | jq -r '.error // "unknown error"' 2>/dev/null) + echo -e "${RED}[BUILD FAILED]${NC} $error" + return 1 + ;; + esac + fi + done +} + # Wait for build to complete (Claude building the site) # Returns: 0 on success, 1 on failure/timeout wait_for_build() { local task_id="$1" + local project_id="${2:-}" local start_time=$(date +%s) log_info "Waiting for Claude to build the site (timeout: ${BUILD_TIMEOUT}s)..." + # If streaming mode is enabled and we have a project_id, use SSE + if [[ "$STREAM_MODE" == "true" && -n "$project_id" ]]; then + log_info "Streaming mode enabled - showing live build output" + stream_build_events "$project_id" "$task_id" & + local stream_pid=$! + fi + while true; do local elapsed=$(($(date +%s) - start_time)) if [[ $elapsed -ge $BUILD_TIMEOUT ]]; then + [[ -n "${stream_pid:-}" ]] && kill "$stream_pid" 2>/dev/null || true log_error "Build timeout after ${BUILD_TIMEOUT}s" return 1 fi @@ -85,6 +153,7 @@ wait_for_build() { case "$status" in completed) + [[ -n "${stream_pid:-}" ]] && kill "$stream_pid" 2>/dev/null || true local success success=$(echo "$response" | jq -r '.data.result.success // false') if [[ "$success" == "true" ]]; then @@ -98,18 +167,25 @@ wait_for_build() { fi ;; failed) + [[ -n "${stream_pid:-}" ]] && kill "$stream_pid" 2>/dev/null || true log_error "Build failed" echo "$response" | jq '.data.result // .data' return 1 ;; running) - echo -ne "\r${BLUE}[INFO]${NC} Build status: running (${elapsed}s)... " + if [[ "$STREAM_MODE" != "true" ]]; then + echo -ne "\r${BLUE}[INFO]${NC} Build status: running (${elapsed}s)... " + fi ;; pending) - echo -ne "\r${BLUE}[INFO]${NC} Build status: pending (${elapsed}s)... " + if [[ "$STREAM_MODE" != "true" ]]; then + echo -ne "\r${BLUE}[INFO]${NC} Build status: pending (${elapsed}s)... " + fi ;; *) - echo -ne "\r${BLUE}[INFO]${NC} Build status: $status (${elapsed}s)... " + if [[ "$STREAM_MODE" != "true" ]]; then + echo -ne "\r${BLUE}[INFO]${NC} Build status: $status (${elapsed}s)... " + fi ;; esac @@ -334,7 +410,7 @@ run_flow() { log_info "Step 2: Monitoring build progress..." echo "" local build_success=false - if wait_for_build "$task_id"; then + if wait_for_build "$task_id" "$project_name"; then build_success=true else log_error "Build did not complete successfully" @@ -588,12 +664,14 @@ case "${1:-}" in echo "Environment:" echo " RDEV_API_URL API endpoint (default: https://rdev.masq-ops.orchard9.ai)" echo " RDEV_API_KEY API key (required)" + echo " STREAM_MODE Set to 'true' for live SSE streaming of build output" echo "" echo "Examples:" - echo " $0 run # Run with default project name 'landing-test'" - echo " $0 run my-landing # Run with custom project name" - echo " $0 status my-landing # Check status, builds, and pipelines" - echo " $0 teardown my-landing # Clean up project" + echo " $0 run # Run with default project name 'landing-test'" + echo " $0 run my-landing # Run with custom project name" + echo " STREAM_MODE=true $0 run # Run with live build output streaming" + echo " $0 status my-landing # Check status, builds, and pipelines" + echo " $0 teardown my-landing # Clean up project" exit 1 ;; esac diff --git a/cookbooks/scripts/lib/stream-client.sh b/cookbooks/scripts/lib/stream-client.sh new file mode 100755 index 0000000..bf9888c --- /dev/null +++ b/cookbooks/scripts/lib/stream-client.sh @@ -0,0 +1,283 @@ +#!/bin/bash +# SSE Stream Client Library for rdev API +# Provides functions for consuming Server-Sent Events from build streams +# +# Usage: +# source cookbooks/scripts/lib/stream-client.sh +# stream_build_with_progress "$API_URL" "$API_KEY" "$PROJECT_ID" "$TASK_ID" + +# Colors for output +STREAM_RED='\033[0;31m' +STREAM_GREEN='\033[0;32m' +STREAM_YELLOW='\033[1;33m' +STREAM_BLUE='\033[0;34m' +STREAM_CYAN='\033[0;36m' +STREAM_NC='\033[0m' + +# Progress bar width +PROGRESS_BAR_WIDTH=40 + +# Draw a progress bar +# Arguments: percentage (0-100) +draw_progress_bar() { + local percent="${1:-0}" + local filled=$((percent * PROGRESS_BAR_WIDTH / 100)) + local empty=$((PROGRESS_BAR_WIDTH - filled)) + + printf "\r[" + printf '%*s' "$filled" '' | tr ' ' '=' + if [[ $filled -lt $PROGRESS_BAR_WIDTH ]]; then + printf ">" + printf '%*s' "$((empty - 1))" '' | tr ' ' ' ' + fi + printf "] %3d%%" "$percent" +} + +# Parse SSE data line and extract JSON +# Arguments: data line (after "data: " prefix) +parse_sse_data() { + local data="$1" + echo "$data" +} + +# Stream build events with progress bar +# Arguments: api_url, api_key, project_id, task_id +# Options: +# --verbose Show all output (not just progress) +# --last-id Last-Event-ID for reconnection +stream_build_with_progress() { + local api_url="$1" + local api_key="$2" + local project_id="$3" + local task_id="$4" + shift 4 + + local verbose=false + local last_event_id="" + + # Parse options + while [[ $# -gt 0 ]]; do + case "$1" in + --verbose) + verbose=true + shift + ;; + --last-id) + last_event_id="$2" + shift 2 + ;; + *) + shift + ;; + esac + done + + local stream_url="${api_url}/projects/${project_id}/events?stream_id=${task_id}" + local curl_args=( + -s -N + -H "X-API-Key: ${api_key}" + -H "Accept: text/event-stream" + ) + + if [[ -n "$last_event_id" ]]; then + curl_args+=(-H "Last-Event-ID: ${last_event_id}") + fi + + echo -e "${STREAM_CYAN}Streaming build events...${STREAM_NC}" + echo "" + + # Track state + local current_phase="starting" + local current_percent=0 + local last_event_id_received="" + + # Stream events + curl "${curl_args[@]}" "$stream_url" 2>/dev/null | while IFS= read -r line; do + # Skip empty lines + [[ -z "$line" ]] && continue + + # Parse event ID + if [[ "$line" == "id:"* ]]; then + last_event_id_received="${line#id: }" + continue + fi + + # Skip event type lines (we parse data directly) + [[ "$line" == "event:"* ]] && continue + + # Parse data lines + if [[ "$line" == "data:"* ]]; then + local data="${line#data: }" + local event_type + event_type=$(echo "$data" | jq -r '.type // ""' 2>/dev/null) + + case "$event_type" in + build.started) + echo -e "${STREAM_GREEN}[BUILD STARTED]${STREAM_NC}" + current_phase="starting" + current_percent=0 + draw_progress_bar 0 + ;; + + build.progress) + current_phase=$(echo "$data" | jq -r '.phase // "unknown"' 2>/dev/null) + current_percent=$(echo "$data" | jq -r '.percentage // 0' 2>/dev/null | cut -d. -f1) + draw_progress_bar "$current_percent" + printf " [%s]" "$current_phase" + ;; + + build.output) + if [[ "$verbose" == "true" ]]; then + local content + content=$(echo "$data" | jq -r '.content // ""' 2>/dev/null) + [[ -n "$content" ]] && printf "\n%s" "$content" + fi + ;; + + build.tool_use) + local tool_name + tool_name=$(echo "$data" | jq -r '.tool_name // "unknown"' 2>/dev/null) + if [[ "$verbose" == "true" ]]; then + printf "\n${STREAM_YELLOW}[TOOL: %s]${STREAM_NC}" "$tool_name" + fi + ;; + + build.error) + local error_content + error_content=$(echo "$data" | jq -r '.content // ""' 2>/dev/null) + printf "\n${STREAM_RED}[ERROR] %s${STREAM_NC}" "$error_content" + ;; + + build.completed) + echo "" + draw_progress_bar 100 + printf " [complete]" + echo "" + echo -e "${STREAM_GREEN}[BUILD COMPLETED]${STREAM_NC}" + local duration_ms + duration_ms=$(echo "$data" | jq -r '.duration_ms // 0' 2>/dev/null) + local duration_s=$((duration_ms / 1000)) + echo "Duration: ${duration_s}s" + return 0 + ;; + + build.failed) + echo "" + local error + error=$(echo "$data" | jq -r '.error // "unknown error"' 2>/dev/null) + echo -e "${STREAM_RED}[BUILD FAILED]${STREAM_NC}" + echo "Error: $error" + return 1 + ;; + + connected) + local reconnecting + reconnecting=$(echo "$data" | jq -r '.reconnecting // false' 2>/dev/null) + if [[ "$reconnecting" == "true" ]]; then + echo -e "${STREAM_YELLOW}[RECONNECTED]${STREAM_NC}" + fi + ;; + + heartbeat) + # Silent heartbeat - just proves connection is alive + ;; + esac + fi + done + + # If we get here, the stream closed unexpectedly + echo "" + echo -e "${STREAM_YELLOW}[STREAM CLOSED]${STREAM_NC}" + echo "Last event ID: $last_event_id_received" + echo "To reconnect: stream_build_with_progress ... --last-id \"$last_event_id_received\"" + return 2 +} + +# Simple stream consumer that just prints events +# Arguments: api_url, api_key, project_id, task_id +stream_build_simple() { + local api_url="$1" + local api_key="$2" + local project_id="$3" + local task_id="$4" + + local stream_url="${api_url}/projects/${project_id}/events?stream_id=${task_id}" + + curl -s -N \ + -H "X-API-Key: ${api_key}" \ + -H "Accept: text/event-stream" \ + "$stream_url" 2>/dev/null | while IFS= read -r line; do + + [[ -z "$line" ]] && continue + + if [[ "$line" == "data:"* ]]; then + local data="${line#data: }" + local event_type content + event_type=$(echo "$data" | jq -r '.type // ""' 2>/dev/null) + + case "$event_type" in + build.output|build.error) + content=$(echo "$data" | jq -r '.content // ""' 2>/dev/null) + [[ -n "$content" ]] && echo "$content" + ;; + build.completed) + echo "[BUILD COMPLETED]" + return 0 + ;; + build.failed) + local error + error=$(echo "$data" | jq -r '.error // ""' 2>/dev/null) + echo "[BUILD FAILED] $error" + return 1 + ;; + esac + fi + done +} + +# Wait for build completion with polling fallback +# Arguments: api_url, api_key, task_id, timeout_seconds +# Returns: 0 on success, 1 on failure, 2 on timeout +wait_for_build_completion() { + local api_url="$1" + local api_key="$2" + local task_id="$3" + local timeout="${4:-600}" + + local start_time=$(date +%s) + + while true; do + local elapsed=$(($(date +%s) - start_time)) + if [[ $elapsed -ge $timeout ]]; then + return 2 # Timeout + fi + + local response + response=$(curl -s -X GET "${api_url}/builds/${task_id}" \ + -H "X-API-Key: ${api_key}" 2>/dev/null) + + local status + status=$(echo "$response" | jq -r '.data.status // "unknown"' 2>/dev/null) + + case "$status" in + completed) + local success + success=$(echo "$response" | jq -r '.data.result.success // false' 2>/dev/null) + if [[ "$success" == "true" ]]; then + return 0 + else + return 1 + fi + ;; + failed) + return 1 + ;; + running|pending) + sleep 5 + ;; + *) + sleep 5 + ;; + esac + done +} diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..7aaea23 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,77 @@ +# rdev Documentation + +Documentation for the rdev remote development API. + +## Quick Start + +- **[Quick Reference](reference.md)** - Essential commands for daily operations +- **[API Documentation](api/README.md)** - REST API reference + +## Documentation Structure + +``` +docs/ +├── reference.md # Quick reference for operations +├── api/ # API documentation +│ ├── README.md # API overview +│ ├── authentication.md # API key auth +│ ├── sse-examples.md # SSE streaming +│ └── errors.md # Error codes +├── architecture/ # System design +│ ├── README.md # Architecture overview +│ ├── hexagonal.md # Ports & adapters +│ ├── security.md # Auth, sanitization +│ └── streaming.md # SSE protocol +├── operations/ # Operational guides +│ ├── deployment.md # K8s deployment +│ ├── monitoring.md # Prometheus/Grafana +│ ├── troubleshooting.md # Common issues +│ ├── database-connections.md # CRDB/Redis/Postgres +│ └── runbooks/ # Incident runbooks +├── features/ # Feature documentation +│ └── multi-provider.md # Code agent providers +└── plans/ # Planning documents +``` + +## Developer Guides + +For day-to-day development, see `.claude/guides/`: + +| Guide | Description | +|-------|-------------| +| [local/setup.md](../.claude/guides/local/setup.md) | Local development setup | +| [local/testing.md](../.claude/guides/local/testing.md) | Running tests | +| [backend/go-guidelines.md](../.claude/guides/backend/go-guidelines.md) | Go coding standards | +| [backend/hexagonal.md](../.claude/guides/backend/hexagonal.md) | Hexagonal architecture | +| [ops/credentials.md](../.claude/guides/ops/credentials.md) | Credentials management | +| [ops/deploying.md](../.claude/guides/ops/deploying.md) | Deployment process | + +## Key Resources + +### Database Connections + +See [operations/database-connections.md](operations/database-connections.md) for: +- CockroachDB SQL shell access +- Redis CLI access +- PostgreSQL access for rdev metadata + +### Credentials + +Infrastructure credentials (Cloudflare, Gitea, Woodpecker) are stored in: +- **Source:** `.secrets` file at repo root (gitignored) +- **Storage:** PostgreSQL with encryption +- **Guide:** [.claude/guides/ops/credentials.md](../.claude/guides/ops/credentials.md) + +### Service URLs + +| Service | External URL | +|---------|--------------| +| rdev API | https://rdev.masq-ops.orchard9.ai | +| CockroachDB Console | https://cockroachdb.threesix.ai | +| Gitea | https://git.threesix.ai | +| Woodpecker CI | https://ci.threesix.ai | + +## Related + +- **CLAUDE.md** - Project root documentation (always in context) +- **ai-lookup/** - Quick fact lookups for Claude diff --git a/docs/api/README.md b/docs/api/README.md index 0078695..0eb1b97 100644 --- a/docs/api/README.md +++ b/docs/api/README.md @@ -44,7 +44,7 @@ curl -N https://rdev.example.com/projects/my-project/events?stream_id=cmd-001 \ ## Base URL ``` -https://rdev.example.com +https://rdev.masq-ops.orchard9.ai ``` ## Authentication diff --git a/docs/operations/database-connections.md b/docs/operations/database-connections.md new file mode 100644 index 0000000..357290a --- /dev/null +++ b/docs/operations/database-connections.md @@ -0,0 +1,183 @@ +# Database Connections + +Quick reference for connecting to rdev infrastructure databases. + +## Prerequisites + +```bash +# REQUIRED: Set kubeconfig before any kubectl command +export KUBECONFIG=~/.kube/orchard9-k3sf.yaml +``` + +## CockroachDB + +CockroachDB is the distributed SQL database for threesix.ai project databases. + +| Property | Value | +|----------|-------| +| Service | `cockroachdb-public.databases.svc:26257` | +| Version | v25.1.3 | +| Nodes | 2-3 (StatefulSet) | +| Console | https://cockroachdb.threesix.ai | + +### Interactive SQL Shell + +```bash +kubectl exec -it -n databases cockroachdb-0 -- \ + /cockroach/cockroach sql --insecure --host=localhost:26257 +``` + +### Run a Query + +```bash +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach sql --insecure --host=localhost:26257 \ + -e "SHOW DATABASES;" +``` + +### Check Cluster Status + +```bash +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach node status --insecure --host=localhost:26257 +``` + +### Check Ranges Distribution + +```bash +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach sql --insecure --host=localhost:26257 \ + -e "SHOW RANGES FROM DATABASE rdev;" +``` + +### Internal Connection URL + +For apps running inside the cluster: + +``` +postgresql://root@cockroachdb-public.databases.svc:26257/defaultdb?sslmode=disable +``` + +## Redis + +Redis provides caching and session storage for threesix.ai projects. + +| Property | Value | +|----------|-------| +| Service | `redis.threesix.svc:6379` | +| Version | 7-alpine | +| Replicas | 1 (StatefulSet) | +| Auth | Password required | + +### Get Password + +```bash +REDIS_PASS=$(kubectl get secret -n threesix redis-credentials -o jsonpath="{.data.REDIS_PASSWORD}" | base64 -d) +``` + +### Interactive CLI + +```bash +kubectl exec -it -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" +``` + +### Ping Test + +```bash +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" ping +``` + +### Check Memory Usage + +```bash +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" info memory +``` + +### List Keys for a Project + +```bash +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" keys "project:myapp:*" +``` + +### Internal Connection URL + +For apps running inside the cluster: + +``` +redis://:password@redis.threesix.svc:6379 +``` + +## PostgreSQL (rdev metadata) + +PostgreSQL stores rdev API metadata (API keys, audit logs, work queue, credentials). + +| Property | Value | +|----------|-------| +| Service | `postgres.databases.svc:5432` | +| Database | `rdev` | + +### Connect to rdev Database + +```bash +kubectl exec -it -n databases postgres-0 -- \ + psql -U rdev -d rdev +``` + +### Check Recent API Keys + +```sql +SELECT id, name, created_at FROM api_keys ORDER BY created_at DESC LIMIT 10; +``` + +### Check Work Queue + +```sql +SELECT id, project_id, status, created_at FROM work_items ORDER BY created_at DESC LIMIT 10; +``` + +## Credentials Storage + +Infrastructure credentials (Cloudflare, Gitea, Woodpecker tokens) are stored in PostgreSQL with encryption. + +**Source file:** `.secrets` at repo root (gitignored) + +**Load credentials:** +```bash +./scripts/load-credentials.sh $RDEV_API_URL +``` + +**Verify credentials loaded:** +```bash +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/credentials | jq +``` + +See [Credentials Management](../../.claude/guides/ops/credentials.md) for full documentation. + +## Troubleshooting + +### CockroachDB: "Connection Refused" + +1. Check pods are running: + ```bash + kubectl get pods -n databases -l app=cockroachdb + ``` + +2. Check service exists: + ```bash + kubectl get svc -n databases cockroachdb-public + ``` + +### Redis: "NOAUTH Authentication Required" + +Get the password first: +```bash +REDIS_PASS=$(kubectl get secret -n threesix redis-credentials -o jsonpath="{.data.REDIS_PASSWORD}" | base64 -d) +``` + +### PostgreSQL: "Role does not exist" + +Check the correct user/database: +```bash +kubectl exec -n databases postgres-0 -- psql -U postgres -c "\l" +kubectl exec -n databases postgres-0 -- psql -U postgres -c "\du" +``` diff --git a/docs/operations/deployment.md b/docs/operations/deployment.md index bccb3ef..eef749d 100644 --- a/docs/operations/deployment.md +++ b/docs/operations/deployment.md @@ -1,19 +1,28 @@ # Deployment Guide -This guide covers deploying rdev API to a Kubernetes cluster. +This guide covers deploying rdev API to the k3s cluster. ## Prerequisites -- Kubernetes cluster (1.24+) -- kubectl configured +```bash +# REQUIRED: Set kubeconfig before any kubectl command +export KUBECONFIG=~/.kube/orchard9-k3sf.yaml +``` + +- k3s cluster (orchard9-k3sf) +- kubectl configured with correct kubeconfig - PostgreSQL database -- Container registry access +- Container registry access (ghcr.io/orchard9) ## Quick Deploy ```bash -# Apply all manifests -kubectl apply -k deployments/k8s/base/ +# Release + deploy (recommended) +./scripts/release.sh v0.10.1 "Description of changes" --deploy + +# Or manual deploy +kubectl apply -f deployments/k8s/base/rdev-api.yaml +kubectl rollout restart -n rdev deployment/rdev-api # Verify deployment kubectl -n rdev get pods diff --git a/docs/operations/monitoring.md b/docs/operations/monitoring.md index 4d81593..13d9fdc 100644 --- a/docs/operations/monitoring.md +++ b/docs/operations/monitoring.md @@ -2,6 +2,13 @@ This guide covers monitoring rdev API with Prometheus and Grafana. +## Prerequisites + +```bash +# REQUIRED: Set kubeconfig before any kubectl command +export KUBECONFIG=~/.kube/orchard9-k3sf.yaml +``` + ## Metrics Endpoint rdev exposes Prometheus metrics at `/metrics`: diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md index b6406c1..1b9430f 100644 --- a/docs/operations/troubleshooting.md +++ b/docs/operations/troubleshooting.md @@ -2,14 +2,22 @@ Common issues and their resolutions for rdev API. +## Prerequisites + +```bash +# REQUIRED: Set kubeconfig before any kubectl command +export KUBECONFIG=~/.kube/orchard9-k3sf.yaml +``` + ## Quick Diagnostics ```bash # Check pod status kubectl -n rdev get pods -l app=rdev-api -# Check logs -kubectl -n rdev logs -l app=rdev-api --tail=100 +# Check logs (use script for convenience) +./scripts/logs.sh # Last 100 lines +./scripts/logs.sh -e # Errors only # Check events kubectl -n rdev get events --sort-by='.lastTimestamp' @@ -18,7 +26,7 @@ kubectl -n rdev get events --sort-by='.lastTimestamp' kubectl -n rdev get endpoints rdev-api # Test health -kubectl -n rdev exec -it deployment/rdev-api -- wget -qO- localhost:8080/health +curl $RDEV_API_URL/health ``` ## Common Issues @@ -67,11 +75,23 @@ kubectl -n rdev logs -l app=rdev-api --previous **Diagnosis:** ```bash -# Check database connectivity from pod -kubectl -n rdev exec -it deployment/rdev-api -- sh -nc -zv postgres.databases.svc 5432 +# Check database pods +kubectl get pods -n databases + +# Test CockroachDB +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach node status --insecure --host=localhost:26257 + +# Test Redis +REDIS_PASS=$(kubectl get secret -n threesix redis-credentials -o jsonpath="{.data.REDIS_PASSWORD}" | base64 -d) +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" ping + +# Test PostgreSQL +kubectl exec -n databases postgres-0 -- psql -U rdev -d rdev -c "SELECT 1;" ``` +See [database-connections.md](database-connections.md) for full connection details. + **Common Causes:** 1. **Wrong host/port:** diff --git a/docs/plans/worker-executor-breakdown.md b/docs/plans/worker-executor-breakdown.md index 50c6042..c82c80a 100644 --- a/docs/plans/worker-executor-breakdown.md +++ b/docs/plans/worker-executor-breakdown.md @@ -99,12 +99,11 @@ The work queue, worker registry, build audit, and code agent systems are **all i ### Tasks -1. **Create `internal/worker/git_operations.go`** - - `CloneRepo(ctx, gitURL, dir, token) error` — clone via HTTPS with token auth - - `CommitAndPush(ctx, dir, message) (commitSHA string, filesChanged []string, err error)` - - `ConfigureGit(dir, name, email)` — set git user for commits - - Uses `os/exec` for git commands (same pattern as `kubernetes.Executor` uses for kubectl) - - Workspace management: creates temp dir per task, cleans up after +1. **Create `internal/worker/pod_git_operations.go`** ✅ IMPLEMENTED + - `CommitAndPush(ctx, podName, workDir, message, push) *PostBuildResult` + - Runs git commands **inside the pod** via `kubectl exec` (not locally) + - Post-build phase: Claude writes code, then rdev programmatically commits/pushes + - Follows "LLM vs rdev" principle: LLMs generate code, rdev handles deterministic ops 2. **Add git credential resolution to `BuildExecutor`** - Option A (simplest): Use the Gitea token already in `InfraConfig.GiteaToken` @@ -125,11 +124,10 @@ The work queue, worker registry, build audit, and code agent systems are **all i - Add a method to retrieve git info by project ID - Or: include `git_url` in the `WorkTask.Spec` at enqueue time (simpler, no extra lookup) -5. **Create `internal/worker/git_operations_test.go`** - - Test: clone with token auth - - Test: commit and push - - Test: workspace cleanup on success and failure - - Test: git URL construction with token +5. **Test pod git operations** + - Integration test via cookbook scripts + - Verify commit is created in pod workspace + - Verify push succeeds via kubectl exec 6. **Integration test** - Enqueue a build task with a real prompt @@ -149,8 +147,7 @@ The work queue, worker registry, build audit, and code agent systems are **all i | File | Action | |------|--------| -| `internal/worker/git_operations.go` | Create | -| `internal/worker/git_operations_test.go` | Create | +| `internal/worker/pod_git_operations.go` | Create ✅ | | `internal/worker/build_executor.go` | Modify (add git integration) | | `internal/worker/work_executor.go` | Modify (pass git config) | | `cmd/rdev-api/main.go` | Modify (pass gitea token to executor) | @@ -304,7 +301,7 @@ The work queue, worker registry, build audit, and code agent systems are **all i | Gitea token may lack permissions for new repos created by different users | Test with actual token; all repos should be in the same org | | Agent execution may take longer than expected (10+ minutes for complex prompts) | Make timeout configurable; increase default | | Worker process crash loses in-flight task | Stale requeue (Week 4) handles this automatically | -| 500-line file limit may require splitting new files | Plan for split from the start; `work_executor.go` + `build_executor.go` + `git_operations.go` keeps things modular | +| 500-line file limit may require splitting new files | Plan for split from the start; `work_executor.go` + `build_executor.go` + `pod_git_operations.go` keeps things modular | ## Architecture Decision: In-Process vs External Worker diff --git a/docs/reference.md b/docs/reference.md index 07de22f..edbd435 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -1,12 +1,211 @@ -# Multi-monorepo Claude Code infrastructure: Complete reference guide +# rdev Quick Reference -This guide covers deploying claudebox across multiple monorepos with Discord control, using Claude Pro or Team subscription authentication. This architecture provides complete project isolation, parallel AI development, and team-based access control without API keys. +Quick reference for operating rdev on the k3s cluster. -## Architecture overview +## Prerequisites -**Pattern: Multi-claudebox with single-bot routing** +```bash +# REQUIRED: Set kubeconfig before any kubectl command +export KUBECONFIG=~/.kube/orchard9-k3sf.yaml -This architecture runs separate Docker containers for each monorepo, managed by a single Discord bot that routes commands based on channel context. Each container maintains independent dependencies, git state, network policies, and Claude authentication sessions. +# API access +export RDEV_API_URL="https://rdev.masq-ops.orchard9.ai" +export RDEV_API_KEY="" +``` + +## Essential Commands + +### rdev API + +```bash +# Health check +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/health + +# List projects +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/projects + +# Work queue stats +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/work/stats + +# List credentials (masked) +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/credentials +``` + +### Pod Management + +```bash +# View rdev pods +kubectl get pods -n rdev + +# View logs (use script for convenience) +./scripts/logs.sh # Last 100 lines +./scripts/logs.sh -f # Follow/stream +./scripts/logs.sh -e # Errors only + +# Restart rdev-api +kubectl rollout restart deployment/rdev-api -n rdev +``` + +### Database Connections + +#### CockroachDB + +```bash +# Interactive SQL shell +kubectl exec -it -n databases cockroachdb-0 -- \ + /cockroach/cockroach sql --insecure --host=localhost:26257 + +# Run a query +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach sql --insecure --host=localhost:26257 \ + -e "SHOW DATABASES;" + +# Check cluster status +kubectl exec -n databases cockroachdb-0 -- \ + /cockroach/cockroach node status --insecure --host=localhost:26257 +``` + +**Console:** https://cockroachdb.threesix.ai + +#### Redis + +```bash +# Get password +REDIS_PASS=$(kubectl get secret -n threesix redis-credentials -o jsonpath="{.data.REDIS_PASSWORD}" | base64 -d) + +# Interactive CLI +kubectl exec -it -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" + +# Ping test +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" ping + +# Memory info +kubectl exec -n threesix redis-0 -- redis-cli -a "$REDIS_PASS" info memory +``` + +#### PostgreSQL (rdev metadata) + +```bash +kubectl exec -it -n databases postgres-0 -- psql -U rdev -d rdev +``` + +### Credentials Management + +Infrastructure credentials are stored in `.secrets` (gitignored) at repo root. + +```bash +# Load credentials to database +./scripts/load-credentials.sh $RDEV_API_URL + +# Verify loaded +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/credentials?category=cloudflare +``` + +**Key credential categories:** +- `gitea` - Git repository management +- `cloudflare` - DNS record management +- `woodpecker` - CI/CD pipelines + +See [Credentials Guide](../.claude/guides/ops/credentials.md) for `.secrets` file format. + +### Release & Deploy + +```bash +# Release + deploy (one command) +./scripts/release.sh v0.10.1 "Description of changes" --deploy + +# Release only (no deploy) +./scripts/release.sh v0.10.1 "Description of changes" + +# Manual deploy +kubectl apply -f deployments/k8s/base/rdev-api.yaml +kubectl rollout restart -n rdev deployment/rdev-api +``` + +## Service URLs + +| Service | Internal URL | External URL | +|---------|--------------|--------------| +| rdev API | `rdev-api.rdev.svc:8080` | https://rdev.masq-ops.orchard9.ai | +| CockroachDB | `cockroachdb-public.databases.svc:26257` | https://cockroachdb.threesix.ai | +| Redis | `redis.threesix.svc:6379` | (internal only) | +| PostgreSQL | `postgres.databases.svc:5432` | (internal only) | +| Gitea | `gitea.gitea.svc:3000` | https://git.threesix.ai | +| Woodpecker | `woodpecker-server.woodpecker.svc:8000` | https://ci.threesix.ai | + +## Documentation Index + +| Topic | Location | +|-------|----------| +| **Developer Guides** | `.claude/guides/` | +| Setup & Local Dev | `.claude/guides/local/setup.md` | +| Go Guidelines | `.claude/guides/backend/go-guidelines.md` | +| Hexagonal Architecture | `.claude/guides/backend/hexagonal.md` | +| **Operations** | `docs/operations/` | +| Deployment | `docs/operations/deployment.md` | +| Monitoring | `docs/operations/monitoring.md` | +| Troubleshooting | `docs/operations/troubleshooting.md` | +| Database Connections | `docs/operations/database-connections.md` | +| **API Reference** | `docs/api/` | +| Authentication | `docs/api/authentication.md` | +| SSE Streaming | `docs/api/sse-examples.md` | +| **Architecture** | `docs/architecture/` | +| Overview | `docs/architecture/README.md` | +| Security | `docs/architecture/security.md` | + +## Troubleshooting + +### rdev-api Not Responding + +```bash +# Check pod status +kubectl get pods -n rdev -l app=rdev-api + +# Check logs +./scripts/logs.sh -e # errors only + +# Restart +kubectl rollout restart deployment/rdev-api -n rdev +``` + +### Database Connection Issues + +```bash +# Check CockroachDB +kubectl get pods -n databases -l app=cockroachdb + +# Check Redis +kubectl get pods -n threesix -l app=redis + +# Check PostgreSQL +kubectl get pods -n databases -l app=postgres +``` + +### Credentials Not Working + +```bash +# Verify credentials loaded +curl -H "X-API-Key: $RDEV_API_KEY" $RDEV_API_URL/credentials | jq + +# Reload and restart +./scripts/load-credentials.sh $RDEV_API_URL +kubectl rollout restart deployment/rdev-api -n rdev +``` + +--- + +## Legacy: Claudebox VM Reference + +> **Note:** This section documents the original claudebox Docker/VM deployment pattern. +> rdev now runs on k3s. See above for current commands. + +The original architecture used Docker containers on a VM with Discord bot integration. +This reference is preserved for historical context and for deployments not using k3s. + +
+Click to expand legacy claudebox documentation + +### Architecture (VM-based) ``` Physical Layout: @@ -15,1678 +214,47 @@ Physical Layout: │ ├── Docker Container: claudebox-project-b │ └── Docker Container: claudebox-project-c │ └── Discord Bot Process (routes to containers) -│ -Discord Server: -├── #project-a-dev → claudebox-project-a -├── #project-b-dev → claudebox-project-b -└── #project-c-dev → claudebox-project-c ``` -**Why this pattern:** -- Dependency isolation (Project A: Node 18, Project B: Node 20, Project C: Python 3.11) -- Parallel execution (Claude works on all projects simultaneously) -- Security boundaries (network policies and file access per project) -- Resource allocation (CPU/memory limits per project) -- Independent git state (no cross-contamination) +### Key Differences from k3s ---- +| Aspect | Claudebox (VM) | rdev (k3s) | +|--------|----------------|------------| +| Orchestration | Docker Compose | Kubernetes | +| Commands | `docker exec` | `kubectl exec` | +| Storage | Host volumes | Longhorn PVCs | +| Credentials | `~/.claude/.credentials.json` | PVC mount | +| Bot | Deno Discord bot | REST API | -## Subscription requirements and authentication - -**Required subscriptions:** - -You need ONE of the following: -- **Claude Pro** ($20/month per developer) - individual use -- **Claude Team** ($30/month per developer, 5 minimum) - team collaboration with centralized billing -- **Claude Enterprise** - contact Anthropic sales for multi-team deployments - -**Authentication model:** - -Claude Code authenticates via OAuth to your claude.ai account. Each developer on your team needs their own subscription, and each will authenticate their claudebox instances with their personal credentials. This is fundamentally different from API-based usage—you're using the web subscription's usage limits, not paying per token. - -**Per-developer or shared authentication:** - -You have two deployment options: - -1. **Per-developer claudebox** (Recommended for teams): - - Each developer has their own VM with their own claudebox instances - - Each authenticates with their personal Claude subscription - - Usage tracked per developer - - No credential sharing - -2. **Shared claudebox with team account** (Single shared development server): - - One VM running all claudebox instances - - Authenticate with a shared Claude Team account - - All developers use same Discord bot - - Usage pooled across team - -This guide assumes **shared claudebox** for simplicity, but the architecture works for both models. - ---- - -## Initial VM setup and prerequisites - -**Provision your remote VM:** - -Minimum specifications: -- **CPU**: 8 cores (for 3 parallel containers) -- **RAM**: 16GB (allocate 4GB per container minimum) -- **Storage**: 100GB SSD (monorepos + Docker images + build artifacts) -- **OS**: Ubuntu 22.04 LTS or Debian 12 - -Cloud provider recommendations: -- AWS: t3.2xlarge or c6i.2xlarge -- GCP: n2-standard-8 -- Azure: Standard_D8s_v3 -- DigitalOcean: CPU-Optimized 8GB/4vCPU droplet - -**Install system dependencies:** +### Claudebox Commands (VM) ```bash -# Update system -sudo apt update && sudo apt upgrade -y - -# Install Docker -curl -fsSL https://get.docker.com -o get-docker.sh -sudo sh get-docker.sh -sudo usermod -aG docker $USER -newgrp docker - -# Install Node.js (for Claude Code CLI) -curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - -sudo apt install -y nodejs - -# Install Deno (for Discord bot) -curl -fsSL https://deno.land/install.sh | sh -echo 'export DENO_INSTALL="$HOME/.deno"' >> ~/.bashrc -echo 'export PATH="$DENO_INSTALL/bin:$PATH"' >> ~/.bashrc -source ~/.bashrc - -# Install Git and development tools -sudo apt install -y git build-essential curl wget vim - -# Install Docker Compose -sudo apt install -y docker-compose-plugin -``` - -**Clone your monorepos:** - -```bash -mkdir -p ~/projects -cd ~/projects - -# Clone your monorepos -git clone https://github.com/yourorg/monorepo-a.git -git clone https://github.com/yourorg/monorepo-b.git -git clone https://github.com/yourorg/monorepo-c.git -``` - ---- - -## Claude Code CLI installation and authentication - -**Install Claude Code globally:** - -```bash -npm install -g @anthropic-ai/claude-code -``` - -**Authenticate with your subscription:** - -```bash -claude /login -``` - -This opens a browser window where you'll sign in with your Claude Pro/Team account. The authentication token is saved to `~/.claude/.credentials.json` and will be mounted into each container. - -**Verify authentication:** - -```bash -claude --version -ls -la ~/.claude/.credentials.json -``` - -The credentials file should exist and contain your authentication state. This single authentication covers all three claudebox instances since they'll mount the same credentials directory. - -**Important: Credential persistence** - -Your authentication lasts approximately 30 days. When it expires: -1. Run `claude /login` again on the host VM -2. Restart all claudebox containers to pick up new credentials -3. No need to re-authenticate inside containers - ---- - -## Installing and configuring claudebox - -**Download and install claudebox:** - -```bash -cd ~ -wget https://github.com/RchGrav/claudebox/releases/latest/download/claudebox.run -chmod +x claudebox.run -./claudebox.run - -# Add to PATH -echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc -source ~/.bashrc - -# Verify installation -claudebox info -``` - -**Initialize claudebox for each project:** - -```bash -# Project A -cd ~/projects/monorepo-a -claudebox init project-a - -# Project B -cd ~/projects/monorepo-b -claudebox init project-b - -# Project C -cd ~/projects/monorepo-c -claudebox init project-c -``` - -This creates isolated configuration directories: -``` -~/.claudebox/ -├── project-a/ -│ ├── .claude/ # Will mount host credentials -│ ├── firewall/ # Network allowlist -│ └── workspace/ # Symlink to ~/projects/monorepo-a -├── project-b/ -│ └── ... -└── project-c/ - └── ... -``` - -**Install profiles for each project:** - -Profiles define the development environment for each container. - -```bash -# Project A: JavaScript/TypeScript monorepo with Turborepo -claudebox profile javascript typescript -claudebox profile project-a --install - -# Project B: Python monorepo with Poetry -claudebox profile python ml -claudebox profile project-b --install - -# Project C: Rust monorepo -claudebox profile rust -claudebox profile project-c --install -``` - -**Configure project-specific profiles:** - -Create custom profile files for monorepo tooling: - -```ini -# ~/.claudebox/profiles/project-a.ini -[base] -name = JavaScript Monorepo (Turborepo) -extends = javascript,typescript - -[packages] -# System packages -apt = build-essential python3 git - -# Node.js tools -npm = turbo nx pnpm typescript eslint prettier - -[environment] -NODE_VERSION = 20 -PNPM_HOME = /root/.local/share/pnpm -PATH = $PNPM_HOME:$PATH - -[startup] -# Install dependencies on container start -commands = - cd /workspace && pnpm install - turbo daemon start - -[firewall] -# Allow access to package registries -allow = registry.npmjs.org,github.com,*.github.com - -[resource_limits] -cpus = 2.0 -memory = 4g -``` - -```ini -# ~/.claudebox/profiles/project-b.ini -[base] -name = Python Monorepo (Poetry) -extends = python,ml - -[packages] -apt = python3.11 python3-pip python3-venv -pip = poetry pytest black mypy pandas numpy torch - -[environment] -PYTHON_VERSION = 3.11 -POETRY_VIRTUALENVS_IN_PROJECT = true - -[startup] -commands = - cd /workspace && poetry install - poetry run python --version - -[firewall] -allow = pypi.org,files.pythonhosted.org,github.com - -[resource_limits] -cpus = 4.0 -memory = 8g -``` - -```ini -# ~/.claudebox/profiles/project-c.ini -[base] -name = Rust Monorepo -extends = rust - -[packages] -apt = build-essential pkg-config libssl-dev -cargo = cargo-workspaces cargo-watch cargo-edit - -[environment] -RUST_VERSION = stable -CARGO_HOME = /usr/local/cargo - -[startup] -commands = - cd /workspace && cargo fetch - cargo build --workspace --release - -[firewall] -allow = crates.io,static.crates.io,github.com - -[resource_limits] -cpus = 3.0 -memory = 6g -``` - ---- - -## Docker compose orchestration - -Create a centralized orchestration file for all containers. - -**Create infrastructure directory:** - -```bash -mkdir -p ~/claude-infra -cd ~/claude-infra -``` - -**Create docker-compose.yml:** - -```yaml -# ~/claude-infra/docker-compose.yml -version: '3.8' - -services: - claudebox-project-a: - image: ghcr.io/rchgrav/claudebox:latest - container_name: claudebox-project-a - hostname: project-a-dev - - volumes: - # Mount monorepo - - ~/projects/monorepo-a:/workspace:rw - - # Mount shared Claude credentials (READ-ONLY) - - ~/.claude:/root/.claude:ro - - # Mount project-specific config - - ~/.claudebox/project-a/firewall:/etc/claudebox/firewall:rw - - # Shared caches for faster builds - - npm-cache:/root/.npm - - pnpm-store:/root/.local/share/pnpm/store - - environment: - - PROJECT_NAME=project-a - - WORKSPACE=/workspace - - CLAUDEBOX_PROFILE=project-a - - working_dir: /workspace - - deploy: - resources: - limits: - cpus: '2.0' - memory: 4G - reservations: - cpus: '1.0' - memory: 2G - - networks: - - claude-network-a - - restart: unless-stopped - - # Keep container running - command: tail -f /dev/null - - claudebox-project-b: - image: ghcr.io/rchgrav/claudebox:latest - container_name: claudebox-project-b - hostname: project-b-dev - - volumes: - - ~/projects/monorepo-b:/workspace:rw - - ~/.claude:/root/.claude:ro - - ~/.claudebox/project-b/firewall:/etc/claudebox/firewall:rw - - pip-cache:/root/.cache/pip - - poetry-cache:/root/.cache/pypoetry - - environment: - - PROJECT_NAME=project-b - - WORKSPACE=/workspace - - CLAUDEBOX_PROFILE=project-b - - PYTHON_VERSION=3.11 - - working_dir: /workspace - - deploy: - resources: - limits: - cpus: '4.0' - memory: 8G - reservations: - cpus: '2.0' - memory: 4G - - networks: - - claude-network-b - - restart: unless-stopped - command: tail -f /dev/null - - claudebox-project-c: - image: ghcr.io/rchgrav/claudebox:latest - container_name: claudebox-project-c - hostname: project-c-dev - - volumes: - - ~/projects/monorepo-c:/workspace:rw - - ~/.claude:/root/.claude:ro - - ~/.claudebox/project-c/firewall:/etc/claudebox/firewall:rw - - cargo-registry:/usr/local/cargo/registry - - cargo-git:/usr/local/cargo/git - - environment: - - PROJECT_NAME=project-c - - WORKSPACE=/workspace - - CLAUDEBOX_PROFILE=project-c - - RUST_VERSION=stable - - working_dir: /workspace - - deploy: - resources: - limits: - cpus: '3.0' - memory: 6G - reservations: - cpus: '1.5' - memory: 3G - - networks: - - claude-network-c - - restart: unless-stopped - command: tail -f /dev/null - -networks: - claude-network-a: - driver: bridge - claude-network-b: - driver: bridge - claude-network-c: - driver: bridge - -volumes: - npm-cache: - pnpm-store: - pip-cache: - poetry-cache: - cargo-registry: - cargo-git: -``` - -**Container lifecycle management:** - -```bash -# Start all containers +# Start containers cd ~/claude-infra docker compose up -d # View status docker compose ps -# View logs -docker compose logs -f claudebox-project-a +# Execute command +docker exec -it claudebox-project-a bash -# Stop all containers -docker compose stop - -# Restart specific container -docker compose restart claudebox-project-b - -# Remove all containers (keeps volumes) -docker compose down - -# Full cleanup including volumes -docker compose down -v +# Claude session +docker exec -it claudebox-project-a claude "your prompt" ``` ---- - -## Discord bot setup and configuration - -**Create Discord application:** - -1. Go to https://discord.com/developers/applications -2. Click **New Application**, name it "Claude-MultiProject" -3. Navigate to **Bot** section -4. Click **Add Bot** -5. Enable **Message Content Intent** -6. Copy the **Bot Token** (save securely) -7. Copy the **Application ID** from General Information - -**Invite bot to your server:** - -1. Go to **OAuth2** → **URL Generator** -2. Select scopes: `bot`, `applications.commands` -3. Select permissions: - - Send Messages - - Use Slash Commands - - Read Message History - - Embed Links - - Attach Files - - Manage Messages (for cleanup) -4. Copy generated URL and open in browser -5. Select your server and authorize - -**Clone and configure the Discord bot:** +### Management Script ```bash -cd ~/claude-infra -git clone https://github.com/zebbern/claude-code-discord.git discord-bot -cd discord-bot -``` - -**Create environment configuration:** - -```bash -# ~/claude-infra/discord-bot/.env -DISCORD_TOKEN=your_bot_token_here -APPLICATION_ID=your_application_id_here - -# Project routing configuration -PROJECT_A_CONTAINER=claudebox-project-a -PROJECT_B_CONTAINER=claudebox-project-b -PROJECT_C_CONTAINER=claudebox-project-c - -# Channel mapping (will configure in code) -# This is just documentation -CHANNEL_PROJECT_A=project-a-dev -CHANNEL_PROJECT_B=project-b-dev -CHANNEL_PROJECT_C=project-c-dev -``` - -**Create enhanced routing configuration:** - -```typescript -// ~/claude-infra/discord-bot/config.ts -export interface ProjectConfig { - container: string; - workdir: string; - profile: string; - allowedRoles: string[]; - description: string; - monorepo: { - tool: 'turbo' | 'nx' | 'poetry' | 'cargo'; - packages: string[]; - }; -} - -export const PROJECT_CONFIGS: Record = { - 'project-a-dev': { - container: 'claudebox-project-a', - workdir: '/workspace', - profile: 'javascript typescript', - allowedRoles: ['project-a-devs', 'admin', 'engineering'], - description: 'JavaScript/TypeScript monorepo with Turborepo', - monorepo: { - tool: 'turbo', - packages: ['@myorg/api', '@myorg/web', '@myorg/shared'] - } - }, - 'project-b-dev': { - container: 'claudebox-project-b', - workdir: '/workspace', - profile: 'python ml', - allowedRoles: ['project-b-devs', 'admin', 'data-science'], - description: 'Python monorepo with Poetry for ML workflows', - monorepo: { - tool: 'poetry', - packages: ['ml-pipeline', 'data-processing', 'inference-service'] - } - }, - 'project-c-dev': { - container: 'claudebox-project-c', - workdir: '/workspace', - profile: 'rust', - allowedRoles: ['project-c-devs', 'admin', 'systems'], - description: 'Rust monorepo with Cargo workspaces', - monorepo: { - tool: 'cargo', - packages: ['core', 'cli', 'server'] - } - } -}; - -// Monorepo context templates -export const MONOREPO_CONTEXT = { - turbo: `This is a Turborepo monorepo. When making changes: -- Use 'turbo run ' for builds/tests -- Changes may affect multiple packages -- Check package.json workspaces for dependencies -- Use 'turbo run build --filter=' for specific packages`, - - nx: `This is an Nx monorepo. When making changes: -- Use 'nx run :' for tasks -- Check nx.json for task configuration -- Use 'nx affected:build' for changed packages`, - - poetry: `This is a Poetry monorepo. When making changes: -- Use 'poetry run ' for scripts -- Update pyproject.toml for dependencies -- Run 'poetry install' after dependency changes`, - - cargo: `This is a Cargo workspace. When making changes: -- Use 'cargo build --workspace' for all packages -- Update Cargo.toml in workspace root -- Use 'cargo build -p ' for specific crates` -}; -``` - -**Create enhanced bot with routing:** - -```typescript -// ~/claude-infra/discord-bot/bot.ts -import { Client, GatewayIntentBits, REST, Routes, SlashCommandBuilder } from 'discord.js'; -import { exec } from 'child_process'; -import { promisify } from 'util'; -import { PROJECT_CONFIGS, MONOREPO_CONTEXT } from './config.ts'; - -const execAsync = promisify(exec); - -const client = new Client({ - intents: [ - GatewayIntentBits.Guilds, - GatewayIntentBits.GuildMessages, - GatewayIntentBits.MessageContent, - ], -}); - -// Helper: Check user permissions -function hasPermission(member: any, allowedRoles: string[]): boolean { - return member.roles.cache.some((role: any) => - allowedRoles.includes(role.name.toLowerCase()) - ); -} - -// Helper: Execute command in container -async function execInContainer( - container: string, - command: string, - cwd: string = '/workspace' -): Promise { - const dockerCmd = `docker exec -w ${cwd} ${container} bash -c "${command.replace(/"/g, '\\"')}"`; - - try { - const { stdout, stderr } = await execAsync(dockerCmd, { - maxBuffer: 10 * 1024 * 1024, // 10MB buffer for large outputs - timeout: 300000 // 5 minute timeout - }); - return stdout || stderr; - } catch (error: any) { - throw new Error(`Container execution failed: ${error.message}`); - } -} - -// Helper: Get project config from channel -function getProjectConfig(channelName: string) { - const config = PROJECT_CONFIGS[channelName]; - if (!config) { - throw new Error(`No configuration found for channel: ${channelName}`); - } - return config; -} - -// Command: /claude -const claudeCommand = new SlashCommandBuilder() - .setName('claude') - .setDescription('Send a prompt to Claude Code') - .addStringOption(option => - option.setName('prompt') - .setDescription('Your prompt for Claude') - .setRequired(true) - ) - .addStringOption(option => - option.setName('mode') - .setDescription('Execution mode') - .addChoices( - { name: 'normal', value: 'normal' }, - { name: 'auto-accept', value: 'auto' }, - { name: 'plan-only', value: 'plan' } - ) - ); - -// Command: /status -const statusCommand = new SlashCommandBuilder() - .setName('status') - .setDescription('Check container and project status'); - -// Command: /shell -const shellCommand = new SlashCommandBuilder() - .setName('shell') - .setDescription('Execute shell command in project container') - .addStringOption(option => - option.setName('command') - .setDescription('Shell command to execute') - .setRequired(true) - ); - -// Command: /git -const gitCommand = new SlashCommandBuilder() - .setName('git') - .setDescription('Execute git command') - .addStringOption(option => - option.setName('args') - .setDescription('Git arguments (e.g., "status", "diff HEAD")') - .setRequired(true) - ); - -// Command: /monorepo-info -const monorepoInfoCommand = new SlashCommandBuilder() - .setName('monorepo-info') - .setDescription('Show monorepo structure and packages'); - -// Register commands -const commands = [ - claudeCommand, - statusCommand, - shellCommand, - gitCommand, - monorepoInfoCommand, -].map(cmd => cmd.toJSON()); - -// Handle interactions -client.on('interactionCreate', async interaction => { - if (!interaction.isCommand()) return; - - try { - const channelName = interaction.channel?.name || ''; - const config = getProjectConfig(channelName); - - // Check permissions - if (!hasPermission(interaction.member, config.allowedRoles)) { - await interaction.reply({ - content: `⛔ You don't have permission to use Claude Code on ${config.description}. Required roles: ${config.allowedRoles.join(', ')}`, - ephemeral: true - }); - return; - } - - await interaction.deferReply(); - - switch (interaction.commandName) { - case 'claude': { - const prompt = interaction.options.getString('prompt', true); - const mode = interaction.options.getString('mode') || 'normal'; - - // Add monorepo context to prompt - const monorepContext = MONOREPO_CONTEXT[config.monorepo.tool]; - const enhancedPrompt = `${monorepContext}\n\nUser request: ${prompt}`; - - const modeFlag = mode === 'auto' ? '--dangerously-skip-permissions' : - mode === 'plan' ? '--plan-only' : ''; - - const output = await execInContainer( - config.container, - `claude "${enhancedPrompt}" ${modeFlag}`, - config.workdir - ); - - // Split long responses - const chunks = output.match(/[\s\S]{1,1900}/g) || []; - for (const chunk of chunks) { - await interaction.followUp({ - content: `\`\`\`\n${chunk}\n\`\`\`` - }); - } - break; - } - - case 'status': { - const containerStatus = await execAsync(`docker inspect ${config.container} --format '{{.State.Status}}'`); - const diskUsage = await execInContainer(config.container, 'df -h /workspace | tail -1'); - const gitStatus = await execInContainer(config.container, 'git status -s', config.workdir); - - await interaction.editReply({ - content: `📊 **Project Status: ${config.description}**\n\n` + - `Container: ${containerStatus.stdout.trim()}\n` + - `Disk: ${diskUsage}\n` + - `\`\`\`\n${gitStatus || 'No changes'}\n\`\`\`` - }); - break; - } - - case 'shell': { - const command = interaction.options.getString('command', true); - - // Safety check for destructive commands - const dangerous = ['rm -rf', 'dd if=', 'mkfs', '> /dev/']; - if (dangerous.some(d => command.includes(d))) { - await interaction.editReply('⛔ Potentially destructive command blocked. Use with caution.'); - return; - } - - const output = await execInContainer(config.container, command, config.workdir); - await interaction.editReply({ - content: `\`\`\`bash\n$ ${command}\n${output.slice(0, 1900)}\n\`\`\`` - }); - break; - } - - case 'git': { - const args = interaction.options.getString('args', true); - const output = await execInContainer(config.container, `git ${args}`, config.workdir); - await interaction.editReply({ - content: `\`\`\`\n${output.slice(0, 1900)}\n\`\`\`` - }); - break; - } - - case 'monorepo-info': { - const packages = config.monorepo.packages.join('\n- '); - const tree = await execInContainer(config.container, 'tree -L 2 -d', config.workdir); - - await interaction.editReply({ - content: `📦 **Monorepo Structure**\n\n` + - `Tool: ${config.monorepo.tool}\n` + - `Packages:\n- ${packages}\n\n` + - `\`\`\`\n${tree.slice(0, 1500)}\n\`\`\`` - }); - break; - } - } - - } catch (error: any) { - console.error('Command error:', error); - await interaction.editReply({ - content: `❌ Error: ${error.message}` - }); - } -}); - -// Bot ready -client.on('ready', async () => { - console.log(`✅ Bot logged in as ${client.user?.tag}`); - - // Register slash commands - const rest = new REST({ version: '10' }).setToken(process.env.DISCORD_TOKEN!); - - try { - await rest.put( - Routes.applicationCommands(process.env.APPLICATION_ID!), - { body: commands } - ); - console.log('✅ Slash commands registered'); - } catch (error) { - console.error('Failed to register commands:', error); - } -}); - -// Start bot -client.login(process.env.DISCORD_TOKEN); -``` - -**Start the Discord bot:** - -```bash -cd ~/claude-infra/discord-bot -deno run --allow-all bot.ts -``` - -**Run bot as systemd service:** - -```ini -# /etc/systemd/system/claude-discord-bot.service -[Unit] -Description=Claude Code Discord Bot (Multi-Project) -After=network.target docker.service - -[Service] -Type=simple -User=youruser -WorkingDirectory=/home/youruser/claude-infra/discord-bot -EnvironmentFile=/home/youruser/claude-infra/discord-bot/.env -ExecStart=/home/youruser/.deno/bin/deno run --allow-all bot.ts -Restart=always -RestartSec=10 - -[Install] -WantedBy=multi-user.target -``` - -```bash -sudo systemctl daemon-reload -sudo systemctl enable claude-discord-bot -sudo systemctl start claude-discord-bot -sudo systemctl status claude-discord-bot -``` - ---- - -## Discord server organization - -**Create structured channel layout:** - -``` -Your Discord Server -│ -├── 📁 INFRASTRUCTURE -│ ├── #system-status (bot health, container stats) -│ ├── #announcements (maintenance windows, updates) -│ └── #admin-commands (restricted admin-only channel) -│ -├── 📁 PROJECT A - JavaScript Monorepo -│ ├── #project-a-dev (Claude commands) -│ ├── #project-a-logs (git commits, CI/CD) -│ └── #project-a-discuss (team chat) -│ -├── 📁 PROJECT B - Python ML -│ ├── #project-b-dev -│ ├── #project-b-logs -│ └── #project-b-discuss -│ -├── 📁 PROJECT C - Rust Systems -│ ├── #project-c-dev -│ ├── #project-c-logs -│ └── #project-c-discuss -│ -└── 📁 RESOURCES - ├── #documentation (setup guides, workflows) - └── #troubleshooting (common issues, solutions) -``` - -**Configure role-based permissions:** - -Create Discord roles: -- `admin` - Full access to all projects -- `project-a-devs` - Access to Project A -- `project-b-devs` - Access to Project B -- `project-c-devs` - Access to Project C -- `engineering` - Read access to all projects -- `observers` - View-only access - -Set channel permissions: -1. Each `#project-X-dev` channel: Only respective role can send messages -2. Each `#project-X-logs` channel: Read-only for all, bot can post -3. `#admin-commands`: Admin-only -4. `#system-status`: Bot posts, all can view - ---- - -## Management scripts and automation - -**Create container management wrapper:** - -```bash -#!/bin/bash -# ~/claude-infra/manage.sh - -set -e - -COMPOSE_FILE="$HOME/claude-infra/docker-compose.yml" - -function usage() { - cat << EOF -Claude Multi-Project Manager - -Usage: ./manage.sh [options] - -Commands: - start [project] Start all containers or specific project - stop [project] Stop all containers or specific project - restart [project] Restart containers - status Show status of all containers - logs View logs for project - exec Execute command in project container - shell Open shell in project container - rebuild [project] Rebuild container images - backup Backup all container state - restore Restore from backup - health Run health checks on all projects - update Update claudebox and bot - -Projects: project-a, project-b, project-c - -Examples: - ./manage.sh start project-a - ./manage.sh logs project-b - ./manage.sh exec project-c "git status" - ./manage.sh shell project-a -EOF -} - -function get_container() { - case "$1" in - project-a) echo "claudebox-project-a" ;; - project-b) echo "claudebox-project-b" ;; - project-c) echo "claudebox-project-c" ;; - *) echo "Unknown project: $1" >&2; exit 1 ;; - esac -} - -case "$1" in - start) - if [ -z "$2" ]; then - docker compose -f "$COMPOSE_FILE" up -d - echo "✅ All containers started" - else - CONTAINER=$(get_container "$2") - docker compose -f "$COMPOSE_FILE" up -d "$CONTAINER" - echo "✅ Started $CONTAINER" - fi - ;; - - stop) - if [ -z "$2" ]; then - docker compose -f "$COMPOSE_FILE" stop - echo "✅ All containers stopped" - else - CONTAINER=$(get_container "$2") - docker compose -f "$COMPOSE_FILE" stop "$CONTAINER" - echo "✅ Stopped $CONTAINER" - fi - ;; - - restart) - if [ -z "$2" ]; then - docker compose -f "$COMPOSE_FILE" restart - echo "✅ All containers restarted" - else - CONTAINER=$(get_container "$2") - docker compose -f "$COMPOSE_FILE" restart "$CONTAINER" - echo "✅ Restarted $CONTAINER" - fi - ;; - - status) - docker compose -f "$COMPOSE_FILE" ps - echo "" - echo "Resource usage:" - docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" \ - claudebox-project-a claudebox-project-b claudebox-project-c - ;; - - logs) - if [ -z "$2" ]; then - echo "Error: Specify project" >&2 - exit 1 - fi - CONTAINER=$(get_container "$2") - docker compose -f "$COMPOSE_FILE" logs -f --tail=100 "$CONTAINER" - ;; - - exec) - if [ -z "$2" ] || [ -z "$3" ]; then - echo "Error: Specify project and command" >&2 - exit 1 - fi - CONTAINER=$(get_container "$2") - shift 2 - docker exec -it "$CONTAINER" bash -c "$@" - ;; - - shell) - if [ -z "$2" ]; then - echo "Error: Specify project" >&2 - exit 1 - fi - CONTAINER=$(get_container "$2") - docker exec -it "$CONTAINER" bash - ;; - - rebuild) - if [ -z "$2" ]; then - docker compose -f "$COMPOSE_FILE" build --no-cache - docker compose -f "$COMPOSE_FILE" up -d - else - CONTAINER=$(get_container "$2") - docker compose -f "$COMPOSE_FILE" build --no-cache "$CONTAINER" - docker compose -f "$COMPOSE_FILE" up -d "$CONTAINER" - fi - echo "✅ Rebuild complete" - ;; - - backup) - BACKUP_DIR="$HOME/claude-backups/$(date +%Y%m%d-%H%M%S)" - mkdir -p "$BACKUP_DIR" - - # Backup container state - for project in project-a project-b project-c; do - CONTAINER=$(get_container "$project") - echo "Backing up $CONTAINER..." - docker export "$CONTAINER" | gzip > "$BACKUP_DIR/${CONTAINER}.tar.gz" - done - - # Backup configurations - cp -r ~/.claudebox "$BACKUP_DIR/config" - - echo "✅ Backup saved to $BACKUP_DIR" - ;; - - health) - echo "Running health checks..." - echo "" - - for project in project-a project-b project-c; do - CONTAINER=$(get_container "$project") - echo "Checking $project..." - - # Container running? - if docker ps --filter "name=$CONTAINER" --format '{{.Names}}' | grep -q "$CONTAINER"; then - echo " ✅ Container running" - else - echo " ❌ Container not running" - continue - fi - - # Claude authenticated? - if docker exec "$CONTAINER" bash -c "[ -f /root/.claude/.credentials.json ]"; then - echo " ✅ Claude authenticated" - else - echo " ⚠️ Claude not authenticated" - fi - - # Workspace accessible? - if docker exec "$CONTAINER" bash -c "[ -d /workspace ]"; then - echo " ✅ Workspace mounted" - else - echo " ❌ Workspace not mounted" - fi - - # Git repository? - if docker exec "$CONTAINER" bash -c "cd /workspace && git status >/dev/null 2>&1"; then - echo " ✅ Git repository valid" - else - echo " ⚠️ Not a git repository" - fi - - echo "" - done - ;; - - update) - echo "Updating Claude Code CLI..." - npm install -g @anthropic-ai/claude-code - - echo "Updating Discord bot..." - cd ~/claude-infra/discord-bot - git pull - - echo "Updating containers..." - docker compose -f "$COMPOSE_FILE" pull - docker compose -f "$COMPOSE_FILE" up -d - - echo "✅ Update complete" - ;; - - *) - usage - exit 1 - ;; -esac -``` - -**Make executable:** - -```bash -chmod +x ~/claude-infra/manage.sh - -# Add alias for convenience -echo "alias claude-manage='~/claude-infra/manage.sh'" >> ~/.bashrc -source ~/.bashrc -``` - -**Create automatic checkpoint script:** - -```bash -#!/bin/bash -# ~/claude-infra/auto-checkpoint.sh - -# Commits work in progress for all projects every hour - -for project in project-a project-b project-c; do - CONTAINER="claudebox-$project" - - echo "Checkpointing $project..." - docker exec "$CONTAINER" bash -c " - cd /workspace && \ - git add -A && \ - git commit -m 'Auto-checkpoint: $(date +"%Y-%m-%d %H:%M:%S")' >/dev/null 2>&1 || true - " -done - -echo "Checkpoints created at $(date)" -``` - -**Schedule with cron:** - -```bash -chmod +x ~/claude-infra/auto-checkpoint.sh - -# Add to crontab -crontab -e - -# Add line: -0 * * * * /home/youruser/claude-infra/auto-checkpoint.sh >> /home/youruser/claude-infra/checkpoint.log 2>&1 -``` - -**Create health monitoring:** - -```bash -#!/bin/bash -# ~/claude-infra/health-monitor.sh - -WEBHOOK_URL="your_discord_webhook_url" - -function send_alert() { - curl -X POST "$WEBHOOK_URL" \ - -H "Content-Type: application/json" \ - -d "{\"content\": \"🚨 $1\"}" -} - -# Check each container -for container in claudebox-project-a claudebox-project-b claudebox-project-c; do - if ! docker ps --format '{{.Names}}' | grep -q "^${container}$"; then - send_alert "Container $container is not running!" - - # Attempt restart - docker start "$container" - sleep 5 - - if docker ps --format '{{.Names}}' | grep -q "^${container}$"; then - send_alert "Container $container restarted successfully" - else - send_alert "Failed to restart container $container" - fi - fi -done - -# Check disk space -DISK_USAGE=$(df -h /home | tail -1 | awk '{print $5}' | sed 's/%//') -if [ "$DISK_USAGE" -gt 85 ]; then - send_alert "Disk usage is at ${DISK_USAGE}% - cleanup recommended" -fi - -# Check memory -MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100}') -if [ "$MEM_USAGE" -gt 90 ]; then - send_alert "Memory usage is at ${MEM_USAGE}% - may need to restart containers" -fi -``` - -**Schedule health checks:** - -```bash -chmod +x ~/claude-infra/health-monitor.sh - -# Add to crontab (every 5 minutes) -crontab -e - -# Add: -*/5 * * * * /home/youruser/claude-infra/health-monitor.sh -``` - ---- - -## Operational workflows - -**Daily developer workflow:** - -1. **Check system status** (via Discord): - ``` - /status - ``` - -2. **Start working on a feature**: - ``` - /git checkout -b feature/new-api-endpoint - /claude Create a new REST endpoint for user profile updates in the api package. Follow our existing patterns. - ``` - -3. **Review changes**: - ``` - /git diff - /git status - ``` - -4. **Run tests**: - ``` - /shell turbo run test --filter=@myorg/api - ``` - -5. **Commit work**: - ``` - /git add . - /git commit -m "feat(api): add user profile update endpoint" - /git push origin feature/new-api-endpoint - ``` - -**Cross-project coordination:** - -When changes in one project affect another: - -1. Make changes in Project A: - ``` - In #project-a-dev: - /claude Update the API client types to match the new endpoint schema - ``` - -2. Export the changes: - ``` - /git diff HEAD~1 src/types/api.ts > /tmp/api-changes.patch - ``` - -3. Apply to Project B: - ``` - In #project-b-dev: - /shell cat /tmp/api-changes.patch - /claude Review this API change from project-a and update our integration accordingly: [paste diff] - ``` - -**Emergency rollback:** - -If Claude makes unwanted changes: - -``` -/git status -/git diff HEAD -/git checkout . -/git clean -fd -``` - -Or restore from auto-checkpoint: - -``` -/git reflog -/git reset --hard HEAD@{1} -``` - -**Subscription usage monitoring:** - -Since you're using claude.ai subscriptions (not API), monitor usage via: - -1. Go to https://claude.ai/settings -2. Check "Usage" tab for your subscription tier limits -3. Usage resets monthly based on your subscription date - -Each container shares the same subscription, so total usage is aggregate across all three projects. - -**Credential refresh:** - -When your 30-day authentication expires: - -```bash -# On the VM -claude /login - -# Restart containers to pick up new credentials -claude-manage restart -``` - ---- - -## Troubleshooting - -**Container won't start:** - -```bash -# Check logs -claude-manage logs project-a - -# Common issues: -# 1. Port conflict -docker ps -a | grep claudebox - -# 2. Volume mount permissions -ls -la ~/projects/monorepo-a - -# 3. Credentials missing -ls -la ~/.claude/.credentials.json -``` - -**"Not authenticated" error in container:** - -```bash -# Re-authenticate on host -claude /login - -# Verify credentials exist -cat ~/.claude/.credentials.json - -# Restart containers -claude-manage restart - -# Verify mount inside container -docker exec claudebox-project-a ls -la /root/.claude/ -``` - -**Discord bot not responding:** - -```bash -# Check bot process -sudo systemctl status claude-discord-bot - -# View bot logs -sudo journalctl -u claude-discord-bot -f - -# Common issues: -# 1. Invalid token -grep DISCORD_TOKEN ~/claude-infra/discord-bot/.env - -# 2. Missing permissions -# Check bot has "Use Slash Commands" in Discord server settings - -# 3. Commands not registered -# Wait 1 hour or restart bot -sudo systemctl restart claude-discord-bot -``` - -**Claude Code command hangs:** - -```bash -# Check container CPU/memory -docker stats claudebox-project-a - -# Kill hung process inside container -docker exec claudebox-project-a pkill -f claude - -# Or restart container -claude-manage restart project-a -``` - -**Monorepo build failures:** - -```bash -# Clear caches -claude-manage exec project-a "rm -rf node_modules .turbo && pnpm install" -claude-manage exec project-b "poetry cache clear --all pypi" -claude-manage exec project-c "cargo clean" - -# Rebuild container from scratch -claude-manage rebuild project-a -``` - -**Disk space issues:** - -```bash -# Check usage -df -h - -# Clean Docker -docker system prune -a --volumes -docker volume prune - -# Clean build artifacts -claude-manage exec project-a "turbo run clean" -claude-manage exec project-b "find /workspace -type d -name __pycache__ -exec rm -rf {} +" -claude-manage exec project-c "cargo clean" -``` - ---- - -## Security considerations - -**Credential security:** - -- Never commit `.env` files to git -- Restrict `~/.claude/.credentials.json` permissions: `chmod 600 ~/.claude/.credentials.json` -- Use SSH keys for git operations, not passwords -- Rotate Discord bot token if exposed - -**Container isolation:** - -- Each container has separate network namespace -- Firewall rules prevent unauthorized egress -- Containers run as non-root user (UID 1000) -- No privileged mode - -**Discord permissions:** - -- Use role-based access control -- Audit channel permissions monthly -- Restrict `/shell` and destructive commands to admins -- Enable 2FA for all Discord accounts - -**Git security:** - -- Always work on feature branches -- Require code review for main/master -- Use GPG signing for commits -- Never commit secrets or API keys - -**VM hardening:** - -```bash -# Setup UFW firewall -sudo ufw default deny incoming -sudo ufw default allow outgoing -sudo ufw allow 22/tcp # SSH -sudo ufw enable - -# Disable password authentication -sudo sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config -sudo systemctl restart sshd - -# Setup fail2ban -sudo apt install fail2ban -sudo systemctl enable fail2ban -``` - -**Backup strategy:** - -```bash -# Weekly full backup -0 2 * * 0 /home/youruser/claude-infra/manage.sh backup - -# Keep last 4 weeks -find ~/claude-backups -type d -mtime +28 -exec rm -rf {} + -``` - ---- - -## Subscription usage optimization - -**Usage limits by tier:** - -- **Claude Pro**: Approximately 5x more usage than free tier, resets monthly -- **Claude Team**: Higher limits, usage pooled across team members -- **Usage tracked per subscription**, not per container - -**Strategies to optimize usage:** - -1. **Use planning mode first**: - ``` - /claude --mode plan "Design the new API endpoint structure" - ``` - Reviews design without making changes. - -2. **Batch related tasks**: - ``` - /claude "Update all API endpoints to use the new error handling pattern, then update tests, then update documentation" - ``` - Single conversation vs. three separate ones. - -3. **Use shell commands for simple operations**: - ``` - /shell grep -r "TODO" src/ - /git log --oneline -10 - ``` - Don't waste Claude on simple lookups. - -4. **Enable auto-checkpoints**: - Prevents needing to regenerate work if you hit usage limits mid-task. - -5. **Schedule heavy operations**: - Run large refactors at the start of your billing cycle when usage resets. - -**Monitor usage per project:** - -Create a tracking webhook: - -```typescript -// Add to bot.ts -async function logUsage(project: string, prompt: string) { - await fetch('YOUR_TRACKING_ENDPOINT', { - method: 'POST', - body: JSON.stringify({ - project, - timestamp: new Date(), - promptLength: prompt.length, - user: interaction.user.id - }) - }); -} -``` - -This helps identify which projects consume the most usage. - ---- - -## Conclusion - -This infrastructure provides production-grade multi-project Claude Code access with strong isolation, team collaboration, and subscription-based authentication. The multi-container architecture scales horizontally—add more projects by extending the docker-compose file and updating the bot's PROJECT_CONFIGS. - -**Quick reference:** - -```bash -# Management claude-manage start # Start all containers claude-manage status # Check system status claude-manage logs project-a # View logs claude-manage shell project-b # Open shell claude-manage health # Run health checks - -# Discord commands -/claude # Main interaction -/status # Project status -/shell # Execute shell -/git # Git operations -/monorepo-info # Show structure - -# Maintenance -claude # Interactive mode (triggers auth if needed) -claude-manage restart # Restart containers -claude-manage update # Update everything ``` -For advanced configurations, refer to the individual project documentation: -- claudebox: https://github.com/RchGrav/claudebox -- claude-code-discord: https://github.com/zebbern/claude-code-discord +For full claudebox documentation, see: +- https://github.com/RchGrav/claudebox +- https://github.com/zebbern/claude-code-discord ---- - -## rdev: K3s Implementation Notes - -This section documents our actual implementation running on k3s instead of a standalone VM. - -### Architecture Difference - -The reference guide above describes a VM-based deployment with Docker Compose. Our implementation uses: - -- **Kubernetes (k3s)** instead of Docker Compose -- **StatefulSets** instead of standalone containers -- **Longhorn PVCs** instead of host volume mounts -- **GitHub Container Registry** instead of local images - -``` -k3s cluster (orchard9-k3sf) -└── rdev namespace - ├── claudebox-0 (StatefulSet pod) - │ ├── Claude Code CLI - │ ├── /workspace (PVC: 20Gi) - │ └── /root/.claude (PVC: 1Gi) - └── Future: discord-bot, claudebox-pantheon, claudebox-aeries -``` - -### Key Commands - -```bash -# REQUIRED: Set kubeconfig before any kubectl command -export KUBECONFIG=~/.kube/orchard9-k3sf.yaml - -# Interactive Claude session (triggers OAuth if not authenticated) -kubectl exec -it -n rdev claudebox-0 -- claude - -# Run Claude with a prompt -kubectl exec -it -n rdev claudebox-0 -- claude "your prompt here" - -# Shell access -kubectl exec -it -n rdev claudebox-0 -- bash - -# Check status -kubectl get pods -n rdev - -# View logs -kubectl logs -n rdev claudebox-0 -``` - -### Authentication - -Claude authenticates via OAuth on first run. Auth persists in the `/root/.claude` PVC: - -```bash -kubectl exec -it -n rdev claudebox-0 -- claude -# Follow the URL to authenticate -# Auth persists across pod restarts -``` - -### Image - -``` -ghcr.io/orchard9/rdev-claudebox:v0.1.0 -``` - -Built for `linux/amd64` (k3s node architecture). - -### Differences from Reference Guide - -| Reference Guide | rdev Implementation | -|-----------------|---------------------| -| VM with Docker Compose | k3s with Kustomize | -| `docker exec` | `kubectl exec` | -| Host volume mounts | Longhorn PVCs | -| `~/.claude/.credentials.json` | PVC at `/root/.claude` | -| claudebox binary | Custom Dockerfile | -| Deno Discord bot | TBD (v0.4+) | - -### Version History - -See `history/` directory for detailed release notes. +
diff --git a/go.mod b/go.mod index 786d36b..a683346 100644 --- a/go.mod +++ b/go.mod @@ -7,8 +7,10 @@ require ( github.com/bdpiprava/scalar-go v0.13.0 github.com/go-chi/chi/v5 v5.1.0 github.com/google/uuid v1.6.0 + github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 github.com/lib/pq v1.10.9 github.com/prometheus/client_golang v1.23.2 + github.com/redis/go-redis/v9 v9.17.3 github.com/stretchr/testify v1.11.1 go.opentelemetry.io/otel v1.39.0 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.39.0 @@ -27,6 +29,7 @@ require ( github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect github.com/davidmz/go-pageant v1.0.2 // indirect + github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect github.com/emicklei/go-restful/v3 v3.12.2 // indirect github.com/fxamacker/cbor/v2 v2.9.0 // indirect github.com/go-fed/httpsig v1.1.0 // indirect diff --git a/go.sum b/go.sum index 039e99a..9a5b63b 100644 --- a/go.sum +++ b/go.sum @@ -8,6 +8,10 @@ github.com/bdpiprava/scalar-go v0.13.0 h1:TuhOwYalDpLAziohyEwZlq4PqtEJ+6P/V92dDC github.com/bdpiprava/scalar-go v0.13.0/go.mod h1:e5Nn4yIhcYjlucu4ACMqcs410nIAe5whqj78H3Qv7vw= github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= +github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs= +github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c= +github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA= +github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0= github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM= github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= @@ -18,6 +22,8 @@ github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1 github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davidmz/go-pageant v1.0.2 h1:bPblRCh5jGU+Uptpz6LgMZGD5hJoOt7otgT454WvHn0= github.com/davidmz/go-pageant v1.0.2/go.mod h1:P2EDDnMqIwG5Rrp05dTRITj9z2zpGcD9efWSkTNKLIE= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc= github.com/emicklei/go-restful/v3 v3.12.2 h1:DhwDP0vY3k8ZzE0RunuJy8GhNpPL6zqLkDf9B/a0/xU= github.com/emicklei/go-restful/v3 v3.12.2/go.mod h1:6n3XBCmQQb25CM2LCACGz8ukIrRry+4bhvbpWn3mrbc= github.com/fxamacker/cbor/v2 v2.9.0 h1:NpKPmjDBgUfBms6tr6JZkTHtfFGcMKsw3eGcmD/sapM= @@ -50,6 +56,8 @@ github.com/google/pprof v0.0.0-20250403155104-27863c87afa6 h1:BHT72Gu3keYf3ZEu2J github.com/google/pprof v0.0.0-20250403155104-27863c87afa6/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA= github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= +github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 h1:JeSE6pjso5THxAzdVpqr6/geYxZytqFMBCOtn/ujyeo= +github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674/go.mod h1:r4w70xmWCQKmi1ONH4KIaBptdivuRPyosB9RmPlGEwA= github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.3 h1:NmZ1PKzSTQbuGHw9DGPFomqkkLWMC+vZCkfs+FHv1Vg= github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.3/go.mod h1:zQrxl1YP88HQlA6i9c63DSVPFklWpGX4OWAc9bFuaH4= github.com/hashicorp/go-version v1.7.0 h1:5tqGy27NaOTB8yJKUZELlFAS/LTKJkrmONwQKeRZfjY= @@ -94,6 +102,8 @@ github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9Z github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA= github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg= github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is= +github.com/redis/go-redis/v9 v9.17.3 h1:fN29NdNrE17KttK5Ndf20buqfDZwGNgoUr9qjl1DQx4= +github.com/redis/go-redis/v9 v9.17.3/go.mod h1:u410H11HMLoB+TP67dz8rL9s6QW2j76l0//kSOd3370= github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= github.com/spf13/pflag v1.0.9 h1:9exaQaMOCwffKiiiYk6/BndUBv+iRViNW+4lEMi0PvY= diff --git a/internal/adapter/cockroach/provisioner.go b/internal/adapter/cockroach/provisioner.go new file mode 100644 index 0000000..3145be7 --- /dev/null +++ b/internal/adapter/cockroach/provisioner.go @@ -0,0 +1,245 @@ +// Package cockroach provides CockroachDB database provisioning for projects. +// Creates isolated databases and users for each project. +package cockroach + +import ( + "context" + "crypto/rand" + "database/sql" + "encoding/hex" + "fmt" + "log/slog" + "strings" + "time" + + _ "github.com/lib/pq" // PostgreSQL driver (CockroachDB is PG-compatible) + + "github.com/orchard9/rdev/internal/domain" +) + +// Provisioner implements port.DatabaseProvisioner using CockroachDB. +type Provisioner struct { + db *sql.DB + host string + port int + logger *slog.Logger +} + +// Config holds CockroachDB provisioner configuration. +type Config struct { + Host string // e.g., "cockroachdb-public.databases.svc" + Port int // e.g., 26257 + User string // e.g., "root" (for insecure mode) + SSLMode string // e.g., "disable" (for insecure mode) +} + +// NewProvisioner creates a new CockroachDB database provisioner. +func NewProvisioner(cfg Config, logger *slog.Logger) (*Provisioner, error) { + if cfg.SSLMode == "" { + cfg.SSLMode = "disable" + } + + dsn := fmt.Sprintf("postgresql://%s@%s:%d/defaultdb?sslmode=%s", + cfg.User, cfg.Host, cfg.Port, cfg.SSLMode) + + db, err := sql.Open("postgres", dsn) + if err != nil { + return nil, fmt.Errorf("open connection: %w", err) + } + + // Configure connection pool + db.SetMaxOpenConns(5) + db.SetMaxIdleConns(2) + db.SetConnMaxLifetime(5 * time.Minute) + + // Verify connection + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + + if err := db.PingContext(ctx); err != nil { + return nil, fmt.Errorf("cockroachdb connection failed: %w", err) + } + + return &Provisioner{ + db: db, + host: cfg.Host, + port: cfg.Port, + logger: logger, + }, nil +} + +// CreateProjectDatabase provisions an isolated database for a project. +func (p *Provisioner) CreateProjectDatabase(ctx context.Context, projectID string) (*domain.DatabaseCredentials, error) { + dbName := p.databaseNameFor(projectID) + username := p.usernameFor(projectID) + password, err := generateToken(32) + if err != nil { + return nil, fmt.Errorf("generate password: %w", err) + } + + // Check if database already exists + var exists bool + err = p.db.QueryRowContext(ctx, + "SELECT EXISTS(SELECT 1 FROM information_schema.schemata WHERE catalog_name = $1)", + dbName).Scan(&exists) + if err != nil { + return nil, fmt.Errorf("check database exists: %w", err) + } + + if exists { + p.logger.Warn("database already exists, recreating user", + "project_id", projectID, + "database", dbName) + // Drop existing user to recreate with new password + _, _ = p.db.ExecContext(ctx, fmt.Sprintf("DROP USER IF EXISTS %s", quoteIdent(username))) + } + + // Create database + if _, err := p.db.ExecContext(ctx, fmt.Sprintf("CREATE DATABASE IF NOT EXISTS %s", quoteIdent(dbName))); err != nil { + return nil, fmt.Errorf("create database: %w", err) + } + + // Create user + // In CockroachDB insecure mode, passwords are not enforced but we store one for future TLS mode + if _, err := p.db.ExecContext(ctx, fmt.Sprintf("CREATE USER IF NOT EXISTS %s", quoteIdent(username))); err != nil { + return nil, fmt.Errorf("create user: %w", err) + } + + // Grant permissions + if _, err := p.db.ExecContext(ctx, fmt.Sprintf("GRANT ALL ON DATABASE %s TO %s", quoteIdent(dbName), quoteIdent(username))); err != nil { + return nil, fmt.Errorf("grant permissions: %w", err) + } + + // Build connection URL + // In insecure mode, password is not used in connection, but we store it for future TLS migration + url := fmt.Sprintf("postgresql://%s@%s:%d/%s?sslmode=disable", + username, p.host, p.port, dbName) + + p.logger.Info("created project database", + "project_id", projectID, + "database", dbName, + "username", username) + + return &domain.DatabaseCredentials{ + ProjectID: projectID, + DatabaseName: dbName, + Username: username, + Password: password, + Host: p.host, + Port: p.port, + SSLMode: "disable", + URL: url, + URLStaging: url, // Same for now; separate staging cluster in future + CreatedAt: time.Now().UTC(), + }, nil +} + +// DeleteProjectDatabase removes database access for a project. +func (p *Provisioner) DeleteProjectDatabase(ctx context.Context, projectID string) error { + dbName := p.databaseNameFor(projectID) + username := p.usernameFor(projectID) + + // Revoke permissions first + _, _ = p.db.ExecContext(ctx, fmt.Sprintf("REVOKE ALL ON DATABASE %s FROM %s", quoteIdent(dbName), quoteIdent(username))) + + // Drop database (CASCADE drops all tables, indexes, etc.) + if _, err := p.db.ExecContext(ctx, fmt.Sprintf("DROP DATABASE IF EXISTS %s CASCADE", quoteIdent(dbName))); err != nil { + p.logger.Warn("failed to drop database", "database", dbName, "error", err) + } + + // Drop user + if _, err := p.db.ExecContext(ctx, fmt.Sprintf("DROP USER IF EXISTS %s", quoteIdent(username))); err != nil { + p.logger.Warn("failed to drop user", "username", username, "error", err) + } + + p.logger.Info("deleted project database", + "project_id", projectID, + "database", dbName, + "username", username) + + return nil +} + +// GetProjectDatabase retrieves database credentials for a project. +// Note: Password cannot be retrieved from CockroachDB; use stored credentials. +func (p *Provisioner) GetProjectDatabase(ctx context.Context, projectID string) (*domain.DatabaseCredentials, error) { + dbName := p.databaseNameFor(projectID) + username := p.usernameFor(projectID) + + // Check if database exists + var exists bool + err := p.db.QueryRowContext(ctx, + "SELECT EXISTS(SELECT 1 FROM information_schema.schemata WHERE catalog_name = $1)", + dbName).Scan(&exists) + if err != nil { + return nil, fmt.Errorf("check database exists: %w", err) + } + if !exists { + return nil, nil // Database not provisioned + } + + // Database exists; construct credentials without password + url := fmt.Sprintf("postgresql://%s@%s:%d/%s?sslmode=disable", + username, p.host, p.port, dbName) + + return &domain.DatabaseCredentials{ + ProjectID: projectID, + DatabaseName: dbName, + Username: username, + Password: "", // Not available; use credential store + Host: p.host, + Port: p.port, + SSLMode: "disable", + URL: url, + URLStaging: url, + }, nil +} + +// TestConnection verifies CockroachDB connectivity. +func (p *Provisioner) TestConnection(ctx context.Context) error { + return p.db.PingContext(ctx) +} + +// Close closes the database connection. +func (p *Provisioner) Close() error { + return p.db.Close() +} + +// databaseNameFor returns the database name for a project. +func (p *Provisioner) databaseNameFor(projectID string) string { + return "project_" + sanitizeIdentifier(projectID) +} + +// usernameFor returns the database username for a project. +func (p *Provisioner) usernameFor(projectID string) string { + return "project_" + sanitizeIdentifier(projectID) +} + +// sanitizeIdentifier sanitizes a string for use as a SQL identifier. +// Replaces non-alphanumeric characters with underscores and lowercases. +func sanitizeIdentifier(s string) string { + return strings.Map(func(r rune) rune { + if (r >= 'a' && r <= 'z') || (r >= '0' && r <= '9') || r == '_' { + return r + } + if r >= 'A' && r <= 'Z' { + return r + 32 // lowercase + } + return '_' + }, s) +} + +// quoteIdent quotes a SQL identifier to prevent injection. +// CockroachDB uses double quotes for identifiers. +func quoteIdent(s string) string { + return `"` + strings.ReplaceAll(s, `"`, `""`) + `"` +} + +// generateToken generates a cryptographically secure random token. +func generateToken(length int) (string, error) { + bytes := make([]byte, length) + if _, err := rand.Read(bytes); err != nil { + return "", err + } + return hex.EncodeToString(bytes), nil +} diff --git a/internal/adapter/cockroach/provisioner_test.go b/internal/adapter/cockroach/provisioner_test.go new file mode 100644 index 0000000..1cef8d0 --- /dev/null +++ b/internal/adapter/cockroach/provisioner_test.go @@ -0,0 +1,123 @@ +package cockroach + +import ( + "testing" +) + +func TestSanitizeIdentifier(t *testing.T) { + tests := []struct { + input string + expected string + }{ + {"simple", "simple"}, + {"with-dash", "with_dash"}, + {"with_underscore", "with_underscore"}, + {"UPPERCASE", "uppercase"}, + {"MixedCase", "mixedcase"}, + {"with spaces", "with_spaces"}, + {"with.dots", "with_dots"}, + {"123numeric", "123numeric"}, + {"special!@#$%", "special_____"}, + {"", ""}, + {"project-abc-123", "project_abc_123"}, + } + + for _, tt := range tests { + t.Run(tt.input, func(t *testing.T) { + result := sanitizeIdentifier(tt.input) + if result != tt.expected { + t.Errorf("sanitizeIdentifier(%q) = %q, want %q", tt.input, result, tt.expected) + } + }) + } +} + +func TestQuoteIdent(t *testing.T) { + tests := []struct { + input string + expected string + }{ + {"simple", `"simple"`}, + {"with space", `"with space"`}, + {`with"quote`, `"with""quote"`}, + {"", `""`}, + } + + for _, tt := range tests { + t.Run(tt.input, func(t *testing.T) { + result := quoteIdent(tt.input) + if result != tt.expected { + t.Errorf("quoteIdent(%q) = %q, want %q", tt.input, result, tt.expected) + } + }) + } +} + +func TestGenerateToken(t *testing.T) { + // Test that tokens are generated with correct length + lengths := []int{16, 32, 64} + for _, length := range lengths { + token, err := generateToken(length) + if err != nil { + t.Errorf("generateToken(%d) returned error: %v", length, err) + continue + } + // Hex encoding doubles the length + expectedLen := length * 2 + if len(token) != expectedLen { + t.Errorf("generateToken(%d) returned token of length %d, want %d", length, len(token), expectedLen) + } + } + + // Test that tokens are unique + token1, _ := generateToken(32) + token2, _ := generateToken(32) + if token1 == token2 { + t.Error("generateToken returned duplicate tokens") + } +} + +func TestDatabaseNameFor(t *testing.T) { + p := &Provisioner{} + + tests := []struct { + projectID string + expected string + }{ + {"myproject", "project_myproject"}, + {"my-project", "project_my_project"}, + {"MY_PROJECT", "project_my_project"}, + {"123", "project_123"}, + } + + for _, tt := range tests { + t.Run(tt.projectID, func(t *testing.T) { + result := p.databaseNameFor(tt.projectID) + if result != tt.expected { + t.Errorf("databaseNameFor(%q) = %q, want %q", tt.projectID, result, tt.expected) + } + }) + } +} + +func TestUsernameFor(t *testing.T) { + p := &Provisioner{} + + tests := []struct { + projectID string + expected string + }{ + {"myproject", "project_myproject"}, + {"my-project", "project_my_project"}, + {"MY_PROJECT", "project_my_project"}, + } + + for _, tt := range tests { + t.Run(tt.projectID, func(t *testing.T) { + result := p.usernameFor(tt.projectID) + if result != tt.expected { + t.Errorf("usernameFor(%q) = %q, want %q", tt.projectID, result, tt.expected) + } + }) + } +} diff --git a/internal/adapter/codeagent/claudecode/adapter.go b/internal/adapter/codeagent/claudecode/adapter.go index 18e6b63..ec7841f 100644 --- a/internal/adapter/codeagent/claudecode/adapter.go +++ b/internal/adapter/codeagent/claudecode/adapter.go @@ -191,14 +191,26 @@ func (a *Adapter) Execute(ctx context.Context, req *domain.AgentRequest, handler return result, nil } +// defaultAllowedTools is the list of tools to allow when running Claude Code +// in automated mode. Using --allowedTools instead of --dangerously-skip-permissions +// because the latter is blocked when running as root (which claudebox pods do). +var defaultAllowedTools = []string{ + "Bash", "Edit", "Write", "Read", "Glob", "Grep", "Task", "WebFetch", "WebSearch", +} + // buildCommandArgs constructs the kubectl exec arguments for Claude Code. +// IMPORTANT: The prompt MUST come immediately after "claude" (before other flags) +// because Claude Code's CLI parser expects the positional prompt argument early. func (a *Adapter) buildCommandArgs(namespace, podName string, req *domain.AgentRequest) []string { + // Start with kubectl exec and the prompt right after "claude" + // This is required because Claude Code's CLI doesn't accept the prompt at the end args := []string{ "exec", "-n", namespace, podName, "--", "claude", - "-p", // Print mode (non-interactive) + req.Prompt, // Prompt MUST come first after "claude" + "-p", // Print mode (non-interactive) + "--verbose", // Required for stream-json output "--output-format", "stream-json", - "--dangerously-skip-permissions", } // Add session continuation if resuming @@ -206,11 +218,14 @@ func (a *Adapter) buildCommandArgs(namespace, podName string, req *domain.AgentR args = append(args, "--resume", req.SessionID) } - // Add allowed tools if specified - if len(req.AllowedTools) > 0 { - for _, tool := range req.AllowedTools { - args = append(args, "--allowedTools", tool) - } + // Add allowed tools - use request's tools if specified, otherwise use defaults. + // This replaces --dangerously-skip-permissions which is blocked when running as root. + allowedTools := req.AllowedTools + if len(allowedTools) == 0 { + allowedTools = defaultAllowedTools + } + for _, tool := range allowedTools { + args = append(args, "--allowedTools", tool) } // Add working directory if specified @@ -218,9 +233,6 @@ func (a *Adapter) buildCommandArgs(namespace, podName string, req *domain.AgentR args = append(args, "--add-dir", req.WorkingDir) } - // Add the prompt as the final argument - args = append(args, req.Prompt) - return args } diff --git a/internal/adapter/codeagent/claudecode/adapter_test.go b/internal/adapter/codeagent/claudecode/adapter_test.go index 044fbcc..063102f 100644 --- a/internal/adapter/codeagent/claudecode/adapter_test.go +++ b/internal/adapter/codeagent/claudecode/adapter_test.go @@ -73,8 +73,15 @@ func TestAdapter_buildCommandArgs_Basic(t *testing.T) { if !strings.Contains(argsStr, "--output-format stream-json") { t.Error("expected stream-json output format") } - if !strings.Contains(argsStr, "--dangerously-skip-permissions") { - t.Error("expected permission skip flag") + if !strings.Contains(argsStr, "--verbose") { + t.Error("expected --verbose flag for stream-json output") + } + // Should include default allowed tools instead of --dangerously-skip-permissions + expectedTools := []string{"Bash", "Edit", "Write", "Read", "Glob", "Grep", "Task", "WebFetch", "WebSearch"} + for _, tool := range expectedTools { + if !strings.Contains(argsStr, "--allowedTools "+tool) { + t.Errorf("expected default allowed tool: %s", tool) + } } if !strings.Contains(argsStr, "Hello, Claude") { t.Error("expected prompt in args") @@ -197,7 +204,7 @@ func TestAdapter_parseStreamOutput(t *testing.T) { input := strings.NewReader(`{"type":"init","session_id":"test-123"} {"type":"message","role":"assistant","content":[{"type":"text","text":"Hello!"}]} -{"type":"result","status":"success","duration_ms":100} +{"type":"result","subtype":"success","is_error":false,"duration_ms":100} `) var events []domain.AgentEvent diff --git a/internal/adapter/codeagent/claudecode/parser.go b/internal/adapter/codeagent/claudecode/parser.go index 0fc26c1..a97c134 100644 --- a/internal/adapter/codeagent/claudecode/parser.go +++ b/internal/adapter/codeagent/claudecode/parser.go @@ -27,6 +27,8 @@ const ( // StreamMessage represents a single NDJSON message from Claude Code's stream-json output. type StreamMessage struct { Type StreamMessageType `json:"type"` + Subtype string `json:"subtype,omitempty"` // "success" or "error" (for result type) + IsError bool `json:"is_error,omitempty"` // true if result is an error Timestamp string `json:"timestamp,omitempty"` SessionID string `json:"session_id,omitempty"` Role string `json:"role,omitempty"` // "assistant" or "user" @@ -34,7 +36,6 @@ type StreamMessage struct { Name string `json:"name,omitempty"` // Tool name for tool_use Input json.RawMessage `json:"input,omitempty"` // Tool input for tool_use Output string `json:"output,omitempty"` - Status string `json:"status,omitempty"` // "success" or "error" DurationMs int64 `json:"duration_ms,omitempty"` Error string `json:"error,omitempty"` } @@ -109,15 +110,15 @@ func (m *StreamMessage) ToAgentEvent() domain.AgentEvent { case StreamMessageResult: event.Type = domain.AgentEventComplete - if m.Status == "error" { + if m.Subtype == "error" || m.IsError { event.Type = domain.AgentEventError event.Content = m.Error } if m.DurationMs > 0 { event.Metadata["duration_ms"] = m.DurationMs } - if m.Status != "" { - event.Metadata["status"] = m.Status + if m.Subtype != "" { + event.Metadata["status"] = m.Subtype } default: @@ -158,5 +159,5 @@ func (m *StreamMessage) IsTerminal() bool { // IsSuccess returns true if this is a successful result message. func (m *StreamMessage) IsSuccess() bool { - return m.Type == StreamMessageResult && m.Status == "success" + return m.Type == StreamMessageResult && m.Subtype == "success" && !m.IsError } diff --git a/internal/adapter/codeagent/claudecode/parser_test.go b/internal/adapter/codeagent/claudecode/parser_test.go index 0873f00..cec7a07 100644 --- a/internal/adapter/codeagent/claudecode/parser_test.go +++ b/internal/adapter/codeagent/claudecode/parser_test.go @@ -79,22 +79,25 @@ func TestParseStreamMessage_ToolResult(t *testing.T) { func TestParseStreamMessage_Result(t *testing.T) { tests := []struct { - name string - line string - wantStatus string - wantMs int64 + name string + line string + wantSubtype string + wantIsError bool + wantMs int64 }{ { - name: "success", - line: `{"type":"result","status":"success","duration_ms":1234}`, - wantStatus: "success", - wantMs: 1234, + name: "success", + line: `{"type":"result","subtype":"success","is_error":false,"duration_ms":1234}`, + wantSubtype: "success", + wantIsError: false, + wantMs: 1234, }, { - name: "error", - line: `{"type":"result","status":"error","error":"something went wrong"}`, - wantStatus: "error", - wantMs: 0, + name: "error", + line: `{"type":"result","subtype":"error","is_error":true,"error":"something went wrong"}`, + wantSubtype: "error", + wantIsError: true, + wantMs: 0, }, } @@ -108,8 +111,11 @@ func TestParseStreamMessage_Result(t *testing.T) { if msg.Type != StreamMessageResult { t.Errorf("expected type 'result', got %q", msg.Type) } - if msg.Status != tt.wantStatus { - t.Errorf("expected status %q, got %q", tt.wantStatus, msg.Status) + if msg.Subtype != tt.wantSubtype { + t.Errorf("expected subtype %q, got %q", tt.wantSubtype, msg.Subtype) + } + if msg.IsError != tt.wantIsError { + t.Errorf("expected is_error %v, got %v", tt.wantIsError, msg.IsError) } if msg.DurationMs != tt.wantMs { t.Errorf("expected duration_ms %d, got %d", tt.wantMs, msg.DurationMs) @@ -205,7 +211,8 @@ func TestStreamMessage_ToAgentEvent_ToolResult(t *testing.T) { func TestStreamMessage_ToAgentEvent_ResultSuccess(t *testing.T) { msg := &StreamMessage{ Type: StreamMessageResult, - Status: "success", + Subtype: "success", + IsError: false, DurationMs: 5000, } @@ -224,9 +231,10 @@ func TestStreamMessage_ToAgentEvent_ResultSuccess(t *testing.T) { func TestStreamMessage_ToAgentEvent_ResultError(t *testing.T) { msg := &StreamMessage{ - Type: StreamMessageResult, - Status: "error", - Error: "execution failed", + Type: StreamMessageResult, + Subtype: "error", + IsError: true, + Error: "execution failed", } event := msg.ToAgentEvent() @@ -267,8 +275,8 @@ func TestStreamMessage_IsSuccess(t *testing.T) { msg StreamMessage success bool }{ - {"success result", StreamMessage{Type: StreamMessageResult, Status: "success"}, true}, - {"error result", StreamMessage{Type: StreamMessageResult, Status: "error"}, false}, + {"success result", StreamMessage{Type: StreamMessageResult, Subtype: "success", IsError: false}, true}, + {"error result", StreamMessage{Type: StreamMessageResult, Subtype: "error", IsError: true}, false}, {"non-result", StreamMessage{Type: StreamMessageMessage}, false}, } diff --git a/internal/adapter/gitea/templates/infrastructure.md b/internal/adapter/gitea/templates/infrastructure.md new file mode 100644 index 0000000..c731b0a --- /dev/null +++ b/internal/adapter/gitea/templates/infrastructure.md @@ -0,0 +1,92 @@ +# Infrastructure + +This project has provisioned database and cache access. + +## Database (CockroachDB) + +PostgreSQL-compatible distributed SQL database. + +### Connection + +| Environment | Variable | +|-------------|----------| +| Production | `DATABASE_URL` | +| Staging | `DATABASE_URL_STAGING` | + +### Usage + +**Go (sqlx):** +```go +import "github.com/jmoiron/sqlx" +import _ "github.com/lib/pq" + +db, err := sqlx.Connect("postgres", os.Getenv("DATABASE_URL")) +``` + +**Node.js (pg):** +```javascript +import pg from 'pg'; +const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL }); +``` + +**Python (psycopg2):** +```python +import psycopg2 +conn = psycopg2.connect(os.environ['DATABASE_URL']) +``` + +### Schema Migrations + +Use any PostgreSQL migration tool. Recommended: +- Go: `golang-migrate/migrate` +- Node.js: `node-pg-migrate` +- Python: `alembic` + +## Cache (Redis) + +Redis cache with project-isolated key prefix. + +### Connection + +| Environment | Variable | +|-------------|----------| +| Production | `REDIS_URL` | +| Staging | `REDIS_URL_STAGING` | +| Key Prefix | `REDIS_PREFIX` | + +**Important:** Always prefix your keys with `REDIS_PREFIX` to ensure isolation. + +### Usage + +**Go (go-redis):** +```go +import "github.com/redis/go-redis/v9" + +opt, _ := redis.ParseURL(os.Getenv("REDIS_URL")) +client := redis.NewClient(opt) + +prefix := os.Getenv("REDIS_PREFIX") +client.Set(ctx, prefix+"users:123", data, time.Hour) +``` + +**Node.js (ioredis):** +```javascript +import Redis from 'ioredis'; + +const redis = new Redis(process.env.REDIS_URL); +const prefix = process.env.REDIS_PREFIX; + +await redis.set(`${prefix}session:abc`, JSON.stringify(data), 'EX', 3600); +``` + +## Environment Variables + +These are automatically injected into your deployment: + +| Variable | Description | +|----------|-------------| +| `DATABASE_URL` | CockroachDB production connection | +| `DATABASE_URL_STAGING` | CockroachDB staging connection | +| `REDIS_URL` | Redis production connection | +| `REDIS_URL_STAGING` | Redis staging connection | +| `REDIS_PREFIX` | Key prefix for Redis isolation | diff --git a/internal/adapter/memory/stream_publisher.go b/internal/adapter/memory/stream_publisher.go index e6a967b..f868806 100644 --- a/internal/adapter/memory/stream_publisher.go +++ b/internal/adapter/memory/stream_publisher.go @@ -4,6 +4,7 @@ import ( "fmt" "sync" "sync/atomic" + "time" "github.com/orchard9/rdev/internal/port" ) @@ -170,9 +171,16 @@ func (sp *StreamPublisher) unsubscribe(streamID string, sub *subscriber) { func (sp *StreamPublisher) Publish(streamID string, event port.StreamEvent) string { state := sp.getOrCreateStream(streamID) - // Generate event ID + // Generate event ID and populate metadata seq := state.eventSeq.Add(1) event.ID = fmt.Sprintf("%s:%d", streamID, seq) + event.Sequence = int64(seq) + if event.Timestamp.IsZero() { + event.Timestamp = time.Now() + } + if event.TaskID == "" { + event.TaskID = streamID // Default to stream ID as task ID + } sp.mu.Lock() // Add to buffer for replay diff --git a/internal/adapter/postgres/build_events.go b/internal/adapter/postgres/build_events.go new file mode 100644 index 0000000..67d2c9b --- /dev/null +++ b/internal/adapter/postgres/build_events.go @@ -0,0 +1,107 @@ +package postgres + +import ( + "context" + "database/sql" + "encoding/json" + "time" + + "github.com/orchard9/rdev/internal/domain" + "github.com/orchard9/rdev/internal/port" +) + +// BuildEventRepository implements port.BuildEventStore using PostgreSQL. +type BuildEventRepository struct { + db *sql.DB +} + +// NewBuildEventRepository creates a new build event repository. +func NewBuildEventRepository(db *sql.DB) *BuildEventRepository { + return &BuildEventRepository{db: db} +} + +// Ensure BuildEventRepository implements port.BuildEventStore at compile time. +var _ port.BuildEventStore = (*BuildEventRepository)(nil) + +// buildEventRow is the database representation of a build event. +type buildEventRow struct { + ID string `db:"id"` + TaskID string `db:"task_id"` + ProjectID string `db:"project_id"` + Type string `db:"type"` + Sequence int64 `db:"sequence"` + Timestamp time.Time `db:"timestamp"` + Data []byte `db:"data"` + CreatedAt time.Time `db:"created_at"` +} + +// Record stores a build event for later replay. +func (r *BuildEventRepository) Record(ctx context.Context, event *domain.BuildEvent) error { + dataBytes, err := json.Marshal(event.Data) + if err != nil { + return err + } + + _, err = r.db.ExecContext(ctx, ` + INSERT INTO build_events (id, task_id, project_id, type, sequence, timestamp, data) + VALUES ($1, $2, $3, $4, $5, $6, $7) + ON CONFLICT (id) DO NOTHING + `, event.ID, event.TaskID, event.ProjectID, event.Type, event.Sequence, event.Timestamp, dataBytes) + + return err +} + +// ListByTask retrieves events for a task, optionally after a sequence number. +func (r *BuildEventRepository) ListByTask(ctx context.Context, taskID string, afterSequence int64) ([]*domain.BuildEvent, error) { + rows, err := r.db.QueryContext(ctx, ` + SELECT id, task_id, project_id, type, sequence, timestamp, data + FROM build_events + WHERE task_id = $1 AND sequence > $2 + ORDER BY sequence ASC + `, taskID, afterSequence) + if err != nil { + return nil, err + } + defer func() { _ = rows.Close() }() + + var events []*domain.BuildEvent + for rows.Next() { + var row buildEventRow + if err := rows.Scan(&row.ID, &row.TaskID, &row.ProjectID, &row.Type, &row.Sequence, &row.Timestamp, &row.Data); err != nil { + return nil, err + } + + var data domain.BuildEventData + if err := json.Unmarshal(row.Data, &data); err != nil { + // If unmarshal fails, use empty data + data = domain.BuildEventData{} + } + + events = append(events, &domain.BuildEvent{ + ID: row.ID, + TaskID: row.TaskID, + ProjectID: row.ProjectID, + Type: domain.BuildEventType(row.Type), + Sequence: row.Sequence, + Timestamp: row.Timestamp, + Data: data, + }) + } + + return events, rows.Err() +} + +// Cleanup removes events older than the specified age. +func (r *BuildEventRepository) Cleanup(ctx context.Context, olderThan time.Duration) (int64, error) { + cutoff := time.Now().Add(-olderThan) + + result, err := r.db.ExecContext(ctx, ` + DELETE FROM build_events + WHERE created_at < $1 + `, cutoff) + if err != nil { + return 0, err + } + + return result.RowsAffected() +} diff --git a/internal/adapter/redis/provisioner.go b/internal/adapter/redis/provisioner.go new file mode 100644 index 0000000..f1d3c65 --- /dev/null +++ b/internal/adapter/redis/provisioner.go @@ -0,0 +1,265 @@ +// Package redis provides Redis cache provisioning for projects. +// Uses Redis ACLs to isolate each project to its own key prefix. +package redis + +import ( + "context" + "crypto/rand" + "encoding/hex" + "fmt" + "log/slog" + "strings" + "time" + + "github.com/redis/go-redis/v9" + + "github.com/orchard9/rdev/internal/domain" +) + +// Provisioner implements port.CacheProvisioner using Redis ACLs. +type Provisioner struct { + client *redis.Client + host string + port int + keyPrefix string + logger *slog.Logger +} + +// Config holds Redis provisioner configuration. +type Config struct { + Host string + Port int + Password string + KeyPrefix string // Base prefix for project keys, default "project:" +} + +// NewProvisioner creates a new Redis cache provisioner. +func NewProvisioner(cfg Config, logger *slog.Logger) (*Provisioner, error) { + if cfg.KeyPrefix == "" { + cfg.KeyPrefix = "project:" + } + + client := redis.NewClient(&redis.Options{ + Addr: fmt.Sprintf("%s:%d", cfg.Host, cfg.Port), + Password: cfg.Password, + DB: 0, + }) + + // Verify connection + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + + if err := client.Ping(ctx).Err(); err != nil { + return nil, fmt.Errorf("redis connection failed: %w", err) + } + + return &Provisioner{ + client: client, + host: cfg.Host, + port: cfg.Port, + keyPrefix: cfg.KeyPrefix, + logger: logger, + }, nil +} + +// CreateProjectCache provisions isolated cache access for a project. +func (p *Provisioner) CreateProjectCache(ctx context.Context, projectID string) (*domain.CacheCredentials, error) { + username := p.usernameFor(projectID) + password, err := generateToken(32) + if err != nil { + return nil, fmt.Errorf("generate password: %w", err) + } + prefix := p.prefixFor(projectID) + + // Check if user already exists + existing, err := p.client.Do(ctx, "ACL", "GETUSER", username).Result() + if err == nil && existing != nil { + p.logger.Warn("cache user already exists, recreating", + "project_id", projectID, + "username", username) + // Delete existing user to recreate with new password + if err := p.client.Do(ctx, "ACL", "DELUSER", username).Err(); err != nil { + return nil, fmt.Errorf("delete existing user: %w", err) + } + } + + // Create ACL user with scoped permissions: + // - on: user is active + // - >password: set password + // - ~prefix*: can only access keys matching this pattern + // - +@all: allow all command categories + // - -@dangerous: deny dangerous commands (FLUSHALL, SHUTDOWN, DEBUG, etc.) + // - -@admin: deny admin commands (CONFIG, ACL, SLAVEOF, etc.) + err = p.client.Do(ctx, + "ACL", "SETUSER", username, + "on", + ">"+password, + "~"+prefix+"*", + "+@all", + "-@dangerous", + "-@admin", + ).Err() + if err != nil { + return nil, fmt.Errorf("create ACL user: %w", err) + } + + // Persist ACL changes to disk + if err := p.client.Do(ctx, "ACL", "SAVE").Err(); err != nil { + p.logger.Warn("failed to persist ACL to disk", "error", err) + // Non-fatal: ACLs will still work until Redis restarts + } + + p.logger.Info("created project cache", + "project_id", projectID, + "username", username, + "prefix", prefix) + + url := fmt.Sprintf("redis://%s:%s@%s:%d", username, password, p.host, p.port) + return &domain.CacheCredentials{ + ProjectID: projectID, + URL: url, + URLStaging: url, // Same for now; separate staging instance in future + Prefix: prefix, + Username: username, + Host: p.host, + Port: p.port, + CreatedAt: time.Now().UTC(), + }, nil +} + +// DeleteProjectCache removes cache access for a project. +func (p *Provisioner) DeleteProjectCache(ctx context.Context, projectID string, purgeKeys bool) error { + username := p.usernameFor(projectID) + prefix := p.prefixFor(projectID) + + // Delete ACL user + result, err := p.client.Do(ctx, "ACL", "DELUSER", username).Result() + if err != nil { + return fmt.Errorf("delete ACL user: %w", err) + } + + // ACL DELUSER returns number of users deleted + deleted, ok := result.(int64) + if !ok || deleted == 0 { + p.logger.Warn("cache user did not exist", "project_id", projectID, "username", username) + } + + // Optionally purge all project keys + if purgeKeys { + if err := p.purgeKeys(ctx, prefix); err != nil { + p.logger.Warn("failed to purge project keys", + "project_id", projectID, + "prefix", prefix, + "error", err) + // Non-fatal: user is already deleted + } + } + + // Persist ACL changes + if err := p.client.Do(ctx, "ACL", "SAVE").Err(); err != nil { + p.logger.Warn("failed to persist ACL to disk", "error", err) + } + + p.logger.Info("deleted project cache", + "project_id", projectID, + "username", username, + "purged_keys", purgeKeys) + + return nil +} + +// GetProjectCache retrieves cache credentials for a project. +// Note: Password cannot be retrieved from Redis ACL, only verified. +// Returns nil if user doesn't exist. +func (p *Provisioner) GetProjectCache(ctx context.Context, projectID string) (*domain.CacheCredentials, error) { + username := p.usernameFor(projectID) + prefix := p.prefixFor(projectID) + + // Check if user exists + result, err := p.client.Do(ctx, "ACL", "GETUSER", username).Result() + if err != nil { + if strings.Contains(err.Error(), "User") { + return nil, nil // User doesn't exist + } + return nil, fmt.Errorf("get ACL user: %w", err) + } + if result == nil { + return nil, nil + } + + // User exists but we can't retrieve password + // Caller should use stored credentials from credential store + return &domain.CacheCredentials{ + ProjectID: projectID, + URL: "", // Password not available + Prefix: prefix, + Username: username, + Host: p.host, + Port: p.port, + }, nil +} + +// TestConnection verifies Redis connectivity. +func (p *Provisioner) TestConnection(ctx context.Context) error { + return p.client.Ping(ctx).Err() +} + +// Close closes the Redis connection. +func (p *Provisioner) Close() error { + return p.client.Close() +} + +// usernameFor returns the Redis username for a project. +func (p *Provisioner) usernameFor(projectID string) string { + // Sanitize project ID for Redis username (alphanumeric + hyphen) + safe := strings.Map(func(r rune) rune { + if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || (r >= '0' && r <= '9') || r == '-' { + return r + } + return '-' + }, projectID) + return "proj-" + safe +} + +// prefixFor returns the key prefix for a project. +func (p *Provisioner) prefixFor(projectID string) string { + return p.keyPrefix + projectID + ":" +} + +// purgeKeys deletes all keys matching the project prefix. +func (p *Provisioner) purgeKeys(ctx context.Context, prefix string) error { + var cursor uint64 + var deleted int64 + + for { + keys, nextCursor, err := p.client.Scan(ctx, cursor, prefix+"*", 100).Result() + if err != nil { + return fmt.Errorf("scan keys: %w", err) + } + + if len(keys) > 0 { + n, err := p.client.Del(ctx, keys...).Result() + if err != nil { + return fmt.Errorf("delete keys: %w", err) + } + deleted += n + } + + cursor = nextCursor + if cursor == 0 { + break + } + } + + p.logger.Debug("purged project keys", "prefix", prefix, "count", deleted) + return nil +} + +// generateToken generates a cryptographically secure random token. +func generateToken(length int) (string, error) { + bytes := make([]byte, length) + if _, err := rand.Read(bytes); err != nil { + return "", err + } + return hex.EncodeToString(bytes), nil +} diff --git a/internal/adapter/redis/provisioner_test.go b/internal/adapter/redis/provisioner_test.go new file mode 100644 index 0000000..d14385f --- /dev/null +++ b/internal/adapter/redis/provisioner_test.go @@ -0,0 +1,138 @@ +package redis + +import ( + "context" + "os" + "testing" + "time" + + "log/slog" +) + +// Integration tests - require REDIS_TEST_URL env var +// Example: REDIS_TEST_URL=redis://:password@localhost:6379 go test ./internal/adapter/redis/... + +func TestProvisioner_Integration(t *testing.T) { + redisURL := os.Getenv("REDIS_TEST_URL") + if redisURL == "" { + t.Skip("REDIS_TEST_URL not set, skipping integration test") + } + + // Parse URL for config (simplified, assumes redis://:password@host:port format) + // In real tests, use a proper URL parser + cfg := Config{ + Host: "localhost", + Port: 6379, + Password: "", + KeyPrefix: "test:", + } + + logger := slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelDebug})) + prov, err := NewProvisioner(cfg, logger) + if err != nil { + t.Fatalf("failed to create provisioner: %v", err) + } + defer prov.Close() + + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + + projectID := "test-project-" + time.Now().Format("20060102150405") + + // Test CreateProjectCache + t.Run("CreateProjectCache", func(t *testing.T) { + creds, err := prov.CreateProjectCache(ctx, projectID) + if err != nil { + t.Fatalf("CreateProjectCache failed: %v", err) + } + + if creds.ProjectID != projectID { + t.Errorf("ProjectID = %q, want %q", creds.ProjectID, projectID) + } + if creds.Username == "" { + t.Error("Username is empty") + } + if creds.URL == "" { + t.Error("URL is empty") + } + if creds.Prefix == "" { + t.Error("Prefix is empty") + } + t.Logf("Created cache: username=%s prefix=%s", creds.Username, creds.Prefix) + }) + + // Test GetProjectCache + t.Run("GetProjectCache", func(t *testing.T) { + creds, err := prov.GetProjectCache(ctx, projectID) + if err != nil { + t.Fatalf("GetProjectCache failed: %v", err) + } + if creds == nil { + t.Fatal("GetProjectCache returned nil") + } + if creds.Username == "" { + t.Error("Username is empty") + } + }) + + // Test DeleteProjectCache + t.Run("DeleteProjectCache", func(t *testing.T) { + err := prov.DeleteProjectCache(ctx, projectID, true) + if err != nil { + t.Fatalf("DeleteProjectCache failed: %v", err) + } + + // Verify user is deleted + creds, err := prov.GetProjectCache(ctx, projectID) + if err != nil { + t.Fatalf("GetProjectCache after delete failed: %v", err) + } + if creds != nil { + t.Error("User still exists after delete") + } + }) +} + +func TestUsernameFor(t *testing.T) { + p := &Provisioner{keyPrefix: "project:"} + + tests := []struct { + projectID string + want string + }{ + {"my-app", "proj-my-app"}, + {"my_app", "proj-my-app"}, // underscore converted to hyphen + {"MyApp123", "proj-MyApp123"}, + {"app.name", "proj-app-name"}, // dot converted to hyphen + } + + for _, tt := range tests { + t.Run(tt.projectID, func(t *testing.T) { + got := p.usernameFor(tt.projectID) + if got != tt.want { + t.Errorf("usernameFor(%q) = %q, want %q", tt.projectID, got, tt.want) + } + }) + } +} + +func TestPrefixFor(t *testing.T) { + p := &Provisioner{keyPrefix: "project:"} + + tests := []struct { + projectID string + want string + }{ + {"my-app", "project:my-app:"}, + {"app123", "project:app123:"}, + } + + for _, tt := range tests { + t.Run(tt.projectID, func(t *testing.T) { + got := p.prefixFor(tt.projectID) + if got != tt.want { + t.Errorf("prefixFor(%q) = %q, want %q", tt.projectID, got, tt.want) + } + }) + } +} diff --git a/internal/adapter/woodpecker/client.go b/internal/adapter/woodpecker/client.go index a24ee2e..e7e02e8 100644 --- a/internal/adapter/woodpecker/client.go +++ b/internal/adapter/woodpecker/client.go @@ -117,11 +117,12 @@ func (c *Client) ActivateRepo(ctx context.Context, forge, owner, repo string) (* fullName := owner + "/" + repo - // Retry loop for newly created repos - Woodpecker sync from Gitea is async - // and can take 30+ seconds for newly created repos to appear with valid metadata + // Retry loop for newly created repos - Woodpecker sync from Gitea is async. + // Limited to 5 attempts (15s max) to stay under Traefik's 30s proxy timeout. + // If repo doesn't appear in time, CI activation will be skipped (non-fatal). var targetRepo *woodpecker.Repo var lastErr error - maxAttempts := 15 + maxAttempts := 5 retryDelay := 3 * time.Second for attempt := 1; attempt <= maxAttempts; attempt++ { diff --git a/internal/db/migrations/014_build_events.sql b/internal/db/migrations/014_build_events.sql new file mode 100644 index 0000000..b21bc21 --- /dev/null +++ b/internal/db/migrations/014_build_events.sql @@ -0,0 +1,33 @@ +-- Build Events: PostgreSQL-backed event persistence for SSE replay +-- Stores build events for reconnection support beyond in-memory buffer + +CREATE TABLE IF NOT EXISTS build_events ( + id TEXT PRIMARY KEY, -- Event ID (format: "{task_id}:{sequence}") + task_id TEXT NOT NULL, -- Build task ID + project_id TEXT NOT NULL, -- Project ID + type TEXT NOT NULL, -- Event type (build.started, build.output, etc.) + sequence BIGINT NOT NULL, -- Monotonic sequence within task + timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- When event occurred + data JSONB NOT NULL DEFAULT '{}', -- Event-specific payload + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() -- When stored (for cleanup) +); + +-- Index for efficient task-based queries (replay by task ID) +CREATE INDEX IF NOT EXISTS idx_build_events_task_id ON build_events(task_id); + +-- Index for efficient sequence-based replay (Last-Event-ID support) +CREATE INDEX IF NOT EXISTS idx_build_events_task_sequence ON build_events(task_id, sequence); + +-- Index for cleanup queries (delete old events) +CREATE INDEX IF NOT EXISTS idx_build_events_created_at ON build_events(created_at); + +-- Comments +COMMENT ON TABLE build_events IS 'Persisted build events for SSE replay support'; +COMMENT ON COLUMN build_events.id IS 'Unique event ID (format: {task_id}:{sequence})'; +COMMENT ON COLUMN build_events.task_id IS 'Build task this event belongs to'; +COMMENT ON COLUMN build_events.project_id IS 'Project this event belongs to'; +COMMENT ON COLUMN build_events.type IS 'Event type (build.started, build.output, build.completed, etc.)'; +COMMENT ON COLUMN build_events.sequence IS 'Monotonically increasing sequence number within task'; +COMMENT ON COLUMN build_events.timestamp IS 'When the event occurred'; +COMMENT ON COLUMN build_events.data IS 'Event-specific payload as JSONB'; +COMMENT ON COLUMN build_events.created_at IS 'When stored in database (used for cleanup)'; diff --git a/internal/domain/build.go b/internal/domain/build.go index 0801e4f..86b57b1 100644 --- a/internal/domain/build.go +++ b/internal/domain/build.go @@ -30,6 +30,10 @@ type BuildSpec struct { // CallbackURL is the webhook URL for completion notification. CallbackURL string `json:"callback_url,omitempty"` + + // GitCloneURL is the HTTPS URL for cloning the project repository. + // Required for builds that use AutoCommit/AutoPush on shared worker pods. + GitCloneURL string `json:"git_clone_url,omitempty"` } // Validate checks that the BuildSpec has required fields. diff --git a/internal/domain/build_events.go b/internal/domain/build_events.go new file mode 100644 index 0000000..b6d24c7 --- /dev/null +++ b/internal/domain/build_events.go @@ -0,0 +1,112 @@ +package domain + +import "time" + +// BuildEventType categorizes build streaming events. +type BuildEventType string + +// Build event types for SSE streaming. +const ( + // BuildEventStarted is emitted when a build begins execution. + BuildEventStarted BuildEventType = "build.started" + + // BuildEventOutput is emitted for each line of agent output (stdout/stderr). + BuildEventOutput BuildEventType = "build.output" + + // BuildEventToolUse is emitted when the agent invokes a tool. + BuildEventToolUse BuildEventType = "build.tool_use" + + // BuildEventToolResult is emitted when a tool returns a result. + BuildEventToolResult BuildEventType = "build.tool_result" + + // BuildEventProgress is emitted periodically with progress estimates. + BuildEventProgress BuildEventType = "build.progress" + + // BuildEventError is emitted for error output during execution. + BuildEventError BuildEventType = "build.error" + + // BuildEventCompleted is emitted when a build finishes successfully. + BuildEventCompleted BuildEventType = "build.completed" + + // BuildEventFailed is emitted when a build fails. + BuildEventFailed BuildEventType = "build.failed" +) + +// BuildEvent represents a single event in a build's execution stream. +// Events are published to SSE subscribers in real-time and persisted for replay. +type BuildEvent struct { + // ID is the unique event identifier (format: "{taskID}:{sequence}"). + ID string `json:"id"` + + // TaskID links this event to a build task. + TaskID string `json:"task_id"` + + // ProjectID links this event to a project. + ProjectID string `json:"project_id"` + + // Type categorizes this event. + Type BuildEventType `json:"type"` + + // Timestamp when the event occurred. + Timestamp time.Time `json:"timestamp"` + + // Sequence is the monotonically increasing event number within a task. + Sequence int64 `json:"sequence"` + + // Data contains event-type-specific payload. + Data BuildEventData `json:"data"` +} + +// BuildEventData holds the payload for different event types. +type BuildEventData struct { + // Content is the main text content (output lines, error messages). + Content string `json:"content,omitempty"` + + // Stream identifies the output source ("stdout", "stderr"). + Stream string `json:"stream,omitempty"` + + // ToolName is set for tool_use and tool_result events. + ToolName string `json:"tool_name,omitempty"` + + // Progress fields (for build.progress events) + Phase string `json:"phase,omitempty"` // Current phase: "starting", "reading", "writing", "testing", "committing" + Percentage float64 `json:"percentage,omitempty"` // Estimated completion percentage (0-100) + + // Completion fields (for build.completed and build.failed events) + Success bool `json:"success,omitempty"` + Error string `json:"error,omitempty"` + CommitSHA string `json:"commit_sha,omitempty"` + FilesChanged []string `json:"files_changed,omitempty"` + DurationMs int64 `json:"duration_ms,omitempty"` +} + +// BuildPhase represents the current phase of a build. +type BuildPhase string + +// Build phases for progress tracking. +const ( + BuildPhaseStarting BuildPhase = "starting" + BuildPhaseReading BuildPhase = "reading" + BuildPhaseWriting BuildPhase = "writing" + BuildPhaseTesting BuildPhase = "testing" + BuildPhaseCommitting BuildPhase = "committing" + BuildPhaseComplete BuildPhase = "complete" +) + +// PhaseWeight returns the progress weight for a phase (used for percentage estimation). +func (p BuildPhase) Weight() float64 { + switch p { + case BuildPhaseStarting: + return 0.05 + case BuildPhaseReading: + return 0.15 + case BuildPhaseWriting: + return 0.50 + case BuildPhaseTesting: + return 0.20 + case BuildPhaseCommitting: + return 0.10 + default: + return 0 + } +} diff --git a/internal/domain/cache.go b/internal/domain/cache.go new file mode 100644 index 0000000..778adfa --- /dev/null +++ b/internal/domain/cache.go @@ -0,0 +1,30 @@ +package domain + +import "time" + +// CacheCredentials represents isolated cache access for a project. +// Uses Redis ACLs to scope access to project-specific key prefix. +type CacheCredentials struct { + ProjectID string `json:"project_id" db:"project_id"` + URL string `json:"url" db:"url"` // redis://user:pass@host:port + URLStaging string `json:"url_staging" db:"url_staging"` // staging connection (same for now) + Prefix string `json:"prefix" db:"prefix"` // project:{id}: + Username string `json:"username" db:"username"` // proj-{id} + Host string `json:"host" db:"host"` // redis.databases.svc + Port int `json:"port" db:"port"` // 6379 + CreatedAt time.Time `json:"created_at" db:"created_at"` +} + +// CacheConfig holds cache provisioning configuration. +type CacheConfig struct { + // AdminHost is the Redis host for admin operations + AdminHost string + // AdminPort is the Redis port + AdminPort int + // AdminPassword is the password for the default/admin user + AdminPassword string + // KeyPrefix is the base prefix for all project keys (e.g., "project:") + KeyPrefix string + // DefaultTTL is the default TTL for cached items (0 = no default TTL) + DefaultTTL time.Duration +} diff --git a/internal/domain/database.go b/internal/domain/database.go new file mode 100644 index 0000000..79d25e5 --- /dev/null +++ b/internal/domain/database.go @@ -0,0 +1,30 @@ +package domain + +import "time" + +// DatabaseCredentials represents isolated database access for a project. +// Each project gets its own database and user in CockroachDB. +type DatabaseCredentials struct { + ProjectID string `json:"project_id" db:"project_id"` + DatabaseName string `json:"database_name" db:"database_name"` // project_{id} + Username string `json:"username" db:"username"` // project_{id} + Password string `json:"password" db:"password"` // generated (stored encrypted) + Host string `json:"host" db:"host"` // cockroachdb-public.databases.svc + Port int `json:"port" db:"port"` // 26257 + SSLMode string `json:"ssl_mode" db:"ssl_mode"` // disable (insecure mode) + URL string `json:"url" db:"url"` // full connection string + URLStaging string `json:"url_staging" db:"url_staging"` // staging connection (same for now) + CreatedAt time.Time `json:"created_at" db:"created_at"` +} + +// DatabaseConfig holds database provisioning configuration. +type DatabaseConfig struct { + // Host is the CockroachDB host for provisioning operations + Host string + // Port is the CockroachDB port + Port int + // User is the admin user for provisioning (typically "root" in insecure mode) + User string + // SSLMode is the SSL mode (typically "disable" for insecure mode) + SSLMode string +} diff --git a/internal/handlers/builds.go b/internal/handlers/builds.go index 745c985..39a964f 100644 --- a/internal/handlers/builds.go +++ b/internal/handlers/builds.go @@ -50,6 +50,7 @@ type StartBuildRequest struct { AutoCommit bool `json:"auto_commit"` AutoPush bool `json:"auto_push"` CallbackURL string `json:"callback_url,omitempty"` + GitCloneURL string `json:"git_clone_url,omitempty"` // Required when auto_commit or auto_push is true } // StartBuildResponse is the response for POST /projects/{id}/builds. @@ -58,6 +59,7 @@ type StartBuildResponse struct { ProjectID string `json:"project_id"` Status string `json:"status"` StatusURL string `json:"status_url"` + StreamURL string `json:"stream_url"` // SSE endpoint for real-time build events } // BuildAuditDTO is the data transfer object for build audit entries. @@ -73,6 +75,7 @@ type BuildAuditDTO struct { Result *BuildResultDTO `json:"result,omitempty"` StartedAt string `json:"started_at"` CompletedAt string `json:"completed_at,omitempty"` + StreamURL string `json:"stream_url"` // SSE endpoint for real-time build events } // BuildResultDTO is the data transfer object for build results. @@ -100,6 +103,7 @@ func toBuildAuditDTO(e *domain.BuildAuditEntry) *BuildAuditDTO { AutoCommit: e.Spec.AutoCommit, AutoPush: e.Spec.AutoPush, StartedAt: e.StartedAt.Format("2006-01-02T15:04:05Z07:00"), + StreamURL: "/projects/" + e.ProjectID + "/events?stream_id=" + e.TaskID, } if e.CompletedAt != nil { dto.CompletedAt = e.CompletedAt.Format("2006-01-02T15:04:05Z07:00") @@ -154,6 +158,13 @@ func (h *BuildsHandler) StartBuild(w http.ResponseWriter, r *http.Request) { AutoCommit: req.AutoCommit, AutoPush: req.AutoPush, CallbackURL: req.CallbackURL, + GitCloneURL: req.GitCloneURL, + } + + // Validate git_clone_url is provided when auto_commit or auto_push is enabled + if (req.AutoCommit || req.AutoPush) && req.GitCloneURL == "" { + api.WriteBadRequest(w, r, "git_clone_url is required when auto_commit or auto_push is enabled") + return } taskID, err := h.buildService.StartBuild(r.Context(), projectID, spec) @@ -171,6 +182,7 @@ func (h *BuildsHandler) StartBuild(w http.ResponseWriter, r *http.Request) { ProjectID: projectID, Status: "pending", StatusURL: "/builds/" + taskID, + StreamURL: "/projects/" + projectID + "/events?stream_id=" + taskID, }) } diff --git a/internal/handlers/builds_ws.go b/internal/handlers/builds_ws.go new file mode 100644 index 0000000..21d9989 --- /dev/null +++ b/internal/handlers/builds_ws.go @@ -0,0 +1,154 @@ +package handlers + +import ( + "net/http" + "time" + + "github.com/go-chi/chi/v5" + "github.com/gorilla/websocket" + "github.com/orchard9/rdev/internal/auth" + "github.com/orchard9/rdev/internal/port" + "github.com/orchard9/rdev/pkg/api" +) + +// BuildsWSHandler handles WebSocket connections for build event streaming. +type BuildsWSHandler struct { + streams port.StreamPublisher + upgrader websocket.Upgrader +} + +// NewBuildsWSHandler creates a new WebSocket handler for builds. +func NewBuildsWSHandler(streams port.StreamPublisher) *BuildsWSHandler { + return &BuildsWSHandler{ + streams: streams, + upgrader: websocket.Upgrader{ + ReadBufferSize: 1024, + WriteBufferSize: 1024, + CheckOrigin: func(r *http.Request) bool { + return true // Allow all origins (configure for production) + }, + }, + } +} + +// Mount registers the WebSocket routes. +func (h *BuildsWSHandler) Mount(r api.Router) { + r.With(auth.RequireScope(auth.ScopeBuildRead, auth.ScopeAdmin)). + Get("/builds/{taskId}/ws", h.StreamEvents) +} + +// wsMessage is the structure sent over WebSocket. +type wsMessage struct { + ID string `json:"id,omitempty"` + Type string `json:"type"` + TaskID string `json:"task_id,omitempty"` + Timestamp string `json:"timestamp,omitempty"` + Data map[string]any `json:"data,omitempty"` +} + +// StreamEvents handles WebSocket connections for streaming build events. +// GET /builds/{taskId}/ws +func (h *BuildsWSHandler) StreamEvents(w http.ResponseWriter, r *http.Request) { + taskID := chi.URLParam(r, "taskId") + if taskID == "" { + api.WriteBadRequest(w, r, "task ID is required") + return + } + + // Get optional last event ID from query string + lastEventID := r.URL.Query().Get("last_event_id") + + // Upgrade to WebSocket + conn, err := h.upgrader.Upgrade(w, r, nil) + if err != nil { + // Upgrade already wrote error response + return + } + defer func() { _ = conn.Close() }() + + // Subscribe to events + var events <-chan port.StreamEvent + var cleanup func() + + if lastEventID != "" { + events, cleanup = h.streams.SubscribeFromID(taskID, lastEventID) + } else { + events, cleanup = h.streams.Subscribe(taskID) + } + defer cleanup() + + // Send connected message + _ = conn.WriteJSON(wsMessage{ + Type: "connected", + TaskID: taskID, + Timestamp: time.Now().UTC().Format(time.RFC3339), + Data: map[string]any{ + "reconnecting": lastEventID != "", + }, + }) + + // Set up ping/pong for keepalive + conn.SetPongHandler(func(string) error { + return conn.SetReadDeadline(time.Now().Add(60 * time.Second)) + }) + + // Start a goroutine to read from WebSocket (for close detection) + done := make(chan struct{}) + go func() { + defer close(done) + for { + _, _, err := conn.ReadMessage() + if err != nil { + return + } + } + }() + + // Stream events + pingTicker := time.NewTicker(30 * time.Second) + defer pingTicker.Stop() + + for { + select { + case <-done: + // Client disconnected + return + + case event, ok := <-events: + if !ok { + // Stream closed + _ = conn.WriteJSON(wsMessage{ + Type: "stream_closed", + TaskID: taskID, + Timestamp: time.Now().UTC().Format(time.RFC3339), + }) + return + } + + // Convert port.StreamEvent to wsMessage + msg := wsMessage{ + ID: event.ID, + Type: event.Type, + TaskID: event.TaskID, + Timestamp: event.Timestamp.Format(time.RFC3339), + Data: event.Data, + } + + if err := conn.WriteJSON(msg); err != nil { + return // Write error, close connection + } + + // Check for terminal events + if event.Type == "build.completed" || event.Type == "build.failed" { + // Give client time to process final message + time.Sleep(100 * time.Millisecond) + return + } + + case <-pingTicker.C: + if err := conn.WriteMessage(websocket.PingMessage, nil); err != nil { + return + } + } + } +} diff --git a/internal/handlers/create_and_build.go b/internal/handlers/create_and_build.go index 10e3da4..d49976a 100644 --- a/internal/handlers/create_and_build.go +++ b/internal/handlers/create_and_build.go @@ -139,6 +139,7 @@ func (h *CreateAndBuildHandler) CreateAndBuild(w http.ResponseWriter, r *http.Re AutoCommit: req.AutoCommit, AutoPush: req.AutoPush, CallbackURL: req.CallbackURL, + GitCloneURL: projectResult.CloneHTTP, // Required for git ops on shared worker pods } taskID, err := h.buildService.StartBuild(ctx, projectResult.ProjectID, spec) diff --git a/internal/metrics/metrics.go b/internal/metrics/metrics.go index 1cd24be..5b1c7e6 100644 --- a/internal/metrics/metrics.go +++ b/internal/metrics/metrics.go @@ -87,6 +87,22 @@ var ( Help: "Total number of SSE stream reconnections", }, []string{"project"}) + // Build Events (SSE streaming) + buildEventsTotal = promauto.NewCounterVec(prometheus.CounterOpts{ + Name: "rdev_build_events_total", + Help: "Total number of build events published", + }, []string{"type"}) + + buildEventSubscribers = promauto.NewGaugeVec(prometheus.GaugeOpts{ + Name: "rdev_build_event_subscribers", + Help: "Number of active build event subscribers", + }, []string{"task_id"}) + + buildEventBufferSize = promauto.NewGauge(prometheus.GaugeOpts{ + Name: "rdev_build_event_buffer_size", + Help: "Total number of events in replay buffers", + }) + // Authentication authFailures = promauto.NewCounterVec(prometheus.CounterOpts{ Name: "rdev_auth_failures_total", @@ -127,6 +143,21 @@ func RecordStreamReconnect(project string) { streamReconnects.WithLabelValues(project).Inc() } +// RecordBuildEvent records a build event publication. +func RecordBuildEvent(eventType string) { + buildEventsTotal.WithLabelValues(eventType).Inc() +} + +// SetBuildEventSubscribers sets the number of subscribers for a build stream. +func SetBuildEventSubscribers(taskID string, count int) { + buildEventSubscribers.WithLabelValues(taskID).Set(float64(count)) +} + +// SetBuildEventBufferSize sets the total buffer size for event replay. +func SetBuildEventBufferSize(size int64) { + buildEventBufferSize.Set(float64(size)) +} + // RecordAuthFailure records an authentication failure. func RecordAuthFailure(reason string) { authFailures.WithLabelValues(reason).Inc() diff --git a/internal/port/build_event_store.go b/internal/port/build_event_store.go new file mode 100644 index 0000000..6cab0eb --- /dev/null +++ b/internal/port/build_event_store.go @@ -0,0 +1,24 @@ +package port + +import ( + "context" + "time" + + "github.com/orchard9/rdev/internal/domain" +) + +// BuildEventStore defines persistence operations for build events. +// Used for SSE reconnection replay when in-memory buffer is exhausted. +type BuildEventStore interface { + // Record stores a build event for later replay. + Record(ctx context.Context, event *domain.BuildEvent) error + + // ListByTask retrieves events for a task, optionally after a sequence number. + // If afterSequence > 0, only events with sequence > afterSequence are returned. + // Results are ordered by sequence ascending. + ListByTask(ctx context.Context, taskID string, afterSequence int64) ([]*domain.BuildEvent, error) + + // Cleanup removes events older than the specified age. + // Returns the number of events deleted. + Cleanup(ctx context.Context, olderThan time.Duration) (int64, error) +} diff --git a/internal/port/cache_provisioner.go b/internal/port/cache_provisioner.go new file mode 100644 index 0000000..a19d825 --- /dev/null +++ b/internal/port/cache_provisioner.go @@ -0,0 +1,27 @@ +package port + +import ( + "context" + + "github.com/orchard9/rdev/internal/domain" +) + +// CacheProvisioner provisions isolated cache access for projects. +// Implementation uses Redis ACLs to scope each project to its own key prefix. +type CacheProvisioner interface { + // CreateProjectCache provisions isolated cache access for a project. + // Creates a Redis ACL user scoped to the project's key prefix. + // Returns credentials that should be injected into the project's environment. + CreateProjectCache(ctx context.Context, projectID string) (*domain.CacheCredentials, error) + + // DeleteProjectCache removes cache access for a project. + // Deletes the Redis ACL user and optionally purges all project keys. + DeleteProjectCache(ctx context.Context, projectID string, purgeKeys bool) error + + // GetProjectCache retrieves cache credentials for a project. + // Returns nil if the project has no cache provisioned. + GetProjectCache(ctx context.Context, projectID string) (*domain.CacheCredentials, error) + + // TestConnection verifies the cache provisioner can connect to Redis. + TestConnection(ctx context.Context) error +} diff --git a/internal/port/database_provisioner.go b/internal/port/database_provisioner.go new file mode 100644 index 0000000..fc9f5f6 --- /dev/null +++ b/internal/port/database_provisioner.go @@ -0,0 +1,28 @@ +package port + +import ( + "context" + + "github.com/orchard9/rdev/internal/domain" +) + +// DatabaseProvisioner provisions isolated databases for projects. +// Implementation uses CockroachDB to create per-project databases and users. +type DatabaseProvisioner interface { + // CreateProjectDatabase provisions an isolated database for a project. + // Creates a database and user scoped to the project. + // Returns credentials that should be injected into the project's environment. + CreateProjectDatabase(ctx context.Context, projectID string) (*domain.DatabaseCredentials, error) + + // DeleteProjectDatabase removes database access for a project. + // Drops the database and user. + DeleteProjectDatabase(ctx context.Context, projectID string) error + + // GetProjectDatabase retrieves database credentials for a project. + // Returns nil if the project has no database provisioned. + // Note: Password cannot be retrieved, only verified against stored credentials. + GetProjectDatabase(ctx context.Context, projectID string) (*domain.DatabaseCredentials, error) + + // TestConnection verifies the database provisioner can connect to CockroachDB. + TestConnection(ctx context.Context) error +} diff --git a/internal/port/stream_publisher.go b/internal/port/stream_publisher.go index 301288f..9faafe5 100644 --- a/internal/port/stream_publisher.go +++ b/internal/port/stream_publisher.go @@ -1,9 +1,32 @@ package port +import "time" + // StreamEvent represents an event to be published on a stream. +// Events are delivered to SSE clients and can be replayed using Last-Event-ID. type StreamEvent struct { - ID string // Event ID for Last-Event-ID support + // ID uniquely identifies this event for Last-Event-ID reconnection support. + // Format: "{streamID}:{sequence}". Populated by the publisher on Publish(). + ID string + + // Type identifies the event category (e.g., "build.output", "build.completed"). + // Use the BuildEvent* constants from the worker package for build events. Type string + + // TaskID associates the event with a specific task for filtering and replay. + // Defaults to the stream ID if not explicitly set. + TaskID string + + // Timestamp records when the event was created or published. + // Populated by the publisher if zero when Publish() is called. + Timestamp time.Time + + // Sequence is a monotonically increasing number within a stream. + // Used for ordering and detecting missed events. + Sequence int64 + + // Data contains the event-specific payload as key-value pairs. + // Contents vary by event type. Data map[string]any } diff --git a/internal/service/build_progress.go b/internal/service/build_progress.go new file mode 100644 index 0000000..cc02e83 --- /dev/null +++ b/internal/service/build_progress.go @@ -0,0 +1,209 @@ +package service + +import ( + "sync" + "time" + + "github.com/orchard9/rdev/internal/domain" + "github.com/orchard9/rdev/internal/port" +) + +// BuildProgressTracker estimates build progress based on agent activity patterns. +// It tracks phases and emits progress events periodically. +type BuildProgressTracker struct { + streams port.StreamPublisher + mu sync.RWMutex + tasks map[string]*buildProgress +} + +// buildProgress tracks the progress state for a single build. +type buildProgress struct { + taskID string + projectID string + phase domain.BuildPhase + percentage float64 + toolCount int + outputLines int + startTime time.Time + lastUpdate time.Time +} + +// NewBuildProgressTracker creates a new progress tracker. +func NewBuildProgressTracker(streams port.StreamPublisher) *BuildProgressTracker { + return &BuildProgressTracker{ + streams: streams, + tasks: make(map[string]*buildProgress), + } +} + +// Start begins tracking progress for a build. +func (t *BuildProgressTracker) Start(taskID, projectID string) { + t.mu.Lock() + defer t.mu.Unlock() + + now := time.Now() + t.tasks[taskID] = &buildProgress{ + taskID: taskID, + projectID: projectID, + phase: domain.BuildPhaseStarting, + percentage: 0, + startTime: now, + lastUpdate: now, + } + + t.emitProgress(taskID) +} + +// RecordToolUse updates progress when a tool is used. +func (t *BuildProgressTracker) RecordToolUse(taskID, toolName string) { + t.mu.Lock() + defer t.mu.Unlock() + + progress, exists := t.tasks[taskID] + if !exists { + return + } + + progress.toolCount++ + progress.lastUpdate = time.Now() + + // Infer phase from tool usage + switch toolName { + case "Read", "Glob", "Grep": + if progress.phase == domain.BuildPhaseStarting { + progress.phase = domain.BuildPhaseReading + } + case "Write", "Edit": + progress.phase = domain.BuildPhaseWriting + case "Bash": + // Could be testing or committing depending on context + if progress.phase == domain.BuildPhaseWriting { + progress.phase = domain.BuildPhaseTesting + } + } + + t.updatePercentage(progress) + t.emitProgress(taskID) +} + +// RecordOutput updates progress when output is received. +func (t *BuildProgressTracker) RecordOutput(taskID string) { + t.mu.Lock() + defer t.mu.Unlock() + + progress, exists := t.tasks[taskID] + if !exists { + return + } + + progress.outputLines++ + progress.lastUpdate = time.Now() + + // Emit progress periodically (every 10 lines or 5 seconds) + if progress.outputLines%10 == 0 || time.Since(progress.lastUpdate) > 5*time.Second { + t.updatePercentage(progress) + t.emitProgress(taskID) + } +} + +// Complete marks a build as complete. +func (t *BuildProgressTracker) Complete(taskID string, success bool) { + t.mu.Lock() + defer t.mu.Unlock() + + progress, exists := t.tasks[taskID] + if !exists { + return + } + + progress.phase = domain.BuildPhaseComplete + progress.percentage = 100 + progress.lastUpdate = time.Now() + + t.emitProgress(taskID) + + // Clean up + delete(t.tasks, taskID) +} + +// GetProgress returns current progress for a build. +func (t *BuildProgressTracker) GetProgress(taskID string) (phase domain.BuildPhase, percentage float64, ok bool) { + t.mu.RLock() + defer t.mu.RUnlock() + + progress, exists := t.tasks[taskID] + if !exists { + return "", 0, false + } + + return progress.phase, progress.percentage, true +} + +// updatePercentage estimates completion percentage based on phase and activity. +func (t *BuildProgressTracker) updatePercentage(progress *buildProgress) { + // Base percentage from phase + var basePercent float64 + switch progress.phase { + case domain.BuildPhaseStarting: + basePercent = 5 + case domain.BuildPhaseReading: + basePercent = 15 + case domain.BuildPhaseWriting: + basePercent = 50 + case domain.BuildPhaseTesting: + basePercent = 80 + case domain.BuildPhaseCommitting: + basePercent = 95 + case domain.BuildPhaseComplete: + basePercent = 100 + } + + // Add progress within phase based on activity + // Tool count adds ~1% per tool (max +10% within phase) + toolBonus := float64(progress.toolCount) * 1.0 + if toolBonus > 10 { + toolBonus = 10 + } + + // Time-based bonus: ~1% per 10 seconds (max +5% within phase) + elapsed := time.Since(progress.startTime).Seconds() + timeBonus := elapsed / 10.0 + if timeBonus > 5 { + timeBonus = 5 + } + + // Calculate total but don't exceed next phase threshold + progress.percentage = basePercent + toolBonus + timeBonus + + // Cap to reasonable maximum for current phase + maxForPhase := map[domain.BuildPhase]float64{ + domain.BuildPhaseStarting: 14, + domain.BuildPhaseReading: 49, + domain.BuildPhaseWriting: 79, + domain.BuildPhaseTesting: 94, + domain.BuildPhaseCommitting: 99, + domain.BuildPhaseComplete: 100, + } + if max, ok := maxForPhase[progress.phase]; ok && progress.percentage > max { + progress.percentage = max + } +} + +// emitProgress publishes a progress event. Must be called with lock held. +func (t *BuildProgressTracker) emitProgress(taskID string) { + progress, exists := t.tasks[taskID] + if !exists || t.streams == nil { + return + } + + t.streams.Publish(taskID, port.StreamEvent{ + Type: "build.progress", + TaskID: taskID, + Data: map[string]any{ + "phase": string(progress.phase), + "percentage": progress.percentage, + "tool_count": progress.toolCount, + "elapsed_ms": time.Since(progress.startTime).Milliseconds(), + }, + }) +} diff --git a/internal/service/build_progress_test.go b/internal/service/build_progress_test.go new file mode 100644 index 0000000..343ae7e --- /dev/null +++ b/internal/service/build_progress_test.go @@ -0,0 +1,127 @@ +package service + +import ( + "testing" + + "github.com/orchard9/rdev/internal/adapter/memory" + "github.com/orchard9/rdev/internal/domain" +) + +func TestBuildProgressTracker_Start(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + tracker.Start("task-1", "project-1") + + phase, percentage, ok := tracker.GetProgress("task-1") + if !ok { + t.Fatal("expected to find progress for task-1") + } + if phase != domain.BuildPhaseStarting { + t.Errorf("got phase %q, want %q", phase, domain.BuildPhaseStarting) + } + if percentage < 0 || percentage > 100 { + t.Errorf("got invalid percentage %f", percentage) + } +} + +func TestBuildProgressTracker_RecordToolUse(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + tracker.Start("task-1", "project-1") + + // Simulate reading phase + tracker.RecordToolUse("task-1", "Read") + phase, _, _ := tracker.GetProgress("task-1") + if phase != domain.BuildPhaseReading { + t.Errorf("after Read tool, got phase %q, want %q", phase, domain.BuildPhaseReading) + } + + // Simulate writing phase + tracker.RecordToolUse("task-1", "Write") + phase, _, _ = tracker.GetProgress("task-1") + if phase != domain.BuildPhaseWriting { + t.Errorf("after Write tool, got phase %q, want %q", phase, domain.BuildPhaseWriting) + } + + // Simulate testing phase + tracker.RecordToolUse("task-1", "Bash") + phase, _, _ = tracker.GetProgress("task-1") + if phase != domain.BuildPhaseTesting { + t.Errorf("after Bash tool, got phase %q, want %q", phase, domain.BuildPhaseTesting) + } +} + +func TestBuildProgressTracker_Complete(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + tracker.Start("task-1", "project-1") + tracker.Complete("task-1", true) + + // After completion, task should be removed + _, _, ok := tracker.GetProgress("task-1") + if ok { + t.Error("expected task to be removed after completion") + } +} + +func TestBuildProgressTracker_PercentageIncrease(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + tracker.Start("task-1", "project-1") + + _, initialPercent, _ := tracker.GetProgress("task-1") + + // Record several tool uses + for i := 0; i < 5; i++ { + tracker.RecordToolUse("task-1", "Read") + } + + _, newPercent, _ := tracker.GetProgress("task-1") + if newPercent <= initialPercent { + t.Errorf("expected percentage to increase from %f, got %f", initialPercent, newPercent) + } +} + +func TestBuildProgressTracker_NonexistentTask(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + // These should not panic + tracker.RecordToolUse("nonexistent", "Read") + tracker.RecordOutput("nonexistent") + tracker.Complete("nonexistent", true) + + _, _, ok := tracker.GetProgress("nonexistent") + if ok { + t.Error("expected not to find progress for nonexistent task") + } +} + +func TestBuildProgressTracker_EmitsEvents(t *testing.T) { + streams := memory.NewStreamPublisher() + tracker := NewBuildProgressTracker(streams) + + // Subscribe to events + events, cleanup := streams.Subscribe("task-1") + defer cleanup() + + // Start should emit a progress event + tracker.Start("task-1", "project-1") + + select { + case event := <-events: + if event.Type != "build.progress" { + t.Errorf("got event type %q, want build.progress", event.Type) + } + phase, ok := event.Data["phase"].(string) + if !ok || phase != "starting" { + t.Errorf("got phase %v, want 'starting'", event.Data["phase"]) + } + default: + t.Error("expected progress event to be published") + } +} diff --git a/internal/service/build_service.go b/internal/service/build_service.go index 3151562..258549e 100644 --- a/internal/service/build_service.go +++ b/internal/service/build_service.go @@ -58,6 +58,9 @@ func (s *BuildService) StartBuild(ctx context.Context, projectID string, spec do if len(spec.Variables) > 0 { taskSpec["variables"] = spec.Variables } + if spec.GitCloneURL != "" { + taskSpec["git_clone_url"] = spec.GitCloneURL + } // Create work task task := &domain.WorkTask{ diff --git a/internal/service/project_infra.go b/internal/service/project_infra.go index 76a0f3c..9f5ff6b 100644 --- a/internal/service/project_infra.go +++ b/internal/service/project_infra.go @@ -21,7 +21,8 @@ func ValidateProjectName(name string) error { } // ProjectInfraService orchestrates project infrastructure operations. -// It coordinates git repo creation, DNS, CI activation, template seeding, and deployment. +// It coordinates git repo creation, DNS, CI activation, template seeding, deployment, +// and database/cache provisioning. type ProjectInfraService struct { db *sql.DB gitRepo port.GitRepository @@ -31,6 +32,9 @@ type ProjectInfraService struct { templateProvider port.TemplateProvider domainRepo port.ProjectDomainRepository slugGenerator port.SlugGenerator + credentialStore port.CredentialStore + dbProvisioner port.DatabaseProvisioner + cacheProvisioner port.CacheProvisioner logger *slog.Logger // Config @@ -86,6 +90,24 @@ func NewProjectInfraService( } } +// WithCredentialStore sets the credential store for storing provisioned credentials. +func (s *ProjectInfraService) WithCredentialStore(cs port.CredentialStore) *ProjectInfraService { + s.credentialStore = cs + return s +} + +// WithDatabaseProvisioner sets the database provisioner for project databases. +func (s *ProjectInfraService) WithDatabaseProvisioner(dp port.DatabaseProvisioner) *ProjectInfraService { + s.dbProvisioner = dp + return s +} + +// WithCacheProvisioner sets the cache provisioner for project cache access. +func (s *ProjectInfraService) WithCacheProvisioner(cp port.CacheProvisioner) *ProjectInfraService { + s.cacheProvisioner = cp + return s +} + // CreateProjectRequest contains parameters for creating a new project. type CreateProjectRequest struct { Name string diff --git a/internal/service/project_infra_crud.go b/internal/service/project_infra_crud.go index e77533b..d74f9f2 100644 --- a/internal/service/project_infra_crud.go +++ b/internal/service/project_infra_crud.go @@ -4,6 +4,7 @@ import ( "context" "database/sql" "fmt" + "strings" "time" "github.com/orchard9/rdev/internal/domain" @@ -66,13 +67,16 @@ func (s *ProjectInfraService) CreateProject(ctx context.Context, req CreateProje // 7. Seed repository with template templateSeeded := s.seedTemplate(ctx, req, result) - // 8. Create initial K8s deployment (before triggering CI build) + // 8. Provision database and cache + s.provisionResources(ctx, result) + + // 9. Create initial K8s deployment (before triggering CI build) // This ensures the deployment exists for `kubectl set image` in CI pipeline if templateSeeded { s.createInitialDeployment(ctx, req, result) } - // 9. Trigger initial CI build if both CI and template are ready + // 10. Trigger initial CI build if both CI and template are ready if ciActivated && templateSeeded && s.ciProvider != nil { pipelineNum, err := s.ciProvider.TriggerBuild(ctx, result.GitRepoOwner, result.GitRepoName, "main") if err != nil { @@ -131,9 +135,21 @@ func (s *ProjectInfraService) createGitRepo(ctx context.Context, req CreateProje repo, err := s.gitRepo.CreateRepo(ctx, req.Name, req.Description, req.Private) if err != nil { - s.logger.Error("failed to create git repo", "error", err) - result.NextSteps = append(result.NextSteps, "Create git repo manually: failed to auto-create") - return + // Check if repo already exists - if so, fetch it instead + if strings.Contains(err.Error(), "already exists") { + s.logger.Info("git repo already exists, fetching existing", "name", req.Name) + existingRepo, getErr := s.gitRepo.GetRepo(ctx, s.defaultGitOwner, req.Name) + if getErr != nil { + s.logger.Error("failed to get existing git repo", "error", getErr) + result.NextSteps = append(result.NextSteps, "Git repo exists but couldn't fetch details") + return + } + repo = existingRepo + } else { + s.logger.Error("failed to create git repo", "error", err) + result.NextSteps = append(result.NextSteps, "Create git repo manually: failed to auto-create") + return + } } result.GitRepoOwner = repo.Owner @@ -293,6 +309,94 @@ func (s *ProjectInfraService) seedTemplate(ctx context.Context, req CreateProjec return true } +// provisionResources provisions database and cache for a project. +// Credentials are stored in the credential store for injection into deployments. +// If credential storage fails after provisioning, the resources are rolled back to prevent orphans. +func (s *ProjectInfraService) provisionResources(ctx context.Context, result *CreateProjectResult) { + projectID := result.ProjectID + + // Provision database + if s.dbProvisioner != nil { + dbCreds, err := s.dbProvisioner.CreateProjectDatabase(ctx, projectID) + if err != nil { + s.logger.Error("failed to provision database", "project", projectID, "error", err) + result.NextSteps = append(result.NextSteps, "Database provisioning failed - contact admin") + } else if s.credentialStore != nil { + // Store credentials - rollback on failure to prevent orphaned database + var storeErr error + if err := s.storeCredential(ctx, projectID, "database", "DATABASE_URL", dbCreds.URL); err != nil { + storeErr = err + s.logger.Error("failed to store DATABASE_URL", "project", projectID, "error", err) + } + if err := s.storeCredential(ctx, projectID, "database", "DATABASE_URL_STAGING", dbCreds.URLStaging); err != nil { + storeErr = err + s.logger.Error("failed to store DATABASE_URL_STAGING", "project", projectID, "error", err) + } + + // Rollback database if credential storage failed + if storeErr != nil { + s.logger.Warn("rolling back database due to credential storage failure", "project", projectID) + if rollbackErr := s.dbProvisioner.DeleteProjectDatabase(ctx, projectID); rollbackErr != nil { + s.logger.Error("failed to rollback database", "project", projectID, "error", rollbackErr) + result.NextSteps = append(result.NextSteps, "Database created but credentials not stored - manual cleanup required") + } else { + result.NextSteps = append(result.NextSteps, "Database provisioning rolled back due to credential storage failure") + } + } else { + s.logger.Info("database provisioned", "project", projectID, "database", dbCreds.DatabaseName) + } + } + } + + // Provision cache + if s.cacheProvisioner != nil { + cacheCreds, err := s.cacheProvisioner.CreateProjectCache(ctx, projectID) + if err != nil { + s.logger.Error("failed to provision cache", "project", projectID, "error", err) + result.NextSteps = append(result.NextSteps, "Cache provisioning failed - contact admin") + } else if s.credentialStore != nil { + // Store credentials - rollback on failure to prevent orphaned cache + var storeErr error + if err := s.storeCredential(ctx, projectID, "cache", "REDIS_URL", cacheCreds.URL); err != nil { + storeErr = err + s.logger.Error("failed to store REDIS_URL", "project", projectID, "error", err) + } + if err := s.storeCredential(ctx, projectID, "cache", "REDIS_URL_STAGING", cacheCreds.URLStaging); err != nil { + storeErr = err + s.logger.Error("failed to store REDIS_URL_STAGING", "project", projectID, "error", err) + } + if err := s.storeCredential(ctx, projectID, "cache", "REDIS_PREFIX", cacheCreds.Prefix); err != nil { + storeErr = err + s.logger.Error("failed to store REDIS_PREFIX", "project", projectID, "error", err) + } + + // Rollback cache if credential storage failed + if storeErr != nil { + s.logger.Warn("rolling back cache due to credential storage failure", "project", projectID) + if rollbackErr := s.cacheProvisioner.DeleteProjectCache(ctx, projectID, true); rollbackErr != nil { + s.logger.Error("failed to rollback cache", "project", projectID, "error", rollbackErr) + result.NextSteps = append(result.NextSteps, "Cache created but credentials not stored - manual cleanup required") + } else { + result.NextSteps = append(result.NextSteps, "Cache provisioning rolled back due to credential storage failure") + } + } else { + s.logger.Info("cache provisioned", "project", projectID, "prefix", cacheCreds.Prefix) + } + } + } +} + +// storeCredential stores a project-scoped credential in the credential store. +// Keys are prefixed with the project ID for isolation (e.g., "myproject:DATABASE_URL"). +func (s *ProjectInfraService) storeCredential(ctx context.Context, projectID, category, key, value string) error { + scopedKey := projectID + ":" + key + return s.credentialStore.Set(ctx, domain.Credential{ + Key: scopedKey, + Value: value, + Category: category, + }) +} + // createInitialDeployment creates the initial K8s deployment for a project. // This is called after template seeding to ensure the deployment exists before // the CI pipeline runs `kubectl set image`. The deployment will be in ImagePullBackOff @@ -496,20 +600,34 @@ func (s *ProjectInfraService) DeleteProject(ctx context.Context, projectID strin } } - // 2. Delete all DNS records for project domains + // 2. Delete provisioned database + if s.dbProvisioner != nil { + if err := s.dbProvisioner.DeleteProjectDatabase(ctx, projectID); err != nil { + s.logger.Warn("failed to delete project database", "error", err) + } + } + + // 3. Delete provisioned cache (and purge keys) + if s.cacheProvisioner != nil { + if err := s.cacheProvisioner.DeleteProjectCache(ctx, projectID, true); err != nil { + s.logger.Warn("failed to delete project cache", "error", err) + } + } + + // 4. Delete all DNS records for project domains s.deleteDNSRecords(ctx, status) - // 3. Delete all project_domains entries (CASCADE should handle this, but be explicit) + // 5. Delete all project_domains entries (CASCADE should handle this, but be explicit) if s.domainRepo != nil { if err := s.domainRepo.DeleteByProject(ctx, projectID); err != nil { s.logger.Warn("failed to delete project domains", "error", err) } } - // 4. Delete git repo (optional - might want to keep it) + // 6. Delete git repo (optional - might want to keep it) // Skipping git repo deletion for safety - // 5. Delete from database + // 7. Delete from database _, err = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = $1`, projectID) if err != nil { return fmt.Errorf("failed to delete project from database: %w", err) diff --git a/internal/worker/build_executor.go b/internal/worker/build_executor.go index 932b042..5d5d396 100644 --- a/internal/worker/build_executor.go +++ b/internal/worker/build_executor.go @@ -11,12 +11,24 @@ import ( "github.com/orchard9/rdev/internal/port" ) +// Build event type constants for SSE streaming. +const ( + BuildEventStarted = "build.started" + BuildEventOutput = "build.output" + BuildEventCompleted = "build.completed" + BuildEventFailed = "build.failed" + BuildEventToolUse = "build.tool_use" + BuildEventToolResult = "build.tool_result" + BuildEventError = "build.error" +) + // BuildExecutor handles WorkTaskTypeBuild tasks. // It translates BuildSpec fields from the work task's Spec map into an // AgentRequest, executes via a CodeAgent, and returns a BuildResult. type BuildExecutor struct { agentRegistry port.CodeAgentRegistry - gitOps *GitOperations + podGitOps *PodGitOperations // Post-build git operations (runs in pod) + streams port.StreamPublisher // SSE stream publisher for real-time events logger *slog.Logger defaultPodName string // Default claudebox pod for agent execution namespace string // Kubernetes namespace for the pod @@ -31,7 +43,8 @@ type BuildExecutorConfig struct { // NewBuildExecutor creates a new build executor. func NewBuildExecutor( agentRegistry port.CodeAgentRegistry, - gitOps *GitOperations, + podGitOps *PodGitOperations, + streams port.StreamPublisher, logger *slog.Logger, cfg *BuildExecutorConfig, ) *BuildExecutor { @@ -46,7 +59,8 @@ func NewBuildExecutor( } return &BuildExecutor{ agentRegistry: agentRegistry, - gitOps: gitOps, + podGitOps: podGitOps, + streams: streams, logger: logger.With("component", "build-executor"), defaultPodName: cfg.DefaultPodName, namespace: cfg.Namespace, @@ -56,9 +70,21 @@ func NewBuildExecutor( // Execute runs a build task by translating its spec into an agent call. func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *domain.BuildResult { start := time.Now() + streamID := task.ID // Use task ID as stream ID for SSE + + // Publish BuildEventStarted event + b.publishEvent(streamID, "BuildEventStarted", map[string]any{ + "task_id": task.ID, + "project_id": task.ProjectID, + "started_at": start.Format(time.RFC3339), + }) spec, err := b.parseSpec(task.Spec) if err != nil { + b.publishEvent(streamID, "BuildEventFailed", map[string]any{ + "task_id": task.ID, + "error": fmt.Sprintf("invalid build spec: %v", err), + }) return &domain.BuildResult{ Success: false, Error: fmt.Sprintf("invalid build spec: %v", err), @@ -66,24 +92,9 @@ func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *dom } } - // Determine working directory + // Working directory in the pod where the project repo is cloned workDir := "/workspace" - // Clone repo if git URL is provided in the spec - gitURL, _ := task.Spec["git_url"].(string) - if gitURL != "" && b.gitOps != nil { - cloneDir, cleanup, err := b.gitOps.CloneToTemp(ctx, gitURL) - if err != nil { - return &domain.BuildResult{ - Success: false, - Error: fmt.Sprintf("git clone failed: %v", err), - DurationMs: time.Since(start).Milliseconds(), - } - } - defer cleanup() - workDir = cloneDir - } - // Get a code agent agent := b.agentRegistry.Default() if agent == nil { @@ -100,6 +111,47 @@ func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *dom podName = b.defaultPodName } + // Clone or update the git repository if git operations are needed. + // This ensures the workspace is a valid git repo before the agent runs. + if (spec.AutoCommit || spec.AutoPush) && b.podGitOps != nil { + if spec.GitCloneURL == "" { + b.publishEvent(streamID, "BuildEventFailed", map[string]any{ + "task_id": task.ID, + "error": "git_clone_url is required when auto_commit or auto_push is enabled", + }) + return &domain.BuildResult{ + Success: false, + Error: "git_clone_url is required when auto_commit or auto_push is enabled", + DurationMs: time.Since(start).Milliseconds(), + } + } + + b.logger.Info("ensuring git repository is ready", + "task_id", task.ID, + "pod", podName, + "workDir", workDir, + ) + + cloneResult := b.podGitOps.CloneRepo(ctx, podName, workDir, spec.GitCloneURL) + if cloneResult.Error != nil { + b.publishEvent(streamID, "BuildEventFailed", map[string]any{ + "task_id": task.ID, + "error": fmt.Sprintf("git clone failed: %v", cloneResult.Error), + }) + return &domain.BuildResult{ + Success: false, + Error: fmt.Sprintf("git clone failed: %v", cloneResult.Error), + DurationMs: time.Since(start).Milliseconds(), + } + } + + if cloneResult.Cloned { + b.publishEvent(streamID, "BuildEventOutput", map[string]any{ + "content": fmt.Sprintf("Cloned repository to %s", workDir), + }) + } + } + // Build the agent request with pod metadata for Claude Code adapter agentReq := &domain.AgentRequest{ Prompt: spec.Prompt, @@ -125,6 +177,23 @@ func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *dom // Execute the agent agentResult, err := agent.Execute(ctx, agentReq, func(event domain.AgentEvent) { + // Publish all agent events to the SSE stream + eventType := "BuildEventOutput" + switch event.Type { + case domain.AgentEventToolUse: + eventType = "BuildEventToolUse" + case domain.AgentEventToolResult: + eventType = "BuildEventToolResult" + case domain.AgentEventError: + eventType = "BuildEventError" + } + b.publishEvent(streamID, eventType, map[string]any{ + "content": event.Content, + "stream": event.Stream, + "tool_name": event.ToolName, + }) + + // Also buffer output for final result if event.Type == domain.AgentEventOutput || event.Type == domain.AgentEventError { if outputBuilder.Len() >= maxOutputSize { return // Output cap reached, discard further output @@ -143,6 +212,12 @@ func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *dom }) if err != nil { + b.publishEvent(streamID, "BuildEventFailed", map[string]any{ + "task_id": task.ID, + "error": fmt.Sprintf("agent execution failed: %v", err), + "duration_ms": time.Since(start).Milliseconds(), + }) + b.closeStream(ctx, streamID) return &domain.BuildResult{ Success: false, Error: fmt.Sprintf("agent execution failed: %v", err), @@ -165,33 +240,96 @@ func (b *BuildExecutor) Execute(ctx context.Context, task *domain.WorkTask) *dom result.Error = errMsg } - // Handle git commit/push if requested - if result.Success && b.gitOps != nil && gitURL != "" { - if spec.AutoCommit { - commitMsg := fmt.Sprintf("build: %s", truncate(spec.Prompt, 72)) - sha, filesChanged, err := b.gitOps.CommitAndPush(ctx, workDir, commitMsg, spec.AutoPush) - if err != nil { - b.logger.Warn("git commit/push failed", - "task_id", task.ID, - "error", err, - ) - result.Success = false - result.Error = fmt.Sprintf("build succeeded but git operations failed: %v", err) - } else { - result.CommitSHA = sha - result.FilesChanged = filesChanged - } + // Post-build git operations: commit and push changes programmatically. + // This is deterministic - we don't rely on the LLM to run git commands. + if result.Success && spec.AutoCommit && b.podGitOps != nil { + commitMsg := fmt.Sprintf("build: %s", truncate(spec.Prompt, 72)) + gitResult := b.podGitOps.CommitAndPush(ctx, podName, workDir, commitMsg, spec.AutoPush) + + if gitResult.Error != nil { + b.logger.Warn("post-build git operations failed", + "task_id", task.ID, + "error", gitResult.Error, + ) + result.Success = false + result.Error = fmt.Sprintf("build succeeded but git operations failed: %v", gitResult.Error) + } else if gitResult.HasChanges { + result.CommitSHA = gitResult.CommitSHA + result.FilesChanged = gitResult.FilesChanged + b.logger.Info("post-build git operations completed", + "task_id", task.ID, + "commit", gitResult.CommitSHA, + "files", len(gitResult.FilesChanged), + "pushed", gitResult.Pushed, + ) + } else { + b.logger.Info("no changes to commit after build", + "task_id", task.ID, + ) } } + // Publish completion event + if result.Success { + b.publishEvent(streamID, "BuildEventCompleted", map[string]any{ + "task_id": task.ID, + "success": true, + "commit_sha": result.CommitSHA, + "files_changed": result.FilesChanged, + "duration_ms": result.DurationMs, + }) + } else { + b.publishEvent(streamID, "BuildEventFailed", map[string]any{ + "task_id": task.ID, + "error": result.Error, + "duration_ms": result.DurationMs, + }) + } + b.closeStream(ctx, streamID) + return result } +// publishEvent publishes an event to the SSE stream if a stream publisher is configured. +func (b *BuildExecutor) publishEvent(streamID, eventType string, data map[string]any) { + if b.streams == nil { + return + } + b.streams.Publish(streamID, port.StreamEvent{ + Type: eventType, + Data: data, + }) +} + +// streamCloseDelay is the time to wait before closing a stream after build completion. +// This allows SSE clients to receive final events before the stream is closed. +const streamCloseDelay = 5 * time.Second + +// closeStream closes the stream after a delay to allow clients to receive final events. +// The close is context-aware and respects cancellation. +func (b *BuildExecutor) closeStream(ctx context.Context, streamID string) { + if b.streams == nil { + return + } + // Close stream after a short delay to ensure final events are delivered. + // Use a goroutine with context awareness to avoid race conditions. + go func() { + select { + case <-ctx.Done(): + // Context cancelled, close immediately + b.streams.Close(streamID) + case <-time.After(streamCloseDelay): + b.streams.Close(streamID) + } + }() +} + // parsedBuildSpec holds typed fields extracted from the task spec map. type parsedBuildSpec struct { - Prompt string - AutoCommit bool - AutoPush bool + Prompt string + AutoCommit bool + AutoPush bool + GitCloneURL string } // parseSpec extracts typed BuildSpec fields from the generic map[string]any. @@ -203,11 +341,13 @@ func (b *BuildExecutor) parseSpec(spec map[string]any) (*parsedBuildSpec, error) autoCommit, _ := spec["auto_commit"].(bool) autoPush, _ := spec["auto_push"].(bool) + gitCloneURL, _ := spec["git_clone_url"].(string) return &parsedBuildSpec{ - Prompt: prompt, - AutoCommit: autoCommit, - AutoPush: autoPush, + Prompt: prompt, + AutoCommit: autoCommit, + AutoPush: autoPush, + GitCloneURL: gitCloneURL, }, nil } diff --git a/internal/worker/git_operations.go b/internal/worker/git_operations.go deleted file mode 100644 index bdba0bc..0000000 --- a/internal/worker/git_operations.go +++ /dev/null @@ -1,233 +0,0 @@ -package worker - -import ( - "bytes" - "context" - "fmt" - "log/slog" - "os" - "os/exec" - "path/filepath" - "strings" -) - -// GitOperations provides git clone, commit, and push functionality -// for the build executor. It uses os/exec to run git commands. -type GitOperations struct { - giteaToken string - gitUser string - gitEmail string - logger *slog.Logger -} - -// GitOperationsConfig configures git operations. -type GitOperationsConfig struct { - // GiteaToken is the token for HTTPS clone/push authentication. - GiteaToken string - - // GitUser is the git commit author name. - GitUser string - - // GitEmail is the git commit author email. - GitEmail string - - Logger *slog.Logger -} - -// NewGitOperations creates a new git operations helper. -func NewGitOperations(cfg GitOperationsConfig) *GitOperations { - if cfg.GitUser == "" { - cfg.GitUser = "rdev-worker" - } - if cfg.GitEmail == "" { - cfg.GitEmail = "worker@threesix.ai" - } - if cfg.Logger == nil { - cfg.Logger = slog.Default() - } - return &GitOperations{ - giteaToken: cfg.GiteaToken, - gitUser: cfg.GitUser, - gitEmail: cfg.GitEmail, - logger: cfg.Logger.With("component", "git-ops"), - } -} - -// CloneToTemp clones a repository to a temporary directory. -// Returns the clone directory and a cleanup function. -func (g *GitOperations) CloneToTemp(ctx context.Context, gitURL string) (string, func(), error) { - tmpDir, err := os.MkdirTemp("", "rdev-build-*") - if err != nil { - return "", nil, fmt.Errorf("create temp dir: %w", err) - } - - cleanup := func() { - if err := os.RemoveAll(tmpDir); err != nil { - g.logger.Warn("failed to cleanup temp dir", "dir", tmpDir, "error", err) - } - } - - // Inject token into clone URL for authentication - authURL := g.injectToken(gitURL) - - if err := g.runGit(ctx, tmpDir, "clone", authURL, "."); err != nil { - cleanup() - return "", nil, fmt.Errorf("git clone: %w", err) - } - - // Configure git user for commits - if err := g.runGit(ctx, tmpDir, "config", "user.name", g.gitUser); err != nil { - cleanup() - return "", nil, fmt.Errorf("git config user.name: %w", err) - } - if err := g.runGit(ctx, tmpDir, "config", "user.email", g.gitEmail); err != nil { - cleanup() - return "", nil, fmt.Errorf("git config user.email: %w", err) - } - - g.logger.Info("cloned repository", "url", gitURL, "dir", tmpDir) - return tmpDir, cleanup, nil -} - -// CommitAndPush stages all changes, commits, and optionally pushes. -// Returns the commit SHA and list of changed files. -func (g *GitOperations) CommitAndPush(ctx context.Context, dir, message string, push bool) (string, []string, error) { - // Stage all changes - if err := g.runGit(ctx, dir, "add", "-A"); err != nil { - return "", nil, fmt.Errorf("git add: %w", err) - } - - // Check if there are changes to commit - status, err := g.runGitOutput(ctx, dir, "status", "--porcelain") - if err != nil { - return "", nil, fmt.Errorf("git status: %w", err) - } - if strings.TrimSpace(status) == "" { - g.logger.Info("no changes to commit", "dir", dir) - return "", nil, nil - } - - // Get list of changed files - diffOutput, err := g.runGitOutput(ctx, dir, "diff", "--cached", "--name-only") - if err != nil { - return "", nil, fmt.Errorf("git diff: %w", err) - } - var filesChanged []string - for _, f := range strings.Split(strings.TrimSpace(diffOutput), "\n") { - if f != "" { - filesChanged = append(filesChanged, f) - } - } - - // Commit - if err := g.runGit(ctx, dir, "commit", "-m", message); err != nil { - return "", nil, fmt.Errorf("git commit: %w", err) - } - - // Get commit SHA - sha, err := g.runGitOutput(ctx, dir, "rev-parse", "HEAD") - if err != nil { - return "", nil, fmt.Errorf("git rev-parse: %w", err) - } - sha = strings.TrimSpace(sha) - - g.logger.Info("committed changes", - "sha", sha, - "files", len(filesChanged), - ) - - // Push if requested - if push { - if err := g.runGit(ctx, dir, "push"); err != nil { - return sha, filesChanged, fmt.Errorf("git push: %w", err) - } - g.logger.Info("pushed changes", "sha", sha) - } - - return sha, filesChanged, nil -} - -// injectToken adds the Gitea token to an HTTPS git URL for authentication. -// Converts "https://git.example.com/org/repo.git" to -// "https://token@git.example.com/org/repo.git". -func (g *GitOperations) injectToken(gitURL string) string { - if g.giteaToken == "" { - return gitURL - } - // Handle https:// URLs - if strings.HasPrefix(gitURL, "https://") { - return "https://" + g.giteaToken + "@" + gitURL[len("https://"):] - } - if strings.HasPrefix(gitURL, "http://") { - return "http://" + g.giteaToken + "@" + gitURL[len("http://"):] - } - return gitURL -} - -// gitEnv returns a minimal environment for git subprocesses. -// Only PATH and HOME are inherited; all other host env vars are excluded -// to prevent credential or config leakage. -func gitEnv() []string { - env := []string{"GIT_TERMINAL_PROMPT=0"} - for _, key := range []string{"PATH", "HOME"} { - if v := os.Getenv(key); v != "" { - env = append(env, key+"="+v) - } - } - return env -} - -// runGit executes a git command in the given directory. -func (g *GitOperations) runGit(ctx context.Context, dir string, args ...string) error { - cmd := exec.CommandContext(ctx, "git", args...) - cmd.Dir = dir - cmd.Env = gitEnv() - - var stderr bytes.Buffer - cmd.Stderr = &stderr - - if err := cmd.Run(); err != nil { - // Redact token from error messages - errMsg := g.redactToken(stderr.String()) - return fmt.Errorf("%s: %s", err, errMsg) - } - return nil -} - -// runGitOutput executes a git command and returns its stdout. -func (g *GitOperations) runGitOutput(ctx context.Context, dir string, args ...string) (string, error) { - cmd := exec.CommandContext(ctx, "git", args...) - cmd.Dir = dir - cmd.Env = gitEnv() - - var stdout, stderr bytes.Buffer - cmd.Stdout = &stdout - cmd.Stderr = &stderr - - if err := cmd.Run(); err != nil { - errMsg := g.redactToken(stderr.String()) - return "", fmt.Errorf("%s: %s", err, errMsg) - } - return stdout.String(), nil -} - -// redactToken removes the Gitea token from log/error output. -func (g *GitOperations) redactToken(s string) string { - if g.giteaToken == "" { - return s - } - return strings.ReplaceAll(s, g.giteaToken, "[REDACTED]") -} - -// EnsureGitDir verifies that the given path is a valid git repository. -func (g *GitOperations) EnsureGitDir(dir string) error { - gitDir := filepath.Join(dir, ".git") - info, err := os.Stat(gitDir) - if err != nil { - return fmt.Errorf("not a git repository: %w", err) - } - if !info.IsDir() { - return fmt.Errorf("not a git repository: .git is not a directory") - } - return nil -} diff --git a/internal/worker/git_operations_test.go b/internal/worker/git_operations_test.go deleted file mode 100644 index 4efb200..0000000 --- a/internal/worker/git_operations_test.go +++ /dev/null @@ -1,415 +0,0 @@ -package worker - -import ( - "context" - "log/slog" - "os" - "path/filepath" - "strings" - "testing" -) - -func testGitOps(token string) *GitOperations { - return NewGitOperations(GitOperationsConfig{ - GiteaToken: token, - GitUser: "test-user", - GitEmail: "test@example.com", - Logger: slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelWarn})), - }) -} - -func TestNewGitOperations_Defaults(t *testing.T) { - g := NewGitOperations(GitOperationsConfig{}) - if g.gitUser != "rdev-worker" { - t.Errorf("expected default gitUser 'rdev-worker', got %q", g.gitUser) - } - if g.gitEmail != "worker@threesix.ai" { - t.Errorf("expected default gitEmail 'worker@threesix.ai', got %q", g.gitEmail) - } - if g.logger == nil { - t.Error("expected non-nil logger") - } -} - -func TestNewGitOperations_CustomValues(t *testing.T) { - g := NewGitOperations(GitOperationsConfig{ - GiteaToken: "my-token", - GitUser: "custom-user", - GitEmail: "custom@example.com", - }) - if g.giteaToken != "my-token" { - t.Errorf("expected token 'my-token', got %q", g.giteaToken) - } - if g.gitUser != "custom-user" { - t.Errorf("expected gitUser 'custom-user', got %q", g.gitUser) - } - if g.gitEmail != "custom@example.com" { - t.Errorf("expected gitEmail 'custom@example.com', got %q", g.gitEmail) - } -} - -func TestInjectToken(t *testing.T) { - tests := []struct { - name string - token string - url string - expect string - }{ - { - name: "https URL with token", - token: "ghp_abc123", - url: "https://git.example.com/org/repo.git", - expect: "https://ghp_abc123@git.example.com/org/repo.git", - }, - { - name: "http URL with token", - token: "ghp_abc123", - url: "http://git.example.com/org/repo.git", - expect: "http://ghp_abc123@git.example.com/org/repo.git", - }, - { - name: "no token", - token: "", - url: "https://git.example.com/org/repo.git", - expect: "https://git.example.com/org/repo.git", - }, - { - name: "ssh URL unchanged", - token: "ghp_abc123", - url: "git@git.example.com:org/repo.git", - expect: "git@git.example.com:org/repo.git", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - g := testGitOps(tt.token) - got := g.injectToken(tt.url) - if got != tt.expect { - t.Errorf("injectToken(%q) = %q, want %q", tt.url, got, tt.expect) - } - }) - } -} - -func TestRedactToken(t *testing.T) { - tests := []struct { - name string - token string - input string - expect string - }{ - { - name: "redacts token from message", - token: "secret123", - input: "fatal: Authentication failed for 'https://secret123@git.example.com/repo.git'", - expect: "fatal: Authentication failed for 'https://[REDACTED]@git.example.com/repo.git'", - }, - { - name: "no token to redact", - token: "", - input: "fatal: repository not found", - expect: "fatal: repository not found", - }, - { - name: "token not present in message", - token: "secret123", - input: "fatal: repository not found", - expect: "fatal: repository not found", - }, - { - name: "multiple occurrences", - token: "tok", - input: "tok appears twice: tok", - expect: "[REDACTED] appears twice: [REDACTED]", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - g := testGitOps(tt.token) - got := g.redactToken(tt.input) - if got != tt.expect { - t.Errorf("redactToken(%q) = %q, want %q", tt.input, got, tt.expect) - } - }) - } -} - -func TestEnsureGitDir(t *testing.T) { - g := testGitOps("") - - t.Run("valid git directory", func(t *testing.T) { - dir := t.TempDir() - if err := os.MkdirAll(filepath.Join(dir, ".git"), 0o755); err != nil { - t.Fatal(err) - } - if err := g.EnsureGitDir(dir); err != nil { - t.Errorf("expected no error for valid git dir, got: %v", err) - } - }) - - t.Run("no .git directory", func(t *testing.T) { - dir := t.TempDir() - err := g.EnsureGitDir(dir) - if err == nil { - t.Error("expected error for non-git directory") - } - }) - - t.Run(".git is a file not directory", func(t *testing.T) { - dir := t.TempDir() - if err := os.WriteFile(filepath.Join(dir, ".git"), []byte("gitdir: .."), 0o644); err != nil { - t.Fatal(err) - } - err := g.EnsureGitDir(dir) - if err == nil { - t.Error("expected error when .git is a file") - } - }) -} - -// TestCommitAndPush_NoChanges tests that CommitAndPush returns nil when -// there are no staged changes in the repository. -func TestCommitAndPush_NoChanges(t *testing.T) { - g := testGitOps("") - ctx := context.Background() - - // Create a real git repo with an initial commit - dir := t.TempDir() - if err := g.runGit(ctx, dir, "init"); err != nil { - t.Fatal("git init:", err) - } - if err := g.runGit(ctx, dir, "config", "user.name", "test"); err != nil { - t.Fatal("git config user.name:", err) - } - if err := g.runGit(ctx, dir, "config", "user.email", "test@test.com"); err != nil { - t.Fatal("git config user.email:", err) - } - // Create initial commit so HEAD exists - if err := os.WriteFile(filepath.Join(dir, "README.md"), []byte("init"), 0o644); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, dir, "add", "-A"); err != nil { - t.Fatal("git add:", err) - } - if err := g.runGit(ctx, dir, "commit", "-m", "initial"); err != nil { - t.Fatal("git commit:", err) - } - - // No new changes — should return empty with no error - sha, files, err := g.CommitAndPush(ctx, dir, "no changes", false) - if err != nil { - t.Errorf("expected no error, got: %v", err) - } - if sha != "" { - t.Errorf("expected empty SHA, got: %q", sha) - } - if len(files) != 0 { - t.Errorf("expected no files, got: %v", files) - } -} - -// TestCommitAndPush_WithChanges tests that CommitAndPush correctly stages, -// commits, and returns SHA and changed file list. -func TestCommitAndPush_WithChanges(t *testing.T) { - g := testGitOps("") - ctx := context.Background() - - // Create a real git repo - dir := t.TempDir() - if err := g.runGit(ctx, dir, "init"); err != nil { - t.Fatal("git init:", err) - } - if err := g.runGit(ctx, dir, "config", "user.name", "test"); err != nil { - t.Fatal("git config user.name:", err) - } - if err := g.runGit(ctx, dir, "config", "user.email", "test@test.com"); err != nil { - t.Fatal("git config user.email:", err) - } - // Initial commit - if err := os.WriteFile(filepath.Join(dir, "README.md"), []byte("init"), 0o644); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, dir, "add", "-A"); err != nil { - t.Fatal("git add:", err) - } - if err := g.runGit(ctx, dir, "commit", "-m", "initial"); err != nil { - t.Fatal("git commit:", err) - } - - // Create new files to commit - if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0o644); err != nil { - t.Fatal(err) - } - if err := os.WriteFile(filepath.Join(dir, "go.mod"), []byte("module test"), 0o644); err != nil { - t.Fatal(err) - } - - // CommitAndPush without push (no remote) - sha, files, err := g.CommitAndPush(ctx, dir, "add go files", false) - if err != nil { - t.Errorf("expected no error, got: %v", err) - } - if sha == "" { - t.Error("expected non-empty SHA") - } - if len(sha) < 7 { - t.Errorf("expected SHA to be at least 7 chars, got: %q", sha) - } - if len(files) != 2 { - t.Errorf("expected 2 changed files, got %d: %v", len(files), files) - } - - // Verify the files are in the list - fileSet := make(map[string]bool) - for _, f := range files { - fileSet[f] = true - } - if !fileSet["main.go"] { - t.Error("expected main.go in changed files") - } - if !fileSet["go.mod"] { - t.Error("expected go.mod in changed files") - } -} - -// TestCommitAndPush_PushWithoutRemote tests that push fails gracefully -// when there's no remote configured. -func TestCommitAndPush_PushWithoutRemote(t *testing.T) { - g := testGitOps("") - ctx := context.Background() - - dir := t.TempDir() - if err := g.runGit(ctx, dir, "init"); err != nil { - t.Fatal("git init:", err) - } - if err := g.runGit(ctx, dir, "config", "user.name", "test"); err != nil { - t.Fatal("git config:", err) - } - if err := g.runGit(ctx, dir, "config", "user.email", "test@test.com"); err != nil { - t.Fatal("git config:", err) - } - if err := os.WriteFile(filepath.Join(dir, "file.txt"), []byte("init"), 0o644); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, dir, "add", "-A"); err != nil { - t.Fatal("git add:", err) - } - if err := g.runGit(ctx, dir, "commit", "-m", "initial"); err != nil { - t.Fatal("git commit:", err) - } - - // Add a new file - if err := os.WriteFile(filepath.Join(dir, "new.txt"), []byte("new"), 0o644); err != nil { - t.Fatal(err) - } - - // Push should fail (no remote) but commit succeeds — SHA is returned - sha, files, err := g.CommitAndPush(ctx, dir, "test push", true) - if err == nil { - t.Error("expected push error when no remote configured") - } - // Even though push failed, commit should have succeeded - if sha == "" { - t.Error("expected SHA from successful commit before push failure") - } - if len(files) != 1 || files[0] != "new.txt" { - t.Errorf("expected [new.txt], got: %v", files) - } -} - -// TestCloneToTemp_InvalidURL tests that CloneToTemp fails on a bad URL. -func TestCloneToTemp_InvalidURL(t *testing.T) { - g := testGitOps("") - ctx := context.Background() - - _, _, err := g.CloneToTemp(ctx, "https://invalid.example.com/no-such-repo.git") - if err == nil { - t.Error("expected error cloning invalid URL") - } -} - -// TestCloneToTemp_LocalRepo tests cloning a local bare repository. -func TestCloneToTemp_LocalRepo(t *testing.T) { - g := testGitOps("") - ctx := context.Background() - - // Create a bare repo to clone from - bareDir := t.TempDir() - if err := g.runGit(ctx, bareDir, "init", "--bare"); err != nil { - t.Fatal("git init --bare:", err) - } - - // Create a source repo and push to the bare repo - srcDir := t.TempDir() - if err := g.runGit(ctx, srcDir, "init"); err != nil { - t.Fatal("git init:", err) - } - if err := g.runGit(ctx, srcDir, "config", "user.name", "test"); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, srcDir, "config", "user.email", "test@test.com"); err != nil { - t.Fatal(err) - } - if err := os.WriteFile(filepath.Join(srcDir, "hello.txt"), []byte("hello"), 0o644); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, srcDir, "add", "-A"); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, srcDir, "commit", "-m", "initial"); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, srcDir, "remote", "add", "origin", bareDir); err != nil { - t.Fatal(err) - } - if err := g.runGit(ctx, srcDir, "push", "origin", "master"); err != nil { - // Some git versions use "main" as default branch - if err2 := g.runGit(ctx, srcDir, "push", "origin", "main"); err2 != nil { - t.Fatalf("push failed for both master and main: master=%v, main=%v", err, err2) - } - } - - // Clone the bare repo using file:// protocol - cloneDir, cleanup, err := g.CloneToTemp(ctx, "file://"+bareDir) - if err != nil { - t.Fatalf("CloneToTemp failed: %v", err) - } - defer cleanup() - - // Verify the cloned file exists - content, err := os.ReadFile(filepath.Join(cloneDir, "hello.txt")) - if err != nil { - t.Fatalf("failed to read cloned file: %v", err) - } - if string(content) != "hello" { - t.Errorf("expected file content 'hello', got %q", string(content)) - } - - // Verify .git dir exists - if err := g.EnsureGitDir(cloneDir); err != nil { - t.Errorf("cloned dir should be a git repo: %v", err) - } - - // Verify git config was set - userName, err := g.runGitOutput(ctx, cloneDir, "config", "user.name") - if err != nil { - t.Fatalf("failed to get user.name: %v", err) - } - if got := strings.TrimSpace(userName); got != "test-user" { - t.Errorf("expected user.name 'test-user', got %q", got) - } -} - -func TestRunGit_ContextCancellation(t *testing.T) { - g := testGitOps("") - ctx, cancel := context.WithCancel(context.Background()) - cancel() // Cancel immediately - - dir := t.TempDir() - err := g.runGit(ctx, dir, "status") - if err == nil { - t.Error("expected error when context is cancelled") - } -} diff --git a/internal/worker/pod_git_operations.go b/internal/worker/pod_git_operations.go new file mode 100644 index 0000000..a24c1bf --- /dev/null +++ b/internal/worker/pod_git_operations.go @@ -0,0 +1,342 @@ +package worker + +import ( + "bytes" + "context" + "fmt" + "log/slog" + "os/exec" + "strings" +) + +// PodGitOperations provides git operations that run inside a Kubernetes pod +// via kubectl exec. This ensures git commands execute in the same environment +// where the code agent runs. +type PodGitOperations struct { + namespace string + giteaToken string + gitUser string + gitEmail string + logger *slog.Logger +} + +// PodGitOperationsConfig configures pod git operations. +type PodGitOperationsConfig struct { + // Namespace is the Kubernetes namespace for kubectl exec. + Namespace string + + // GiteaToken is the token for HTTPS push authentication. + GiteaToken string + + // GitUser is the git commit author name. + GitUser string + + // GitEmail is the git commit author email. + GitEmail string + + Logger *slog.Logger +} + +// NewPodGitOperations creates a new pod git operations helper. +func NewPodGitOperations(cfg PodGitOperationsConfig) *PodGitOperations { + if cfg.GitUser == "" { + cfg.GitUser = "rdev-worker" + } + if cfg.GitEmail == "" { + cfg.GitEmail = "worker@threesix.ai" + } + if cfg.Logger == nil { + cfg.Logger = slog.Default() + } + return &PodGitOperations{ + namespace: cfg.Namespace, + giteaToken: cfg.GiteaToken, + gitUser: cfg.GitUser, + gitEmail: cfg.GitEmail, + logger: cfg.Logger.With("component", "pod-git-ops"), + } +} + +// PostBuildResult contains the result of post-build git operations. +type PostBuildResult struct { + HasChanges bool + CommitSHA string + FilesChanged []string + Pushed bool + Error error +} + +// CloneResult contains the result of a git clone operation. +type CloneResult struct { + Cloned bool // True if repo was cloned, false if already existed + Error error +} + +// IsGitRepo checks if the given directory is a git repository. +func (g *PodGitOperations) IsGitRepo(ctx context.Context, podName, workDir string) bool { + // Check if .git directory exists + kubectlArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "test", "-d", workDir + "/.git", + } + cmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...) + return cmd.Run() == nil +} + +// CloneRepo clones a git repository into the workspace if it doesn't already exist. +// If the workspace already contains a git repo, it pulls the latest changes instead. +// If the workspace exists but is not a git repo, it clears the directory first. +func (g *PodGitOperations) CloneRepo(ctx context.Context, podName, workDir, cloneURL string) *CloneResult { + result := &CloneResult{} + + if cloneURL == "" { + result.Error = fmt.Errorf("git clone URL is required") + return result + } + + // Check if already a git repo with the correct remote + if g.IsGitRepo(ctx, podName, workDir) { + // Verify the remote URL matches the expected clone URL + currentRemote, err := g.runGitInPodOutput(ctx, podName, workDir, "config", "--get", "remote.origin.url") + currentRemote = strings.TrimSpace(currentRemote) + expectedURL := cloneURL + + // Normalize URLs for comparison (both should be HTTPS) + if err == nil && currentRemote == expectedURL { + g.logger.Info("workspace is already a git repo with correct remote, pulling latest", + "pod", podName, + "workDir", workDir, + ) + // Pull latest changes + if err := g.runGitInPod(ctx, podName, workDir, "pull", "--ff-only"); err != nil { + // Pull failed, but repo exists - not fatal, might have local changes + g.logger.Warn("git pull failed, continuing with existing state", + "pod", podName, + "error", err, + ) + } + return result + } + + // Remote doesn't match - this is a different project's repo + g.logger.Info("workspace has different git remote, will re-clone", + "pod", podName, + "workDir", workDir, + "currentRemote", currentRemote, + "expectedURL", expectedURL, + ) + } + + // Check if directory exists but is not a git repo - clear it first + if g.dirExists(ctx, podName, workDir) { + g.logger.Info("workspace exists but is not a git repo, clearing", + "pod", podName, + "workDir", workDir, + ) + // Clear the directory contents (but keep the directory itself) + clearArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "sh", "-c", fmt.Sprintf("rm -rf %s/* %s/.[!.]*", workDir, workDir), + } + cmd := exec.CommandContext(ctx, "kubectl", clearArgs...) + if err := cmd.Run(); err != nil { + g.logger.Warn("failed to clear workspace, attempting clone anyway", + "pod", podName, + "error", err, + ) + } + } + + // Configure credential helper for clone (for private repos) + authCloneURL := cloneURL + if g.giteaToken != "" { + // Inject token into clone URL for authentication + // https://git.example.com/owner/repo.git -> https://token:TOKEN@git.example.com/owner/repo.git + authCloneURL = strings.Replace(cloneURL, "https://", "https://token:"+g.giteaToken+"@", 1) + } + + g.logger.Info("cloning repository", + "pod", podName, + "workDir", workDir, + "url", cloneURL, // Log without token + ) + + // Clone the repository + kubectlArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "git", "clone", authCloneURL, workDir, + } + cmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...) + + var stderr bytes.Buffer + cmd.Stderr = &stderr + + if err := cmd.Run(); err != nil { + errMsg := g.redactToken(stderr.String()) + result.Error = fmt.Errorf("git clone failed: %s: %s", err, errMsg) + return result + } + + result.Cloned = true + g.logger.Info("repository cloned successfully", + "pod", podName, + "workDir", workDir, + ) + + return result +} + +// dirExists checks if a directory exists in the pod. +func (g *PodGitOperations) dirExists(ctx context.Context, podName, path string) bool { + kubectlArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "test", "-d", path, + } + cmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...) + return cmd.Run() == nil +} + +// CommitAndPush performs post-build git operations inside the pod: +// 1. Configures git user/email +// 2. Checks for changes (git status) +// 3. Stages all changes (git add -A) +// 4. Commits with the given message +// 5. Pushes if requested +// +// This is the programmatic alternative to relying on LLMs for git operations. +func (g *PodGitOperations) CommitAndPush(ctx context.Context, podName, workDir, message string, push bool) *PostBuildResult { + result := &PostBuildResult{} + + // Configure git user for commits + if err := g.runGitInPod(ctx, podName, workDir, "config", "user.name", g.gitUser); err != nil { + result.Error = fmt.Errorf("git config user.name: %w", err) + return result + } + if err := g.runGitInPod(ctx, podName, workDir, "config", "user.email", g.gitEmail); err != nil { + result.Error = fmt.Errorf("git config user.email: %w", err) + return result + } + + // Check for changes + status, err := g.runGitInPodOutput(ctx, podName, workDir, "status", "--porcelain") + if err != nil { + result.Error = fmt.Errorf("git status: %w", err) + return result + } + if strings.TrimSpace(status) == "" { + g.logger.Info("no changes to commit", "pod", podName, "workDir", workDir) + return result + } + result.HasChanges = true + + // Stage all changes + if err := g.runGitInPod(ctx, podName, workDir, "add", "-A"); err != nil { + result.Error = fmt.Errorf("git add: %w", err) + return result + } + + // Get list of staged files + diffOutput, err := g.runGitInPodOutput(ctx, podName, workDir, "diff", "--cached", "--name-only") + if err != nil { + result.Error = fmt.Errorf("git diff: %w", err) + return result + } + for _, f := range strings.Split(strings.TrimSpace(diffOutput), "\n") { + if f != "" { + result.FilesChanged = append(result.FilesChanged, f) + } + } + + // Commit + if err := g.runGitInPod(ctx, podName, workDir, "commit", "-m", message); err != nil { + result.Error = fmt.Errorf("git commit: %w", err) + return result + } + + // Get commit SHA + sha, err := g.runGitInPodOutput(ctx, podName, workDir, "rev-parse", "HEAD") + if err != nil { + result.Error = fmt.Errorf("git rev-parse: %w", err) + return result + } + result.CommitSHA = strings.TrimSpace(sha) + + g.logger.Info("committed changes", + "pod", podName, + "sha", result.CommitSHA, + "files", len(result.FilesChanged), + ) + + // Push if requested + if push { + // Configure credential helper for push + if g.giteaToken != "" { + // Use git credential helper to inject token + // This avoids putting the token in the URL which would be visible in logs + credHelper := fmt.Sprintf("!f() { echo username=token; echo password=%s; }; f", g.giteaToken) + if err := g.runGitInPod(ctx, podName, workDir, "config", "credential.helper", credHelper); err != nil { + g.logger.Warn("failed to configure credential helper", "error", err) + // Continue anyway - push might still work if pod has other auth configured + } + } + + if err := g.runGitInPod(ctx, podName, workDir, "push", "origin", "HEAD"); err != nil { + result.Error = fmt.Errorf("git push: %w", err) + return result + } + result.Pushed = true + g.logger.Info("pushed changes", "pod", podName, "sha", result.CommitSHA) + } + + return result +} + +// runGitInPod executes a git command inside the pod via kubectl exec. +func (g *PodGitOperations) runGitInPod(ctx context.Context, podName, workDir string, args ...string) error { + // Build: kubectl exec -n -- git -C + kubectlArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "git", "-C", workDir, + } + kubectlArgs = append(kubectlArgs, args...) + + cmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...) + + var stderr bytes.Buffer + cmd.Stderr = &stderr + + if err := cmd.Run(); err != nil { + errMsg := g.redactToken(stderr.String()) + return fmt.Errorf("%s: %s", err, errMsg) + } + return nil +} + +// runGitInPodOutput executes a git command and returns stdout. +func (g *PodGitOperations) runGitInPodOutput(ctx context.Context, podName, workDir string, args ...string) (string, error) { + kubectlArgs := []string{ + "exec", "-n", g.namespace, podName, "--", + "git", "-C", workDir, + } + kubectlArgs = append(kubectlArgs, args...) + + cmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...) + + var stdout, stderr bytes.Buffer + cmd.Stdout = &stdout + cmd.Stderr = &stderr + + if err := cmd.Run(); err != nil { + errMsg := g.redactToken(stderr.String()) + return "", fmt.Errorf("%s: %s", err, errMsg) + } + return stdout.String(), nil +} + +// redactToken removes the Gitea token from output. +func (g *PodGitOperations) redactToken(s string) string { + if g.giteaToken == "" { + return s + } + return strings.ReplaceAll(s, g.giteaToken, "[REDACTED]") +} diff --git a/internal/worker/work_executor_test.go b/internal/worker/work_executor_test.go index fad6ff3..c8dc5d7 100644 --- a/internal/worker/work_executor_test.go +++ b/internal/worker/work_executor_test.go @@ -200,7 +200,7 @@ func TestBuildExecutor_Execute(t *testing.T) { result: &domain.AgentResult{ExitCode: 0, DurationMs: 500}, } registry := &mockCodeAgentRegistry{agent: agent} - exec := NewBuildExecutor(registry, nil, nil, nil) + exec := NewBuildExecutor(registry, nil, nil, nil, nil) task := &domain.WorkTask{ ID: "task-1", @@ -220,7 +220,7 @@ func TestBuildExecutor_Execute(t *testing.T) { t.Run("missing prompt", func(t *testing.T) { registry := &mockCodeAgentRegistry{agent: &mockCodeAgent{}} - exec := NewBuildExecutor(registry, nil, nil, nil) + exec := NewBuildExecutor(registry, nil, nil, nil, nil) task := &domain.WorkTask{ ID: "task-1", @@ -236,7 +236,7 @@ func TestBuildExecutor_Execute(t *testing.T) { t.Run("no agent available", func(t *testing.T) { registry := &mockCodeAgentRegistry{agent: nil} - exec := NewBuildExecutor(registry, nil, nil, nil) + exec := NewBuildExecutor(registry, nil, nil, nil, nil) task := &domain.WorkTask{ ID: "task-1", @@ -253,7 +253,7 @@ func TestBuildExecutor_Execute(t *testing.T) { t.Run("agent execution error", func(t *testing.T) { agent := &mockCodeAgent{err: fmt.Errorf("connection refused")} registry := &mockCodeAgentRegistry{agent: agent} - exec := NewBuildExecutor(registry, nil, nil, nil) + exec := NewBuildExecutor(registry, nil, nil, nil, nil) task := &domain.WorkTask{ ID: "task-1", @@ -275,7 +275,7 @@ func TestBuildExecutor_Execute(t *testing.T) { result: &domain.AgentResult{ExitCode: 1, DurationMs: 500}, } registry := &mockCodeAgentRegistry{agent: agent} - exec := NewBuildExecutor(registry, nil, nil, nil) + exec := NewBuildExecutor(registry, nil, nil, nil, nil) task := &domain.WorkTask{ ID: "task-1", @@ -291,7 +291,7 @@ func TestBuildExecutor_Execute(t *testing.T) { } func TestBuildExecutor_ParseSpec(t *testing.T) { - exec := NewBuildExecutor(nil, nil, nil, nil) + exec := NewBuildExecutor(nil, nil, nil, nil, nil) t.Run("valid spec", func(t *testing.T) { spec, err := exec.parseSpec(map[string]any{ diff --git a/scripts/load-test-builds.sh b/scripts/load-test-builds.sh new file mode 100755 index 0000000..b464c00 --- /dev/null +++ b/scripts/load-test-builds.sh @@ -0,0 +1,165 @@ +#!/bin/bash +# Load Test Script for Build Streaming +# Simulates concurrent builds with SSE clients to verify event delivery +# +# Usage: +# ./scripts/load-test-builds.sh [num_concurrent] [duration_seconds] +# +# Example: +# ./scripts/load-test-builds.sh 10 60 # 10 concurrent streams for 60 seconds + +set -euo pipefail + +# Configuration +API_URL="${RDEV_API_URL:-https://rdev.masq-ops.orchard9.ai}" +API_KEY="${RDEV_API_KEY:?RDEV_API_KEY environment variable required}" + +NUM_CONCURRENT="${1:-5}" +DURATION="${2:-30}" + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +log_info() { echo -e "${BLUE}[INFO]${NC} $1"; } +log_success() { echo -e "${GREEN}[OK]${NC} $1"; } +log_error() { echo -e "${RED}[ERROR]${NC} $1"; } + +# Track metrics +EVENTS_RECEIVED=0 +CONNECTIONS_OPENED=0 +CONNECTIONS_FAILED=0 +STREAM_PIDS=() + +# Cleanup on exit +cleanup() { + log_info "Cleaning up..." + for pid in "${STREAM_PIDS[@]}"; do + kill "$pid" 2>/dev/null || true + done +} +trap cleanup EXIT + +# Simulate an SSE client that counts events +run_sse_client() { + local client_id="$1" + local stream_id="load-test-stream-$client_id" + local events_file="/tmp/load-test-events-$client_id.txt" + + echo "0" > "$events_file" + + # Connect and count events + curl -s -N \ + -H "X-API-Key: ${API_KEY}" \ + -H "Accept: text/event-stream" \ + "${API_URL}/projects/load-test-project/events?stream_id=${stream_id}" 2>/dev/null | \ + while IFS= read -r line; do + if [[ "$line" == "data:"* ]]; then + # Increment event count + count=$(<"$events_file") + echo "$((count + 1))" > "$events_file" + fi + done & + + echo $! +} + +# Publish events to simulate build activity +publish_events() { + local num_events="$1" + local stream_id="$2" + + for i in $(seq 1 "$num_events"); do + # Use curl to trigger some activity (this is a simulation) + # In real usage, events come from BuildExecutor + sleep 0.1 + done +} + +echo "" +echo "==========================================" +echo " Build Streaming Load Test" +echo "==========================================" +echo "" +echo " Concurrent clients: $NUM_CONCURRENT" +echo " Test duration: ${DURATION}s" +echo " API URL: $API_URL" +echo "" + +# Check API health first +log_info "Checking API health..." +if ! curl -sf "${API_URL}/health" > /dev/null 2>&1; then + log_error "API health check failed" + exit 1 +fi +log_success "API is healthy" +echo "" + +# Start concurrent SSE clients +log_info "Starting $NUM_CONCURRENT SSE clients..." +for i in $(seq 1 "$NUM_CONCURRENT"); do + pid=$(run_sse_client "$i") + if [[ -n "$pid" ]]; then + STREAM_PIDS+=("$pid") + CONNECTIONS_OPENED=$((CONNECTIONS_OPENED + 1)) + else + CONNECTIONS_FAILED=$((CONNECTIONS_FAILED + 1)) + fi +done + +log_success "Started $CONNECTIONS_OPENED clients ($CONNECTIONS_FAILED failed)" +echo "" + +# Wait for test duration +log_info "Running load test for ${DURATION}s..." +sleep "$DURATION" + +# Collect results +log_info "Collecting results..." + +total_events=0 +for i in $(seq 1 "$NUM_CONCURRENT"); do + events_file="/tmp/load-test-events-$i.txt" + if [[ -f "$events_file" ]]; then + count=$(<"$events_file") + total_events=$((total_events + count)) + rm -f "$events_file" + fi +done + +# Kill remaining clients +for pid in "${STREAM_PIDS[@]}"; do + kill "$pid" 2>/dev/null || true +done +STREAM_PIDS=() + +echo "" +echo "==========================================" +echo " Load Test Results" +echo "==========================================" +echo "" +echo " Duration: ${DURATION}s" +echo " Clients started: $CONNECTIONS_OPENED" +echo " Clients failed: $CONNECTIONS_FAILED" +echo " Total events received: $total_events" +echo " Events per second: $(echo "scale=2; $total_events / $DURATION" | bc 2>/dev/null || echo "N/A")" +echo " Events per client: $(echo "scale=2; $total_events / $CONNECTIONS_OPENED" | bc 2>/dev/null || echo "N/A")" +echo "" + +# Check for memory usage if running locally +if command -v ps > /dev/null 2>&1; then + log_info "Memory usage check:" + echo " Run 'kubectl top pods -n rdev' to check rdev-api memory" +fi + +echo "" +if [[ $CONNECTIONS_FAILED -eq 0 ]] && [[ $total_events -gt 0 ]]; then + log_success "Load test completed successfully" + exit 0 +else + log_error "Load test completed with issues" + exit 1 +fi