feat: add RWX storage class and full SDLC lifecycle cookbook
- Add longhorn-rwx StorageClass for RWX volume support - Add slackpath-5-full-lifecycle.yaml cookbook tree (all 10 SDLC phases) - Update worker-pool.md documentation - Consolidate PVC configuration, remove separate pvc-shared-claude.yaml - Update rdev-worker and kustomization for new PVC structure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
d74efb75ff
commit
bc010c4746
@ -14,7 +14,7 @@ Quick reference for rdev concepts and facts.
|
||||
| Webhooks | [services/webhooks.md](./services/webhooks.md) | High | 2025-01 | Event subscriptions and delivery |
|
||||
| **Worker Infrastructure** |
|
||||
| Work Queue | [services/work-queue.md](./services/work-queue.md) | High | 2025-01 | Task queue for worker pool |
|
||||
| Worker Pool | [services/worker-pool.md](./services/worker-pool.md) | High | 2026-01 | Embedded work executor with queue maintenance and metrics |
|
||||
| Worker Pool | [services/worker-pool.md](./services/worker-pool.md) | High | 2026-02 | Standalone worker pods with claudebox sidecar, HTTP polling |
|
||||
| External Health | [services/external-health.md](./services/external-health.md) | High | 2026-02 | Background health monitoring of registry, CI, git |
|
||||
| CI Provider | [services/ci-provider.md](./services/ci-provider.md) | High | 2025-01 | Woodpecker auto-activation |
|
||||
| DNS / Cloudflare | [services/dns-cloudflare.md](./services/dns-cloudflare.md) | High | 2026-01 | Domain management for threesix.ai |
|
||||
|
||||
@ -1,79 +1,193 @@
|
||||
# Worker Pool
|
||||
|
||||
**Last Updated:** 2026-01-31
|
||||
**Last Updated:** 2026-02-06
|
||||
**Confidence:** High
|
||||
|
||||
## Summary
|
||||
|
||||
Shared worker pool that executes build tasks for any project. Currently runs as an embedded WorkExecutor daemon inside rdev-api. Workers register with the worker registry, poll the work queue for tasks, execute Claude Code in pods via kubectl exec. Post-build git operations (commit/push) are programmatic via PodGitOperations, not LLM-driven.
|
||||
Distributed task execution system where standalone worker pods poll rdev-api for tasks and execute them via a claudebox sidecar. Supports horizontal scaling by adding more worker pods.
|
||||
|
||||
**Key Facts:**
|
||||
- **LLM vs rdev boundary:** Claude writes code; rdev handles git ops programmatically (no LLM for runbook tasks)
|
||||
- Embedded WorkExecutor daemon runs inside rdev-api process
|
||||
- Workers poll work queue every 5 seconds, heartbeat every 30 seconds
|
||||
- Stale workers (no heartbeat for 2 minutes) automatically marked offline by QueueMaintenance
|
||||
- Stale tasks (running >30 min without completion) automatically requeued
|
||||
- Old tasks (>7 days) automatically cleaned up
|
||||
- Queue depth and worker counts exported as Prometheus metrics
|
||||
- Future: external worker binary for separate pod deployment
|
||||
- **Architecture:** Pull-based polling (not push/websocket)
|
||||
- **Sidecar pattern:** Worker + claudebox in same pod, communicate via localhost HTTP
|
||||
- **Atomic dequeue:** PostgreSQL `FOR UPDATE SKIP LOCKED` prevents duplicate claims
|
||||
- **Task types:** `build` (Claude Code prompts), `sdlc` (SDLC commands)
|
||||
- **Scaling:** Add replicas to handle more concurrent tasks
|
||||
- **Resilience:** Stale workers marked offline, stuck tasks re-queued automatically
|
||||
|
||||
**File Pointers:**
|
||||
- Domain: `internal/domain/worker.go` (Worker, WorkerStatus)
|
||||
- Domain: `internal/domain/build.go` (BuildSpec, BuildResult)
|
||||
- Port: `internal/port/worker_registry.go` (WorkerRegistry interface)
|
||||
- Port: `internal/port/build_audit.go` (BuildAudit interface)
|
||||
- Adapter: `internal/adapter/postgres/worker_registry.go`
|
||||
- Adapter: `internal/adapter/postgres/build_audit.go`
|
||||
- Service: `internal/service/worker_service.go`
|
||||
- Service: `internal/service/build_service.go`
|
||||
- Executor: `internal/worker/work_executor.go` (poll loop, heartbeat, task routing)
|
||||
- Executor: `internal/worker/build_executor.go` (BuildSpec→AgentRequest)
|
||||
- Git: `internal/worker/pod_git_operations.go` (post-build commit/push via kubectl exec)
|
||||
- Maintenance: `internal/worker/queue_maintenance.go` (stale recovery, cleanup, metrics)
|
||||
- Handler: `internal/handlers/workers.go` (REST API for workers)
|
||||
- Handler: `internal/handlers/builds.go` (REST API for builds)
|
||||
- Handler: `internal/handlers/create_and_build.go` (combined create+build)
|
||||
- Migration: `internal/db/migrations/012_worker_registry.sql`
|
||||
## File Pointers
|
||||
|
||||
## Worker Lifecycle (Embedded)
|
||||
### Standalone Worker Binary
|
||||
- **Entry:** `cmd/rdev-worker/main.go` - Main binary, registration, heartbeat, poll loop
|
||||
- **API Client:** `internal/worker/api_client.go` - HTTP client to rdev-api
|
||||
- **Build Executor:** `internal/worker/http_build_executor.go` - Execute builds via claudebox
|
||||
- **SDLC Executor:** `internal/worker/http_sdlc_executor.go` - Execute SDLC tasks via claudebox
|
||||
|
||||
1. rdev-api starts → WorkExecutor registers as worker in registry
|
||||
2. Heartbeat loop: every 30s sends heartbeat via WorkerService
|
||||
3. Poll loop: every 5s dequeues next task from work queue
|
||||
4. BuildExecutor: executes CodeAgent in pod, then programmatically commits/pushes if auto_commit
|
||||
5. Reports completion with BuildResult via WorkerService
|
||||
6. Graceful shutdown: deregisters worker on rdev-api stop
|
||||
### Claudebox Sidecar Client
|
||||
- **Client:** `internal/adapter/claudebox/client.go` - HTTP client to claudebox sidecar
|
||||
- **Endpoints:** `/health`, `/execute`, `/git/clone`, `/git/commit-and-push`, `/sdlc`
|
||||
|
||||
### rdev-api Server-Side
|
||||
- **Handlers:** `internal/handlers/workers.go` - `/workers/*` endpoints
|
||||
- **Service:** `internal/service/worker_service.go` - Claim, complete, fail logic
|
||||
- **Registry:** `internal/adapter/postgres/worker_registry.go` - Worker state persistence
|
||||
- **Queue:** `internal/adapter/postgres/work_queue.go` - Task queue with atomic dequeue
|
||||
|
||||
### Domain
|
||||
- **Worker:** `internal/domain/worker.go` - Worker, WorkerStatus
|
||||
- **Task:** `internal/domain/work.go` - WorkTask, WorkTaskType, WorkTaskStatus
|
||||
- **Build:** `internal/domain/build.go` - BuildSpec, BuildResult
|
||||
|
||||
### Kubernetes
|
||||
- **Deployment:** `deployments/k8s/base/rdev-worker.yaml` - Worker + claudebox pod spec
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────┐ HTTP Polling (5s) ┌──────────────────────────┐
|
||||
│ rdev-api │◄────────────────────────────────►│ Worker Pod │
|
||||
│ │ │ ┌─────────┐ ┌─────────┐ │
|
||||
│ POST /workers/register ← Register at startup │ │ worker │→│claudebox│ │
|
||||
│ POST /workers/{id}/heartbeat ← Every 30s │ └─────────┘ └─────────┘ │
|
||||
│ POST /workers/{id}/claim ← Poll for tasks │ ↓ HTTP localhost │
|
||||
│ POST /workers/{id}/complete/{taskId} ← Success │ Claude Code execution │
|
||||
│ POST /workers/{id}/fail/{taskId} ← Failure └──────────────────────────┘
|
||||
│ │
|
||||
│ PostgreSQL │
|
||||
│ ├─ workers │ (worker registry)
|
||||
│ ├─ work_queue │ (task queue)
|
||||
│ └─ build_audit │ (execution history)
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
## Worker Lifecycle
|
||||
|
||||
1. **Register:** Worker pod starts → `POST /workers/register` with ID, hostname, capabilities
|
||||
2. **Heartbeat:** Every 30s → `POST /workers/{id}/heartbeat` to stay alive
|
||||
3. **Poll:** Every 5s → `POST /workers/{id}/claim` to get next task
|
||||
4. **Execute:** Call claudebox sidecar HTTP API to run Claude Code / SDLC commands
|
||||
5. **Report:** `POST /workers/{id}/complete/{taskId}` or `/fail/{taskId}` with results
|
||||
6. **Shutdown:** Graceful wait for in-flight tasks via `sync.WaitGroup`
|
||||
|
||||
## Worker Statuses
|
||||
|
||||
- `idle` - available for new tasks
|
||||
- `busy` - currently executing a task
|
||||
- `draining` - not accepting new tasks (pre-shutdown)
|
||||
- `offline` - missed heartbeat threshold
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| `idle` | Ready to claim new tasks |
|
||||
| `busy` | Currently executing a task |
|
||||
| `draining` | Not accepting new tasks (pre-shutdown) |
|
||||
| `offline` | Missed heartbeat threshold (>90s) |
|
||||
|
||||
## Task Types
|
||||
|
||||
### Build Tasks (`WorkTaskTypeBuild`)
|
||||
|
||||
Execute Claude Code prompts with optional git operations.
|
||||
|
||||
**Spec:**
|
||||
```json
|
||||
{
|
||||
"prompt": "Build a React app with...",
|
||||
"auto_commit": true,
|
||||
"auto_push": false,
|
||||
"git_clone_url": "https://gitea.../repo.git"
|
||||
}
|
||||
```
|
||||
|
||||
**Execution Flow:**
|
||||
1. Clone repo via `claudebox /git/clone`
|
||||
2. Execute prompt via `claudebox /execute` (streaming)
|
||||
3. Commit/push via `claudebox /git/commit-and-push`
|
||||
|
||||
### SDLC Tasks (`WorkTaskTypeSDLC`)
|
||||
|
||||
Execute SDLC CLI commands.
|
||||
|
||||
**Spec:**
|
||||
```json
|
||||
{
|
||||
"command": "feature",
|
||||
"args": ["init", "feature-name"],
|
||||
"git_clone_url": "https://gitea.../repo.git"
|
||||
}
|
||||
```
|
||||
|
||||
**Execution Flow:**
|
||||
1. Clone repo via `claudebox /git/clone`
|
||||
2. Run SDLC command via `claudebox /sdlc`
|
||||
3. Commit/push changes
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/workers` | List all workers with status summary |
|
||||
| GET | `/workers/{workerId}` | Get worker details |
|
||||
| POST | `/workers/{workerId}/drain` | Set worker to draining |
|
||||
| POST | `/projects/{id}/builds` | Start build for project |
|
||||
| GET | `/projects/{id}/builds` | List builds for project |
|
||||
| GET | `/builds/{taskId}` | Get build status |
|
||||
| POST | `/project/create-and-build` | Create project + start build |
|
||||
| POST | `/workers/register` | Register new worker |
|
||||
| POST | `/workers/{id}/heartbeat` | Keep worker alive |
|
||||
| POST | `/workers/{id}/claim` | Claim next available task (204 if none) |
|
||||
| POST | `/workers/{id}/complete/{taskId}` | Report successful completion |
|
||||
| POST | `/workers/{id}/fail/{taskId}` | Report failure |
|
||||
| GET | `/workers` | List all workers |
|
||||
| GET | `/workers/{id}` | Get worker details |
|
||||
| POST | `/workers/{id}/drain` | Set worker to draining |
|
||||
|
||||
## Kubernetes Deployment
|
||||
|
||||
```yaml
|
||||
# deployments/k8s/base/rdev-worker.yaml
|
||||
spec:
|
||||
replicas: 1 # Scale by increasing
|
||||
strategy:
|
||||
type: RollingUpdate # RWX PVC enables multi-pod mounts
|
||||
rollingUpdate:
|
||||
maxSurge: 2
|
||||
maxUnavailable: 0
|
||||
containers:
|
||||
- name: worker
|
||||
image: registry.threesix.ai/rdev/worker:latest
|
||||
env:
|
||||
- RDEV_API_URL: http://rdev-api.rdev.svc.cluster.local:8080
|
||||
- CLAUDEBOX_URL: http://localhost:8080
|
||||
- WORKER_POLL_INTERVAL: 5s
|
||||
- WORKER_HEARTBEAT_INTERVAL: 30s
|
||||
- WORKER_TASK_TIMEOUT: 15m
|
||||
- name: claudebox
|
||||
image: registry.threesix.ai/rdev/claudebox:latest
|
||||
volumeMounts:
|
||||
- /workspace (EmptyDir)
|
||||
- /root/.claude (RWX PVC - shared Claude auth)
|
||||
```
|
||||
|
||||
**Storage:** The `claudebox-claude-config` PVC uses `ReadWriteMany` (RWX) access mode with Longhorn NFS, allowing multiple worker pods to share Claude OAuth credentials.
|
||||
|
||||
## Error Classification
|
||||
|
||||
Failed tasks are classified for smart retry logic:
|
||||
|
||||
| Code | Trigger | Retryable |
|
||||
|------|---------|-----------|
|
||||
| `RATE_LIMITED` | "rate limit", "quota exceeded" | Yes (with backoff) |
|
||||
| `AUTH_FAILED` | "unauthorized", "invalid api key" | No |
|
||||
| `TIMEOUT` | "context deadline exceeded" | Yes |
|
||||
| `AGENT_ERROR` | Generic error | Yes (limited retries) |
|
||||
|
||||
## Queue Maintenance
|
||||
|
||||
The QueueMaintenance worker runs inside rdev-api alongside the WorkExecutor:
|
||||
- **Stale task recovery** (every 1m): Requeues tasks running >30m without completion. Also syncs build_audit status to "pending" so API correctly reflects requeued state.
|
||||
- **Stale worker marking** (every 1m): Marks workers offline after 2m without heartbeat
|
||||
- **Old task cleanup** (every 1m): Removes completed/failed/cancelled tasks >7 days old
|
||||
- **Metrics refresh** (every 15s): Updates Prometheus gauges for queue depth and worker counts
|
||||
Background goroutine in rdev-api:
|
||||
- **Stale worker marking:** Workers without heartbeat >90s → `offline`
|
||||
- **Stale task recovery:** Tasks running >30m without completion → re-queued
|
||||
- **Old task cleanup:** Completed/failed tasks >7 days → deleted
|
||||
- **Metrics refresh:** Queue depth and worker counts → Prometheus
|
||||
|
||||
**Build Audit Sync:** When stale tasks are requeued, both `work_queue` and `build_audit` tables are updated atomically. This prevents builds from appearing stuck in "running" when the underlying task has been requeued for retry due to worker death.
|
||||
## Graceful Shutdown
|
||||
|
||||
Worker uses `sync.WaitGroup` to track in-flight tasks:
|
||||
1. Receive SIGTERM/SIGINT
|
||||
2. Cancel context (stops polling)
|
||||
3. Wait for WaitGroup with timeout (`WORKER_TASK_TIMEOUT`)
|
||||
4. Log success or timeout warning
|
||||
|
||||
## Related Topics
|
||||
|
||||
- [Work Queue](./work-queue.md)
|
||||
- [Build Orchestration](../features/build-orchestration.md)
|
||||
- [Work Queue](./work-queue.md) - Task queue implementation
|
||||
- [Build Orchestration](../features/build-orchestration.md) - Build API and specs
|
||||
- [SDLC Orchestration](./sdlc.md) - SDLC task integration
|
||||
|
||||
536
cookbooks/trees/slackpath-5-full-lifecycle.yaml
Normal file
536
cookbooks/trees/slackpath-5-full-lifecycle.yaml
Normal file
@ -0,0 +1,536 @@
|
||||
name: full-lifecycle
|
||||
description: "Slack Path 5: The Full Lifecycle. Tests all 10 SDLC phases with explicit artifact approvals."
|
||||
version: 1
|
||||
|
||||
vars:
|
||||
project_name: ""
|
||||
feature_slug: "user-preferences"
|
||||
feature_title: "User Preferences API"
|
||||
|
||||
steps:
|
||||
# ============================================================
|
||||
# INFRASTRUCTURE
|
||||
# ============================================================
|
||||
create-project:
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: /project
|
||||
body:
|
||||
name: "{{ .vars.project_name }}"
|
||||
description: "Slack Path 5: Full SDLC Lifecycle"
|
||||
outputs:
|
||||
- project_id: .data.name
|
||||
- domain: .data.domain
|
||||
|
||||
add-db:
|
||||
description: Add database for preferences storage
|
||||
depends_on: [create-project]
|
||||
on_error: continue
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
|
||||
body:
|
||||
type: postgres
|
||||
name: "main-db"
|
||||
|
||||
add-service:
|
||||
description: Add API service
|
||||
depends_on: [add-db]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
|
||||
body:
|
||||
type: service
|
||||
name: "preferences-api"
|
||||
|
||||
wait-init:
|
||||
depends_on: [add-service]
|
||||
action: wait_pipeline
|
||||
project_id: "{{ .outputs.create-project.project_id }}"
|
||||
|
||||
# ============================================================
|
||||
# PHASE 1: DRAFT
|
||||
# Create feature (starts in draft phase)
|
||||
# ============================================================
|
||||
create-feature:
|
||||
description: "Create feature in draft phase"
|
||||
depends_on: [wait-init]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features"
|
||||
body:
|
||||
slug: "{{ .vars.feature_slug }}"
|
||||
title: "{{ .vars.feature_title }}"
|
||||
outputs:
|
||||
- feature_phase: .data.phase
|
||||
|
||||
verify-draft:
|
||||
description: "Verify feature is in draft phase"
|
||||
depends_on: [create-feature]
|
||||
action: shell
|
||||
command: |
|
||||
PHASE="{{ .outputs.create-feature.feature_phase }}"
|
||||
if [ "$PHASE" == "draft" ]; then
|
||||
echo "Feature created in draft phase"
|
||||
exit 0
|
||||
else
|
||||
echo "Expected draft, got $PHASE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ============================================================
|
||||
# PHASE 2: DRAFT → SPECIFIED
|
||||
# Agent writes spec, API approves, transition
|
||||
# ============================================================
|
||||
write-spec:
|
||||
description: "Agent writes the spec artifact"
|
||||
depends_on: [verify-draft]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/spec-feature {{ .vars.feature_slug }} --requirements 'CRUD API for user preferences. GET/PUT /preferences/{user_id}. Preferences are key-value pairs stored in DB. Support theme, language, notifications settings.'"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-spec:
|
||||
depends_on: [write-spec]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-spec.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-spec:
|
||||
description: "API approves the spec artifact"
|
||||
depends_on: [wait-spec]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/spec/approve"
|
||||
body:
|
||||
comment: "Spec approved by automation"
|
||||
|
||||
transition-to-specified:
|
||||
description: "Transition from draft to specified"
|
||||
depends_on: [approve-spec]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "specified"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 3: SPECIFIED → PLANNED
|
||||
# Agent writes design, tasks, qa_plan. API approves each.
|
||||
# ============================================================
|
||||
write-design:
|
||||
description: "Agent writes the design artifact"
|
||||
depends_on: [transition-to-specified]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/design-feature {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-design:
|
||||
depends_on: [write-design]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-design.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-design:
|
||||
description: "API approves the design artifact"
|
||||
depends_on: [wait-design]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/design/approve"
|
||||
body:
|
||||
comment: "Design approved by automation"
|
||||
|
||||
write-tasks:
|
||||
description: "Agent breaks down into tasks"
|
||||
depends_on: [approve-design]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/breakdown-feature {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-tasks:
|
||||
depends_on: [write-tasks]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-tasks.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-tasks:
|
||||
description: "API approves the tasks artifact"
|
||||
depends_on: [wait-tasks]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/tasks/approve"
|
||||
body:
|
||||
comment: "Tasks approved by automation"
|
||||
|
||||
write-qa-plan:
|
||||
description: "Agent writes QA plan"
|
||||
depends_on: [approve-tasks]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/create-qa-plan {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-qa-plan:
|
||||
depends_on: [write-qa-plan]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-qa-plan.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-qa-plan:
|
||||
description: "API approves the QA plan artifact"
|
||||
depends_on: [wait-qa-plan]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/qa_plan/approve"
|
||||
body:
|
||||
comment: "QA plan approved by automation"
|
||||
|
||||
transition-to-planned:
|
||||
description: "Transition from specified to planned"
|
||||
depends_on: [approve-qa-plan]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "planned"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 4: PLANNED → READY
|
||||
# No new artifacts needed, just transition
|
||||
# ============================================================
|
||||
transition-to-ready:
|
||||
description: "Transition from planned to ready"
|
||||
depends_on: [transition-to-planned]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "ready"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 5: READY → IMPLEMENTATION
|
||||
# Agent implements all tasks
|
||||
# ============================================================
|
||||
implement-feature:
|
||||
description: "Agent implements all tasks for the feature"
|
||||
depends_on: [transition-to-ready]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/implement-feature {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-implement:
|
||||
depends_on: [implement-feature]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.implement-feature.build_id }}"
|
||||
max_attempts: 120
|
||||
poll_interval: 5
|
||||
|
||||
wait-deploy-impl:
|
||||
description: "Wait for implementation to deploy"
|
||||
depends_on: [wait-implement]
|
||||
action: wait_pipeline
|
||||
project_id: "{{ .outputs.create-project.project_id }}"
|
||||
|
||||
transition-to-implementation:
|
||||
description: "Transition to implementation phase (marks code complete)"
|
||||
depends_on: [wait-deploy-impl]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "implementation"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 6: IMPLEMENTATION → REVIEW
|
||||
# Agent writes code review
|
||||
# ============================================================
|
||||
write-review:
|
||||
description: "Agent writes code review"
|
||||
depends_on: [transition-to-implementation]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/review-feature {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-review:
|
||||
depends_on: [write-review]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-review.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-review:
|
||||
description: "API approves the review"
|
||||
depends_on: [wait-review]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/review/approve"
|
||||
body:
|
||||
comment: "Review approved by automation"
|
||||
|
||||
transition-to-review:
|
||||
description: "Transition to review phase"
|
||||
depends_on: [approve-review]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "review"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 7: REVIEW → AUDIT
|
||||
# Agent writes security/architecture audit
|
||||
# ============================================================
|
||||
write-audit:
|
||||
description: "Agent writes security audit"
|
||||
depends_on: [transition-to-review]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/audit-feature {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-audit:
|
||||
depends_on: [write-audit]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.write-audit.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
approve-audit:
|
||||
description: "API approves the audit"
|
||||
depends_on: [wait-audit]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/audit/approve"
|
||||
body:
|
||||
comment: "Audit approved by automation"
|
||||
|
||||
transition-to-audit:
|
||||
description: "Transition to audit phase"
|
||||
depends_on: [approve-audit]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "audit"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 8: AUDIT → QA
|
||||
# Agent runs QA tests
|
||||
# ============================================================
|
||||
run-qa:
|
||||
description: "Agent runs QA plan"
|
||||
depends_on: [transition-to-audit]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
|
||||
body:
|
||||
prompt: "/run-qa {{ .vars.feature_slug }}"
|
||||
auto_commit: true
|
||||
auto_push: true
|
||||
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
|
||||
outputs:
|
||||
- build_id: .data.task_id
|
||||
|
||||
wait-qa:
|
||||
depends_on: [run-qa]
|
||||
action: wait_build
|
||||
build_id: "{{ .outputs.run-qa.build_id }}"
|
||||
max_attempts: 60
|
||||
poll_interval: 5
|
||||
|
||||
transition-to-qa:
|
||||
description: "Transition to QA phase"
|
||||
depends_on: [wait-qa]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "qa"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 9: QA → MERGE
|
||||
# Merge feature branch to main
|
||||
# ============================================================
|
||||
merge-feature:
|
||||
description: "Merge feature branch to main"
|
||||
depends_on: [transition-to-qa]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/merge"
|
||||
body:
|
||||
strategy: "squash"
|
||||
outputs:
|
||||
- merge_commit: .data.commit_sha
|
||||
|
||||
transition-to-merge:
|
||||
description: "Transition to merge phase"
|
||||
depends_on: [merge-feature]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "merge"
|
||||
outputs:
|
||||
- new_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# PHASE 10: MERGE → RELEASED
|
||||
# Archive the feature
|
||||
# ============================================================
|
||||
wait-final-deploy:
|
||||
description: "Wait for merged code to deploy"
|
||||
depends_on: [transition-to-merge]
|
||||
action: wait_pipeline
|
||||
project_id: "{{ .outputs.create-project.project_id }}"
|
||||
|
||||
archive-feature:
|
||||
description: "Archive the completed feature"
|
||||
depends_on: [wait-final-deploy]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/archive"
|
||||
|
||||
transition-to-released:
|
||||
description: "Transition to released phase"
|
||||
depends_on: [archive-feature]
|
||||
action: api
|
||||
method: POST
|
||||
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
|
||||
body:
|
||||
phase: "released"
|
||||
outputs:
|
||||
- final_phase: .data.phase
|
||||
|
||||
# ============================================================
|
||||
# VERIFICATION
|
||||
# ============================================================
|
||||
verify-service-running:
|
||||
description: "Verify the preferences API is running"
|
||||
depends_on: [transition-to-released]
|
||||
action: shell
|
||||
command: |
|
||||
DOMAIN="{{ .outputs.create-project.domain }}"
|
||||
HEALTH=$(curl -s "https://$DOMAIN/api/preferences-api/health" | jq -r '.data.status // empty')
|
||||
if [ "$HEALTH" == "healthy" ]; then
|
||||
echo "Service healthy"
|
||||
exit 0
|
||||
else
|
||||
echo "Service not healthy: $HEALTH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
verify-preferences-api:
|
||||
description: "Test CRUD operations on preferences"
|
||||
depends_on: [verify-service-running]
|
||||
on_error: continue
|
||||
action: shell
|
||||
command: |
|
||||
DOMAIN="{{ .outputs.create-project.domain }}"
|
||||
BASE_URL="https://$DOMAIN/api/preferences-api"
|
||||
USER_ID="test-user-123"
|
||||
|
||||
# PUT preferences
|
||||
echo "Setting preferences..."
|
||||
PUT_RESP=$(curl -s -X PUT "$BASE_URL/preferences/$USER_ID" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"theme":"dark","language":"en","notifications":true}')
|
||||
echo "PUT response: $PUT_RESP"
|
||||
|
||||
# GET preferences
|
||||
echo "Getting preferences..."
|
||||
GET_RESP=$(curl -s "$BASE_URL/preferences/$USER_ID")
|
||||
echo "GET response: $GET_RESP"
|
||||
|
||||
# Verify theme is dark
|
||||
THEME=$(echo "$GET_RESP" | jq -r '.theme // .data.theme // empty')
|
||||
if [ "$THEME" == "dark" ]; then
|
||||
echo "Preferences API working correctly"
|
||||
exit 0
|
||||
else
|
||||
echo "Expected theme=dark, got: $THEME"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
verify-lifecycle-complete:
|
||||
description: "Verify feature reached released phase"
|
||||
depends_on: [verify-preferences-api]
|
||||
action: shell
|
||||
command: |
|
||||
FINAL_PHASE="{{ .outputs.transition-to-released.final_phase }}"
|
||||
if [ "$FINAL_PHASE" == "released" ]; then
|
||||
echo "SUCCESS: Feature completed full lifecycle (draft → released)"
|
||||
echo "All 10 phases traversed with explicit approvals"
|
||||
exit 0
|
||||
else
|
||||
echo "FAIL: Expected released, got $FINAL_PHASE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
teardown:
|
||||
- action: api
|
||||
method: DELETE
|
||||
endpoint: "/project/{{ .outputs.create-project.project_id }}"
|
||||
@ -6,9 +6,11 @@ namespace: rdev
|
||||
resources:
|
||||
- namespace.yaml
|
||||
|
||||
# Storage classes (must be applied before PVCs)
|
||||
- storageclass-rwx.yaml
|
||||
|
||||
# Shared worker claudebox (runs all project builds)
|
||||
- pvc.yaml
|
||||
- pvc-shared-claude.yaml
|
||||
- claudebox.yaml
|
||||
- configmaps.yaml
|
||||
|
||||
|
||||
@ -1,29 +0,0 @@
|
||||
# Shared Claude credentials PVC
|
||||
# v0.6 - All claudebox pods share this for auth
|
||||
# Commands/skills/agents live in /workspace/.claude (per-project, in git)
|
||||
#
|
||||
# IMPORTANT: ReadWriteMany (RWX) requires Longhorn with NFS enabled.
|
||||
# Verify with: kubectl get settings -n longhorn-system rwx-volume-fast-failover
|
||||
# If RWX is not available, either:
|
||||
# 1. Enable Longhorn NFS: kubectl apply -f longhorn-nfs-provisioner.yaml
|
||||
# 2. Or use separate PVCs per pod (revert to per-project claude-config PVCs)
|
||||
#
|
||||
# RWX is needed because multiple claudebox pods mount this simultaneously
|
||||
# to share Claude authentication credentials.
|
||||
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: claudebox-shared-claude-config
|
||||
namespace: rdev
|
||||
labels:
|
||||
app.kubernetes.io/name: claudebox
|
||||
app.kubernetes.io/part-of: rdev
|
||||
rdev.orchard9.ai/type: shared-config
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteMany # Multiple pods can mount simultaneously
|
||||
storageClassName: longhorn
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
@ -14,6 +14,12 @@ spec:
|
||||
requests:
|
||||
storage: 20Gi
|
||||
---
|
||||
# Claude config PVC - shared across claudebox and worker pods
|
||||
# RWX (ReadWriteMany) allows multiple pods to mount simultaneously
|
||||
# Contains Claude subscription OAuth credentials (~/.claude)
|
||||
#
|
||||
# IMPORTANT: Requires longhorn-rwx StorageClass (see storageclass-rwx.yaml)
|
||||
# After recreating this PVC, re-authenticate with: claude login
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
@ -22,10 +28,11 @@ metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: claudebox
|
||||
app.kubernetes.io/part-of: rdev
|
||||
rdev.orchard9.ai/type: shared-config
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
storageClassName: longhorn
|
||||
- ReadWriteMany
|
||||
storageClassName: longhorn-rwx
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
|
||||
@ -10,10 +10,13 @@ metadata:
|
||||
app.kubernetes.io/part-of: rdev
|
||||
spec:
|
||||
replicas: 1
|
||||
# Recreate strategy required: claudebox-claude-config PVC is RWO (ReadWriteOnce)
|
||||
# and cannot be attached to multiple pods simultaneously
|
||||
# RollingUpdate enabled by RWX (ReadWriteMany) PVC for claude-config
|
||||
# See: deployments/k8s/base/pvc.yaml and storageclass-rwx.yaml
|
||||
strategy:
|
||||
type: Recreate
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 2
|
||||
maxUnavailable: 0
|
||||
selector:
|
||||
matchLabels:
|
||||
app: rdev-worker
|
||||
|
||||
24
deployments/k8s/base/storageclass-rwx.yaml
Normal file
24
deployments/k8s/base/storageclass-rwx.yaml
Normal file
@ -0,0 +1,24 @@
|
||||
# RWX (ReadWriteMany) StorageClass for shared volumes
|
||||
# Enables multiple pods to mount the same PVC simultaneously
|
||||
# Used for: claudebox-claude-config (shared Claude auth credentials)
|
||||
#
|
||||
# Prerequisites:
|
||||
# - Longhorn 1.4.0+ with NFS support
|
||||
# - Verify: kubectl get settings -n longhorn-system | grep -i rwx
|
||||
#
|
||||
# If RWX is not available, enable it:
|
||||
# kubectl patch -n longhorn-system settings rwx-volume-fast-failover --type merge -p '{"value":"true"}'
|
||||
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-rwx
|
||||
labels:
|
||||
app.kubernetes.io/part-of: rdev
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
reclaimPolicy: Retain
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "30"
|
||||
nfsOptions: "vers=4.1,noresvport"
|
||||
Loading…
Reference in New Issue
Block a user