feat: add RWX storage class and full SDLC lifecycle cookbook

- Add longhorn-rwx StorageClass for RWX volume support
- Add slackpath-5-full-lifecycle.yaml cookbook tree (all 10 SDLC phases)
- Update worker-pool.md documentation
- Consolidate PVC configuration, remove separate pvc-shared-claude.yaml
- Update rdev-worker and kustomization for new PVC structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
jordan 2026-02-06 11:37:57 -07:00
parent d74efb75ff
commit bc010c4746
8 changed files with 746 additions and 89 deletions

View File

@ -14,7 +14,7 @@ Quick reference for rdev concepts and facts.
| Webhooks | [services/webhooks.md](./services/webhooks.md) | High | 2025-01 | Event subscriptions and delivery | | Webhooks | [services/webhooks.md](./services/webhooks.md) | High | 2025-01 | Event subscriptions and delivery |
| **Worker Infrastructure** | | **Worker Infrastructure** |
| Work Queue | [services/work-queue.md](./services/work-queue.md) | High | 2025-01 | Task queue for worker pool | | Work Queue | [services/work-queue.md](./services/work-queue.md) | High | 2025-01 | Task queue for worker pool |
| Worker Pool | [services/worker-pool.md](./services/worker-pool.md) | High | 2026-01 | Embedded work executor with queue maintenance and metrics | | Worker Pool | [services/worker-pool.md](./services/worker-pool.md) | High | 2026-02 | Standalone worker pods with claudebox sidecar, HTTP polling |
| External Health | [services/external-health.md](./services/external-health.md) | High | 2026-02 | Background health monitoring of registry, CI, git | | External Health | [services/external-health.md](./services/external-health.md) | High | 2026-02 | Background health monitoring of registry, CI, git |
| CI Provider | [services/ci-provider.md](./services/ci-provider.md) | High | 2025-01 | Woodpecker auto-activation | | CI Provider | [services/ci-provider.md](./services/ci-provider.md) | High | 2025-01 | Woodpecker auto-activation |
| DNS / Cloudflare | [services/dns-cloudflare.md](./services/dns-cloudflare.md) | High | 2026-01 | Domain management for threesix.ai | | DNS / Cloudflare | [services/dns-cloudflare.md](./services/dns-cloudflare.md) | High | 2026-01 | Domain management for threesix.ai |

View File

@ -1,79 +1,193 @@
# Worker Pool # Worker Pool
**Last Updated:** 2026-01-31 **Last Updated:** 2026-02-06
**Confidence:** High **Confidence:** High
## Summary ## Summary
Shared worker pool that executes build tasks for any project. Currently runs as an embedded WorkExecutor daemon inside rdev-api. Workers register with the worker registry, poll the work queue for tasks, execute Claude Code in pods via kubectl exec. Post-build git operations (commit/push) are programmatic via PodGitOperations, not LLM-driven. Distributed task execution system where standalone worker pods poll rdev-api for tasks and execute them via a claudebox sidecar. Supports horizontal scaling by adding more worker pods.
**Key Facts:** **Key Facts:**
- **LLM vs rdev boundary:** Claude writes code; rdev handles git ops programmatically (no LLM for runbook tasks) - **Architecture:** Pull-based polling (not push/websocket)
- Embedded WorkExecutor daemon runs inside rdev-api process - **Sidecar pattern:** Worker + claudebox in same pod, communicate via localhost HTTP
- Workers poll work queue every 5 seconds, heartbeat every 30 seconds - **Atomic dequeue:** PostgreSQL `FOR UPDATE SKIP LOCKED` prevents duplicate claims
- Stale workers (no heartbeat for 2 minutes) automatically marked offline by QueueMaintenance - **Task types:** `build` (Claude Code prompts), `sdlc` (SDLC commands)
- Stale tasks (running >30 min without completion) automatically requeued - **Scaling:** Add replicas to handle more concurrent tasks
- Old tasks (>7 days) automatically cleaned up - **Resilience:** Stale workers marked offline, stuck tasks re-queued automatically
- Queue depth and worker counts exported as Prometheus metrics
- Future: external worker binary for separate pod deployment
**File Pointers:** ## File Pointers
- Domain: `internal/domain/worker.go` (Worker, WorkerStatus)
- Domain: `internal/domain/build.go` (BuildSpec, BuildResult)
- Port: `internal/port/worker_registry.go` (WorkerRegistry interface)
- Port: `internal/port/build_audit.go` (BuildAudit interface)
- Adapter: `internal/adapter/postgres/worker_registry.go`
- Adapter: `internal/adapter/postgres/build_audit.go`
- Service: `internal/service/worker_service.go`
- Service: `internal/service/build_service.go`
- Executor: `internal/worker/work_executor.go` (poll loop, heartbeat, task routing)
- Executor: `internal/worker/build_executor.go` (BuildSpec→AgentRequest)
- Git: `internal/worker/pod_git_operations.go` (post-build commit/push via kubectl exec)
- Maintenance: `internal/worker/queue_maintenance.go` (stale recovery, cleanup, metrics)
- Handler: `internal/handlers/workers.go` (REST API for workers)
- Handler: `internal/handlers/builds.go` (REST API for builds)
- Handler: `internal/handlers/create_and_build.go` (combined create+build)
- Migration: `internal/db/migrations/012_worker_registry.sql`
## Worker Lifecycle (Embedded) ### Standalone Worker Binary
- **Entry:** `cmd/rdev-worker/main.go` - Main binary, registration, heartbeat, poll loop
- **API Client:** `internal/worker/api_client.go` - HTTP client to rdev-api
- **Build Executor:** `internal/worker/http_build_executor.go` - Execute builds via claudebox
- **SDLC Executor:** `internal/worker/http_sdlc_executor.go` - Execute SDLC tasks via claudebox
1. rdev-api starts → WorkExecutor registers as worker in registry ### Claudebox Sidecar Client
2. Heartbeat loop: every 30s sends heartbeat via WorkerService - **Client:** `internal/adapter/claudebox/client.go` - HTTP client to claudebox sidecar
3. Poll loop: every 5s dequeues next task from work queue - **Endpoints:** `/health`, `/execute`, `/git/clone`, `/git/commit-and-push`, `/sdlc`
4. BuildExecutor: executes CodeAgent in pod, then programmatically commits/pushes if auto_commit
5. Reports completion with BuildResult via WorkerService ### rdev-api Server-Side
6. Graceful shutdown: deregisters worker on rdev-api stop - **Handlers:** `internal/handlers/workers.go` - `/workers/*` endpoints
- **Service:** `internal/service/worker_service.go` - Claim, complete, fail logic
- **Registry:** `internal/adapter/postgres/worker_registry.go` - Worker state persistence
- **Queue:** `internal/adapter/postgres/work_queue.go` - Task queue with atomic dequeue
### Domain
- **Worker:** `internal/domain/worker.go` - Worker, WorkerStatus
- **Task:** `internal/domain/work.go` - WorkTask, WorkTaskType, WorkTaskStatus
- **Build:** `internal/domain/build.go` - BuildSpec, BuildResult
### Kubernetes
- **Deployment:** `deployments/k8s/base/rdev-worker.yaml` - Worker + claudebox pod spec
## Architecture
```
┌─────────────────────┐ HTTP Polling (5s) ┌──────────────────────────┐
│ rdev-api │◄────────────────────────────────►│ Worker Pod │
│ │ │ ┌─────────┐ ┌─────────┐ │
│ POST /workers/register ← Register at startup │ │ worker │→│claudebox│ │
│ POST /workers/{id}/heartbeat ← Every 30s │ └─────────┘ └─────────┘ │
│ POST /workers/{id}/claim ← Poll for tasks │ ↓ HTTP localhost │
│ POST /workers/{id}/complete/{taskId} ← Success │ Claude Code execution │
│ POST /workers/{id}/fail/{taskId} ← Failure └──────────────────────────┘
│ │
│ PostgreSQL │
│ ├─ workers │ (worker registry)
│ ├─ work_queue │ (task queue)
│ └─ build_audit │ (execution history)
└─────────────────────┘
```
## Worker Lifecycle
1. **Register:** Worker pod starts → `POST /workers/register` with ID, hostname, capabilities
2. **Heartbeat:** Every 30s → `POST /workers/{id}/heartbeat` to stay alive
3. **Poll:** Every 5s → `POST /workers/{id}/claim` to get next task
4. **Execute:** Call claudebox sidecar HTTP API to run Claude Code / SDLC commands
5. **Report:** `POST /workers/{id}/complete/{taskId}` or `/fail/{taskId}` with results
6. **Shutdown:** Graceful wait for in-flight tasks via `sync.WaitGroup`
## Worker Statuses ## Worker Statuses
- `idle` - available for new tasks | Status | Meaning |
- `busy` - currently executing a task |--------|---------|
- `draining` - not accepting new tasks (pre-shutdown) | `idle` | Ready to claim new tasks |
- `offline` - missed heartbeat threshold | `busy` | Currently executing a task |
| `draining` | Not accepting new tasks (pre-shutdown) |
| `offline` | Missed heartbeat threshold (>90s) |
## Task Types
### Build Tasks (`WorkTaskTypeBuild`)
Execute Claude Code prompts with optional git operations.
**Spec:**
```json
{
"prompt": "Build a React app with...",
"auto_commit": true,
"auto_push": false,
"git_clone_url": "https://gitea.../repo.git"
}
```
**Execution Flow:**
1. Clone repo via `claudebox /git/clone`
2. Execute prompt via `claudebox /execute` (streaming)
3. Commit/push via `claudebox /git/commit-and-push`
### SDLC Tasks (`WorkTaskTypeSDLC`)
Execute SDLC CLI commands.
**Spec:**
```json
{
"command": "feature",
"args": ["init", "feature-name"],
"git_clone_url": "https://gitea.../repo.git"
}
```
**Execution Flow:**
1. Clone repo via `claudebox /git/clone`
2. Run SDLC command via `claudebox /sdlc`
3. Commit/push changes
## API Endpoints ## API Endpoints
| Method | Path | Description | | Method | Path | Description |
|--------|------|-------------| |--------|------|-------------|
| GET | `/workers` | List all workers with status summary | | POST | `/workers/register` | Register new worker |
| GET | `/workers/{workerId}` | Get worker details | | POST | `/workers/{id}/heartbeat` | Keep worker alive |
| POST | `/workers/{workerId}/drain` | Set worker to draining | | POST | `/workers/{id}/claim` | Claim next available task (204 if none) |
| POST | `/projects/{id}/builds` | Start build for project | | POST | `/workers/{id}/complete/{taskId}` | Report successful completion |
| GET | `/projects/{id}/builds` | List builds for project | | POST | `/workers/{id}/fail/{taskId}` | Report failure |
| GET | `/builds/{taskId}` | Get build status | | GET | `/workers` | List all workers |
| POST | `/project/create-and-build` | Create project + start build | | GET | `/workers/{id}` | Get worker details |
| POST | `/workers/{id}/drain` | Set worker to draining |
## Kubernetes Deployment
```yaml
# deployments/k8s/base/rdev-worker.yaml
spec:
replicas: 1 # Scale by increasing
strategy:
type: RollingUpdate # RWX PVC enables multi-pod mounts
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
containers:
- name: worker
image: registry.threesix.ai/rdev/worker:latest
env:
- RDEV_API_URL: http://rdev-api.rdev.svc.cluster.local:8080
- CLAUDEBOX_URL: http://localhost:8080
- WORKER_POLL_INTERVAL: 5s
- WORKER_HEARTBEAT_INTERVAL: 30s
- WORKER_TASK_TIMEOUT: 15m
- name: claudebox
image: registry.threesix.ai/rdev/claudebox:latest
volumeMounts:
- /workspace (EmptyDir)
- /root/.claude (RWX PVC - shared Claude auth)
```
**Storage:** The `claudebox-claude-config` PVC uses `ReadWriteMany` (RWX) access mode with Longhorn NFS, allowing multiple worker pods to share Claude OAuth credentials.
## Error Classification
Failed tasks are classified for smart retry logic:
| Code | Trigger | Retryable |
|------|---------|-----------|
| `RATE_LIMITED` | "rate limit", "quota exceeded" | Yes (with backoff) |
| `AUTH_FAILED` | "unauthorized", "invalid api key" | No |
| `TIMEOUT` | "context deadline exceeded" | Yes |
| `AGENT_ERROR` | Generic error | Yes (limited retries) |
## Queue Maintenance ## Queue Maintenance
The QueueMaintenance worker runs inside rdev-api alongside the WorkExecutor: Background goroutine in rdev-api:
- **Stale task recovery** (every 1m): Requeues tasks running >30m without completion. Also syncs build_audit status to "pending" so API correctly reflects requeued state. - **Stale worker marking:** Workers without heartbeat >90s → `offline`
- **Stale worker marking** (every 1m): Marks workers offline after 2m without heartbeat - **Stale task recovery:** Tasks running >30m without completion → re-queued
- **Old task cleanup** (every 1m): Removes completed/failed/cancelled tasks >7 days old - **Old task cleanup:** Completed/failed tasks >7 days → deleted
- **Metrics refresh** (every 15s): Updates Prometheus gauges for queue depth and worker counts - **Metrics refresh:** Queue depth and worker counts → Prometheus
**Build Audit Sync:** When stale tasks are requeued, both `work_queue` and `build_audit` tables are updated atomically. This prevents builds from appearing stuck in "running" when the underlying task has been requeued for retry due to worker death. ## Graceful Shutdown
Worker uses `sync.WaitGroup` to track in-flight tasks:
1. Receive SIGTERM/SIGINT
2. Cancel context (stops polling)
3. Wait for WaitGroup with timeout (`WORKER_TASK_TIMEOUT`)
4. Log success or timeout warning
## Related Topics ## Related Topics
- [Work Queue](./work-queue.md) - [Work Queue](./work-queue.md) - Task queue implementation
- [Build Orchestration](../features/build-orchestration.md) - [Build Orchestration](../features/build-orchestration.md) - Build API and specs
- [SDLC Orchestration](./sdlc.md) - SDLC task integration

View File

@ -0,0 +1,536 @@
name: full-lifecycle
description: "Slack Path 5: The Full Lifecycle. Tests all 10 SDLC phases with explicit artifact approvals."
version: 1
vars:
project_name: ""
feature_slug: "user-preferences"
feature_title: "User Preferences API"
steps:
# ============================================================
# INFRASTRUCTURE
# ============================================================
create-project:
action: api
method: POST
endpoint: /project
body:
name: "{{ .vars.project_name }}"
description: "Slack Path 5: Full SDLC Lifecycle"
outputs:
- project_id: .data.name
- domain: .data.domain
add-db:
description: Add database for preferences storage
depends_on: [create-project]
on_error: continue
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
body:
type: postgres
name: "main-db"
add-service:
description: Add API service
depends_on: [add-db]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
body:
type: service
name: "preferences-api"
wait-init:
depends_on: [add-service]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
# ============================================================
# PHASE 1: DRAFT
# Create feature (starts in draft phase)
# ============================================================
create-feature:
description: "Create feature in draft phase"
depends_on: [wait-init]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features"
body:
slug: "{{ .vars.feature_slug }}"
title: "{{ .vars.feature_title }}"
outputs:
- feature_phase: .data.phase
verify-draft:
description: "Verify feature is in draft phase"
depends_on: [create-feature]
action: shell
command: |
PHASE="{{ .outputs.create-feature.feature_phase }}"
if [ "$PHASE" == "draft" ]; then
echo "Feature created in draft phase"
exit 0
else
echo "Expected draft, got $PHASE"
exit 1
fi
# ============================================================
# PHASE 2: DRAFT → SPECIFIED
# Agent writes spec, API approves, transition
# ============================================================
write-spec:
description: "Agent writes the spec artifact"
depends_on: [verify-draft]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/spec-feature {{ .vars.feature_slug }} --requirements 'CRUD API for user preferences. GET/PUT /preferences/{user_id}. Preferences are key-value pairs stored in DB. Support theme, language, notifications settings.'"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-spec:
depends_on: [write-spec]
action: wait_build
build_id: "{{ .outputs.write-spec.build_id }}"
max_attempts: 60
poll_interval: 5
approve-spec:
description: "API approves the spec artifact"
depends_on: [wait-spec]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/spec/approve"
body:
comment: "Spec approved by automation"
transition-to-specified:
description: "Transition from draft to specified"
depends_on: [approve-spec]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "specified"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 3: SPECIFIED → PLANNED
# Agent writes design, tasks, qa_plan. API approves each.
# ============================================================
write-design:
description: "Agent writes the design artifact"
depends_on: [transition-to-specified]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/design-feature {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-design:
depends_on: [write-design]
action: wait_build
build_id: "{{ .outputs.write-design.build_id }}"
max_attempts: 60
poll_interval: 5
approve-design:
description: "API approves the design artifact"
depends_on: [wait-design]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/design/approve"
body:
comment: "Design approved by automation"
write-tasks:
description: "Agent breaks down into tasks"
depends_on: [approve-design]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/breakdown-feature {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-tasks:
depends_on: [write-tasks]
action: wait_build
build_id: "{{ .outputs.write-tasks.build_id }}"
max_attempts: 60
poll_interval: 5
approve-tasks:
description: "API approves the tasks artifact"
depends_on: [wait-tasks]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/tasks/approve"
body:
comment: "Tasks approved by automation"
write-qa-plan:
description: "Agent writes QA plan"
depends_on: [approve-tasks]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/create-qa-plan {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-qa-plan:
depends_on: [write-qa-plan]
action: wait_build
build_id: "{{ .outputs.write-qa-plan.build_id }}"
max_attempts: 60
poll_interval: 5
approve-qa-plan:
description: "API approves the QA plan artifact"
depends_on: [wait-qa-plan]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/qa_plan/approve"
body:
comment: "QA plan approved by automation"
transition-to-planned:
description: "Transition from specified to planned"
depends_on: [approve-qa-plan]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "planned"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 4: PLANNED → READY
# No new artifacts needed, just transition
# ============================================================
transition-to-ready:
description: "Transition from planned to ready"
depends_on: [transition-to-planned]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "ready"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 5: READY → IMPLEMENTATION
# Agent implements all tasks
# ============================================================
implement-feature:
description: "Agent implements all tasks for the feature"
depends_on: [transition-to-ready]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/implement-feature {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-implement:
depends_on: [implement-feature]
action: wait_build
build_id: "{{ .outputs.implement-feature.build_id }}"
max_attempts: 120
poll_interval: 5
wait-deploy-impl:
description: "Wait for implementation to deploy"
depends_on: [wait-implement]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
transition-to-implementation:
description: "Transition to implementation phase (marks code complete)"
depends_on: [wait-deploy-impl]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "implementation"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 6: IMPLEMENTATION → REVIEW
# Agent writes code review
# ============================================================
write-review:
description: "Agent writes code review"
depends_on: [transition-to-implementation]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/review-feature {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-review:
depends_on: [write-review]
action: wait_build
build_id: "{{ .outputs.write-review.build_id }}"
max_attempts: 60
poll_interval: 5
approve-review:
description: "API approves the review"
depends_on: [wait-review]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/review/approve"
body:
comment: "Review approved by automation"
transition-to-review:
description: "Transition to review phase"
depends_on: [approve-review]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "review"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 7: REVIEW → AUDIT
# Agent writes security/architecture audit
# ============================================================
write-audit:
description: "Agent writes security audit"
depends_on: [transition-to-review]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/audit-feature {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-audit:
depends_on: [write-audit]
action: wait_build
build_id: "{{ .outputs.write-audit.build_id }}"
max_attempts: 60
poll_interval: 5
approve-audit:
description: "API approves the audit"
depends_on: [wait-audit]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/artifacts/audit/approve"
body:
comment: "Audit approved by automation"
transition-to-audit:
description: "Transition to audit phase"
depends_on: [approve-audit]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "audit"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 8: AUDIT → QA
# Agent runs QA tests
# ============================================================
run-qa:
description: "Agent runs QA plan"
depends_on: [transition-to-audit]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/builds"
body:
prompt: "/run-qa {{ .vars.feature_slug }}"
auto_commit: true
auto_push: true
git_clone_url: "https://git.threesix.ai/jordan/{{ .outputs.create-project.project_id }}.git"
outputs:
- build_id: .data.task_id
wait-qa:
depends_on: [run-qa]
action: wait_build
build_id: "{{ .outputs.run-qa.build_id }}"
max_attempts: 60
poll_interval: 5
transition-to-qa:
description: "Transition to QA phase"
depends_on: [wait-qa]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "qa"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 9: QA → MERGE
# Merge feature branch to main
# ============================================================
merge-feature:
description: "Merge feature branch to main"
depends_on: [transition-to-qa]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/merge"
body:
strategy: "squash"
outputs:
- merge_commit: .data.commit_sha
transition-to-merge:
description: "Transition to merge phase"
depends_on: [merge-feature]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "merge"
outputs:
- new_phase: .data.phase
# ============================================================
# PHASE 10: MERGE → RELEASED
# Archive the feature
# ============================================================
wait-final-deploy:
description: "Wait for merged code to deploy"
depends_on: [transition-to-merge]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
archive-feature:
description: "Archive the completed feature"
depends_on: [wait-final-deploy]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/archive"
transition-to-released:
description: "Transition to released phase"
depends_on: [archive-feature]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/sdlc/features/{{ .vars.feature_slug }}/transition"
body:
phase: "released"
outputs:
- final_phase: .data.phase
# ============================================================
# VERIFICATION
# ============================================================
verify-service-running:
description: "Verify the preferences API is running"
depends_on: [transition-to-released]
action: shell
command: |
DOMAIN="{{ .outputs.create-project.domain }}"
HEALTH=$(curl -s "https://$DOMAIN/api/preferences-api/health" | jq -r '.data.status // empty')
if [ "$HEALTH" == "healthy" ]; then
echo "Service healthy"
exit 0
else
echo "Service not healthy: $HEALTH"
exit 1
fi
verify-preferences-api:
description: "Test CRUD operations on preferences"
depends_on: [verify-service-running]
on_error: continue
action: shell
command: |
DOMAIN="{{ .outputs.create-project.domain }}"
BASE_URL="https://$DOMAIN/api/preferences-api"
USER_ID="test-user-123"
# PUT preferences
echo "Setting preferences..."
PUT_RESP=$(curl -s -X PUT "$BASE_URL/preferences/$USER_ID" \
-H "Content-Type: application/json" \
-d '{"theme":"dark","language":"en","notifications":true}')
echo "PUT response: $PUT_RESP"
# GET preferences
echo "Getting preferences..."
GET_RESP=$(curl -s "$BASE_URL/preferences/$USER_ID")
echo "GET response: $GET_RESP"
# Verify theme is dark
THEME=$(echo "$GET_RESP" | jq -r '.theme // .data.theme // empty')
if [ "$THEME" == "dark" ]; then
echo "Preferences API working correctly"
exit 0
else
echo "Expected theme=dark, got: $THEME"
exit 1
fi
verify-lifecycle-complete:
description: "Verify feature reached released phase"
depends_on: [verify-preferences-api]
action: shell
command: |
FINAL_PHASE="{{ .outputs.transition-to-released.final_phase }}"
if [ "$FINAL_PHASE" == "released" ]; then
echo "SUCCESS: Feature completed full lifecycle (draft → released)"
echo "All 10 phases traversed with explicit approvals"
exit 0
else
echo "FAIL: Expected released, got $FINAL_PHASE"
exit 1
fi
teardown:
- action: api
method: DELETE
endpoint: "/project/{{ .outputs.create-project.project_id }}"

View File

@ -6,9 +6,11 @@ namespace: rdev
resources: resources:
- namespace.yaml - namespace.yaml
# Storage classes (must be applied before PVCs)
- storageclass-rwx.yaml
# Shared worker claudebox (runs all project builds) # Shared worker claudebox (runs all project builds)
- pvc.yaml - pvc.yaml
- pvc-shared-claude.yaml
- claudebox.yaml - claudebox.yaml
- configmaps.yaml - configmaps.yaml

View File

@ -1,29 +0,0 @@
# Shared Claude credentials PVC
# v0.6 - All claudebox pods share this for auth
# Commands/skills/agents live in /workspace/.claude (per-project, in git)
#
# IMPORTANT: ReadWriteMany (RWX) requires Longhorn with NFS enabled.
# Verify with: kubectl get settings -n longhorn-system rwx-volume-fast-failover
# If RWX is not available, either:
# 1. Enable Longhorn NFS: kubectl apply -f longhorn-nfs-provisioner.yaml
# 2. Or use separate PVCs per pod (revert to per-project claude-config PVCs)
#
# RWX is needed because multiple claudebox pods mount this simultaneously
# to share Claude authentication credentials.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claudebox-shared-claude-config
namespace: rdev
labels:
app.kubernetes.io/name: claudebox
app.kubernetes.io/part-of: rdev
rdev.orchard9.ai/type: shared-config
spec:
accessModes:
- ReadWriteMany # Multiple pods can mount simultaneously
storageClassName: longhorn
resources:
requests:
storage: 1Gi

View File

@ -14,6 +14,12 @@ spec:
requests: requests:
storage: 20Gi storage: 20Gi
--- ---
# Claude config PVC - shared across claudebox and worker pods
# RWX (ReadWriteMany) allows multiple pods to mount simultaneously
# Contains Claude subscription OAuth credentials (~/.claude)
#
# IMPORTANT: Requires longhorn-rwx StorageClass (see storageclass-rwx.yaml)
# After recreating this PVC, re-authenticate with: claude login
apiVersion: v1 apiVersion: v1
kind: PersistentVolumeClaim kind: PersistentVolumeClaim
metadata: metadata:
@ -22,10 +28,11 @@ metadata:
labels: labels:
app.kubernetes.io/name: claudebox app.kubernetes.io/name: claudebox
app.kubernetes.io/part-of: rdev app.kubernetes.io/part-of: rdev
rdev.orchard9.ai/type: shared-config
spec: spec:
accessModes: accessModes:
- ReadWriteOnce - ReadWriteMany
storageClassName: longhorn storageClassName: longhorn-rwx
resources: resources:
requests: requests:
storage: 1Gi storage: 1Gi

View File

@ -10,10 +10,13 @@ metadata:
app.kubernetes.io/part-of: rdev app.kubernetes.io/part-of: rdev
spec: spec:
replicas: 1 replicas: 1
# Recreate strategy required: claudebox-claude-config PVC is RWO (ReadWriteOnce) # RollingUpdate enabled by RWX (ReadWriteMany) PVC for claude-config
# and cannot be attached to multiple pods simultaneously # See: deployments/k8s/base/pvc.yaml and storageclass-rwx.yaml
strategy: strategy:
type: Recreate type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
selector: selector:
matchLabels: matchLabels:
app: rdev-worker app: rdev-worker

View File

@ -0,0 +1,24 @@
# RWX (ReadWriteMany) StorageClass for shared volumes
# Enables multiple pods to mount the same PVC simultaneously
# Used for: claudebox-claude-config (shared Claude auth credentials)
#
# Prerequisites:
# - Longhorn 1.4.0+ with NFS support
# - Verify: kubectl get settings -n longhorn-system | grep -i rwx
#
# If RWX is not available, enable it:
# kubectl patch -n longhorn-system settings rwx-volume-fast-failover --type merge -p '{"value":"true"}'
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-rwx
labels:
app.kubernetes.io/part-of: rdev
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "30"
nfsOptions: "vers=4.1,noresvport"