This commit captures the current state before implementing the composable monorepo template system. Key changes included: Infrastructure: - Add CockroachDB provisioner adapter for database provisioning - Add Redis provisioner adapter for cache provisioning - Add build events system with PostgreSQL storage - Add WebSocket endpoint for real-time build progress Code agent improvements: - Fix Claude Code adapter to use default allowed tools instead of dangerously-skip-permissions - Add context-aware stream closing for cancellation support - Improve parser tests for edge cases Build system: - Add build event constants and metrics - Remove deprecated git_operations.go (replaced by pod_git_operations.go) - Add rollback logic for multi-step provisioning operations Documentation: - Add composable-monorepo feature documentation - Add DNS/Cloudflare service documentation - Update deployment and troubleshooting guides Cookbooks: - Add fullstack-app cookbook - Refactor landing-test with shared library Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
14 KiB
Worker Executor Implementation Plan
Close the last gap in the landing page cookbook: automated code generation via the worker pool.
Context
The work queue, worker registry, build audit, and code agent systems are all implemented. The single missing piece is a work executor — a background loop that consumes queued tasks and executes them via a code agent. This is analogous to the existing QueueProcessor (which processes per-project command queue tasks), but for the generic WorkQueue (cross-project worker pool tasks).
What Already Exists
| Component | File | Status |
|---|---|---|
| Work queue (PostgreSQL) | internal/adapter/postgres/work_queue.go |
Done |
| Worker registry (PostgreSQL) | internal/adapter/postgres/worker_registry.go |
Done |
| Build audit (PostgreSQL) | internal/adapter/postgres/build_audit.go |
Done |
| WorkService (enqueue/dequeue/complete/fail) | internal/service/work_service.go |
Done |
| WorkerService (claim/complete/health) | internal/service/worker_service.go |
Done |
| BuildService (start/status/complete) | internal/service/build_service.go |
Done |
| WorkHandler (REST API) | internal/handlers/work.go |
Done |
| AgentsHandler (REST API) | internal/handlers/agents.go |
Done |
| CodeAgent interface | internal/port/code_agent.go |
Done |
| Domain models (WorkTask, Worker, BuildSpec) | internal/domain/ |
Done |
| Command QueueProcessor (reference pattern) | internal/worker/queue_processor.go |
Done |
What's Missing
| Gap | Priority |
|---|---|
| Work executor daemon (poll loop) | Critical |
| BuildSpec → AgentRequest translation | Critical |
| Git clone/commit/push in executor | Critical |
| Git credential resolution for cross-project | High |
| Worker management REST endpoints | Medium |
| DNS alias endpoint | Medium |
| Create-and-build endpoint | Medium |
| Woodpecker build status proxy | Low |
Week 1: Work Executor Core
Goal: A background loop that claims tasks from the work queue and executes them via a code agent. By end of week, POST /work/enqueue → task claimed → agent executes → result recorded.
Tasks
-
Create
internal/worker/work_executor.go- Follow the
QueueProcessorpattern fromqueue_processor.go - Poll loop: calls
WorkerService.ClaimTask(workerID)on a ticker - On task claim: route to appropriate handler based on
task.Type - On completion: call
WorkerService.CompleteTask(workerID, taskID, result) - On failure: call
WorkService.FailTask(taskID, errMsg)(handles retry logic) - Graceful shutdown via context cancellation
- Self-registers as a worker via
WorkerService.Register()on start - Sends heartbeats via
WorkerService.Heartbeat()on a 30s ticker
- Follow the
-
Create
internal/worker/build_executor.go- Handles
WorkTaskTypeBuildtasks specifically - Extracts
BuildSpecfields fromWorkTask.Spec(map[string]any → typed fields) - Translates
BuildSpec.Promptintodomain.AgentRequest - Calls
CodeAgent.Execute()with event streaming - Collects output, files changed, duration into
domain.BuildResult - Returns
BuildResultto the work executor
- Handles
-
Wire into
cmd/rdev-api/main.go- Create
WorkExecutoralongside existingQueueProcessor - Inject:
WorkerService,BuildService,CodeAgentRegistry - Start on boot, stop on shutdown
- Worker ID: hostname or pod name (from
HOSTNAMEenv var)
- Create
-
Create
internal/worker/work_executor_test.go- Test: executor starts and registers as a worker
- Test: executor claims a task and routes to build handler
- Test: build handler translates spec and calls code agent
- Test: results are recorded via CompleteTask
- Test: failures trigger FailTask with retry
- Test: graceful shutdown stops the poll loop
- Use mock implementations of ports
Deliverables
POST /work/enqueuewith a build task → executor picks it up → agent runs → result inGET /work/{taskId}- Worker visible in registry during execution
- Build audit entry created with spec and result
Files Created/Modified
| File | Action |
|---|---|
internal/worker/work_executor.go |
Create |
internal/worker/build_executor.go |
Create |
internal/worker/work_executor_test.go |
Create |
cmd/rdev-api/main.go |
Modify (wire executor) |
Week 2: Git Operations & Cross-Project Execution
Goal: The executor can clone any project's repo, run the agent in that directory, and push results back. By end of week, the full build cycle works: enqueue → clone → agent generates code → commit → push → CI triggers.
Tasks
-
Create
internal/worker/pod_git_operations.go✅ IMPLEMENTEDCommitAndPush(ctx, podName, workDir, message, push) *PostBuildResult- Runs git commands inside the pod via
kubectl exec(not locally) - Post-build phase: Claude writes code, then rdev programmatically commits/pushes
- Follows "LLM vs rdev" principle: LLMs generate code, rdev handles deterministic ops
-
Add git credential resolution to
BuildExecutor- Option A (simplest): Use the Gitea token already in
InfraConfig.GiteaToken- All project repos are in Gitea, so one token covers all repos
- Pass token via HTTPS clone URL:
https://token@git.threesix.ai/org/repo.git
- Option B (per-project): Look up project's git URL from database, resolve credentials
- Recommendation: Option A — the Gitea token is already loaded and available
- Option A (simplest): Use the Gitea token already in
-
Integrate git ops into
BuildExecutor- Before agent execution: clone the project's repo to a temp directory
- Look up project git URL from database (add
ProjectStoreport or query directly) - After agent execution: if
auto_commitis true, commit changes - After commit: if
auto_pushis true, push to remote - Capture
commit_shaandfiles_changedinBuildResult
-
Add project git URL lookup
- The
ProjectInfraServicestores git URLs in the database duringCreateProject - Add a method to retrieve git info by project ID
- Or: include
git_urlin theWorkTask.Specat enqueue time (simpler, no extra lookup)
- The
-
Test pod git operations
- Integration test via cookbook scripts
- Verify commit is created in pod workspace
- Verify push succeeds via kubectl exec
-
Integration test
- Enqueue a build task with a real prompt
- Verify agent executes in cloned repo
- Verify commit is created (if auto_commit)
- Verify push succeeds (if auto_push)
- Verify BuildResult has correct fields
Deliverables
- Full build cycle: enqueue → clone → execute → commit → push
- Git credentials resolved from infrastructure config
- Temp workspace created and cleaned per task
- Build audit shows commit SHA and files changed
Files Created/Modified
| File | Action |
|---|---|
internal/worker/pod_git_operations.go |
Create ✅ |
internal/worker/build_executor.go |
Modify (add git integration) |
internal/worker/work_executor.go |
Modify (pass git config) |
cmd/rdev-api/main.go |
Modify (pass gitea token to executor) |
Week 3: API Enhancements
Goal: Add the REST endpoints that complete the platform experience. By end of week, users can create a project, enqueue a build, monitor CI status, and manage DNS — all through rdev-api.
Tasks
-
Worker management endpoints —
internal/handlers/workers.goGET /workers— list all workers with statusGET /workers/{id}— get worker detailsPOST /workers/{id}/drain— drain a worker- Wire
WorkerServiceinto handler - Register in
cmd/rdev-api/main.goandopenapi.go
-
Build management endpoints —
internal/handlers/builds.goPOST /projects/{id}/builds— enqueue a build (wrapsBuildService.StartBuild())GET /projects/{id}/builds— list build historyGET /projects/{id}/builds/{taskId}— get build status- Simpler API than raw
/work/enqueue— project-scoped, build-specific - Register in
cmd/rdev-api/main.goandopenapi.go
-
DNS alias endpoint —
internal/handlers/infrastructure.goPOST /projects/{id}/domains— add DNS alias (A or CNAME record)GET /projects/{id}/domains— list domains for projectDELETE /projects/{id}/domains/{domain}— remove alias- Uses existing Cloudflare adapter's
CreateRecord()andDeleteRecordByName() - The adapter already supports full CRUD — just needs a handler
-
Woodpecker build status proxy —
internal/handlers/ci.goGET /projects/{id}/ci/pipelines— list recent Woodpecker pipelinesGET /projects/{id}/ci/pipelines/{number}— get pipeline details- Add
ListPipelines()andGetPipeline()toport.CIProvider - Implement in
internal/adapter/woodpecker/client.gousing Woodpecker SDK - Low priority — can defer if time is tight
-
Create-and-build endpoint —
internal/handlers/project_management.goPOST /project/create-and-build- Request:
{ name, description, template, prompt, auto_push } - Calls
ProjectInfraService.CreateProject()thenBuildService.StartBuild() - Returns project info + task ID
- Trivial once executor is working
-
Tests for all new handlers
- Follow existing patterns in
handlers/*_test.go - Test request validation, success paths, error handling
- Follow existing patterns in
Deliverables
POST /projects/{id}/buildsas the clean API for code generationGET /workersfor monitoring the worker poolPOST /projects/{id}/domainsfor DNS aliasesPOST /project/create-and-buildfor the single-call flow- All endpoints documented in
openapi.go
Files Created/Modified
| File | Action |
|---|---|
internal/handlers/workers.go |
Create |
internal/handlers/workers_test.go |
Create |
internal/handlers/builds.go |
Create |
internal/handlers/builds_test.go |
Create |
internal/handlers/infrastructure.go |
Modify (add domain endpoints) |
internal/handlers/ci.go |
Create (if time) |
internal/handlers/project_management.go |
Modify (add create-and-build) |
internal/adapter/woodpecker/client.go |
Modify (add pipeline methods, if time) |
internal/port/ci.go or port updates |
Modify (add pipeline interface, if time) |
cmd/rdev-api/main.go |
Modify (wire new handlers) |
cmd/rdev-api/openapi.go |
Modify (add routes to spec) |
Week 4: Polish, Validation & Observability
Goal: End-to-end validation of the cookbook flow. Observability for production operation. Documentation updated.
Tasks
-
End-to-end cookbook validation
- Run the landing page cookbook flow from start to finish
POST /projectwithastro-landingtemplatePOST /projects/landing/buildswith customization prompt- Monitor via
GET /work/{taskId}/status - Verify CI triggers on push
- Verify site is live at
https://landing.threesix.ai - Fix any issues found during validation
-
Stale task recovery
- Add periodic
RequeueStale()call to the work executor - Requeue tasks where the worker crashed mid-execution
- Add periodic
CleanupOld()call to remove ancient completed tasks - These methods exist on
WorkQueuebut nothing calls them
- Add periodic
-
Observability additions
- Add metrics to work executor: tasks_claimed, tasks_completed, tasks_failed, execution_duration
- Add metrics to worker service: workers_registered, workers_idle, workers_busy
- Follow existing pattern in
internal/metrics/metrics.go - Add work executor health to readiness check (
GET /ready)
-
Queue maintenance worker
- Create
internal/worker/queue_maintenance.go - Runs on a slower ticker (every 5 minutes)
- Calls
RequeueStale(ctx, 10*time.Minute)— requeue tasks running > 10min with no heartbeat - Calls
CleanupOld(ctx, 7*24*time.Hour)— prune tasks older than 7 days - Wire into main.go
- Create
-
Update documentation
- Update
cookbooks/landing-page.mdwith final validated flow - Update
ai-lookup/features/build-orchestration.md - Update
ai-lookup/services/worker-pool.md - Add
.claude/guides/services/build-orchestration.mdif needed
- Update
-
Update CLAUDE.md roadmap
- Mark "Work Queue" as implemented
- Mark "Worker Pool" as implemented
- Mark "Build Orchestration" as implemented
- Update "Bot Communication" status
Deliverables
- Cookbook flow works end-to-end without manual intervention (except code generation prompt)
- Stale task recovery running in production
- Metrics visible in
/metricsendpoint - All documentation reflects actual capabilities
Files Created/Modified
| File | Action |
|---|---|
internal/worker/queue_maintenance.go |
Create |
internal/metrics/metrics.go |
Modify (add work executor metrics) |
internal/handlers/health.go |
Modify (add executor health) |
cookbooks/landing-page.md |
Modify (final validation) |
ai-lookup/features/build-orchestration.md |
Modify |
ai-lookup/services/worker-pool.md |
Modify |
CLAUDE.md |
Modify (update roadmap) |
cmd/rdev-api/main.go |
Modify (wire maintenance worker) |
Risk & Dependencies
| Risk | Mitigation |
|---|---|
| CodeAgent execution in a temp directory (not a K8s pod) may not work the same as in-pod execution | Test early in Week 1; fallback is to kubectl exec into a worker pod |
| Gitea token may lack permissions for new repos created by different users | Test with actual token; all repos should be in the same org |
| Agent execution may take longer than expected (10+ minutes for complex prompts) | Make timeout configurable; increase default |
| Worker process crash loses in-flight task | Stale requeue (Week 4) handles this automatically |
| 500-line file limit may require splitting new files | Plan for split from the start; work_executor.go + build_executor.go + pod_git_operations.go keeps things modular |
Architecture Decision: In-Process vs External Worker
The plan above implements the executor in-process (running inside the rdev-api binary). This is simpler and matches the existing QueueProcessor pattern. The alternative would be a separate worker binary, which would allow independent scaling. The in-process approach is the right starting point — it can be extracted into a separate binary later if scaling requires it.
Summary
| Week | Focus | Key Deliverable |
|---|---|---|
| 1 | Work executor core | Tasks flow from queue → agent → result |
| 2 | Git operations | Clone → execute → commit → push cycle |
| 3 | API enhancements | Build, worker, DNS, create-and-build endpoints |
| 4 | Polish & validation | E2E cookbook flow, observability, docs |