rdev/docs/plans/worker-executor-breakdown.md
jordan bc47e426b0 feat: Add CI pipeline proxy, DNS alias management, and worker executor system
- Add ListPipelines/GetPipeline to CIProvider port with Woodpecker adapter
- Add DNS alias endpoints: GET/POST/DELETE /projects/{id}/domains
- Implement worker executor daemon, build executor, and git operations
- Add build service, worker service, and build audit tracking
- Add worker registry with PostgreSQL adapter and migration
- Add multi-provider code agent interface (Claude Code + OpenCode)
- Add create-and-build combo endpoint
- Update landing-page cookbook to reflect all gaps closed
- Fix tech debt: unified validation, auth scopes, error wrapping, slog patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:05:28 -07:00

15 KiB

Worker Executor Implementation Plan

Close the last gap in the landing page cookbook: automated code generation via the worker pool.

Context

The work queue, worker registry, build audit, and code agent systems are all implemented. The single missing piece is a work executor — a background loop that consumes queued tasks and executes them via a code agent. This is analogous to the existing QueueProcessor (which processes per-project command queue tasks), but for the generic WorkQueue (cross-project worker pool tasks).

What Already Exists

Component File Status
Work queue (PostgreSQL) internal/adapter/postgres/work_queue.go Done
Worker registry (PostgreSQL) internal/adapter/postgres/worker_registry.go Done
Build audit (PostgreSQL) internal/adapter/postgres/build_audit.go Done
WorkService (enqueue/dequeue/complete/fail) internal/service/work_service.go Done
WorkerService (claim/complete/health) internal/service/worker_service.go Done
BuildService (start/status/complete) internal/service/build_service.go Done
WorkHandler (REST API) internal/handlers/work.go Done
AgentsHandler (REST API) internal/handlers/agents.go Done
CodeAgent interface internal/port/code_agent.go Done
Domain models (WorkTask, Worker, BuildSpec) internal/domain/ Done
Command QueueProcessor (reference pattern) internal/worker/queue_processor.go Done

What's Missing

Gap Priority
Work executor daemon (poll loop) Critical
BuildSpec → AgentRequest translation Critical
Git clone/commit/push in executor Critical
Git credential resolution for cross-project High
Worker management REST endpoints Medium
DNS alias endpoint Medium
Create-and-build endpoint Medium
Woodpecker build status proxy Low

Week 1: Work Executor Core

Goal: A background loop that claims tasks from the work queue and executes them via a code agent. By end of week, POST /work/enqueue → task claimed → agent executes → result recorded.

Tasks

  1. Create internal/worker/work_executor.go

    • Follow the QueueProcessor pattern from queue_processor.go
    • Poll loop: calls WorkerService.ClaimTask(workerID) on a ticker
    • On task claim: route to appropriate handler based on task.Type
    • On completion: call WorkerService.CompleteTask(workerID, taskID, result)
    • On failure: call WorkService.FailTask(taskID, errMsg) (handles retry logic)
    • Graceful shutdown via context cancellation
    • Self-registers as a worker via WorkerService.Register() on start
    • Sends heartbeats via WorkerService.Heartbeat() on a 30s ticker
  2. Create internal/worker/build_executor.go

    • Handles WorkTaskTypeBuild tasks specifically
    • Extracts BuildSpec fields from WorkTask.Spec (map[string]any → typed fields)
    • Translates BuildSpec.Prompt into domain.AgentRequest
    • Calls CodeAgent.Execute() with event streaming
    • Collects output, files changed, duration into domain.BuildResult
    • Returns BuildResult to the work executor
  3. Wire into cmd/rdev-api/main.go

    • Create WorkExecutor alongside existing QueueProcessor
    • Inject: WorkerService, BuildService, CodeAgentRegistry
    • Start on boot, stop on shutdown
    • Worker ID: hostname or pod name (from HOSTNAME env var)
  4. Create internal/worker/work_executor_test.go

    • Test: executor starts and registers as a worker
    • Test: executor claims a task and routes to build handler
    • Test: build handler translates spec and calls code agent
    • Test: results are recorded via CompleteTask
    • Test: failures trigger FailTask with retry
    • Test: graceful shutdown stops the poll loop
    • Use mock implementations of ports

Deliverables

  • POST /work/enqueue with a build task → executor picks it up → agent runs → result in GET /work/{taskId}
  • Worker visible in registry during execution
  • Build audit entry created with spec and result

Files Created/Modified

File Action
internal/worker/work_executor.go Create
internal/worker/build_executor.go Create
internal/worker/work_executor_test.go Create
cmd/rdev-api/main.go Modify (wire executor)

Week 2: Git Operations & Cross-Project Execution

Goal: The executor can clone any project's repo, run the agent in that directory, and push results back. By end of week, the full build cycle works: enqueue → clone → agent generates code → commit → push → CI triggers.

Tasks

  1. Create internal/worker/git_operations.go

    • CloneRepo(ctx, gitURL, dir, token) error — clone via HTTPS with token auth
    • CommitAndPush(ctx, dir, message) (commitSHA string, filesChanged []string, err error)
    • ConfigureGit(dir, name, email) — set git user for commits
    • Uses os/exec for git commands (same pattern as kubernetes.Executor uses for kubectl)
    • Workspace management: creates temp dir per task, cleans up after
  2. Add git credential resolution to BuildExecutor

    • Option A (simplest): Use the Gitea token already in InfraConfig.GiteaToken
      • All project repos are in Gitea, so one token covers all repos
      • Pass token via HTTPS clone URL: https://token@git.threesix.ai/org/repo.git
    • Option B (per-project): Look up project's git URL from database, resolve credentials
    • Recommendation: Option A — the Gitea token is already loaded and available
  3. Integrate git ops into BuildExecutor

    • Before agent execution: clone the project's repo to a temp directory
    • Look up project git URL from database (add ProjectStore port or query directly)
    • After agent execution: if auto_commit is true, commit changes
    • After commit: if auto_push is true, push to remote
    • Capture commit_sha and files_changed in BuildResult
  4. Add project git URL lookup

    • The ProjectInfraService stores git URLs in the database during CreateProject
    • Add a method to retrieve git info by project ID
    • Or: include git_url in the WorkTask.Spec at enqueue time (simpler, no extra lookup)
  5. Create internal/worker/git_operations_test.go

    • Test: clone with token auth
    • Test: commit and push
    • Test: workspace cleanup on success and failure
    • Test: git URL construction with token
  6. Integration test

    • Enqueue a build task with a real prompt
    • Verify agent executes in cloned repo
    • Verify commit is created (if auto_commit)
    • Verify push succeeds (if auto_push)
    • Verify BuildResult has correct fields

Deliverables

  • Full build cycle: enqueue → clone → execute → commit → push
  • Git credentials resolved from infrastructure config
  • Temp workspace created and cleaned per task
  • Build audit shows commit SHA and files changed

Files Created/Modified

File Action
internal/worker/git_operations.go Create
internal/worker/git_operations_test.go Create
internal/worker/build_executor.go Modify (add git integration)
internal/worker/work_executor.go Modify (pass git config)
cmd/rdev-api/main.go Modify (pass gitea token to executor)

Week 3: API Enhancements

Goal: Add the REST endpoints that complete the platform experience. By end of week, users can create a project, enqueue a build, monitor CI status, and manage DNS — all through rdev-api.

Tasks

  1. Worker management endpoints — internal/handlers/workers.go

    • GET /workers — list all workers with status
    • GET /workers/{id} — get worker details
    • POST /workers/{id}/drain — drain a worker
    • Wire WorkerService into handler
    • Register in cmd/rdev-api/main.go and openapi.go
  2. Build management endpoints — internal/handlers/builds.go

    • POST /projects/{id}/builds — enqueue a build (wraps BuildService.StartBuild())
    • GET /projects/{id}/builds — list build history
    • GET /projects/{id}/builds/{taskId} — get build status
    • Simpler API than raw /work/enqueue — project-scoped, build-specific
    • Register in cmd/rdev-api/main.go and openapi.go
  3. DNS alias endpoint — internal/handlers/infrastructure.go

    • POST /projects/{id}/domains — add DNS alias (A or CNAME record)
    • GET /projects/{id}/domains — list domains for project
    • DELETE /projects/{id}/domains/{domain} — remove alias
    • Uses existing Cloudflare adapter's CreateRecord() and DeleteRecordByName()
    • The adapter already supports full CRUD — just needs a handler
  4. Woodpecker build status proxy — internal/handlers/ci.go

    • GET /projects/{id}/ci/pipelines — list recent Woodpecker pipelines
    • GET /projects/{id}/ci/pipelines/{number} — get pipeline details
    • Add ListPipelines() and GetPipeline() to port.CIProvider
    • Implement in internal/adapter/woodpecker/client.go using Woodpecker SDK
    • Low priority — can defer if time is tight
  5. Create-and-build endpoint — internal/handlers/project_management.go

    • POST /project/create-and-build
    • Request: { name, description, template, prompt, auto_push }
    • Calls ProjectInfraService.CreateProject() then BuildService.StartBuild()
    • Returns project info + task ID
    • Trivial once executor is working
  6. Tests for all new handlers

    • Follow existing patterns in handlers/*_test.go
    • Test request validation, success paths, error handling

Deliverables

  • POST /projects/{id}/builds as the clean API for code generation
  • GET /workers for monitoring the worker pool
  • POST /projects/{id}/domains for DNS aliases
  • POST /project/create-and-build for the single-call flow
  • All endpoints documented in openapi.go

Files Created/Modified

File Action
internal/handlers/workers.go Create
internal/handlers/workers_test.go Create
internal/handlers/builds.go Create
internal/handlers/builds_test.go Create
internal/handlers/infrastructure.go Modify (add domain endpoints)
internal/handlers/ci.go Create (if time)
internal/handlers/project_management.go Modify (add create-and-build)
internal/adapter/woodpecker/client.go Modify (add pipeline methods, if time)
internal/port/ci.go or port updates Modify (add pipeline interface, if time)
cmd/rdev-api/main.go Modify (wire new handlers)
cmd/rdev-api/openapi.go Modify (add routes to spec)

Week 4: Polish, Validation & Observability

Goal: End-to-end validation of the cookbook flow. Observability for production operation. Documentation updated.

Tasks

  1. End-to-end cookbook validation

    • Run the landing page cookbook flow from start to finish
    • POST /project with astro-landing template
    • POST /projects/landing/builds with customization prompt
    • Monitor via GET /work/{taskId}/status
    • Verify CI triggers on push
    • Verify site is live at https://landing.threesix.ai
    • Fix any issues found during validation
  2. Stale task recovery

    • Add periodic RequeueStale() call to the work executor
    • Requeue tasks where the worker crashed mid-execution
    • Add periodic CleanupOld() call to remove ancient completed tasks
    • These methods exist on WorkQueue but nothing calls them
  3. Observability additions

    • Add metrics to work executor: tasks_claimed, tasks_completed, tasks_failed, execution_duration
    • Add metrics to worker service: workers_registered, workers_idle, workers_busy
    • Follow existing pattern in internal/metrics/metrics.go
    • Add work executor health to readiness check (GET /ready)
  4. Queue maintenance worker

    • Create internal/worker/queue_maintenance.go
    • Runs on a slower ticker (every 5 minutes)
    • Calls RequeueStale(ctx, 10*time.Minute) — requeue tasks running > 10min with no heartbeat
    • Calls CleanupOld(ctx, 7*24*time.Hour) — prune tasks older than 7 days
    • Wire into main.go
  5. Update documentation

    • Update cookbooks/landing-page.md with final validated flow
    • Update ai-lookup/features/build-orchestration.md
    • Update ai-lookup/services/worker-pool.md
    • Add .claude/guides/services/build-orchestration.md if needed
  6. Update CLAUDE.md roadmap

    • Mark "Work Queue" as implemented
    • Mark "Worker Pool" as implemented
    • Mark "Build Orchestration" as implemented
    • Update "Bot Communication" status

Deliverables

  • Cookbook flow works end-to-end without manual intervention (except code generation prompt)
  • Stale task recovery running in production
  • Metrics visible in /metrics endpoint
  • All documentation reflects actual capabilities

Files Created/Modified

File Action
internal/worker/queue_maintenance.go Create
internal/metrics/metrics.go Modify (add work executor metrics)
internal/handlers/health.go Modify (add executor health)
cookbooks/landing-page.md Modify (final validation)
ai-lookup/features/build-orchestration.md Modify
ai-lookup/services/worker-pool.md Modify
CLAUDE.md Modify (update roadmap)
cmd/rdev-api/main.go Modify (wire maintenance worker)

Risk & Dependencies

Risk Mitigation
CodeAgent execution in a temp directory (not a K8s pod) may not work the same as in-pod execution Test early in Week 1; fallback is to kubectl exec into a worker pod
Gitea token may lack permissions for new repos created by different users Test with actual token; all repos should be in the same org
Agent execution may take longer than expected (10+ minutes for complex prompts) Make timeout configurable; increase default
Worker process crash loses in-flight task Stale requeue (Week 4) handles this automatically
500-line file limit may require splitting new files Plan for split from the start; work_executor.go + build_executor.go + git_operations.go keeps things modular

Architecture Decision: In-Process vs External Worker

The plan above implements the executor in-process (running inside the rdev-api binary). This is simpler and matches the existing QueueProcessor pattern. The alternative would be a separate worker binary, which would allow independent scaling. The in-process approach is the right starting point — it can be extracted into a separate binary later if scaling requires it.

Summary

Week Focus Key Deliverable
1 Work executor core Tasks flow from queue → agent → result
2 Git operations Clone → execute → commit → push cycle
3 API enhancements Build, worker, DNS, create-and-build endpoints
4 Polish & validation E2E cookbook flow, observability, docs