rdev/ai-lookup/services/worker-pool.md
jordan bc47e426b0 feat: Add CI pipeline proxy, DNS alias management, and worker executor system
- Add ListPipelines/GetPipeline to CIProvider port with Woodpecker adapter
- Add DNS alias endpoints: GET/POST/DELETE /projects/{id}/domains
- Implement worker executor daemon, build executor, and git operations
- Add build service, worker service, and build audit tracking
- Add worker registry with PostgreSQL adapter and migration
- Add multi-provider code agent interface (Claude Code + OpenCode)
- Add create-and-build combo endpoint
- Update landing-page cookbook to reflect all gaps closed
- Fix tech debt: unified validation, auth scopes, error wrapping, slog patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:05:28 -07:00

3.5 KiB

Worker Pool

Last Updated: 2026-01-27 Confidence: High

Summary

Shared worker pool that executes build tasks for any project. Currently runs as an embedded WorkExecutor daemon inside rdev-api. Workers register with the worker registry, poll the work queue for tasks, execute Claude Code in cloned repos via GitOperations, and report results with audit trails.

Key Facts:

  • Embedded WorkExecutor daemon runs inside rdev-api process
  • Workers poll work queue every 5 seconds, heartbeat every 30 seconds
  • Stale workers (no heartbeat for 2 minutes) automatically marked offline by QueueMaintenance
  • Stale tasks (running >30 min without completion) automatically requeued
  • Old tasks (>7 days) automatically cleaned up
  • Queue depth and worker counts exported as Prometheus metrics
  • Future: external worker binary for separate pod deployment

File Pointers:

  • Domain: internal/domain/worker.go (Worker, WorkerStatus)
  • Domain: internal/domain/build.go (BuildSpec, BuildResult)
  • Port: internal/port/worker_registry.go (WorkerRegistry interface)
  • Port: internal/port/build_audit.go (BuildAudit interface)
  • Adapter: internal/adapter/postgres/worker_registry.go
  • Adapter: internal/adapter/postgres/build_audit.go
  • Service: internal/service/worker_service.go
  • Service: internal/service/build_service.go
  • Executor: internal/worker/work_executor.go (poll loop, heartbeat, task routing)
  • Executor: internal/worker/build_executor.go (BuildSpec→AgentRequest)
  • Git: internal/worker/git_operations.go (clone, commit, push)
  • Maintenance: internal/worker/queue_maintenance.go (stale recovery, cleanup, metrics)
  • Handler: internal/handlers/workers.go (REST API for workers)
  • Handler: internal/handlers/builds.go (REST API for builds)
  • Handler: internal/handlers/create_and_build.go (combined create+build)
  • Migration: internal/db/migrations/012_worker_registry.sql

Worker Lifecycle (Embedded)

  1. rdev-api starts → WorkExecutor registers as worker in registry
  2. Heartbeat loop: every 30s sends heartbeat via WorkerService
  3. Poll loop: every 5s dequeues next task from work queue
  4. BuildExecutor: clones repo, executes CodeAgent, commits/pushes if auto_commit
  5. Reports completion with BuildResult via WorkerService
  6. Graceful shutdown: deregisters worker on rdev-api stop

Worker Statuses

  • idle - available for new tasks
  • busy - currently executing a task
  • draining - not accepting new tasks (pre-shutdown)
  • offline - missed heartbeat threshold

API Endpoints

Method Path Description
GET /workers List all workers with status summary
GET /workers/{workerId} Get worker details
POST /workers/{workerId}/drain Set worker to draining
POST /projects/{id}/builds Start build for project
GET /projects/{id}/builds List builds for project
GET /builds/{taskId} Get build status
POST /project/create-and-build Create project + start build

Queue Maintenance

The QueueMaintenance worker runs inside rdev-api alongside the WorkExecutor:

  • Stale task recovery (every 1m): Requeues tasks running >30m without completion
  • Stale worker marking (every 1m): Marks workers offline after 2m without heartbeat
  • Old task cleanup (every 1m): Removes completed/failed/cancelled tasks >7 days old
  • Metrics refresh (every 15s): Updates Prometheus gauges for queue depth and worker counts