# Worker Pool **Last Updated:** 2026-01-27 **Confidence:** High ## Summary Shared worker pool that executes build tasks for any project. Currently runs as an embedded WorkExecutor daemon inside rdev-api. Workers register with the worker registry, poll the work queue for tasks, execute Claude Code in cloned repos via GitOperations, and report results with audit trails. **Key Facts:** - Embedded WorkExecutor daemon runs inside rdev-api process - Workers poll work queue every 5 seconds, heartbeat every 30 seconds - Stale workers (no heartbeat for 2 minutes) automatically marked offline by QueueMaintenance - Stale tasks (running >30 min without completion) automatically requeued - Old tasks (>7 days) automatically cleaned up - Queue depth and worker counts exported as Prometheus metrics - Future: external worker binary for separate pod deployment **File Pointers:** - Domain: `internal/domain/worker.go` (Worker, WorkerStatus) - Domain: `internal/domain/build.go` (BuildSpec, BuildResult) - Port: `internal/port/worker_registry.go` (WorkerRegistry interface) - Port: `internal/port/build_audit.go` (BuildAudit interface) - Adapter: `internal/adapter/postgres/worker_registry.go` - Adapter: `internal/adapter/postgres/build_audit.go` - Service: `internal/service/worker_service.go` - Service: `internal/service/build_service.go` - Executor: `internal/worker/work_executor.go` (poll loop, heartbeat, task routing) - Executor: `internal/worker/build_executor.go` (BuildSpec→AgentRequest) - Git: `internal/worker/git_operations.go` (clone, commit, push) - Maintenance: `internal/worker/queue_maintenance.go` (stale recovery, cleanup, metrics) - Handler: `internal/handlers/workers.go` (REST API for workers) - Handler: `internal/handlers/builds.go` (REST API for builds) - Handler: `internal/handlers/create_and_build.go` (combined create+build) - Migration: `internal/db/migrations/012_worker_registry.sql` ## Worker Lifecycle (Embedded) 1. rdev-api starts → WorkExecutor registers as worker in registry 2. Heartbeat loop: every 30s sends heartbeat via WorkerService 3. Poll loop: every 5s dequeues next task from work queue 4. BuildExecutor: clones repo, executes CodeAgent, commits/pushes if auto_commit 5. Reports completion with BuildResult via WorkerService 6. Graceful shutdown: deregisters worker on rdev-api stop ## Worker Statuses - `idle` - available for new tasks - `busy` - currently executing a task - `draining` - not accepting new tasks (pre-shutdown) - `offline` - missed heartbeat threshold ## API Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/workers` | List all workers with status summary | | GET | `/workers/{workerId}` | Get worker details | | POST | `/workers/{workerId}/drain` | Set worker to draining | | POST | `/projects/{id}/builds` | Start build for project | | GET | `/projects/{id}/builds` | List builds for project | | GET | `/builds/{taskId}` | Get build status | | POST | `/project/create-and-build` | Create project + start build | ## Queue Maintenance The QueueMaintenance worker runs inside rdev-api alongside the WorkExecutor: - **Stale task recovery** (every 1m): Requeues tasks running >30m without completion - **Stale worker marking** (every 1m): Marks workers offline after 2m without heartbeat - **Old task cleanup** (every 1m): Removes completed/failed/cancelled tasks >7 days old - **Metrics refresh** (every 15s): Updates Prometheus gauges for queue depth and worker counts ## Related Topics - [Work Queue](./work-queue.md) - [Build Orchestration](../features/build-orchestration.md)