- Add ListPipelines/GetPipeline to CIProvider port with Woodpecker adapter
- Add DNS alias endpoints: GET/POST/DELETE /projects/{id}/domains
- Implement worker executor daemon, build executor, and git operations
- Add build service, worker service, and build audit tracking
- Add worker registry with PostgreSQL adapter and migration
- Add multi-provider code agent interface (Claude Code + OpenCode)
- Add create-and-build combo endpoint
- Update landing-page cookbook to reflect all gaps closed
- Fix tech debt: unified validation, auth scopes, error wrapping, slog patterns
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
77 lines
3.5 KiB
Markdown
77 lines
3.5 KiB
Markdown
# Worker Pool
|
|
|
|
**Last Updated:** 2026-01-27
|
|
**Confidence:** High
|
|
|
|
## Summary
|
|
|
|
Shared worker pool that executes build tasks for any project. Currently runs as an embedded WorkExecutor daemon inside rdev-api. Workers register with the worker registry, poll the work queue for tasks, execute Claude Code in cloned repos via GitOperations, and report results with audit trails.
|
|
|
|
**Key Facts:**
|
|
- Embedded WorkExecutor daemon runs inside rdev-api process
|
|
- Workers poll work queue every 5 seconds, heartbeat every 30 seconds
|
|
- Stale workers (no heartbeat for 2 minutes) automatically marked offline by QueueMaintenance
|
|
- Stale tasks (running >30 min without completion) automatically requeued
|
|
- Old tasks (>7 days) automatically cleaned up
|
|
- Queue depth and worker counts exported as Prometheus metrics
|
|
- Future: external worker binary for separate pod deployment
|
|
|
|
**File Pointers:**
|
|
- Domain: `internal/domain/worker.go` (Worker, WorkerStatus)
|
|
- Domain: `internal/domain/build.go` (BuildSpec, BuildResult)
|
|
- Port: `internal/port/worker_registry.go` (WorkerRegistry interface)
|
|
- Port: `internal/port/build_audit.go` (BuildAudit interface)
|
|
- Adapter: `internal/adapter/postgres/worker_registry.go`
|
|
- Adapter: `internal/adapter/postgres/build_audit.go`
|
|
- Service: `internal/service/worker_service.go`
|
|
- Service: `internal/service/build_service.go`
|
|
- Executor: `internal/worker/work_executor.go` (poll loop, heartbeat, task routing)
|
|
- Executor: `internal/worker/build_executor.go` (BuildSpec→AgentRequest)
|
|
- Git: `internal/worker/git_operations.go` (clone, commit, push)
|
|
- Maintenance: `internal/worker/queue_maintenance.go` (stale recovery, cleanup, metrics)
|
|
- Handler: `internal/handlers/workers.go` (REST API for workers)
|
|
- Handler: `internal/handlers/builds.go` (REST API for builds)
|
|
- Handler: `internal/handlers/create_and_build.go` (combined create+build)
|
|
- Migration: `internal/db/migrations/012_worker_registry.sql`
|
|
|
|
## Worker Lifecycle (Embedded)
|
|
|
|
1. rdev-api starts → WorkExecutor registers as worker in registry
|
|
2. Heartbeat loop: every 30s sends heartbeat via WorkerService
|
|
3. Poll loop: every 5s dequeues next task from work queue
|
|
4. BuildExecutor: clones repo, executes CodeAgent, commits/pushes if auto_commit
|
|
5. Reports completion with BuildResult via WorkerService
|
|
6. Graceful shutdown: deregisters worker on rdev-api stop
|
|
|
|
## Worker Statuses
|
|
|
|
- `idle` - available for new tasks
|
|
- `busy` - currently executing a task
|
|
- `draining` - not accepting new tasks (pre-shutdown)
|
|
- `offline` - missed heartbeat threshold
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/workers` | List all workers with status summary |
|
|
| GET | `/workers/{workerId}` | Get worker details |
|
|
| POST | `/workers/{workerId}/drain` | Set worker to draining |
|
|
| POST | `/projects/{id}/builds` | Start build for project |
|
|
| GET | `/projects/{id}/builds` | List builds for project |
|
|
| GET | `/builds/{taskId}` | Get build status |
|
|
| POST | `/project/create-and-build` | Create project + start build |
|
|
|
|
## Queue Maintenance
|
|
|
|
The QueueMaintenance worker runs inside rdev-api alongside the WorkExecutor:
|
|
- **Stale task recovery** (every 1m): Requeues tasks running >30m without completion
|
|
- **Stale worker marking** (every 1m): Marks workers offline after 2m without heartbeat
|
|
- **Old task cleanup** (every 1m): Removes completed/failed/cancelled tasks >7 days old
|
|
- **Metrics refresh** (every 15s): Updates Prometheus gauges for queue depth and worker counts
|
|
|
|
## Related Topics
|
|
|
|
- [Work Queue](./work-queue.md)
|
|
- [Build Orchestration](../features/build-orchestration.md)
|