Operations Audit (new feature): - Add Operation domain model with status tracking (pending, running, completed, failed, cancelled) - Add OperationRepository with PostgreSQL implementation - Add OperationService for CRUD and lifecycle management - Add operations handlers (list, get, cancel endpoints) - Add migration 015_operations.sql for operations table - Add operation cleanup worker for stale operation handling - Add ErrOperationNotFound to domain errors Template Improvements: - Add CLAUDE.md configuration files to astro-landing, default, and go-api templates - Fix PORT template variable usage in nginx configs for app templates - Add replace directives for local pkg module in Go templates - Simplify Go service/worker Dockerfiles for workspace builds - Fix TypeScript error in logger template Other: - Refactor landing-test.sh cookbook script - Update CLAUDE.md version reference Note: Some files exceed 500-line limit (pre-existing debt + new feature) - component.go: 550 lines (unchanged, pre-existing) - main.go: 522 lines (added operations wiring) - operation_repo.go: 569 lines (new, needs splitting) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8.2 KiB
8.2 KiB
Operations Audit System
Status: Spec Purpose: Make automated development debuggable via API
Overview
Every action on a project is an Operation. Operations capture what happened, step-by-step, with enough detail to pinpoint failures without digging through logs.
GET /projects/testgo1/operations?status=failed
→ Operation "build" failed at step "build-api": git executable not found
Design Principles
- Queryable via API - No kubectl, no Woodpecker UI, no guessing
- Comprehensive, not verbose - Capture essence + detail separately
- 30-day retention - Operations are for debugging, not compliance
- Linked to permanent audit -
audit_logstays forever, operations link to it
Data Model
Operations Table
CREATE TABLE operations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id TEXT NOT NULL,
type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'running',
-- Correlation
request_id TEXT, -- HTTP request that initiated
triggered_by UUID, -- Parent operation (build triggered by component.add)
commit_sha TEXT, -- Git commit this operation created/triggered
external_ref TEXT, -- Woodpecker build#, K8s deployment, etc.
-- Timing
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
duration_ms INT,
-- Content (JSONB for flexibility)
input JSONB, -- What was requested
output JSONB, -- What was produced
-- Error handling: essence + detail
error TEXT, -- One-line summary
error_detail TEXT, -- Full stack/output (truncated to 10KB)
-- Steps
steps JSONB NOT NULL DEFAULT '[]'
);
-- Indexes
CREATE INDEX idx_ops_project_time ON operations(project_id, started_at DESC);
CREATE INDEX idx_ops_project_status ON operations(project_id, status) WHERE status IN ('running', 'failed');
CREATE INDEX idx_ops_commit ON operations(commit_sha) WHERE commit_sha IS NOT NULL;
CREATE INDEX idx_ops_cleanup ON operations(started_at) WHERE started_at < NOW() - INTERVAL '30 days';
Step Structure
{
"name": "build-api",
"status": "failed",
"started_at": "2026-02-01T20:31:45Z",
"duration_ms": 17000,
"output": {"image": "registry.threesix.ai/testgo1/api:abc123"},
"error": "git executable not found",
"error_detail": "exec: \"git\": executable file not found in $PATH\n at /app/pkg/app.go:24"
}
Operation Types
| Type | Trigger | Key Steps |
|---|---|---|
project.create |
POST /projects |
create_pod, create_repo, activate_ci, create_dns |
component.add |
POST /projects/{id}/components |
render_template, commit_files, create_deployment |
build |
Woodpecker webhook | git, build-{component}, deploy-{component} |
resource.provision |
POST /projects/{id}/databases |
create_database, create_user, store_credentials |
API
List Operations
GET /projects/{id}/operations
GET /projects/{id}/operations?status=failed
GET /projects/{id}/operations?type=build
GET /projects/{id}/operations?since=1h
GET /projects/{id}/operations?limit=50
Response:
{
"data": [
{
"id": "op-abc123",
"type": "build",
"status": "failed",
"started_at": "2026-02-01T20:31:45Z",
"duration_ms": 87000,
"error": "build-api: git executable not found",
"steps_summary": "git ✓ → build-web ✓ → build-api ✗"
}
]
}
Get Operation Detail
GET /projects/{id}/operations/{operation_id}
Response:
{
"data": {
"id": "op-abc123",
"type": "build",
"status": "failed",
"triggered_by": "op-xyz789",
"commit_sha": "abc123",
"external_ref": "build#42",
"started_at": "2026-02-01T20:31:45Z",
"completed_at": "2026-02-01T20:33:12Z",
"duration_ms": 87000,
"input": {
"commit_message": "Add service component: api"
},
"steps": [
{"name": "git", "status": "completed", "duration_ms": 5000},
{"name": "build-web", "status": "completed", "duration_ms": 48000},
{
"name": "build-api",
"status": "failed",
"duration_ms": 17000,
"error": "git executable not found",
"error_detail": "/app/pkg/app/app.go:24:2: github.com/jordan/testgo1/pkg@v0.0.0: exec: \"git\": executable file not found..."
}
],
"error": "build-api: git executable not found",
"error_detail": "Full kaniko output..."
}
}
Find by Commit
GET /projects/{id}/operations?commit=abc123
Returns operations that created or were triggered by this commit.
Correlation
Request → Operation
HTTP Request (X-Request-ID: req-123)
↓
Handler creates Operation (id: op-abc, request_id: req-123)
↓
Service executes steps, updates operation
↓
Response includes operation_id
Component Add → Build
component.add (op-abc)
→ commits to git (sha: abc123)
→ operation.commit_sha = "abc123"
Woodpecker webhook fires for abc123
→ rdev looks up: SELECT id FROM operations WHERE commit_sha = 'abc123'
→ creates build operation (triggered_by: op-abc)
Linking to Permanent Audit
Operations are temporary (30d). For compliance, audit_log is permanent.
-- Add operation_id to audit_log
ALTER TABLE audit_log ADD COLUMN operation_id UUID;
CREATE INDEX idx_audit_operation ON audit_log(operation_id) WHERE operation_id IS NOT NULL;
Query permanent history via audit_log, debug recent issues via operations.
Implementation
Phase 1: Foundation
- Migration: operations table
- Domain: Operation, OperationStep
- Port: OperationRepository
- Adapter: PostgreSQL implementation
- Handler: GET /projects/{id}/operations
Phase 2: Instrumentation
- Instrument: project.create handler
- Instrument: component.add handler
- Instrument: resource provisioning
- Add operation_id to responses
Phase 3: Build Integration
- Woodpecker webhook receiver endpoint
- Parse build events into operation steps
- Link via commit_sha
Phase 4: Cleanup
- Background job: delete operations older than 30d
- Add operation_id column to audit_log
Files to Create/Modify
internal/
├── domain/
│ └── operation.go # NEW: Operation, OperationStep, OperationType
├── port/
│ └── operation.go # NEW: OperationRepository interface
├── adapter/
│ └── postgres/
│ └── operation_repo.go # NEW: PostgreSQL implementation
├── service/
│ └── operation_service.go # NEW: Business logic
├── handlers/
│ └── operations.go # NEW: API handlers
│ └── project.go # MODIFY: Create operation on project.create
│ └── component.go # MODIFY: Create operation on component.add
│ └── webhooks.go # MODIFY: Handle Woodpecker build events
└── worker/
└── cleanup.go # NEW: 30-day retention cleanup
migrations/
└── 015_operations.sql # NEW: Table + indexes
Example Debugging Session
# Project deployment failing. What happened?
$ curl -s "$API/projects/testgo1/operations?status=failed" | jq '.[0]'
{
"id": "op-abc123",
"type": "build",
"error": "build-api: git executable not found",
"steps_summary": "git ✓ → build-web ✓ → build-api ✗"
}
# Get details
$ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.steps[-1]'
{
"name": "build-api",
"status": "failed",
"error": "git executable not found",
"error_detail": "exec: \"git\": executable file not found in $PATH..."
}
# What triggered this build?
$ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.triggered_by'
"op-xyz789"
# What was that operation?
$ curl -s "$API/projects/testgo1/operations/op-xyz789" | jq '{type, input}'
{
"type": "component.add",
"input": {"template": "service", "name": "api"}
}
# Root cause: component.add triggered build, build failed due to missing git in Dockerfile
Open Questions
- Stream running operations? - Could add SSE endpoint for real-time step updates
- CLI integration? -
rdev debug testgo1to show recent failures - Alerting? - Webhook when operation fails?