# Operations Audit System **Status**: Spec **Purpose**: Make automated development debuggable via API ## Overview Every action on a project is an **Operation**. Operations capture what happened, step-by-step, with enough detail to pinpoint failures without digging through logs. ``` GET /projects/testgo1/operations?status=failed → Operation "build" failed at step "build-api": git executable not found ``` ## Design Principles 1. **Queryable via API** - No kubectl, no Woodpecker UI, no guessing 2. **Comprehensive, not verbose** - Capture essence + detail separately 3. **30-day retention** - Operations are for debugging, not compliance 4. **Linked to permanent audit** - `audit_log` stays forever, operations link to it ## Data Model ### Operations Table ```sql CREATE TABLE operations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), project_id TEXT NOT NULL, type TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'running', -- Correlation request_id TEXT, -- HTTP request that initiated triggered_by UUID, -- Parent operation (build triggered by component.add) commit_sha TEXT, -- Git commit this operation created/triggered external_ref TEXT, -- Woodpecker build#, K8s deployment, etc. -- Timing started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), completed_at TIMESTAMPTZ, duration_ms INT, -- Content (JSONB for flexibility) input JSONB, -- What was requested output JSONB, -- What was produced -- Error handling: essence + detail error TEXT, -- One-line summary error_detail TEXT, -- Full stack/output (truncated to 10KB) -- Steps steps JSONB NOT NULL DEFAULT '[]' ); -- Indexes CREATE INDEX idx_ops_project_time ON operations(project_id, started_at DESC); CREATE INDEX idx_ops_project_status ON operations(project_id, status) WHERE status IN ('running', 'failed'); CREATE INDEX idx_ops_commit ON operations(commit_sha) WHERE commit_sha IS NOT NULL; CREATE INDEX idx_ops_cleanup ON operations(started_at) WHERE started_at < NOW() - INTERVAL '30 days'; ``` ### Step Structure ```json { "name": "build-api", "status": "failed", "started_at": "2026-02-01T20:31:45Z", "duration_ms": 17000, "output": {"image": "registry.threesix.ai/testgo1/api:abc123"}, "error": "git executable not found", "error_detail": "exec: \"git\": executable file not found in $PATH\n at /app/pkg/app.go:24" } ``` ### Operation Types | Type | Trigger | Key Steps | |------|---------|-----------| | `project.create` | `POST /projects` | create_pod, create_repo, activate_ci, create_dns | | `component.add` | `POST /projects/{id}/components` | render_template, commit_files, create_deployment | | `build` | Woodpecker webhook | git, build-{component}, deploy-{component} | | `resource.provision` | `POST /projects/{id}/databases` | create_database, create_user, store_credentials | ## API ### List Operations ``` GET /projects/{id}/operations GET /projects/{id}/operations?status=failed GET /projects/{id}/operations?type=build GET /projects/{id}/operations?since=1h GET /projects/{id}/operations?limit=50 ``` Response: ```json { "data": [ { "id": "op-abc123", "type": "build", "status": "failed", "started_at": "2026-02-01T20:31:45Z", "duration_ms": 87000, "error": "build-api: git executable not found", "steps_summary": "git ✓ → build-web ✓ → build-api ✗" } ] } ``` ### Get Operation Detail ``` GET /projects/{id}/operations/{operation_id} ``` Response: ```json { "data": { "id": "op-abc123", "type": "build", "status": "failed", "triggered_by": "op-xyz789", "commit_sha": "abc123", "external_ref": "build#42", "started_at": "2026-02-01T20:31:45Z", "completed_at": "2026-02-01T20:33:12Z", "duration_ms": 87000, "input": { "commit_message": "Add service component: api" }, "steps": [ {"name": "git", "status": "completed", "duration_ms": 5000}, {"name": "build-web", "status": "completed", "duration_ms": 48000}, { "name": "build-api", "status": "failed", "duration_ms": 17000, "error": "git executable not found", "error_detail": "/app/pkg/app/app.go:24:2: github.com/jordan/testgo1/pkg@v0.0.0: exec: \"git\": executable file not found..." } ], "error": "build-api: git executable not found", "error_detail": "Full kaniko output..." } } ``` ### Find by Commit ``` GET /projects/{id}/operations?commit=abc123 ``` Returns operations that created or were triggered by this commit. ## Correlation ### Request → Operation ``` HTTP Request (X-Request-ID: req-123) ↓ Handler creates Operation (id: op-abc, request_id: req-123) ↓ Service executes steps, updates operation ↓ Response includes operation_id ``` ### Component Add → Build ``` component.add (op-abc) → commits to git (sha: abc123) → operation.commit_sha = "abc123" Woodpecker webhook fires for abc123 → rdev looks up: SELECT id FROM operations WHERE commit_sha = 'abc123' → creates build operation (triggered_by: op-abc) ``` ### Linking to Permanent Audit Operations are temporary (30d). For compliance, `audit_log` is permanent. ```sql -- Add operation_id to audit_log ALTER TABLE audit_log ADD COLUMN operation_id UUID; CREATE INDEX idx_audit_operation ON audit_log(operation_id) WHERE operation_id IS NOT NULL; ``` Query permanent history via audit_log, debug recent issues via operations. ## Implementation ### Phase 1: Foundation - [ ] Migration: operations table - [ ] Domain: Operation, OperationStep - [ ] Port: OperationRepository - [ ] Adapter: PostgreSQL implementation - [ ] Handler: GET /projects/{id}/operations ### Phase 2: Instrumentation - [ ] Instrument: project.create handler - [ ] Instrument: component.add handler - [ ] Instrument: resource provisioning - [ ] Add operation_id to responses ### Phase 3: Build Integration - [ ] Woodpecker webhook receiver endpoint - [ ] Parse build events into operation steps - [ ] Link via commit_sha ### Phase 4: Cleanup - [ ] Background job: delete operations older than 30d - [ ] Add operation_id column to audit_log ## Files to Create/Modify ``` internal/ ├── domain/ │ └── operation.go # NEW: Operation, OperationStep, OperationType ├── port/ │ └── operation.go # NEW: OperationRepository interface ├── adapter/ │ └── postgres/ │ └── operation_repo.go # NEW: PostgreSQL implementation ├── service/ │ └── operation_service.go # NEW: Business logic ├── handlers/ │ └── operations.go # NEW: API handlers │ └── project.go # MODIFY: Create operation on project.create │ └── component.go # MODIFY: Create operation on component.add │ └── webhooks.go # MODIFY: Handle Woodpecker build events └── worker/ └── cleanup.go # NEW: 30-day retention cleanup migrations/ └── 015_operations.sql # NEW: Table + indexes ``` ## Example Debugging Session ```bash # Project deployment failing. What happened? $ curl -s "$API/projects/testgo1/operations?status=failed" | jq '.[0]' { "id": "op-abc123", "type": "build", "error": "build-api: git executable not found", "steps_summary": "git ✓ → build-web ✓ → build-api ✗" } # Get details $ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.steps[-1]' { "name": "build-api", "status": "failed", "error": "git executable not found", "error_detail": "exec: \"git\": executable file not found in $PATH..." } # What triggered this build? $ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.triggered_by' "op-xyz789" # What was that operation? $ curl -s "$API/projects/testgo1/operations/op-xyz789" | jq '{type, input}' { "type": "component.add", "input": {"template": "service", "name": "api"} } # Root cause: component.add triggered build, build failed due to missing git in Dockerfile ``` ## Open Questions 1. **Stream running operations?** - Could add SSE endpoint for real-time step updates 2. **CLI integration?** - `rdev debug testgo1` to show recent failures 3. **Alerting?** - Webhook when operation fails?