Operations Audit (new feature): - Add Operation domain model with status tracking (pending, running, completed, failed, cancelled) - Add OperationRepository with PostgreSQL implementation - Add OperationService for CRUD and lifecycle management - Add operations handlers (list, get, cancel endpoints) - Add migration 015_operations.sql for operations table - Add operation cleanup worker for stale operation handling - Add ErrOperationNotFound to domain errors Template Improvements: - Add CLAUDE.md configuration files to astro-landing, default, and go-api templates - Fix PORT template variable usage in nginx configs for app templates - Add replace directives for local pkg module in Go templates - Simplify Go service/worker Dockerfiles for workspace builds - Fix TypeScript error in logger template Other: - Refactor landing-test.sh cookbook script - Update CLAUDE.md version reference Note: Some files exceed 500-line limit (pre-existing debt + new feature) - component.go: 550 lines (unchanged, pre-existing) - main.go: 522 lines (added operations wiring) - operation_repo.go: 569 lines (new, needs splitting) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
290 lines
8.2 KiB
Markdown
290 lines
8.2 KiB
Markdown
# Operations Audit System
|
|
|
|
**Status**: Spec
|
|
**Purpose**: Make automated development debuggable via API
|
|
|
|
## Overview
|
|
|
|
Every action on a project is an **Operation**. Operations capture what happened, step-by-step, with enough detail to pinpoint failures without digging through logs.
|
|
|
|
```
|
|
GET /projects/testgo1/operations?status=failed
|
|
|
|
→ Operation "build" failed at step "build-api": git executable not found
|
|
```
|
|
|
|
## Design Principles
|
|
|
|
1. **Queryable via API** - No kubectl, no Woodpecker UI, no guessing
|
|
2. **Comprehensive, not verbose** - Capture essence + detail separately
|
|
3. **30-day retention** - Operations are for debugging, not compliance
|
|
4. **Linked to permanent audit** - `audit_log` stays forever, operations link to it
|
|
|
|
## Data Model
|
|
|
|
### Operations Table
|
|
|
|
```sql
|
|
CREATE TABLE operations (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
project_id TEXT NOT NULL,
|
|
type TEXT NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'running',
|
|
|
|
-- Correlation
|
|
request_id TEXT, -- HTTP request that initiated
|
|
triggered_by UUID, -- Parent operation (build triggered by component.add)
|
|
commit_sha TEXT, -- Git commit this operation created/triggered
|
|
external_ref TEXT, -- Woodpecker build#, K8s deployment, etc.
|
|
|
|
-- Timing
|
|
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
completed_at TIMESTAMPTZ,
|
|
duration_ms INT,
|
|
|
|
-- Content (JSONB for flexibility)
|
|
input JSONB, -- What was requested
|
|
output JSONB, -- What was produced
|
|
|
|
-- Error handling: essence + detail
|
|
error TEXT, -- One-line summary
|
|
error_detail TEXT, -- Full stack/output (truncated to 10KB)
|
|
|
|
-- Steps
|
|
steps JSONB NOT NULL DEFAULT '[]'
|
|
);
|
|
|
|
-- Indexes
|
|
CREATE INDEX idx_ops_project_time ON operations(project_id, started_at DESC);
|
|
CREATE INDEX idx_ops_project_status ON operations(project_id, status) WHERE status IN ('running', 'failed');
|
|
CREATE INDEX idx_ops_commit ON operations(commit_sha) WHERE commit_sha IS NOT NULL;
|
|
CREATE INDEX idx_ops_cleanup ON operations(started_at) WHERE started_at < NOW() - INTERVAL '30 days';
|
|
```
|
|
|
|
### Step Structure
|
|
|
|
```json
|
|
{
|
|
"name": "build-api",
|
|
"status": "failed",
|
|
"started_at": "2026-02-01T20:31:45Z",
|
|
"duration_ms": 17000,
|
|
"output": {"image": "registry.threesix.ai/testgo1/api:abc123"},
|
|
"error": "git executable not found",
|
|
"error_detail": "exec: \"git\": executable file not found in $PATH\n at /app/pkg/app.go:24"
|
|
}
|
|
```
|
|
|
|
### Operation Types
|
|
|
|
| Type | Trigger | Key Steps |
|
|
|------|---------|-----------|
|
|
| `project.create` | `POST /projects` | create_pod, create_repo, activate_ci, create_dns |
|
|
| `component.add` | `POST /projects/{id}/components` | render_template, commit_files, create_deployment |
|
|
| `build` | Woodpecker webhook | git, build-{component}, deploy-{component} |
|
|
| `resource.provision` | `POST /projects/{id}/databases` | create_database, create_user, store_credentials |
|
|
|
|
## API
|
|
|
|
### List Operations
|
|
|
|
```
|
|
GET /projects/{id}/operations
|
|
GET /projects/{id}/operations?status=failed
|
|
GET /projects/{id}/operations?type=build
|
|
GET /projects/{id}/operations?since=1h
|
|
GET /projects/{id}/operations?limit=50
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"data": [
|
|
{
|
|
"id": "op-abc123",
|
|
"type": "build",
|
|
"status": "failed",
|
|
"started_at": "2026-02-01T20:31:45Z",
|
|
"duration_ms": 87000,
|
|
"error": "build-api: git executable not found",
|
|
"steps_summary": "git ✓ → build-web ✓ → build-api ✗"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Get Operation Detail
|
|
|
|
```
|
|
GET /projects/{id}/operations/{operation_id}
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"data": {
|
|
"id": "op-abc123",
|
|
"type": "build",
|
|
"status": "failed",
|
|
"triggered_by": "op-xyz789",
|
|
"commit_sha": "abc123",
|
|
"external_ref": "build#42",
|
|
"started_at": "2026-02-01T20:31:45Z",
|
|
"completed_at": "2026-02-01T20:33:12Z",
|
|
"duration_ms": 87000,
|
|
"input": {
|
|
"commit_message": "Add service component: api"
|
|
},
|
|
"steps": [
|
|
{"name": "git", "status": "completed", "duration_ms": 5000},
|
|
{"name": "build-web", "status": "completed", "duration_ms": 48000},
|
|
{
|
|
"name": "build-api",
|
|
"status": "failed",
|
|
"duration_ms": 17000,
|
|
"error": "git executable not found",
|
|
"error_detail": "/app/pkg/app/app.go:24:2: github.com/jordan/testgo1/pkg@v0.0.0: exec: \"git\": executable file not found..."
|
|
}
|
|
],
|
|
"error": "build-api: git executable not found",
|
|
"error_detail": "Full kaniko output..."
|
|
}
|
|
}
|
|
```
|
|
|
|
### Find by Commit
|
|
|
|
```
|
|
GET /projects/{id}/operations?commit=abc123
|
|
```
|
|
|
|
Returns operations that created or were triggered by this commit.
|
|
|
|
## Correlation
|
|
|
|
### Request → Operation
|
|
|
|
```
|
|
HTTP Request (X-Request-ID: req-123)
|
|
↓
|
|
Handler creates Operation (id: op-abc, request_id: req-123)
|
|
↓
|
|
Service executes steps, updates operation
|
|
↓
|
|
Response includes operation_id
|
|
```
|
|
|
|
### Component Add → Build
|
|
|
|
```
|
|
component.add (op-abc)
|
|
→ commits to git (sha: abc123)
|
|
→ operation.commit_sha = "abc123"
|
|
|
|
Woodpecker webhook fires for abc123
|
|
→ rdev looks up: SELECT id FROM operations WHERE commit_sha = 'abc123'
|
|
→ creates build operation (triggered_by: op-abc)
|
|
```
|
|
|
|
### Linking to Permanent Audit
|
|
|
|
Operations are temporary (30d). For compliance, `audit_log` is permanent.
|
|
|
|
```sql
|
|
-- Add operation_id to audit_log
|
|
ALTER TABLE audit_log ADD COLUMN operation_id UUID;
|
|
CREATE INDEX idx_audit_operation ON audit_log(operation_id) WHERE operation_id IS NOT NULL;
|
|
```
|
|
|
|
Query permanent history via audit_log, debug recent issues via operations.
|
|
|
|
## Implementation
|
|
|
|
### Phase 1: Foundation
|
|
- [ ] Migration: operations table
|
|
- [ ] Domain: Operation, OperationStep
|
|
- [ ] Port: OperationRepository
|
|
- [ ] Adapter: PostgreSQL implementation
|
|
- [ ] Handler: GET /projects/{id}/operations
|
|
|
|
### Phase 2: Instrumentation
|
|
- [ ] Instrument: project.create handler
|
|
- [ ] Instrument: component.add handler
|
|
- [ ] Instrument: resource provisioning
|
|
- [ ] Add operation_id to responses
|
|
|
|
### Phase 3: Build Integration
|
|
- [ ] Woodpecker webhook receiver endpoint
|
|
- [ ] Parse build events into operation steps
|
|
- [ ] Link via commit_sha
|
|
|
|
### Phase 4: Cleanup
|
|
- [ ] Background job: delete operations older than 30d
|
|
- [ ] Add operation_id column to audit_log
|
|
|
|
## Files to Create/Modify
|
|
|
|
```
|
|
internal/
|
|
├── domain/
|
|
│ └── operation.go # NEW: Operation, OperationStep, OperationType
|
|
├── port/
|
|
│ └── operation.go # NEW: OperationRepository interface
|
|
├── adapter/
|
|
│ └── postgres/
|
|
│ └── operation_repo.go # NEW: PostgreSQL implementation
|
|
├── service/
|
|
│ └── operation_service.go # NEW: Business logic
|
|
├── handlers/
|
|
│ └── operations.go # NEW: API handlers
|
|
│ └── project.go # MODIFY: Create operation on project.create
|
|
│ └── component.go # MODIFY: Create operation on component.add
|
|
│ └── webhooks.go # MODIFY: Handle Woodpecker build events
|
|
└── worker/
|
|
└── cleanup.go # NEW: 30-day retention cleanup
|
|
|
|
migrations/
|
|
└── 015_operations.sql # NEW: Table + indexes
|
|
```
|
|
|
|
## Example Debugging Session
|
|
|
|
```bash
|
|
# Project deployment failing. What happened?
|
|
$ curl -s "$API/projects/testgo1/operations?status=failed" | jq '.[0]'
|
|
{
|
|
"id": "op-abc123",
|
|
"type": "build",
|
|
"error": "build-api: git executable not found",
|
|
"steps_summary": "git ✓ → build-web ✓ → build-api ✗"
|
|
}
|
|
|
|
# Get details
|
|
$ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.steps[-1]'
|
|
{
|
|
"name": "build-api",
|
|
"status": "failed",
|
|
"error": "git executable not found",
|
|
"error_detail": "exec: \"git\": executable file not found in $PATH..."
|
|
}
|
|
|
|
# What triggered this build?
|
|
$ curl -s "$API/projects/testgo1/operations/op-abc123" | jq '.triggered_by'
|
|
"op-xyz789"
|
|
|
|
# What was that operation?
|
|
$ curl -s "$API/projects/testgo1/operations/op-xyz789" | jq '{type, input}'
|
|
{
|
|
"type": "component.add",
|
|
"input": {"template": "service", "name": "api"}
|
|
}
|
|
|
|
# Root cause: component.add triggered build, build failed due to missing git in Dockerfile
|
|
```
|
|
|
|
## Open Questions
|
|
|
|
1. **Stream running operations?** - Could add SSE endpoint for real-time step updates
|
|
2. **CLI integration?** - `rdev debug testgo1` to show recent failures
|
|
3. **Alerting?** - Webhook when operation fails?
|