jordan 9a1309a0c5 feat: fix composable monorepo CI builds + health endpoint improvements

Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)

Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health

Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-03 18:46:51 -07:00

17 KiB

Raw Blame History

Visual Verification Implementation Breakdown

Goal: Add Playwright-based visual verification to rdev, enabling automated screenshot/video capture of deployed sites and AI-driven feature completeness evaluation. Integrate with SDLC as an optional QA gate and add a cookbook E2E test.

Estimated Duration: 4 weeks (assumes ~25 hours/week of focused work)

Week 1: Foundation — Domain + Capture Infrastructure

Goals:

Playwright pod deployed and reachable via kubectl exec
Capture script working end-to-end
Domain models and work task type in place
Manual verification via kubectl exec confirms capture works

Tasks:

Day 1-2: Playwright Pod Infrastructure

Create Playwright pod manifest (deployments/k8s/base/playwright-pod.yaml)
- StatefulSet with mcr.microsoft.com/playwright:v1.50.0-noble image
- sleep infinity command (stays alive for kubectl exec)
- Labels: app: playwright, rdev.orchard9.ai/role: playwright
- Volumes: /captures (emptyDir), /scripts (ConfigMap)
- Resources: 500m CPU / 1Gi request, 2 CPU / 4Gi limit
Create capture script (deployments/k8s/base/playwright-scripts/capture.js)
- ~60 lines Node.js using Playwright
- CLI: --url, --viewports (comma-sep), --output, --wait-for, --full-page, --video, --timeout
- Output: JSON manifest to stdout with screenshot paths
- Error handling: catch navigation failures, timeout gracefully
Create ConfigMap for script (deployments/k8s/base/playwright-configmap.yaml)
- Mount capture.js at /scripts/capture.js

Deploy to cluster and test manually

kubectl apply -f deployments/k8s/base/playwright-configmap.yaml
kubectl apply -f deployments/k8s/base/playwright-pod.yaml
kubectl exec playwright-0 -- node /scripts/capture.js \
  --url=https://example.com --viewports=1920x1080 --output=/captures/test/
kubectl exec playwright-0 -- cat /captures/test/manifest.json

Day 3: Domain Models

Create domain types (internal/domain/verify.go)
- VerifySpec struct with fields: URL, Viewports, WaitFor, WaitTimeout, FullPage, Video, Evaluate, Prompt, SpecPath, CallbackURL
- Validate() method: URL required, callback URL validation (reuse ValidateCallbackURL)
- VerifyResult struct: Success, Screenshots, Video, Evaluation, Score, Passed, DurationMs, Error
- ToWorkResult() method (promote screenshots to artifacts map)
Add work task type (internal/domain/work.go)
- Add WorkTaskTypeVerify WorkTaskType = "verify" to constants
- Update IsValid() to include verify
Unit tests (internal/domain/verify_test.go)
- Test Validate() with valid/invalid specs
- Test ToWorkResult() conversion

Day 4-5: Verify Executor (Capture Only)

Create verify executor (internal/worker/verify_executor.go)
- Follow BuildExecutor pattern exactly
- Execute(ctx, task) method:
  - Parse VerifySpec from task.Spec map
  - Build kubectl exec command: kubectl exec playwright-0 -- node /scripts/capture.js --url=X ...
  - Execute via existing CommandExecutor port
  - Parse JSON manifest from stdout
  - Return BuildResult with artifacts map containing screenshot paths
- Config struct: VerifyExecutorConfig with playwright pod name, namespace
- Constructor: NewVerifyExecutor(executor, streams, logger, cfg)
Wire executor to WorkExecutor (internal/worker/work_executor.go)
- Add verifyExec *VerifyExecutor field
- Add case in executeTask() switch for WorkTaskTypeVerify
- Update NewWorkExecutor() to accept VerifyExecutor
Unit tests (internal/worker/verify_executor_test.go)
- Mock CommandExecutor to return capture manifest JSON
- Test successful capture with multiple viewports
- Test failure handling (command fails, invalid JSON)

Deliverables:

Playwright pod running in cluster
Capture script takes screenshots successfully
VerifySpec/VerifyResult domain types with tests
VerifyExecutor can dispatch capture via kubectl exec
Work queue can dispatch verify tasks (manual test via SQL insert)

Foundation this enables:

Week 2 can build API layer knowing capture works
Executor pattern established for AI evaluation later

Week 2: API Layer + Manual E2E

Goals:

Full API surface: POST /verify, GET /verify/{id}, GET /verifications
Auth scopes configured
Manual E2E working: API call → queue → capture → result
Initial release candidate deployed to staging

Tasks:

Day 1: Auth and Service Layer

Add auth scopes (internal/auth/scopes.go)
- ScopeVerifyRead Scope = "verify:read"
- ScopeVerifyWrite Scope = "verify:write"
- Add to AllScopes if needed
Create verify service (internal/service/verify_service.go)
- Follow BuildService pattern
- StartVerify(ctx, projectID, spec) → validate, enqueue task, return task ID
- GetVerifyStatus(ctx, taskID) → get task from work queue
- ListVerifications(ctx, projectID, limit) → list tasks by project
- Dependencies: WorkQueue port (existing)
Unit tests (internal/service/verify_service_test.go)
- Mock work queue
- Test enqueue, status, list

Day 2-3: Handler Layer

Create verify handler (internal/handlers/verify.go)
- Follow BuildsHandler pattern exactly
- Mount(r api.Router) with scopes:
  - POST /projects/{id}/verify → ScopeVerifyWrite
  - GET /projects/{id}/verifications → ScopeVerifyRead
  - GET /verify/{taskId} → ScopeVerifyRead
- Use api.DecodeJSON(), validate.New(), response helpers
- Request struct: VerifyRequest matching VerifySpec
- Response structs: match existing patterns
Wire DI (cmd/rdev-api/main.go)
- Create VerifyExecutor in worker setup
- Create VerifyService
- Create VerifyHandler
- Mount routes
Handler tests (internal/handlers/verify_test.go)
- Test POST with valid/invalid specs
- Test auth scope enforcement
- Test GET status/list

Day 4: SSE Events

Add verify events (internal/worker/verify_executor.go)
- Publish events via StreamPublisher:
  - verify.started - task claimed
  - verify.capturing - starting capture
  - verify.captured - capture complete with manifest
  - verify.completed / verify.failed - final status
- Event constants in verify_executor.go (follow BuildExecutor pattern)

Day 5: Manual E2E + Deploy

Manual E2E test sequence

# 1. Start verification
curl -X POST $RDEV_API_URL/projects/myproject/verify \
  -H "X-API-Key: $RDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://myproject.threesix.ai", "viewports": ["1920x1080"]}'
# Response: {"task_id": "xxx"}

# 2. Poll for completion
curl $RDEV_API_URL/verify/xxx -H "X-API-Key: $RDEV_API_KEY"
# Response: screenshots in artifacts

Build and deploy

./scripts/release.sh v0.11.0 "feat: add visual verification (capture-only MVP)" --deploy

Deliverables:

Auth scopes for verify:read/write
VerifyService with enqueue/status/list
VerifyHandler with 3 endpoints
SSE events for verification progress
Deployed to staging, manual E2E passing

Foundation this enables:

Week 3 can add AI evaluation knowing API works
Cookbook script can use standard api_call() pattern

Week 3: AI Evaluation + Cookbook Test

Goals:

AI evaluation path working (Claude reads screenshots, returns verdict)
Cookbook E2E test script: visual-verify-test.sh
Add to common.sh utilities
Full E2E passing in CI

Tasks:

Day 1-2: AI Evaluation Path

Add evaluation to VerifyExecutor (internal/worker/verify_executor.go)
- After successful capture, if spec.Evaluate:
  - Build evaluation prompt: "Compare these screenshots against the specification..."
  - Include spec.Prompt or read spec.SpecPath content
  - Call Claude Code via CodeAgentRegistry
  - Pass screenshots as attachments (file paths in pod)
  - Parse evaluation output for score (look for "Score: XX/100" pattern)
  - Set result.Evaluation, result.Score, result.Passed

Evaluation prompt template (hardcoded in executor for now)

Evaluate these screenshots against the following specification:

{spec.Prompt or contents of spec.SpecPath}

For each screenshot, assess:
1. Does the UI match the specification?
2. Are all required elements present?
3. Is the layout correct at this viewport?

End with: "Score: XX/100" and "PASSED" or "FAILED"

Handle partial failures (internal/worker/verify_executor.go)
- If capture succeeds but evaluation fails:
  - Set success=true (screenshots are still useful)
  - Leave evaluation=""
  - Log warning
Unit tests for evaluation path
- Mock CodeAgentRegistry
- Test evaluation output parsing
- Test partial failure handling

Day 3-4: Cookbook Test Script

Add utility to common.sh (cookbooks/scripts/common.sh)

# Wait for verification to complete
# Arguments: task_id [max_attempts] [poll_interval]
wait_for_verify() {
    local task_id="$1"
    local max_attempts="${2:-30}"
    local poll_interval="${3:-5}"
    # Poll GET /verify/{task_id} until completed/failed
}

Create visual-verify-test.sh (cookbooks/scripts/visual-verify-test.sh)
- Follow cookbook script SKILL.md patterns exactly
- Commands: run, status, diagnose, teardown
- Flow:
  1. Create composable project with app-astro component
  2. Wait for initial deploy (site is live)
  3. Start build: "Create a hero section with a call-to-action button"
  4. Wait for build to complete
  5. Wait for CI pipeline
  6. Wait for site to respond
  7. Start verification: POST /projects/{id}/verify {url, evaluate: true, prompt: ...}
  8. Wait for verify to complete
  9. Assert: result.passed == true OR result.score >= 70
  10. Teardown
Add auto-teardown support
- Parse --auto-teardown flag
- Register cleanup trap
- Set CLEANUP_PROJECT

Day 5: Integration + CI

Test locally

./cookbooks/scripts/visual-verify-test.sh run vv-test --auto-teardown

Add to CI (if CI runs cookbook tests)
- Add visual-verify-test to test matrix
- Ensure playwright-0 pod is available in test environment
Document in cookbook skill (.claude/skills/cookbook-scripts/SKILL.md)
- Add wait_for_verify() to utilities list
- Add visual-verify-test.sh to examples

Deliverables:

AI evaluation working with score extraction
Partial failure handling (capture ok, eval fail)
wait_for_verify() in common.sh
visual-verify-test.sh passing end-to-end
Documentation updated

Foundation this enables:

Week 4 can add SDLC integration knowing full flow works
Cookbook pattern established for future tests

Week 4: SDLC Integration + Polish

Goals:

Visual verification as optional SDLC gate between QA and merge
Skeleton command: /verify-feature
Build chaining: auto-verify after deploy
Release v0.12.0 with full feature

Tasks:

Day 1-2: SDLC Types and Rules

Add artifact type (internal/sdlc/types.go)
- ArtifactVerification ArtifactType = "verification"
- Add to ValidArtifactTypes slice
- Add case in ArtifactFilename() → returns "verification.md"
Add action types (internal/sdlc/types.go)
- ActionVerifyFeature ActionType = "VERIFY_FEATURE"
- ActionFixVerificationIssues ActionType = "FIX_VERIFICATION_ISSUES"
Add classifier rules (internal/sdlc/rules_execution.go)
- needsVerificationRule():
  - Condition: Phase=QA, qa_results=passed, verification=nil or pending
  - Action: ActionVerifyFeature
  - NextCommand: "/verify-feature {slug}"
- verificationFailedRule():
  - Condition: Phase=QA, verification=failed
  - Action: ActionFixVerificationIssues
  - NextCommand: "/fix-verification-issues {slug}"
- verificationPassedRule():
  - Condition: Phase=QA, qa_results=passed, verification=passed
  - Action: ActionTransition to PhaseMerge
Update rule ordering (internal/sdlc/rules.go)
- Insert verification rules after qaPassedRule
- Update qaPassedRule: only transition if verification also passed OR feature doesn't require verification (config flag)
Unit tests (internal/sdlc/rules_execution_test.go)
- Test all three verification rules
- Test interaction with existing QA rules

Day 3: Skeleton Command

Create verify-feature command (embedded template: templates/skeleton/.claude/commands/verify-feature.md)

---
description: Visually verify a deployed feature
argument-hint: <feature-slug>
allowed-tools: Bash, Read, Write, Edit, Glob, Grep
---

Visually verify feature: $ARGUMENTS

## Instructions

1. Load feature spec from `.sdlc/features/$ARGUMENTS/spec.md`
2. Get project domain from CLAUDE.md or config
3. Determine the deployed URL
4. Execute verification via rdev API (if available) or Playwright directly
5. Write results to `.sdlc/features/$ARGUMENTS/verification.md`
6. Register artifact: `sdlc artifact create $ARGUMENTS verification`

## Output Format

Write `.sdlc/features/$ARGUMENTS/verification.md`:

```markdown
# Visual Verification: [Feature Title]

## Screenshots

| Viewport | Status | Notes |
|----------|--------|-------|
| Desktop (1920x1080) | PASS | All elements visible |
| Mobile (375x667) | PASS | Responsive layout correct |

## Evaluation

[AI or manual evaluation notes]

## Result

**Status:** PASSED
**Score:** 95/100

Update skeleton template to include the command
- Ensure new projects get verify-feature.md

Day 4: Build Chaining (Optional)

Add verify_after to BuildSpec (internal/domain/build.go)
- VerifyAfter bool - auto-verify after successful deploy
- VerifyURL string - URL to verify (if different from project domain)
Chain verification in BuildExecutor (internal/worker/build_executor.go)
- After successful build + push (line ~270):
```
if spec.VerifyAfter && spec.VerifyURL != "" {
    // Enqueue verify task
}
```
- Or: callback webhook triggers external verification
Update build handler to accept verify_after/verify_url

Day 5: Documentation + Release

Update documentation
- CLAUDE.md: Update platform status to "Done"
- visual-verification.md: Add SDLC integration examples
- sdlc.md: Document verification rules
Integration test
- Test full SDLC flow with verification gate
- Test classifier transitions correctly

Final release

./scripts/release.sh v0.12.0 "feat: visual verification with SDLC integration" --deploy

Deliverables:

ArtifactVerification type in SDLC
3 classifier rules for verification gate
verify-feature.md skeleton command
Build chaining (verify_after flag)
Full integration test passing
v0.12.0 released

Summary

Week	Theme	Key Output
1	Foundation	Playwright pod + capture script + domain types + executor
2	API Layer	Handlers + service + auth scopes + manual E2E
3	AI + Cookbook	Evaluation path + visual-verify-test.sh + common.sh utils
4	SDLC + Polish	Classifier rules + skeleton command + build chaining + release

Risks and Mitigations

Risk	Impact	Mitigation
Playwright pod OOM	Capture fails	Start with conservative limits (4Gi), tune based on usage
AI evaluation unreliable	Poor pass/fail decisions	Start with high threshold (70), tune; partial success mode
Screenshot storage fills up	Pod crashes	EmptyDir for now, add cleanup job or PVC later
SDLC rules conflict	Features stuck	Test extensively, make verification optional via config
Claude Code can't read screenshots	Evaluation broken	Test multimodal support; fallback to manual verification

Files Created/Modified

New Files (13):

internal/domain/verify.go
internal/domain/verify_test.go
internal/service/verify_service.go
internal/service/verify_service_test.go
internal/handlers/verify.go
internal/handlers/verify_test.go
internal/worker/verify_executor.go
internal/worker/verify_executor_test.go
deployments/k8s/base/playwright-pod.yaml
deployments/k8s/base/playwright-configmap.yaml
deployments/k8s/base/playwright-scripts/capture.js
cookbooks/scripts/visual-verify-test.sh
templates/skeleton/.claude/commands/verify-feature.md

Modified Files (8):

internal/domain/work.go - Add WorkTaskTypeVerify
internal/auth/scopes.go - Add verify scopes
internal/worker/work_executor.go - Add dispatch case
internal/sdlc/types.go - Add artifact/action types
internal/sdlc/rules.go - Register verification rules
internal/sdlc/rules_execution.go - Add verification rules
cookbooks/scripts/common.sh - Add wait_for_verify()
cmd/rdev-api/main.go - Wire DI

17 KiB Raw Blame History

Visual Verification Implementation Breakdown

Week 1: Foundation — Domain + Capture Infrastructure

Day 1-2: Playwright Pod Infrastructure

Day 3: Domain Models

Day 4-5: Verify Executor (Capture Only)

Week 2: API Layer + Manual E2E

Day 1: Auth and Service Layer

Day 2-3: Handler Layer

Day 4: SSE Events

Day 5: Manual E2E + Deploy

Week 3: AI Evaluation + Cookbook Test

Day 1-2: AI Evaluation Path

Day 3-4: Cookbook Test Script

Day 5: Integration + CI

Week 4: SDLC Integration + Polish

Day 1-2: SDLC Types and Rules

Day 3: Skeleton Command

Day 4: Build Chaining (Optional)

Day 5: Documentation + Release

Summary

Risks and Mitigations

Files Created/Modified

17 KiB

Raw Blame History