rdev/docs/plans/visual-verification-breakdown.md
jordan 9a1309a0c5 feat: fix composable monorepo CI builds + health endpoint improvements
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)

Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health

Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:46:51 -07:00

480 lines
17 KiB
Markdown

# Visual Verification Implementation Breakdown
**Goal:** Add Playwright-based visual verification to rdev, enabling automated screenshot/video capture of deployed sites and AI-driven feature completeness evaluation. Integrate with SDLC as an optional QA gate and add a cookbook E2E test.
**Estimated Duration:** 4 weeks (assumes ~25 hours/week of focused work)
---
## Week 1: Foundation — Domain + Capture Infrastructure
**Goals:**
- Playwright pod deployed and reachable via kubectl exec
- Capture script working end-to-end
- Domain models and work task type in place
- Manual verification via kubectl exec confirms capture works
**Tasks:**
### Day 1-2: Playwright Pod Infrastructure
1. **Create Playwright pod manifest** (`deployments/k8s/base/playwright-pod.yaml`)
- StatefulSet with `mcr.microsoft.com/playwright:v1.50.0-noble` image
- `sleep infinity` command (stays alive for kubectl exec)
- Labels: `app: playwright`, `rdev.orchard9.ai/role: playwright`
- Volumes: `/captures` (emptyDir), `/scripts` (ConfigMap)
- Resources: 500m CPU / 1Gi request, 2 CPU / 4Gi limit
2. **Create capture script** (`deployments/k8s/base/playwright-scripts/capture.js`)
- ~60 lines Node.js using Playwright
- CLI: `--url`, `--viewports` (comma-sep), `--output`, `--wait-for`, `--full-page`, `--video`, `--timeout`
- Output: JSON manifest to stdout with screenshot paths
- Error handling: catch navigation failures, timeout gracefully
3. **Create ConfigMap for script** (`deployments/k8s/base/playwright-configmap.yaml`)
- Mount `capture.js` at `/scripts/capture.js`
4. **Deploy to cluster and test manually**
```bash
kubectl apply -f deployments/k8s/base/playwright-configmap.yaml
kubectl apply -f deployments/k8s/base/playwright-pod.yaml
kubectl exec playwright-0 -- node /scripts/capture.js \
--url=https://example.com --viewports=1920x1080 --output=/captures/test/
kubectl exec playwright-0 -- cat /captures/test/manifest.json
```
### Day 3: Domain Models
5. **Create domain types** (`internal/domain/verify.go`)
- `VerifySpec` struct with fields: URL, Viewports, WaitFor, WaitTimeout, FullPage, Video, Evaluate, Prompt, SpecPath, CallbackURL
- `Validate()` method: URL required, callback URL validation (reuse `ValidateCallbackURL`)
- `VerifyResult` struct: Success, Screenshots, Video, Evaluation, Score, Passed, DurationMs, Error
- `ToWorkResult()` method (promote screenshots to artifacts map)
6. **Add work task type** (`internal/domain/work.go`)
- Add `WorkTaskTypeVerify WorkTaskType = "verify"` to constants
- Update `IsValid()` to include verify
7. **Unit tests** (`internal/domain/verify_test.go`)
- Test Validate() with valid/invalid specs
- Test ToWorkResult() conversion
### Day 4-5: Verify Executor (Capture Only)
8. **Create verify executor** (`internal/worker/verify_executor.go`)
- Follow `BuildExecutor` pattern exactly
- `Execute(ctx, task)` method:
- Parse VerifySpec from task.Spec map
- Build kubectl exec command: `kubectl exec playwright-0 -- node /scripts/capture.js --url=X ...`
- Execute via existing `CommandExecutor` port
- Parse JSON manifest from stdout
- Return `BuildResult` with artifacts map containing screenshot paths
- Config struct: `VerifyExecutorConfig` with playwright pod name, namespace
- Constructor: `NewVerifyExecutor(executor, streams, logger, cfg)`
9. **Wire executor to WorkExecutor** (`internal/worker/work_executor.go`)
- Add `verifyExec *VerifyExecutor` field
- Add case in `executeTask()` switch for `WorkTaskTypeVerify`
- Update `NewWorkExecutor()` to accept VerifyExecutor
10. **Unit tests** (`internal/worker/verify_executor_test.go`)
- Mock CommandExecutor to return capture manifest JSON
- Test successful capture with multiple viewports
- Test failure handling (command fails, invalid JSON)
**Deliverables:**
- [ ] Playwright pod running in cluster
- [ ] Capture script takes screenshots successfully
- [ ] VerifySpec/VerifyResult domain types with tests
- [ ] VerifyExecutor can dispatch capture via kubectl exec
- [ ] Work queue can dispatch verify tasks (manual test via SQL insert)
**Foundation this enables:**
- Week 2 can build API layer knowing capture works
- Executor pattern established for AI evaluation later
---
## Week 2: API Layer + Manual E2E
**Goals:**
- Full API surface: POST /verify, GET /verify/{id}, GET /verifications
- Auth scopes configured
- Manual E2E working: API call → queue → capture → result
- Initial release candidate deployed to staging
**Tasks:**
### Day 1: Auth and Service Layer
1. **Add auth scopes** (`internal/auth/scopes.go`)
- `ScopeVerifyRead Scope = "verify:read"`
- `ScopeVerifyWrite Scope = "verify:write"`
- Add to `AllScopes` if needed
2. **Create verify service** (`internal/service/verify_service.go`)
- Follow `BuildService` pattern
- `StartVerify(ctx, projectID, spec)` → validate, enqueue task, return task ID
- `GetVerifyStatus(ctx, taskID)` → get task from work queue
- `ListVerifications(ctx, projectID, limit)` → list tasks by project
- Dependencies: WorkQueue port (existing)
3. **Unit tests** (`internal/service/verify_service_test.go`)
- Mock work queue
- Test enqueue, status, list
### Day 2-3: Handler Layer
4. **Create verify handler** (`internal/handlers/verify.go`)
- Follow `BuildsHandler` pattern exactly
- `Mount(r api.Router)` with scopes:
- POST `/projects/{id}/verify` → ScopeVerifyWrite
- GET `/projects/{id}/verifications` → ScopeVerifyRead
- GET `/verify/{taskId}` → ScopeVerifyRead
- Use `api.DecodeJSON()`, `validate.New()`, response helpers
- Request struct: `VerifyRequest` matching VerifySpec
- Response structs: match existing patterns
5. **Wire DI** (`cmd/rdev-api/main.go`)
- Create VerifyExecutor in worker setup
- Create VerifyService
- Create VerifyHandler
- Mount routes
6. **Handler tests** (`internal/handlers/verify_test.go`)
- Test POST with valid/invalid specs
- Test auth scope enforcement
- Test GET status/list
### Day 4: SSE Events
7. **Add verify events** (`internal/worker/verify_executor.go`)
- Publish events via StreamPublisher:
- `verify.started` - task claimed
- `verify.capturing` - starting capture
- `verify.captured` - capture complete with manifest
- `verify.completed` / `verify.failed` - final status
- Event constants in verify_executor.go (follow BuildExecutor pattern)
### Day 5: Manual E2E + Deploy
8. **Manual E2E test sequence**
```bash
# 1. Start verification
curl -X POST $RDEV_API_URL/projects/myproject/verify \
-H "X-API-Key: $RDEV_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://myproject.threesix.ai", "viewports": ["1920x1080"]}'
# Response: {"task_id": "xxx"}
# 2. Poll for completion
curl $RDEV_API_URL/verify/xxx -H "X-API-Key: $RDEV_API_KEY"
# Response: screenshots in artifacts
```
9. **Build and deploy**
```bash
./scripts/release.sh v0.11.0 "feat: add visual verification (capture-only MVP)" --deploy
```
**Deliverables:**
- [ ] Auth scopes for verify:read/write
- [ ] VerifyService with enqueue/status/list
- [ ] VerifyHandler with 3 endpoints
- [ ] SSE events for verification progress
- [ ] Deployed to staging, manual E2E passing
**Foundation this enables:**
- Week 3 can add AI evaluation knowing API works
- Cookbook script can use standard api_call() pattern
---
## Week 3: AI Evaluation + Cookbook Test
**Goals:**
- AI evaluation path working (Claude reads screenshots, returns verdict)
- Cookbook E2E test script: `visual-verify-test.sh`
- Add to common.sh utilities
- Full E2E passing in CI
**Tasks:**
### Day 1-2: AI Evaluation Path
1. **Add evaluation to VerifyExecutor** (`internal/worker/verify_executor.go`)
- After successful capture, if `spec.Evaluate`:
- Build evaluation prompt: "Compare these screenshots against the specification..."
- Include spec.Prompt or read spec.SpecPath content
- Call Claude Code via CodeAgentRegistry
- Pass screenshots as attachments (file paths in pod)
- Parse evaluation output for score (look for "Score: XX/100" pattern)
- Set result.Evaluation, result.Score, result.Passed
2. **Evaluation prompt template** (hardcoded in executor for now)
```
Evaluate these screenshots against the following specification:
{spec.Prompt or contents of spec.SpecPath}
For each screenshot, assess:
1. Does the UI match the specification?
2. Are all required elements present?
3. Is the layout correct at this viewport?
End with: "Score: XX/100" and "PASSED" or "FAILED"
```
3. **Handle partial failures** (`internal/worker/verify_executor.go`)
- If capture succeeds but evaluation fails:
- Set success=true (screenshots are still useful)
- Leave evaluation=""
- Log warning
4. **Unit tests for evaluation path**
- Mock CodeAgentRegistry
- Test evaluation output parsing
- Test partial failure handling
### Day 3-4: Cookbook Test Script
5. **Add utility to common.sh** (`cookbooks/scripts/common.sh`)
```bash
# Wait for verification to complete
# Arguments: task_id [max_attempts] [poll_interval]
wait_for_verify() {
local task_id="$1"
local max_attempts="${2:-30}"
local poll_interval="${3:-5}"
# Poll GET /verify/{task_id} until completed/failed
}
```
6. **Create visual-verify-test.sh** (`cookbooks/scripts/visual-verify-test.sh`)
- Follow cookbook script SKILL.md patterns exactly
- Commands: run, status, diagnose, teardown
- Flow:
1. Create composable project with app-astro component
2. Wait for initial deploy (site is live)
3. Start build: "Create a hero section with a call-to-action button"
4. Wait for build to complete
5. Wait for CI pipeline
6. Wait for site to respond
7. Start verification: `POST /projects/{id}/verify {url, evaluate: true, prompt: ...}`
8. Wait for verify to complete
9. Assert: result.passed == true OR result.score >= 70
10. Teardown
7. **Add auto-teardown support**
- Parse `--auto-teardown` flag
- Register cleanup trap
- Set CLEANUP_PROJECT
### Day 5: Integration + CI
8. **Test locally**
```bash
./cookbooks/scripts/visual-verify-test.sh run vv-test --auto-teardown
```
9. **Add to CI** (if CI runs cookbook tests)
- Add visual-verify-test to test matrix
- Ensure playwright-0 pod is available in test environment
10. **Document in cookbook skill** (`.claude/skills/cookbook-scripts/SKILL.md`)
- Add `wait_for_verify()` to utilities list
- Add visual-verify-test.sh to examples
**Deliverables:**
- [ ] AI evaluation working with score extraction
- [ ] Partial failure handling (capture ok, eval fail)
- [ ] wait_for_verify() in common.sh
- [ ] visual-verify-test.sh passing end-to-end
- [ ] Documentation updated
**Foundation this enables:**
- Week 4 can add SDLC integration knowing full flow works
- Cookbook pattern established for future tests
---
## Week 4: SDLC Integration + Polish
**Goals:**
- Visual verification as optional SDLC gate between QA and merge
- Skeleton command: `/verify-feature`
- Build chaining: auto-verify after deploy
- Release v0.12.0 with full feature
**Tasks:**
### Day 1-2: SDLC Types and Rules
1. **Add artifact type** (`internal/sdlc/types.go`)
- `ArtifactVerification ArtifactType = "verification"`
- Add to `ValidArtifactTypes` slice
- Add case in `ArtifactFilename()` → returns `"verification.md"`
2. **Add action types** (`internal/sdlc/types.go`)
- `ActionVerifyFeature ActionType = "VERIFY_FEATURE"`
- `ActionFixVerificationIssues ActionType = "FIX_VERIFICATION_ISSUES"`
3. **Add classifier rules** (`internal/sdlc/rules_execution.go`)
- `needsVerificationRule()`:
- Condition: Phase=QA, qa_results=passed, verification=nil or pending
- Action: ActionVerifyFeature
- NextCommand: "/verify-feature {slug}"
- `verificationFailedRule()`:
- Condition: Phase=QA, verification=failed
- Action: ActionFixVerificationIssues
- NextCommand: "/fix-verification-issues {slug}"
- `verificationPassedRule()`:
- Condition: Phase=QA, qa_results=passed, verification=passed
- Action: ActionTransition to PhaseMerge
4. **Update rule ordering** (`internal/sdlc/rules.go`)
- Insert verification rules after qaPassedRule
- Update qaPassedRule: only transition if verification also passed OR feature doesn't require verification (config flag)
5. **Unit tests** (`internal/sdlc/rules_execution_test.go`)
- Test all three verification rules
- Test interaction with existing QA rules
### Day 3: Skeleton Command
6. **Create verify-feature command** (embedded template: `templates/skeleton/.claude/commands/verify-feature.md`)
```markdown
---
description: Visually verify a deployed feature
argument-hint: <feature-slug>
allowed-tools: Bash, Read, Write, Edit, Glob, Grep
---
Visually verify feature: $ARGUMENTS
## Instructions
1. Load feature spec from `.sdlc/features/$ARGUMENTS/spec.md`
2. Get project domain from CLAUDE.md or config
3. Determine the deployed URL
4. Execute verification via rdev API (if available) or Playwright directly
5. Write results to `.sdlc/features/$ARGUMENTS/verification.md`
6. Register artifact: `sdlc artifact create $ARGUMENTS verification`
## Output Format
Write `.sdlc/features/$ARGUMENTS/verification.md`:
```markdown
# Visual Verification: [Feature Title]
## Screenshots
| Viewport | Status | Notes |
|----------|--------|-------|
| Desktop (1920x1080) | PASS | All elements visible |
| Mobile (375x667) | PASS | Responsive layout correct |
## Evaluation
[AI or manual evaluation notes]
## Result
**Status:** PASSED
**Score:** 95/100
```
```
7. **Update skeleton template** to include the command
- Ensure new projects get verify-feature.md
### Day 4: Build Chaining (Optional)
8. **Add verify_after to BuildSpec** (`internal/domain/build.go`)
- `VerifyAfter bool` - auto-verify after successful deploy
- `VerifyURL string` - URL to verify (if different from project domain)
9. **Chain verification in BuildExecutor** (`internal/worker/build_executor.go`)
- After successful build + push (line ~270):
```go
if spec.VerifyAfter && spec.VerifyURL != "" {
// Enqueue verify task
}
```
- Or: callback webhook triggers external verification
10. **Update build handler** to accept verify_after/verify_url
### Day 5: Documentation + Release
11. **Update documentation**
- CLAUDE.md: Update platform status to "Done"
- visual-verification.md: Add SDLC integration examples
- sdlc.md: Document verification rules
12. **Integration test**
- Test full SDLC flow with verification gate
- Test classifier transitions correctly
13. **Final release**
```bash
./scripts/release.sh v0.12.0 "feat: visual verification with SDLC integration" --deploy
```
**Deliverables:**
- [ ] ArtifactVerification type in SDLC
- [ ] 3 classifier rules for verification gate
- [ ] verify-feature.md skeleton command
- [ ] Build chaining (verify_after flag)
- [ ] Full integration test passing
- [ ] v0.12.0 released
---
## Summary
| Week | Theme | Key Output |
|------|-------|------------|
| 1 | Foundation | Playwright pod + capture script + domain types + executor |
| 2 | API Layer | Handlers + service + auth scopes + manual E2E |
| 3 | AI + Cookbook | Evaluation path + visual-verify-test.sh + common.sh utils |
| 4 | SDLC + Polish | Classifier rules + skeleton command + build chaining + release |
## Risks and Mitigations
| Risk | Impact | Mitigation |
|------|--------|------------|
| Playwright pod OOM | Capture fails | Start with conservative limits (4Gi), tune based on usage |
| AI evaluation unreliable | Poor pass/fail decisions | Start with high threshold (70), tune; partial success mode |
| Screenshot storage fills up | Pod crashes | EmptyDir for now, add cleanup job or PVC later |
| SDLC rules conflict | Features stuck | Test extensively, make verification optional via config |
| Claude Code can't read screenshots | Evaluation broken | Test multimodal support; fallback to manual verification |
## Files Created/Modified
**New Files (13):**
- `internal/domain/verify.go`
- `internal/domain/verify_test.go`
- `internal/service/verify_service.go`
- `internal/service/verify_service_test.go`
- `internal/handlers/verify.go`
- `internal/handlers/verify_test.go`
- `internal/worker/verify_executor.go`
- `internal/worker/verify_executor_test.go`
- `deployments/k8s/base/playwright-pod.yaml`
- `deployments/k8s/base/playwright-configmap.yaml`
- `deployments/k8s/base/playwright-scripts/capture.js`
- `cookbooks/scripts/visual-verify-test.sh`
- `templates/skeleton/.claude/commands/verify-feature.md`
**Modified Files (8):**
- `internal/domain/work.go` - Add WorkTaskTypeVerify
- `internal/auth/scopes.go` - Add verify scopes
- `internal/worker/work_executor.go` - Add dispatch case
- `internal/sdlc/types.go` - Add artifact/action types
- `internal/sdlc/rules.go` - Register verification rules
- `internal/sdlc/rules_execution.go` - Add verification rules
- `cookbooks/scripts/common.sh` - Add wait_for_verify()
- `cmd/rdev-api/main.go` - Wire DI