# Visual Verification Implementation Breakdown **Goal:** Add Playwright-based visual verification to rdev, enabling automated screenshot/video capture of deployed sites and AI-driven feature completeness evaluation. Integrate with SDLC as an optional QA gate and add a cookbook E2E test. **Estimated Duration:** 4 weeks (assumes ~25 hours/week of focused work) --- ## Week 1: Foundation — Domain + Capture Infrastructure **Goals:** - Playwright pod deployed and reachable via kubectl exec - Capture script working end-to-end - Domain models and work task type in place - Manual verification via kubectl exec confirms capture works **Tasks:** ### Day 1-2: Playwright Pod Infrastructure 1. **Create Playwright pod manifest** (`deployments/k8s/base/playwright-pod.yaml`) - StatefulSet with `mcr.microsoft.com/playwright:v1.50.0-noble` image - `sleep infinity` command (stays alive for kubectl exec) - Labels: `app: playwright`, `rdev.orchard9.ai/role: playwright` - Volumes: `/captures` (emptyDir), `/scripts` (ConfigMap) - Resources: 500m CPU / 1Gi request, 2 CPU / 4Gi limit 2. **Create capture script** (`deployments/k8s/base/playwright-scripts/capture.js`) - ~60 lines Node.js using Playwright - CLI: `--url`, `--viewports` (comma-sep), `--output`, `--wait-for`, `--full-page`, `--video`, `--timeout` - Output: JSON manifest to stdout with screenshot paths - Error handling: catch navigation failures, timeout gracefully 3. **Create ConfigMap for script** (`deployments/k8s/base/playwright-configmap.yaml`) - Mount `capture.js` at `/scripts/capture.js` 4. **Deploy to cluster and test manually** ```bash kubectl apply -f deployments/k8s/base/playwright-configmap.yaml kubectl apply -f deployments/k8s/base/playwright-pod.yaml kubectl exec playwright-0 -- node /scripts/capture.js \ --url=https://example.com --viewports=1920x1080 --output=/captures/test/ kubectl exec playwright-0 -- cat /captures/test/manifest.json ``` ### Day 3: Domain Models 5. **Create domain types** (`internal/domain/verify.go`) - `VerifySpec` struct with fields: URL, Viewports, WaitFor, WaitTimeout, FullPage, Video, Evaluate, Prompt, SpecPath, CallbackURL - `Validate()` method: URL required, callback URL validation (reuse `ValidateCallbackURL`) - `VerifyResult` struct: Success, Screenshots, Video, Evaluation, Score, Passed, DurationMs, Error - `ToWorkResult()` method (promote screenshots to artifacts map) 6. **Add work task type** (`internal/domain/work.go`) - Add `WorkTaskTypeVerify WorkTaskType = "verify"` to constants - Update `IsValid()` to include verify 7. **Unit tests** (`internal/domain/verify_test.go`) - Test Validate() with valid/invalid specs - Test ToWorkResult() conversion ### Day 4-5: Verify Executor (Capture Only) 8. **Create verify executor** (`internal/worker/verify_executor.go`) - Follow `BuildExecutor` pattern exactly - `Execute(ctx, task)` method: - Parse VerifySpec from task.Spec map - Build kubectl exec command: `kubectl exec playwright-0 -- node /scripts/capture.js --url=X ...` - Execute via existing `CommandExecutor` port - Parse JSON manifest from stdout - Return `BuildResult` with artifacts map containing screenshot paths - Config struct: `VerifyExecutorConfig` with playwright pod name, namespace - Constructor: `NewVerifyExecutor(executor, streams, logger, cfg)` 9. **Wire executor to WorkExecutor** (`internal/worker/work_executor.go`) - Add `verifyExec *VerifyExecutor` field - Add case in `executeTask()` switch for `WorkTaskTypeVerify` - Update `NewWorkExecutor()` to accept VerifyExecutor 10. **Unit tests** (`internal/worker/verify_executor_test.go`) - Mock CommandExecutor to return capture manifest JSON - Test successful capture with multiple viewports - Test failure handling (command fails, invalid JSON) **Deliverables:** - [ ] Playwright pod running in cluster - [ ] Capture script takes screenshots successfully - [ ] VerifySpec/VerifyResult domain types with tests - [ ] VerifyExecutor can dispatch capture via kubectl exec - [ ] Work queue can dispatch verify tasks (manual test via SQL insert) **Foundation this enables:** - Week 2 can build API layer knowing capture works - Executor pattern established for AI evaluation later --- ## Week 2: API Layer + Manual E2E **Goals:** - Full API surface: POST /verify, GET /verify/{id}, GET /verifications - Auth scopes configured - Manual E2E working: API call → queue → capture → result - Initial release candidate deployed to staging **Tasks:** ### Day 1: Auth and Service Layer 1. **Add auth scopes** (`internal/auth/scopes.go`) - `ScopeVerifyRead Scope = "verify:read"` - `ScopeVerifyWrite Scope = "verify:write"` - Add to `AllScopes` if needed 2. **Create verify service** (`internal/service/verify_service.go`) - Follow `BuildService` pattern - `StartVerify(ctx, projectID, spec)` → validate, enqueue task, return task ID - `GetVerifyStatus(ctx, taskID)` → get task from work queue - `ListVerifications(ctx, projectID, limit)` → list tasks by project - Dependencies: WorkQueue port (existing) 3. **Unit tests** (`internal/service/verify_service_test.go`) - Mock work queue - Test enqueue, status, list ### Day 2-3: Handler Layer 4. **Create verify handler** (`internal/handlers/verify.go`) - Follow `BuildsHandler` pattern exactly - `Mount(r api.Router)` with scopes: - POST `/projects/{id}/verify` → ScopeVerifyWrite - GET `/projects/{id}/verifications` → ScopeVerifyRead - GET `/verify/{taskId}` → ScopeVerifyRead - Use `api.DecodeJSON()`, `validate.New()`, response helpers - Request struct: `VerifyRequest` matching VerifySpec - Response structs: match existing patterns 5. **Wire DI** (`cmd/rdev-api/main.go`) - Create VerifyExecutor in worker setup - Create VerifyService - Create VerifyHandler - Mount routes 6. **Handler tests** (`internal/handlers/verify_test.go`) - Test POST with valid/invalid specs - Test auth scope enforcement - Test GET status/list ### Day 4: SSE Events 7. **Add verify events** (`internal/worker/verify_executor.go`) - Publish events via StreamPublisher: - `verify.started` - task claimed - `verify.capturing` - starting capture - `verify.captured` - capture complete with manifest - `verify.completed` / `verify.failed` - final status - Event constants in verify_executor.go (follow BuildExecutor pattern) ### Day 5: Manual E2E + Deploy 8. **Manual E2E test sequence** ```bash # 1. Start verification curl -X POST $RDEV_API_URL/projects/myproject/verify \ -H "X-API-Key: $RDEV_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://myproject.threesix.ai", "viewports": ["1920x1080"]}' # Response: {"task_id": "xxx"} # 2. Poll for completion curl $RDEV_API_URL/verify/xxx -H "X-API-Key: $RDEV_API_KEY" # Response: screenshots in artifacts ``` 9. **Build and deploy** ```bash ./scripts/release.sh v0.11.0 "feat: add visual verification (capture-only MVP)" --deploy ``` **Deliverables:** - [ ] Auth scopes for verify:read/write - [ ] VerifyService with enqueue/status/list - [ ] VerifyHandler with 3 endpoints - [ ] SSE events for verification progress - [ ] Deployed to staging, manual E2E passing **Foundation this enables:** - Week 3 can add AI evaluation knowing API works - Cookbook script can use standard api_call() pattern --- ## Week 3: AI Evaluation + Cookbook Test **Goals:** - AI evaluation path working (Claude reads screenshots, returns verdict) - Cookbook E2E test script: `visual-verify-test.sh` - Add to common.sh utilities - Full E2E passing in CI **Tasks:** ### Day 1-2: AI Evaluation Path 1. **Add evaluation to VerifyExecutor** (`internal/worker/verify_executor.go`) - After successful capture, if `spec.Evaluate`: - Build evaluation prompt: "Compare these screenshots against the specification..." - Include spec.Prompt or read spec.SpecPath content - Call Claude Code via CodeAgentRegistry - Pass screenshots as attachments (file paths in pod) - Parse evaluation output for score (look for "Score: XX/100" pattern) - Set result.Evaluation, result.Score, result.Passed 2. **Evaluation prompt template** (hardcoded in executor for now) ``` Evaluate these screenshots against the following specification: {spec.Prompt or contents of spec.SpecPath} For each screenshot, assess: 1. Does the UI match the specification? 2. Are all required elements present? 3. Is the layout correct at this viewport? End with: "Score: XX/100" and "PASSED" or "FAILED" ``` 3. **Handle partial failures** (`internal/worker/verify_executor.go`) - If capture succeeds but evaluation fails: - Set success=true (screenshots are still useful) - Leave evaluation="" - Log warning 4. **Unit tests for evaluation path** - Mock CodeAgentRegistry - Test evaluation output parsing - Test partial failure handling ### Day 3-4: Cookbook Test Script 5. **Add utility to common.sh** (`cookbooks/scripts/common.sh`) ```bash # Wait for verification to complete # Arguments: task_id [max_attempts] [poll_interval] wait_for_verify() { local task_id="$1" local max_attempts="${2:-30}" local poll_interval="${3:-5}" # Poll GET /verify/{task_id} until completed/failed } ``` 6. **Create visual-verify-test.sh** (`cookbooks/scripts/visual-verify-test.sh`) - Follow cookbook script SKILL.md patterns exactly - Commands: run, status, diagnose, teardown - Flow: 1. Create composable project with app-astro component 2. Wait for initial deploy (site is live) 3. Start build: "Create a hero section with a call-to-action button" 4. Wait for build to complete 5. Wait for CI pipeline 6. Wait for site to respond 7. Start verification: `POST /projects/{id}/verify {url, evaluate: true, prompt: ...}` 8. Wait for verify to complete 9. Assert: result.passed == true OR result.score >= 70 10. Teardown 7. **Add auto-teardown support** - Parse `--auto-teardown` flag - Register cleanup trap - Set CLEANUP_PROJECT ### Day 5: Integration + CI 8. **Test locally** ```bash ./cookbooks/scripts/visual-verify-test.sh run vv-test --auto-teardown ``` 9. **Add to CI** (if CI runs cookbook tests) - Add visual-verify-test to test matrix - Ensure playwright-0 pod is available in test environment 10. **Document in cookbook skill** (`.claude/skills/cookbook-scripts/SKILL.md`) - Add `wait_for_verify()` to utilities list - Add visual-verify-test.sh to examples **Deliverables:** - [ ] AI evaluation working with score extraction - [ ] Partial failure handling (capture ok, eval fail) - [ ] wait_for_verify() in common.sh - [ ] visual-verify-test.sh passing end-to-end - [ ] Documentation updated **Foundation this enables:** - Week 4 can add SDLC integration knowing full flow works - Cookbook pattern established for future tests --- ## Week 4: SDLC Integration + Polish **Goals:** - Visual verification as optional SDLC gate between QA and merge - Skeleton command: `/verify-feature` - Build chaining: auto-verify after deploy - Release v0.12.0 with full feature **Tasks:** ### Day 1-2: SDLC Types and Rules 1. **Add artifact type** (`internal/sdlc/types.go`) - `ArtifactVerification ArtifactType = "verification"` - Add to `ValidArtifactTypes` slice - Add case in `ArtifactFilename()` → returns `"verification.md"` 2. **Add action types** (`internal/sdlc/types.go`) - `ActionVerifyFeature ActionType = "VERIFY_FEATURE"` - `ActionFixVerificationIssues ActionType = "FIX_VERIFICATION_ISSUES"` 3. **Add classifier rules** (`internal/sdlc/rules_execution.go`) - `needsVerificationRule()`: - Condition: Phase=QA, qa_results=passed, verification=nil or pending - Action: ActionVerifyFeature - NextCommand: "/verify-feature {slug}" - `verificationFailedRule()`: - Condition: Phase=QA, verification=failed - Action: ActionFixVerificationIssues - NextCommand: "/fix-verification-issues {slug}" - `verificationPassedRule()`: - Condition: Phase=QA, qa_results=passed, verification=passed - Action: ActionTransition to PhaseMerge 4. **Update rule ordering** (`internal/sdlc/rules.go`) - Insert verification rules after qaPassedRule - Update qaPassedRule: only transition if verification also passed OR feature doesn't require verification (config flag) 5. **Unit tests** (`internal/sdlc/rules_execution_test.go`) - Test all three verification rules - Test interaction with existing QA rules ### Day 3: Skeleton Command 6. **Create verify-feature command** (embedded template: `templates/skeleton/.claude/commands/verify-feature.md`) ```markdown --- description: Visually verify a deployed feature argument-hint: allowed-tools: Bash, Read, Write, Edit, Glob, Grep --- Visually verify feature: $ARGUMENTS ## Instructions 1. Load feature spec from `.sdlc/features/$ARGUMENTS/spec.md` 2. Get project domain from CLAUDE.md or config 3. Determine the deployed URL 4. Execute verification via rdev API (if available) or Playwright directly 5. Write results to `.sdlc/features/$ARGUMENTS/verification.md` 6. Register artifact: `sdlc artifact create $ARGUMENTS verification` ## Output Format Write `.sdlc/features/$ARGUMENTS/verification.md`: ```markdown # Visual Verification: [Feature Title] ## Screenshots | Viewport | Status | Notes | |----------|--------|-------| | Desktop (1920x1080) | PASS | All elements visible | | Mobile (375x667) | PASS | Responsive layout correct | ## Evaluation [AI or manual evaluation notes] ## Result **Status:** PASSED **Score:** 95/100 ``` ``` 7. **Update skeleton template** to include the command - Ensure new projects get verify-feature.md ### Day 4: Build Chaining (Optional) 8. **Add verify_after to BuildSpec** (`internal/domain/build.go`) - `VerifyAfter bool` - auto-verify after successful deploy - `VerifyURL string` - URL to verify (if different from project domain) 9. **Chain verification in BuildExecutor** (`internal/worker/build_executor.go`) - After successful build + push (line ~270): ```go if spec.VerifyAfter && spec.VerifyURL != "" { // Enqueue verify task } ``` - Or: callback webhook triggers external verification 10. **Update build handler** to accept verify_after/verify_url ### Day 5: Documentation + Release 11. **Update documentation** - CLAUDE.md: Update platform status to "Done" - visual-verification.md: Add SDLC integration examples - sdlc.md: Document verification rules 12. **Integration test** - Test full SDLC flow with verification gate - Test classifier transitions correctly 13. **Final release** ```bash ./scripts/release.sh v0.12.0 "feat: visual verification with SDLC integration" --deploy ``` **Deliverables:** - [ ] ArtifactVerification type in SDLC - [ ] 3 classifier rules for verification gate - [ ] verify-feature.md skeleton command - [ ] Build chaining (verify_after flag) - [ ] Full integration test passing - [ ] v0.12.0 released --- ## Summary | Week | Theme | Key Output | |------|-------|------------| | 1 | Foundation | Playwright pod + capture script + domain types + executor | | 2 | API Layer | Handlers + service + auth scopes + manual E2E | | 3 | AI + Cookbook | Evaluation path + visual-verify-test.sh + common.sh utils | | 4 | SDLC + Polish | Classifier rules + skeleton command + build chaining + release | ## Risks and Mitigations | Risk | Impact | Mitigation | |------|--------|------------| | Playwright pod OOM | Capture fails | Start with conservative limits (4Gi), tune based on usage | | AI evaluation unreliable | Poor pass/fail decisions | Start with high threshold (70), tune; partial success mode | | Screenshot storage fills up | Pod crashes | EmptyDir for now, add cleanup job or PVC later | | SDLC rules conflict | Features stuck | Test extensively, make verification optional via config | | Claude Code can't read screenshots | Evaluation broken | Test multimodal support; fallback to manual verification | ## Files Created/Modified **New Files (13):** - `internal/domain/verify.go` - `internal/domain/verify_test.go` - `internal/service/verify_service.go` - `internal/service/verify_service_test.go` - `internal/handlers/verify.go` - `internal/handlers/verify_test.go` - `internal/worker/verify_executor.go` - `internal/worker/verify_executor_test.go` - `deployments/k8s/base/playwright-pod.yaml` - `deployments/k8s/base/playwright-configmap.yaml` - `deployments/k8s/base/playwright-scripts/capture.js` - `cookbooks/scripts/visual-verify-test.sh` - `templates/skeleton/.claude/commands/verify-feature.md` **Modified Files (8):** - `internal/domain/work.go` - Add WorkTaskTypeVerify - `internal/auth/scopes.go` - Add verify scopes - `internal/worker/work_executor.go` - Add dispatch case - `internal/sdlc/types.go` - Add artifact/action types - `internal/sdlc/rules.go` - Register verification rules - `internal/sdlc/rules_execution.go` - Add verification rules - `cookbooks/scripts/common.sh` - Add wait_for_verify() - `cmd/rdev-api/main.go` - Wire DI