rdev/docs/plans/visual-verification-breakdown.md

# Visual Verification Implementation Breakdown

**Goal:** Add Playwright-based visual verification to rdev, enabling automated screenshot/video capture of deployed sites and AI-driven feature completeness evaluation. Integrate with SDLC as an optional QA gate and add a cookbook E2E test.

**Estimated Duration:** 4 weeks (assumes ~25 hours/week of focused work)

---

## Week 1: Foundation — Domain + Capture Infrastructure

**Goals:**
- Playwright pod deployed and reachable via kubectl exec
- Capture script working end-to-end
- Domain models and work task type in place
- Manual verification via kubectl exec confirms capture works

**Tasks:**

### Day 1-2: Playwright Pod Infrastructure

1. **Create Playwright pod manifest** (`deployments/k8s/base/playwright-pod.yaml`)
   - StatefulSet with `mcr.microsoft.com/playwright:v1.50.0-noble` image
   - `sleep infinity` command (stays alive for kubectl exec)
   - Labels: `app: playwright`, `rdev.orchard9.ai/role: playwright`
   - Volumes: `/captures` (emptyDir), `/scripts` (ConfigMap)
   - Resources: 500m CPU / 1Gi request, 2 CPU / 4Gi limit

2. **Create capture script** (`deployments/k8s/base/playwright-scripts/capture.js`)
   - ~60 lines Node.js using Playwright
   - CLI: `--url`, `--viewports` (comma-sep), `--output`, `--wait-for`, `--full-page`, `--video`, `--timeout`
   - Output: JSON manifest to stdout with screenshot paths
   - Error handling: catch navigation failures, timeout gracefully

3. **Create ConfigMap for script** (`deployments/k8s/base/playwright-configmap.yaml`)
   - Mount `capture.js` at `/scripts/capture.js`

4. **Deploy to cluster and test manually**
   ```bash
   kubectl apply -f deployments/k8s/base/playwright-configmap.yaml
   kubectl apply -f deployments/k8s/base/playwright-pod.yaml
   kubectl exec playwright-0 -- node /scripts/capture.js \
     --url=https://example.com --viewports=1920x1080 --output=/captures/test/
   kubectl exec playwright-0 -- cat /captures/test/manifest.json
   ```

### Day 3: Domain Models

5. **Create domain types** (`internal/domain/verify.go`)
   - `VerifySpec` struct with fields: URL, Viewports, WaitFor, WaitTimeout, FullPage, Video, Evaluate, Prompt, SpecPath, CallbackURL
   - `Validate()` method: URL required, callback URL validation (reuse `ValidateCallbackURL`)
   - `VerifyResult` struct: Success, Screenshots, Video, Evaluation, Score, Passed, DurationMs, Error
   - `ToWorkResult()` method (promote screenshots to artifacts map)

6. **Add work task type** (`internal/domain/work.go`)
   - Add `WorkTaskTypeVerify WorkTaskType = "verify"` to constants
   - Update `IsValid()` to include verify

7. **Unit tests** (`internal/domain/verify_test.go`)
   - Test Validate() with valid/invalid specs
   - Test ToWorkResult() conversion

### Day 4-5: Verify Executor (Capture Only)

8. **Create verify executor** (`internal/worker/verify_executor.go`)
   - Follow `BuildExecutor` pattern exactly
   - `Execute(ctx, task)` method:
     - Parse VerifySpec from task.Spec map
     - Build kubectl exec command: `kubectl exec playwright-0 -- node /scripts/capture.js --url=X ...`
     - Execute via existing `CommandExecutor` port
     - Parse JSON manifest from stdout
     - Return `BuildResult` with artifacts map containing screenshot paths
   - Config struct: `VerifyExecutorConfig` with playwright pod name, namespace
   - Constructor: `NewVerifyExecutor(executor, streams, logger, cfg)`

9. **Wire executor to WorkExecutor** (`internal/worker/work_executor.go`)
   - Add `verifyExec *VerifyExecutor` field
   - Add case in `executeTask()` switch for `WorkTaskTypeVerify`
   - Update `NewWorkExecutor()` to accept VerifyExecutor

10. **Unit tests** (`internal/worker/verify_executor_test.go`)
    - Mock CommandExecutor to return capture manifest JSON
    - Test successful capture with multiple viewports
    - Test failure handling (command fails, invalid JSON)

**Deliverables:**
- [ ] Playwright pod running in cluster
- [ ] Capture script takes screenshots successfully
- [ ] VerifySpec/VerifyResult domain types with tests
- [ ] VerifyExecutor can dispatch capture via kubectl exec
- [ ] Work queue can dispatch verify tasks (manual test via SQL insert)

**Foundation this enables:**
- Week 2 can build API layer knowing capture works
- Executor pattern established for AI evaluation later

---

## Week 2: API Layer + Manual E2E

**Goals:**
- Full API surface: POST /verify, GET /verify/{id}, GET /verifications
- Auth scopes configured
- Manual E2E working: API call → queue → capture → result
- Initial release candidate deployed to staging

**Tasks:**

### Day 1: Auth and Service Layer

1. **Add auth scopes** (`internal/auth/scopes.go`)
   - `ScopeVerifyRead Scope = "verify:read"`
   - `ScopeVerifyWrite Scope = "verify:write"`
   - Add to `AllScopes` if needed

2. **Create verify service** (`internal/service/verify_service.go`)
   - Follow `BuildService` pattern
   - `StartVerify(ctx, projectID, spec)` → validate, enqueue task, return task ID
   - `GetVerifyStatus(ctx, taskID)` → get task from work queue
   - `ListVerifications(ctx, projectID, limit)` → list tasks by project
   - Dependencies: WorkQueue port (existing)

3. **Unit tests** (`internal/service/verify_service_test.go`)
   - Mock work queue
   - Test enqueue, status, list

### Day 2-3: Handler Layer

4. **Create verify handler** (`internal/handlers/verify.go`)
   - Follow `BuildsHandler` pattern exactly
   - `Mount(r api.Router)` with scopes:
     - POST `/projects/{id}/verify` → ScopeVerifyWrite
     - GET `/projects/{id}/verifications` → ScopeVerifyRead
     - GET `/verify/{taskId}` → ScopeVerifyRead
   - Use `api.DecodeJSON()`, `validate.New()`, response helpers
   - Request struct: `VerifyRequest` matching VerifySpec
   - Response structs: match existing patterns

5. **Wire DI** (`cmd/rdev-api/main.go`)
   - Create VerifyExecutor in worker setup
   - Create VerifyService
   - Create VerifyHandler
   - Mount routes

6. **Handler tests** (`internal/handlers/verify_test.go`)
   - Test POST with valid/invalid specs
   - Test auth scope enforcement
   - Test GET status/list

### Day 4: SSE Events

7. **Add verify events** (`internal/worker/verify_executor.go`)
   - Publish events via StreamPublisher:
     - `verify.started` - task claimed
     - `verify.capturing` - starting capture
     - `verify.captured` - capture complete with manifest
     - `verify.completed` / `verify.failed` - final status
   - Event constants in verify_executor.go (follow BuildExecutor pattern)

### Day 5: Manual E2E + Deploy

8. **Manual E2E test sequence**
   ```bash
   # 1. Start verification
   curl -X POST $RDEV_API_URL/projects/myproject/verify \
     -H "X-API-Key: $RDEV_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"url": "https://myproject.threesix.ai", "viewports": ["1920x1080"]}'
   # Response: {"task_id": "xxx"}

   # 2. Poll for completion
   curl $RDEV_API_URL/verify/xxx -H "X-API-Key: $RDEV_API_KEY"
   # Response: screenshots in artifacts
   ```

9. **Build and deploy**
   ```bash
   ./scripts/release.sh v0.11.0 "feat: add visual verification (capture-only MVP)" --deploy
   ```

**Deliverables:**
- [ ] Auth scopes for verify:read/write
- [ ] VerifyService with enqueue/status/list
- [ ] VerifyHandler with 3 endpoints
- [ ] SSE events for verification progress
- [ ] Deployed to staging, manual E2E passing

**Foundation this enables:**
- Week 3 can add AI evaluation knowing API works
- Cookbook script can use standard api_call() pattern

---

## Week 3: AI Evaluation + Cookbook Test

**Goals:**
- AI evaluation path working (Claude reads screenshots, returns verdict)
- Cookbook E2E test script: `visual-verify-test.sh`
- Add to common.sh utilities
- Full E2E passing in CI

**Tasks:**

### Day 1-2: AI Evaluation Path

1. **Add evaluation to VerifyExecutor** (`internal/worker/verify_executor.go`)
   - After successful capture, if `spec.Evaluate`:
     - Build evaluation prompt: "Compare these screenshots against the specification..."
     - Include spec.Prompt or read spec.SpecPath content
     - Call Claude Code via CodeAgentRegistry
     - Pass screenshots as attachments (file paths in pod)
     - Parse evaluation output for score (look for "Score: XX/100" pattern)
     - Set result.Evaluation, result.Score, result.Passed

2. **Evaluation prompt template** (hardcoded in executor for now)
   ```
   Evaluate these screenshots against the following specification:

   {spec.Prompt or contents of spec.SpecPath}

   For each screenshot, assess:
   1. Does the UI match the specification?
   2. Are all required elements present?
   3. Is the layout correct at this viewport?

   End with: "Score: XX/100" and "PASSED" or "FAILED"
   ```

3. **Handle partial failures** (`internal/worker/verify_executor.go`)
   - If capture succeeds but evaluation fails:
     - Set success=true (screenshots are still useful)
     - Leave evaluation=""
     - Log warning

4. **Unit tests for evaluation path**
   - Mock CodeAgentRegistry
   - Test evaluation output parsing
   - Test partial failure handling

### Day 3-4: Cookbook Test Script

5. **Add utility to common.sh** (`cookbooks/scripts/common.sh`)
   ```bash
   # Wait for verification to complete
   # Arguments: task_id [max_attempts] [poll_interval]
   wait_for_verify() {
       local task_id="$1"
       local max_attempts="${2:-30}"
       local poll_interval="${3:-5}"
       # Poll GET /verify/{task_id} until completed/failed
   }
   ```

6. **Create visual-verify-test.sh** (`cookbooks/scripts/visual-verify-test.sh`)
   - Follow cookbook script SKILL.md patterns exactly
   - Commands: run, status, diagnose, teardown
   - Flow:
     1. Create composable project with app-astro component
     2. Wait for initial deploy (site is live)
     3. Start build: "Create a hero section with a call-to-action button"
     4. Wait for build to complete
     5. Wait for CI pipeline
     6. Wait for site to respond
     7. Start verification: `POST /projects/{id}/verify {url, evaluate: true, prompt: ...}`
     8. Wait for verify to complete
     9. Assert: result.passed == true OR result.score >= 70
     10. Teardown

7. **Add auto-teardown support**
   - Parse `--auto-teardown` flag
   - Register cleanup trap
   - Set CLEANUP_PROJECT

### Day 5: Integration + CI

8. **Test locally**
   ```bash
   ./cookbooks/scripts/visual-verify-test.sh run vv-test --auto-teardown
   ```

9. **Add to CI** (if CI runs cookbook tests)
   - Add visual-verify-test to test matrix
   - Ensure playwright-0 pod is available in test environment

10. **Document in cookbook skill** (`.claude/skills/cookbook-scripts/SKILL.md`)
    - Add `wait_for_verify()` to utilities list
    - Add visual-verify-test.sh to examples

**Deliverables:**
- [ ] AI evaluation working with score extraction
- [ ] Partial failure handling (capture ok, eval fail)
- [ ] wait_for_verify() in common.sh
- [ ] visual-verify-test.sh passing end-to-end
- [ ] Documentation updated

**Foundation this enables:**
- Week 4 can add SDLC integration knowing full flow works
- Cookbook pattern established for future tests

---

## Week 4: SDLC Integration + Polish

**Goals:**
- Visual verification as optional SDLC gate between QA and merge
- Skeleton command: `/verify-feature`
- Build chaining: auto-verify after deploy
- Release v0.12.0 with full feature

**Tasks:**

### Day 1-2: SDLC Types and Rules

1. **Add artifact type** (`internal/sdlc/types.go`)
   - `ArtifactVerification ArtifactType = "verification"`
   - Add to `ValidArtifactTypes` slice
   - Add case in `ArtifactFilename()` → returns `"verification.md"`

2. **Add action types** (`internal/sdlc/types.go`)
   - `ActionVerifyFeature ActionType = "VERIFY_FEATURE"`
   - `ActionFixVerificationIssues ActionType = "FIX_VERIFICATION_ISSUES"`

3. **Add classifier rules** (`internal/sdlc/rules_execution.go`)
   - `needsVerificationRule()`:
     - Condition: Phase=QA, qa_results=passed, verification=nil or pending
     - Action: ActionVerifyFeature
     - NextCommand: "/verify-feature {slug}"
   - `verificationFailedRule()`:
     - Condition: Phase=QA, verification=failed
     - Action: ActionFixVerificationIssues
     - NextCommand: "/fix-verification-issues {slug}"
   - `verificationPassedRule()`:
     - Condition: Phase=QA, qa_results=passed, verification=passed
     - Action: ActionTransition to PhaseMerge

4. **Update rule ordering** (`internal/sdlc/rules.go`)
   - Insert verification rules after qaPassedRule
   - Update qaPassedRule: only transition if verification also passed OR feature doesn't require verification (config flag)

5. **Unit tests** (`internal/sdlc/rules_execution_test.go`)
   - Test all three verification rules
   - Test interaction with existing QA rules

### Day 3: Skeleton Command

6. **Create verify-feature command** (embedded template: `templates/skeleton/.claude/commands/verify-feature.md`)
   ```markdown
   ---
   description: Visually verify a deployed feature
   argument-hint: <feature-slug>
   allowed-tools: Bash, Read, Write, Edit, Glob, Grep
   ---

   Visually verify feature: $ARGUMENTS

   ## Instructions

   1. Load feature spec from `.sdlc/features/$ARGUMENTS/spec.md`
   2. Get project domain from CLAUDE.md or config
   3. Determine the deployed URL
   4. Execute verification via rdev API (if available) or Playwright directly
   5. Write results to `.sdlc/features/$ARGUMENTS/verification.md`
   6. Register artifact: `sdlc artifact create $ARGUMENTS verification`

   ## Output Format

   Write `.sdlc/features/$ARGUMENTS/verification.md`:

   ```markdown
   # Visual Verification: [Feature Title]

   ## Screenshots

   | Viewport | Status | Notes |
   |----------|--------|-------|
   | Desktop (1920x1080) | PASS | All elements visible |
   | Mobile (375x667) | PASS | Responsive layout correct |

   ## Evaluation

   [AI or manual evaluation notes]

   ## Result

   **Status:** PASSED
   **Score:** 95/100
   ```
   ```

7. **Update skeleton template** to include the command
   - Ensure new projects get verify-feature.md

### Day 4: Build Chaining (Optional)

8. **Add verify_after to BuildSpec** (`internal/domain/build.go`)
   - `VerifyAfter bool` - auto-verify after successful deploy
   - `VerifyURL string` - URL to verify (if different from project domain)

9. **Chain verification in BuildExecutor** (`internal/worker/build_executor.go`)
   - After successful build + push (line ~270):
     ```go
     if spec.VerifyAfter && spec.VerifyURL != "" {
         // Enqueue verify task
     }
     ```
   - Or: callback webhook triggers external verification

10. **Update build handler** to accept verify_after/verify_url

### Day 5: Documentation + Release

11. **Update documentation**
    - CLAUDE.md: Update platform status to "Done"
    - visual-verification.md: Add SDLC integration examples
    - sdlc.md: Document verification rules

12. **Integration test**
    - Test full SDLC flow with verification gate
    - Test classifier transitions correctly

13. **Final release**
    ```bash
    ./scripts/release.sh v0.12.0 "feat: visual verification with SDLC integration" --deploy
    ```

**Deliverables:**
- [ ] ArtifactVerification type in SDLC
- [ ] 3 classifier rules for verification gate
- [ ] verify-feature.md skeleton command
- [ ] Build chaining (verify_after flag)
- [ ] Full integration test passing
- [ ] v0.12.0 released

---

## Summary

| Week | Theme | Key Output |
|------|-------|------------|
| 1 | Foundation | Playwright pod + capture script + domain types + executor |
| 2 | API Layer | Handlers + service + auth scopes + manual E2E |
| 3 | AI + Cookbook | Evaluation path + visual-verify-test.sh + common.sh utils |
| 4 | SDLC + Polish | Classifier rules + skeleton command + build chaining + release |

## Risks and Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| Playwright pod OOM | Capture fails | Start with conservative limits (4Gi), tune based on usage |
| AI evaluation unreliable | Poor pass/fail decisions | Start with high threshold (70), tune; partial success mode |
| Screenshot storage fills up | Pod crashes | EmptyDir for now, add cleanup job or PVC later |
| SDLC rules conflict | Features stuck | Test extensively, make verification optional via config |
| Claude Code can't read screenshots | Evaluation broken | Test multimodal support; fallback to manual verification |

## Files Created/Modified

**New Files (13):**
- `internal/domain/verify.go`
- `internal/domain/verify_test.go`
- `internal/service/verify_service.go`
- `internal/service/verify_service_test.go`
- `internal/handlers/verify.go`
- `internal/handlers/verify_test.go`
- `internal/worker/verify_executor.go`
- `internal/worker/verify_executor_test.go`
- `deployments/k8s/base/playwright-pod.yaml`
- `deployments/k8s/base/playwright-configmap.yaml`
- `deployments/k8s/base/playwright-scripts/capture.js`
- `cookbooks/scripts/visual-verify-test.sh`
- `templates/skeleton/.claude/commands/verify-feature.md`

**Modified Files (8):**
- `internal/domain/work.go` - Add WorkTaskTypeVerify
- `internal/auth/scopes.go` - Add verify scopes
- `internal/worker/work_executor.go` - Add dispatch case
- `internal/sdlc/types.go` - Add artifact/action types
- `internal/sdlc/rules.go` - Register verification rules
- `internal/sdlc/rules_execution.go` - Add verification rules
- `cookbooks/scripts/common.sh` - Add wait_for_verify()
- `cmd/rdev-api/main.go` - Wire DI