rdev/docs/plans/visual-verification-breakdown.md
jordan 9a1309a0c5 feat: fix composable monorepo CI builds + health endpoint improvements
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)

Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health

Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:46:51 -07:00

17 KiB

Visual Verification Implementation Breakdown

Goal: Add Playwright-based visual verification to rdev, enabling automated screenshot/video capture of deployed sites and AI-driven feature completeness evaluation. Integrate with SDLC as an optional QA gate and add a cookbook E2E test.

Estimated Duration: 4 weeks (assumes ~25 hours/week of focused work)


Week 1: Foundation — Domain + Capture Infrastructure

Goals:

  • Playwright pod deployed and reachable via kubectl exec
  • Capture script working end-to-end
  • Domain models and work task type in place
  • Manual verification via kubectl exec confirms capture works

Tasks:

Day 1-2: Playwright Pod Infrastructure

  1. Create Playwright pod manifest (deployments/k8s/base/playwright-pod.yaml)

    • StatefulSet with mcr.microsoft.com/playwright:v1.50.0-noble image
    • sleep infinity command (stays alive for kubectl exec)
    • Labels: app: playwright, rdev.orchard9.ai/role: playwright
    • Volumes: /captures (emptyDir), /scripts (ConfigMap)
    • Resources: 500m CPU / 1Gi request, 2 CPU / 4Gi limit
  2. Create capture script (deployments/k8s/base/playwright-scripts/capture.js)

    • ~60 lines Node.js using Playwright
    • CLI: --url, --viewports (comma-sep), --output, --wait-for, --full-page, --video, --timeout
    • Output: JSON manifest to stdout with screenshot paths
    • Error handling: catch navigation failures, timeout gracefully
  3. Create ConfigMap for script (deployments/k8s/base/playwright-configmap.yaml)

    • Mount capture.js at /scripts/capture.js
  4. Deploy to cluster and test manually

    kubectl apply -f deployments/k8s/base/playwright-configmap.yaml
    kubectl apply -f deployments/k8s/base/playwright-pod.yaml
    kubectl exec playwright-0 -- node /scripts/capture.js \
      --url=https://example.com --viewports=1920x1080 --output=/captures/test/
    kubectl exec playwright-0 -- cat /captures/test/manifest.json
    

Day 3: Domain Models

  1. Create domain types (internal/domain/verify.go)

    • VerifySpec struct with fields: URL, Viewports, WaitFor, WaitTimeout, FullPage, Video, Evaluate, Prompt, SpecPath, CallbackURL
    • Validate() method: URL required, callback URL validation (reuse ValidateCallbackURL)
    • VerifyResult struct: Success, Screenshots, Video, Evaluation, Score, Passed, DurationMs, Error
    • ToWorkResult() method (promote screenshots to artifacts map)
  2. Add work task type (internal/domain/work.go)

    • Add WorkTaskTypeVerify WorkTaskType = "verify" to constants
    • Update IsValid() to include verify
  3. Unit tests (internal/domain/verify_test.go)

    • Test Validate() with valid/invalid specs
    • Test ToWorkResult() conversion

Day 4-5: Verify Executor (Capture Only)

  1. Create verify executor (internal/worker/verify_executor.go)

    • Follow BuildExecutor pattern exactly
    • Execute(ctx, task) method:
      • Parse VerifySpec from task.Spec map
      • Build kubectl exec command: kubectl exec playwright-0 -- node /scripts/capture.js --url=X ...
      • Execute via existing CommandExecutor port
      • Parse JSON manifest from stdout
      • Return BuildResult with artifacts map containing screenshot paths
    • Config struct: VerifyExecutorConfig with playwright pod name, namespace
    • Constructor: NewVerifyExecutor(executor, streams, logger, cfg)
  2. Wire executor to WorkExecutor (internal/worker/work_executor.go)

    • Add verifyExec *VerifyExecutor field
    • Add case in executeTask() switch for WorkTaskTypeVerify
    • Update NewWorkExecutor() to accept VerifyExecutor
  3. Unit tests (internal/worker/verify_executor_test.go)

    • Mock CommandExecutor to return capture manifest JSON
    • Test successful capture with multiple viewports
    • Test failure handling (command fails, invalid JSON)

Deliverables:

  • Playwright pod running in cluster
  • Capture script takes screenshots successfully
  • VerifySpec/VerifyResult domain types with tests
  • VerifyExecutor can dispatch capture via kubectl exec
  • Work queue can dispatch verify tasks (manual test via SQL insert)

Foundation this enables:

  • Week 2 can build API layer knowing capture works
  • Executor pattern established for AI evaluation later

Week 2: API Layer + Manual E2E

Goals:

  • Full API surface: POST /verify, GET /verify/{id}, GET /verifications
  • Auth scopes configured
  • Manual E2E working: API call → queue → capture → result
  • Initial release candidate deployed to staging

Tasks:

Day 1: Auth and Service Layer

  1. Add auth scopes (internal/auth/scopes.go)

    • ScopeVerifyRead Scope = "verify:read"
    • ScopeVerifyWrite Scope = "verify:write"
    • Add to AllScopes if needed
  2. Create verify service (internal/service/verify_service.go)

    • Follow BuildService pattern
    • StartVerify(ctx, projectID, spec) → validate, enqueue task, return task ID
    • GetVerifyStatus(ctx, taskID) → get task from work queue
    • ListVerifications(ctx, projectID, limit) → list tasks by project
    • Dependencies: WorkQueue port (existing)
  3. Unit tests (internal/service/verify_service_test.go)

    • Mock work queue
    • Test enqueue, status, list

Day 2-3: Handler Layer

  1. Create verify handler (internal/handlers/verify.go)

    • Follow BuildsHandler pattern exactly
    • Mount(r api.Router) with scopes:
      • POST /projects/{id}/verify → ScopeVerifyWrite
      • GET /projects/{id}/verifications → ScopeVerifyRead
      • GET /verify/{taskId} → ScopeVerifyRead
    • Use api.DecodeJSON(), validate.New(), response helpers
    • Request struct: VerifyRequest matching VerifySpec
    • Response structs: match existing patterns
  2. Wire DI (cmd/rdev-api/main.go)

    • Create VerifyExecutor in worker setup
    • Create VerifyService
    • Create VerifyHandler
    • Mount routes
  3. Handler tests (internal/handlers/verify_test.go)

    • Test POST with valid/invalid specs
    • Test auth scope enforcement
    • Test GET status/list

Day 4: SSE Events

  1. Add verify events (internal/worker/verify_executor.go)
    • Publish events via StreamPublisher:
      • verify.started - task claimed
      • verify.capturing - starting capture
      • verify.captured - capture complete with manifest
      • verify.completed / verify.failed - final status
    • Event constants in verify_executor.go (follow BuildExecutor pattern)

Day 5: Manual E2E + Deploy

  1. Manual E2E test sequence

    # 1. Start verification
    curl -X POST $RDEV_API_URL/projects/myproject/verify \
      -H "X-API-Key: $RDEV_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"url": "https://myproject.threesix.ai", "viewports": ["1920x1080"]}'
    # Response: {"task_id": "xxx"}
    
    # 2. Poll for completion
    curl $RDEV_API_URL/verify/xxx -H "X-API-Key: $RDEV_API_KEY"
    # Response: screenshots in artifacts
    
  2. Build and deploy

    ./scripts/release.sh v0.11.0 "feat: add visual verification (capture-only MVP)" --deploy
    

Deliverables:

  • Auth scopes for verify:read/write
  • VerifyService with enqueue/status/list
  • VerifyHandler with 3 endpoints
  • SSE events for verification progress
  • Deployed to staging, manual E2E passing

Foundation this enables:

  • Week 3 can add AI evaluation knowing API works
  • Cookbook script can use standard api_call() pattern

Week 3: AI Evaluation + Cookbook Test

Goals:

  • AI evaluation path working (Claude reads screenshots, returns verdict)
  • Cookbook E2E test script: visual-verify-test.sh
  • Add to common.sh utilities
  • Full E2E passing in CI

Tasks:

Day 1-2: AI Evaluation Path

  1. Add evaluation to VerifyExecutor (internal/worker/verify_executor.go)

    • After successful capture, if spec.Evaluate:
      • Build evaluation prompt: "Compare these screenshots against the specification..."
      • Include spec.Prompt or read spec.SpecPath content
      • Call Claude Code via CodeAgentRegistry
      • Pass screenshots as attachments (file paths in pod)
      • Parse evaluation output for score (look for "Score: XX/100" pattern)
      • Set result.Evaluation, result.Score, result.Passed
  2. Evaluation prompt template (hardcoded in executor for now)

    Evaluate these screenshots against the following specification:
    
    {spec.Prompt or contents of spec.SpecPath}
    
    For each screenshot, assess:
    1. Does the UI match the specification?
    2. Are all required elements present?
    3. Is the layout correct at this viewport?
    
    End with: "Score: XX/100" and "PASSED" or "FAILED"
    
  3. Handle partial failures (internal/worker/verify_executor.go)

    • If capture succeeds but evaluation fails:
      • Set success=true (screenshots are still useful)
      • Leave evaluation=""
      • Log warning
  4. Unit tests for evaluation path

    • Mock CodeAgentRegistry
    • Test evaluation output parsing
    • Test partial failure handling

Day 3-4: Cookbook Test Script

  1. Add utility to common.sh (cookbooks/scripts/common.sh)

    # Wait for verification to complete
    # Arguments: task_id [max_attempts] [poll_interval]
    wait_for_verify() {
        local task_id="$1"
        local max_attempts="${2:-30}"
        local poll_interval="${3:-5}"
        # Poll GET /verify/{task_id} until completed/failed
    }
    
  2. Create visual-verify-test.sh (cookbooks/scripts/visual-verify-test.sh)

    • Follow cookbook script SKILL.md patterns exactly
    • Commands: run, status, diagnose, teardown
    • Flow:
      1. Create composable project with app-astro component
      2. Wait for initial deploy (site is live)
      3. Start build: "Create a hero section with a call-to-action button"
      4. Wait for build to complete
      5. Wait for CI pipeline
      6. Wait for site to respond
      7. Start verification: POST /projects/{id}/verify {url, evaluate: true, prompt: ...}
      8. Wait for verify to complete
      9. Assert: result.passed == true OR result.score >= 70
      10. Teardown
  3. Add auto-teardown support

    • Parse --auto-teardown flag
    • Register cleanup trap
    • Set CLEANUP_PROJECT

Day 5: Integration + CI

  1. Test locally

    ./cookbooks/scripts/visual-verify-test.sh run vv-test --auto-teardown
    
  2. Add to CI (if CI runs cookbook tests)

    • Add visual-verify-test to test matrix
    • Ensure playwright-0 pod is available in test environment
  3. Document in cookbook skill (.claude/skills/cookbook-scripts/SKILL.md)

    • Add wait_for_verify() to utilities list
    • Add visual-verify-test.sh to examples

Deliverables:

  • AI evaluation working with score extraction
  • Partial failure handling (capture ok, eval fail)
  • wait_for_verify() in common.sh
  • visual-verify-test.sh passing end-to-end
  • Documentation updated

Foundation this enables:

  • Week 4 can add SDLC integration knowing full flow works
  • Cookbook pattern established for future tests

Week 4: SDLC Integration + Polish

Goals:

  • Visual verification as optional SDLC gate between QA and merge
  • Skeleton command: /verify-feature
  • Build chaining: auto-verify after deploy
  • Release v0.12.0 with full feature

Tasks:

Day 1-2: SDLC Types and Rules

  1. Add artifact type (internal/sdlc/types.go)

    • ArtifactVerification ArtifactType = "verification"
    • Add to ValidArtifactTypes slice
    • Add case in ArtifactFilename() → returns "verification.md"
  2. Add action types (internal/sdlc/types.go)

    • ActionVerifyFeature ActionType = "VERIFY_FEATURE"
    • ActionFixVerificationIssues ActionType = "FIX_VERIFICATION_ISSUES"
  3. Add classifier rules (internal/sdlc/rules_execution.go)

    • needsVerificationRule():
      • Condition: Phase=QA, qa_results=passed, verification=nil or pending
      • Action: ActionVerifyFeature
      • NextCommand: "/verify-feature {slug}"
    • verificationFailedRule():
      • Condition: Phase=QA, verification=failed
      • Action: ActionFixVerificationIssues
      • NextCommand: "/fix-verification-issues {slug}"
    • verificationPassedRule():
      • Condition: Phase=QA, qa_results=passed, verification=passed
      • Action: ActionTransition to PhaseMerge
  4. Update rule ordering (internal/sdlc/rules.go)

    • Insert verification rules after qaPassedRule
    • Update qaPassedRule: only transition if verification also passed OR feature doesn't require verification (config flag)
  5. Unit tests (internal/sdlc/rules_execution_test.go)

    • Test all three verification rules
    • Test interaction with existing QA rules

Day 3: Skeleton Command

  1. Create verify-feature command (embedded template: templates/skeleton/.claude/commands/verify-feature.md)

    ---
    description: Visually verify a deployed feature
    argument-hint: <feature-slug>
    allowed-tools: Bash, Read, Write, Edit, Glob, Grep
    ---
    
    Visually verify feature: $ARGUMENTS
    
    ## Instructions
    
    1. Load feature spec from `.sdlc/features/$ARGUMENTS/spec.md`
    2. Get project domain from CLAUDE.md or config
    3. Determine the deployed URL
    4. Execute verification via rdev API (if available) or Playwright directly
    5. Write results to `.sdlc/features/$ARGUMENTS/verification.md`
    6. Register artifact: `sdlc artifact create $ARGUMENTS verification`
    
    ## Output Format
    
    Write `.sdlc/features/$ARGUMENTS/verification.md`:
    
    ```markdown
    # Visual Verification: [Feature Title]
    
    ## Screenshots
    
    | Viewport | Status | Notes |
    |----------|--------|-------|
    | Desktop (1920x1080) | PASS | All elements visible |
    | Mobile (375x667) | PASS | Responsive layout correct |
    
    ## Evaluation
    
    [AI or manual evaluation notes]
    
    ## Result
    
    **Status:** PASSED
    **Score:** 95/100
    
    
    
  2. Update skeleton template to include the command

    • Ensure new projects get verify-feature.md

Day 4: Build Chaining (Optional)

  1. Add verify_after to BuildSpec (internal/domain/build.go)

    • VerifyAfter bool - auto-verify after successful deploy
    • VerifyURL string - URL to verify (if different from project domain)
  2. Chain verification in BuildExecutor (internal/worker/build_executor.go)

    • After successful build + push (line ~270):
      if spec.VerifyAfter && spec.VerifyURL != "" {
          // Enqueue verify task
      }
      
    • Or: callback webhook triggers external verification
  3. Update build handler to accept verify_after/verify_url

Day 5: Documentation + Release

  1. Update documentation

    • CLAUDE.md: Update platform status to "Done"
    • visual-verification.md: Add SDLC integration examples
    • sdlc.md: Document verification rules
  2. Integration test

    • Test full SDLC flow with verification gate
    • Test classifier transitions correctly
  3. Final release

    ./scripts/release.sh v0.12.0 "feat: visual verification with SDLC integration" --deploy
    

Deliverables:

  • ArtifactVerification type in SDLC
  • 3 classifier rules for verification gate
  • verify-feature.md skeleton command
  • Build chaining (verify_after flag)
  • Full integration test passing
  • v0.12.0 released

Summary

Week Theme Key Output
1 Foundation Playwright pod + capture script + domain types + executor
2 API Layer Handlers + service + auth scopes + manual E2E
3 AI + Cookbook Evaluation path + visual-verify-test.sh + common.sh utils
4 SDLC + Polish Classifier rules + skeleton command + build chaining + release

Risks and Mitigations

Risk Impact Mitigation
Playwright pod OOM Capture fails Start with conservative limits (4Gi), tune based on usage
AI evaluation unreliable Poor pass/fail decisions Start with high threshold (70), tune; partial success mode
Screenshot storage fills up Pod crashes EmptyDir for now, add cleanup job or PVC later
SDLC rules conflict Features stuck Test extensively, make verification optional via config
Claude Code can't read screenshots Evaluation broken Test multimodal support; fallback to manual verification

Files Created/Modified

New Files (13):

  • internal/domain/verify.go
  • internal/domain/verify_test.go
  • internal/service/verify_service.go
  • internal/service/verify_service_test.go
  • internal/handlers/verify.go
  • internal/handlers/verify_test.go
  • internal/worker/verify_executor.go
  • internal/worker/verify_executor_test.go
  • deployments/k8s/base/playwright-pod.yaml
  • deployments/k8s/base/playwright-configmap.yaml
  • deployments/k8s/base/playwright-scripts/capture.js
  • cookbooks/scripts/visual-verify-test.sh
  • templates/skeleton/.claude/commands/verify-feature.md

Modified Files (8):

  • internal/domain/work.go - Add WorkTaskTypeVerify
  • internal/auth/scopes.go - Add verify scopes
  • internal/worker/work_executor.go - Add dispatch case
  • internal/sdlc/types.go - Add artifact/action types
  • internal/sdlc/rules.go - Register verification rules
  • internal/sdlc/rules_execution.go - Add verification rules
  • cookbooks/scripts/common.sh - Add wait_for_verify()
  • cmd/rdev-api/main.go - Wire DI