rdev/app-vision-gaps.md
jordan 3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00

39 KiB

Orchard Studio: Gap Analysis

This document maps the delta between current rdev capabilities and what Orchard Studio requires.

Current Foundation (What We Have)

Capability Status Location
SDLC Classifier Complete internal/sdlc/classifier.go
Feature State Machine Complete internal/sdlc/ (10 phases, 31 rules)
Composable Templates Complete internal/adapter/templates/
Worker Pod Execution Complete internal/worker/sdlc_executor.go
Webhook Dispatcher Complete internal/webhook/dispatcher.go
Project Provisioning Complete K8s namespace, DNS, git repo
Database Provisioning Complete CockroachDB adapter
Tree Workflows Proven cookbooks/trees/*.yaml

Gap 0: Design Reference Capture & Processing

Current: No mechanism for users to provide visual inspiration. Features are described purely in text.

Required: Users can provide URLs or screenshots as design references, which inform the Architect's questions and the Blueprint's design system section.

What's Missing

┌─────────────────────────────────────────────────────────────────────────┐
│  CURRENT FLOW                                                            │
│                                                                          │
│  User: "Build a pricing page"                                            │
│  Architect: *asks about data model, endpoints...*                        │
│  (No visual context, design decisions are guesswork)                     │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│  REQUIRED FLOW                                                           │
│                                                                          │
│  User: "Build a pricing page like this" + [URL or screenshot]            │
│  System: Captures screenshot, stores with Blueprint                      │
│  Architect: "I see a dark theme with 3 tiers..." → asks clarifying Qs   │
│  Blueprint: Populates designSystem section with extracted tokens         │
└─────────────────────────────────────────────────────────────────────────┘

Two Input Types

Input Capture Method Storage
URL Playwright screenshots the page automatically /references/{blueprintId}/{refId}.png
Screenshot User uploads image (drag/drop, paste, file picker) Same storage path

Implementation Required

  1. Reference Capture Service:

    • For URLs: Reuse verify_executor.go pattern (Playwright pod)
    • For uploads: Standard file upload handling
    • Store thumbnails alongside Blueprint
  2. Chat Endpoint Enhancement:

    • Accept references[] array in request body
    • Process references before LLM call
    • Include reference images in Architect prompt context
  3. Architect Prompt Updates:

    • Describe what it observes in natural language
    • Ask clarifying questions about design intent
    • Extract structured design tokens into Blueprint
  4. Blueprint Schema:

    • Add references.items[] array
    • Add sections.designSystem section
    • Track which references informed which design decisions
  5. Plan Pane Rendering:

    • Show reference thumbnails in UI
    • Display extracted design tokens
    • Allow user to add annotations

Complexity: Medium

  • URL capture reuses existing Playwright infrastructure
  • File upload is standard pattern
  • Main work is Architect prompt engineering for visual understanding
  • LLM vision capabilities needed (Claude can see images natively)

Gap 1: Blueprint Storage & Chat API

Current: Features are created via POST /sdlc/features with a complete spec. No iterative refinement.

Required: Multi-turn conversation that builds a Blueprint incrementally.

What's Missing

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT FLOW                                                   │
│                                                                 │
│  User writes spec → POST /sdlc/features → Feature created       │
│  (one shot, no iteration)                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED FLOW                                                  │
│                                                                 │
│  User message → Architect responds + updates Blueprint →        │
│  User message → Architect responds + updates Blueprint →        │
│  ...repeat until ready...                                       │
│  User: "build it" → Blueprint → SDLC Feature → Build            │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

  1. Database Tables:

    • blueprints - stores structured Blueprint JSON
    • blueprint_messages - conversation history with snapshots
  2. API Endpoints:

    • POST /projects/{id}/blueprint/chat - send message, get reply + updated blueprint
    • GET /projects/{id}/blueprints - list blueprints
    • GET /projects/{id}/blueprints/{id} - get specific blueprint
    • DELETE /projects/{id}/blueprints/{id} - discard draft
  3. Service Layer:

    • ArchitectService - manages conversation, calls LLM, updates Blueprint

Complexity: Medium

  • Schema is defined (see app-vision.md)
  • Standard CRUD + LLM integration
  • Most work is in prompt engineering for Architect

Gap 2: Architect Agent Persona

Current: We have coding agents (/implement-feature). They write code, not specs.

Required: An agent that asks questions, fills in a structured Blueprint, knows when to stop.

What's Missing

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT AGENTS                                                 │
│                                                                 │
│  User: "Add cat photos"                                         │
│  Agent: *immediately writes code*                               │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  ARCHITECT AGENT                                                │
│                                                                 │
│  User: "Add cat photos"                                         │
│  Architect: "Should photos be public or friends-only?"          │
│  User: "Public"                                                 │
│  Architect: "Got it. Do you want likes, comments, or neither?"  │
│  ...continues until Blueprint is complete...                    │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

  1. System Prompt:

    • .claude/agents/architect.md - detailed persona
    • Structured output format (reply + Blueprint JSON)
    • Question strategy (when to ask vs assume)
  2. Structured Output Parsing:

    • LLM returns {reply: string, blueprint: Blueprint}
    • Validate Blueprint against schema
    • Handle partial updates (delta vs full replacement)
  3. Completeness Logic:

    • isReadyToBuild(blueprint) function
    • Clear rules for when questions are resolved
    • Override mechanism for user to force build

Complexity: Medium-High

  • Prompt engineering is iterative
  • Structured output from LLMs can be fragile
  • Need fallback handling for malformed responses

Gap 3: Operation Tracking (Tree Runner in DB)

Current: Tree workflows run via shell script (tree-runner.sh). State in local JSON files.

Required: Operations tracked in database, queryable via API, streamable to UI.

What's Missing

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  ./tree-runner.sh slackpath-1.yaml                              │
│  → Runs in terminal                                             │
│  → State in .checkpoints/slackpath-1.json                       │
│  → No API visibility                                            │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  POST /operations/start {tree: "slackpath-1"}                   │
│  → Returns operation_id                                         │
│  → State in operations table                                    │
│  → GET /operations/{id}/stream returns SSE events               │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

  1. Database Tables:

    • operations - tracks running/completed operations
    • operation_events - event log for replay/streaming
  2. Service Layer:

    • OrchestratorService - manages operation lifecycle
    • Port tree-runner logic from bash to Go
    • Event emission during execution
  3. API Endpoints:

    • POST /projects/{id}/operations - start operation
    • GET /projects/{id}/operations/{id} - get status
    • GET /projects/{id}/operations/{id}/stream - SSE stream
  4. Worker Integration:

    • SDLC executor emits events as it progresses
    • Events written to operation_events table
    • SSE handler reads from table and streams

Complexity: High

  • Tree runner logic is non-trivial (dependencies, outputs, error handling)
  • SSE streaming requires careful connection management
  • Need to handle operation cancellation, resumption

Gap 4: Real-Time Progress Streaming

Current: Webhooks fire on build complete. No per-step visibility.

Required: SSE stream showing "Designing schema... Writing handlers... Running tests..."

What's Missing

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Build starts → ... silence ... → Webhook: "build complete"    │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  Build starts →                                                 │
│    event: {"phase": "spec", "status": "complete"}               │
│    event: {"phase": "design", "status": "in_progress"}          │
│    event: {"phase": "design", "status": "complete"}             │
│    event: {"phase": "implement", "progress": 0.5}               │
│    ...                                                          │
│    event: {"status": "complete", "url": "..."}                  │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

  1. SDLC Executor Changes:

    • Emit events at phase transitions
    • Emit progress within phases (task completion)
    • Write events to operation_events table
  2. SSE Handler:

    • GET /operations/{id}/stream
    • Long-lived connection
    • Read events from DB (or Redis pub/sub)
    • Handle client disconnection gracefully
  3. Event Types:

    type OperationEvent struct {
        Type      string    // "phase", "progress", "artifact", "error", "complete"
        Phase     string    // "spec", "design", "implement", "test", "deploy"
        Status    string    // "in_progress", "complete", "failed"
        Message   string    // Human-readable
        Progress  float64   // 0.0 to 1.0 for granular progress
        Timestamp time.Time
    }
    

Complexity: Medium

  • SSE is straightforward in Go
  • Main work is instrumenting SDLC executor
  • Need to balance granularity vs noise

Gap 5: Blueprint → SDLC Feature Conversion

Current: SDLC features are created manually with spec documents.

Required: Automated conversion from structured Blueprint to SDLC feature spec.

What's Missing

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Human writes: spec.md with prose description                   │
│  → POST /sdlc/features                                          │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  Blueprint JSON → Template rendering → spec.md                  │
│  → Automated POST /sdlc/features                                │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

  1. Spec Template:

    # Feature: {{.Feature}}
    
    ## Summary
    {{.Summary}}
    
    ## Data Model
    {{range .Sections.DataModel.Entities}}
    ### {{.Name}}
    | Field | Type |
    |-------|------|
    {{range .Fields}}| {{.Name}} | {{.Type}} |
    {{end}}
    {{end}}
    
    ## API Endpoints
    {{range .Sections.APIEndpoints.Endpoints}}
    - `{{.Method}} {{.Path}}` - {{.Description}}
    {{end}}
    
    ## UI Components
    {{range .Sections.UIComponents.Components}}
    - **{{.Name}}**: {{.Purpose}}
    {{end}}
    
    ## Assumptions
    {{range .Assumptions}}
    - {{.Assumption}}
    {{end}}
    
  2. Conversion Service:

    • Takes Blueprint, renders spec.md
    • Creates SDLC feature via existing API
    • Links Blueprint to created feature (built_feature_slug)

Complexity: Low

  • Template rendering is straightforward
  • SDLC feature creation already exists
  • Main work is template design

Gap 6: Frontend (Next.js Studio)

Current: No frontend. All interaction via API/CLI.

Required: Three-pane interface (Chat, Plan, Preview).

What's Missing

Everything. This is a new application.

Implementation Required

  1. Project Setup:

    • Next.js 14 with App Router
    • Tailwind CSS for styling
    • Authentication (integrate with rdev auth)
  2. Core Components:

    apps/studio/
    ├── app/
    │   ├── page.tsx              # Template selection
    │   ├── projects/
    │   │   └── [id]/
    │   │       └── page.tsx      # Three-pane workspace
    │   └── api/                  # Proxy to rdev-api
    ├── components/
    │   ├── ChatPane.tsx
    │   ├── PlanPane.tsx
    │   ├── PreviewPane.tsx
    │   ├── ActivityFeed.tsx
    │   └── BuildProgress.tsx
    └── lib/
        ├── api.ts               # rdev-api client
        └── sse.ts               # SSE connection manager
    
  3. State Management:

    • Blueprint state (updated on each chat response)
    • Operation state (updated via SSE)
    • UI state (which pane is focused, etc.)
  4. Key Interactions:

    • Send chat message → receive reply + blueprint
    • Click "Build It" → start operation → show progress
    • Operation complete → refresh preview iframe

Complexity: Medium

  • Standard Next.js app
  • SSE client requires careful handling
  • Most complexity is in polish and UX

Gap 7: Platform Service Infrastructure

Current: Projects manage their own integrations. No shared services, no credential management.

Required: A service catalog with provisioning, credential injection, and upgrade paths for existing projects.

The "Upgrade" Problem

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Project created 3 months ago                                   │
│  → No centralized logging                                       │
│  → No analytics                                                 │
│  → Rolling your own email                                       │
│  → No easy way to add platform services                         │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  POST /projects/{id}/services                                   │
│  { "type": "logging", "provider": "loki" }                      │
│                                                                 │
│  → Provision credentials                                        │
│  → Inject into K8s secrets                                      │
│  → Create integration PR with config changes                    │
│  → Project now ships logs to centralized system                 │
└─────────────────────────────────────────────────────────────────┘

Service Rollout Order

Build infrastructure with simplest service first, then add complexity:

Order Service Why This Order
1 Logging Pure infrastructure, no user-facing code changes
2 Email Simple API calls, clear success/failure
3 Stats Frontend SDK + backend events
4 Auth Most complex (middleware, user model, protected routes)

Implementation Required

1. Service Catalog

# internal/platform/catalog.yaml
services:
  logging:
    description: "Centralized log aggregation"
    providers:
      loki:
        name: "Grafana Loki"
        credentials:
          - LOKI_URL
          - LOKI_TENANT_ID
        integration:
          go:
            config_template: "loki-logger.go.tmpl"
            env_example: ["LOKI_URL", "LOKI_TENANT_ID"]
          node:
            packages: ["pino", "pino-loki"]
            config_template: "pino-loki.ts.tmpl"

  email:
    description: "Transactional email"
    providers:
      resend:
        name: "Resend"
        credentials:
          - RESEND_API_KEY
        integration:
          go:
            packages: ["github.com/resendlabs/resend-go"]
            service_template: "email-service.go.tmpl"
          node:
            packages: ["resend"]
            service_template: "email-client.ts.tmpl"

  stats:
    description: "Product analytics"
    providers:
      posthog:
        name: "PostHog"
        credentials:
          - POSTHOG_API_KEY
          - POSTHOG_HOST
        integration:
          go:
            packages: ["github.com/posthog/posthog-go"]
          node:
            packages: ["posthog-js", "posthog-node"]
            provider_template: "analytics-provider.tsx.tmpl"

  auth:
    description: "User authentication"
    providers:
      clerk:
        name: "Clerk"
        credentials:
          - CLERK_SECRET_KEY
          - NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
        integration:
          node:
            packages: ["@clerk/nextjs"]
            middleware_template: "clerk-middleware.ts.tmpl"
            provider_template: "clerk-provider.tsx.tmpl"

2. Database Schema

-- Track which services a project uses
CREATE TABLE project_services (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id),
    service_type TEXT NOT NULL,      -- 'logging', 'email', 'stats', 'auth'
    provider TEXT NOT NULL,           -- 'loki', 'resend', 'posthog', 'clerk'
    environment TEXT NOT NULL,        -- 'staging', 'production', 'all'

    -- Encrypted credentials
    credentials_encrypted BYTEA,

    -- Non-sensitive config
    config JSONB NOT NULL DEFAULT '{}',

    -- Status tracking
    status TEXT NOT NULL DEFAULT 'provisioning',
    -- provisioning → active → needs_update → deprovisioned

    -- Integration tracking
    integration_status TEXT DEFAULT 'pending',
    -- pending → pr_created → integrated → needs_update
    integration_pr_url TEXT,
    integration_commit TEXT,

    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(project_id, service_type, environment)
);

3. Provisioner Interface

// internal/port/platform_provisioner.go
type PlatformProvisioner interface {
    // Provision creates credentials for a project
    Provision(ctx context.Context, req ProvisionRequest) (*ProvisionResult, error)

    // Verify checks if credentials are still valid
    Verify(ctx context.Context, projectID string, creds map[string]string) error

    // Deprovision cleans up (optional, for account removal)
    Deprovision(ctx context.Context, projectID string) error
}

type ProvisionRequest struct {
    ProjectID   uuid.UUID
    ProjectName string
    Environment string  // "staging", "production"
}

type ProvisionResult struct {
    Credentials map[string]string  // Encrypted before storage
    Config      map[string]string  // Non-sensitive config
}

4. Service Addition API

POST /projects/{projectId}/services
{
  "serviceType": "logging",
  "provider": "loki"       // Optional, uses platform default
}

Response:
{
  "serviceId": "svc_abc123",
  "status": "provisioning",
  "integrationMethod": "pr",  // or "direct"
  "prUrl": null  // Populated when PR is created
}

GET /projects/{projectId}/services/{serviceId}
{
  "serviceId": "svc_abc123",
  "serviceType": "logging",
  "provider": "loki",
  "status": "active",
  "integrationStatus": "integrated",
  "integrationCommit": "abc123...",
  "credentials": {
    "LOKI_URL": "[redacted]",
    "LOKI_TENANT_ID": "project-xyz"
  }
}

5. Integration Flow

POST /projects/{id}/services {type: "logging", provider: "loki"}
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  1. PROVISION                                                   │
│                                                                 │
│  LokiProvisioner.Provision()                                    │
│  → Create tenant in Loki (or use shared with project prefix)   │
│  → Generate credentials                                         │
│  → Store encrypted in project_services                          │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  2. INJECT                                                      │
│                                                                 │
│  K8sSecretInjector.Inject()                                     │
│  → Add LOKI_URL, LOKI_TENANT_ID to project's K8s secret        │
│  → Trigger deployment restart to pick up new env vars          │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  3. INTEGRATE                                                   │
│                                                                 │
│  IntegrationService.CreatePR() or .DirectCommit()               │
│  → Clone project repo                                           │
│  → Apply integration templates:                                 │
│    • Update logger config to ship to Loki                       │
│    • Add env vars to .env.example                               │
│    • Update deployment to mount secrets                         │
│  → Create PR (or direct commit for new projects)                │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  4. VERIFY                                                      │
│                                                                 │
│  After PR merge / deploy:                                       │
│  → Check logs appearing in Loki                                 │
│  → Update integration_status to "integrated"                    │
└─────────────────────────────────────────────────────────────────┘

Complexity: High

  • Service catalog is straightforward (YAML/DB)
  • Each provisioner is unique (Loki vs Resend vs PostHog)
  • Credential encryption and management needs care
  • Integration templates need to handle Go + Node + various frameworks
  • PR creation requires git operations

Starting Point: Logging with Loki

// internal/adapter/loki/provisioner.go
type LokiProvisioner struct {
    lokiURL    string
    adminToken string  // For tenant creation if using multi-tenant Loki
}

func (p *LokiProvisioner) Provision(ctx context.Context, req ProvisionRequest) (*ProvisionResult, error) {
    // For single-tenant Loki, just create a unique label prefix
    tenantID := fmt.Sprintf("project-%s", req.ProjectID)

    return &ProvisionResult{
        Credentials: map[string]string{
            "LOKI_URL":       p.lokiURL,
            "LOKI_TENANT_ID": tenantID,
        },
        Config: map[string]string{
            "service_name": req.ProjectName,
        },
    }, nil
}

Gap 8: Dual Environment Support

Current: Single deployment per project. Main branch = production.

Required: Staging + Production environments. Build deploys to staging, "Publish" promotes to production.

The Environment Model

┌─────────────────────────────────────────────────────────────────┐
│  Project: cool-project                                          │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  STAGING                                                 │   │
│  │  staging.cool-project.threesix.ai                       │   │
│  │                                                          │   │
│  │  • Where development happens                             │   │
│  │  • Preview pane shows this                               │   │
│  │  • "Build It" deploys here                               │   │
│  │  • May use test credentials for services                 │   │
│  └─────────────────────────────────────────────────────────┘   │
│                         │                                       │
│                    [Publish]                                    │
│                         │                                       │
│                         ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  PRODUCTION                                              │   │
│  │  cool-project.threesix.ai                               │   │
│  │                                                          │   │
│  │  • User-facing, stable                                   │   │
│  │  • Only updated via explicit "Publish"                   │   │
│  │  • Production credentials for services                   │   │
│  │  • Enabled after first publish                           │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Implementation Required

1. DNS Changes

// On project creation, create both records (prod may be placeholder)
CreateDNSRecord("staging.cool-project.threesix.ai", stagingIP)
CreateDNSRecord("cool-project.threesix.ai", prodIP)  // Or placeholder until first publish

2. K8s Deployment Model

# Option A: Two deployments in same namespace
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-project-staging
  namespace: cool-project
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-project-production
  namespace: cool-project

# Option B: Two namespaces (cleaner isolation)
# cool-project-staging namespace
# cool-project-production namespace

Recommendation: Same namespace, two deployments. Simpler to manage, secrets can be shared or scoped.

3. Database Model

Two options:

A. Same database, schema prefixes:

-- Staging tables
staging_users, staging_posts, staging_...

-- Production tables
prod_users, prod_posts, prod_...

B. Separate databases (cleaner):

cool-project-staging (CockroachDB database)
cool-project-production (CockroachDB database)

Recommendation: Separate databases. Cleaner isolation, no risk of cross-env data access.

4. Project Schema Updates

ALTER TABLE projects ADD COLUMN environments JSONB NOT NULL DEFAULT '{
  "staging": {"enabled": true, "deployed_at": null},
  "production": {"enabled": false, "deployed_at": null, "published_at": null}
}';

5. Publish API

POST /projects/{projectId}/publish
{
  "fromEnvironment": "staging",  // Usually staging
  "toEnvironment": "production"
}

Response:
{
  "operationId": "op_xyz789",
  "status": "publishing",
  "streamUrl": "/operations/{operationId}/stream"
}

Publish Flow:

  1. Validate staging is healthy
  2. Provision production credentials for any services (if not exist)
  3. Run migrations on production database
  4. Deploy staging image to production deployment
  5. Health check production
  6. Update DNS if needed
  7. Update project.environments.production

Complexity: Medium

  • DNS: Already have CloudflareAdapter, just create two records
  • K8s: Straightforward deployment duplication
  • Database: CockroachDB adapter supports multiple databases
  • Main complexity is the publish flow coordination

Defer Until After Gap 7

Dual environments can work with platform services, but we can build Gap 7 (services) first:

  • Services provision for a single environment initially
  • Then extend to environment-aware provisioning
  • Then add the publish flow that syncs services to production

Summary: Work Required

Gap Effort Dependencies Critical Path
0. Design References 2-3 days Gap 1 (storage) Yes (for design flows)
1. Blueprint Storage 2-3 days None Yes
2. Architect Agent 3-5 days Gap 1 Yes
3. Operation Tracking 4-6 days None Yes
4. Progress Streaming 2-3 days Gap 3 Yes
5. Blueprint → SDLC 1-2 days Gap 1 Yes
6. Frontend 5-7 days Gaps 1-5 Yes
7. Platform Services 5-8 days None (can start now) Parallel track
8. Dual Environments 3-5 days Gap 7 After services work

Total Estimate: 4-5 weeks of focused work (Gaps 7-8 can parallel with 1-6)

Service Rollout (within Gap 7):

  1. Logging (Loki) - 2 days
  2. Email (Resend) - 2 days
  3. Stats (PostHog) - 2 days
  4. Auth (Clerk) - 3 days

Note: Gap 0 (Design References) can be implemented in parallel with Gap 2 (Architect Agent) since both involve Architect prompt engineering. The reference capture infrastructure (Gap 0) builds on Gap 1's storage layer.

Critical Path

                    ┌──► Gap 0 (References) ──┐
                    │                         │
Gap 1 (Blueprint) ──┼──► Gap 2 (Architect) ───┼──► Gap 5 (Conversion)
                    │                         │
                    │                         └──► Gap 6 (Frontend)
                    │                              ▲
Gap 3 (Operations) ─┴──► Gap 4 (Streaming) ────────┘


Parallel Track:

Gap 7 (Services) ──► Logging ──► Email ──► Stats ──► Auth
        │
        └──► Gap 8 (Environments) ──► Publish Flow

Gap 7 can start immediately and run parallel to the Studio work. Gap 8 depends on Gap 7 for service credential handling per environment.


Risk Assessment

Risk Likelihood Impact Mitigation
Architect outputs malformed JSON High Medium JSON schema validation, retry logic
SSE connections drop Medium Low Client-side reconnection, event replay from DB
Blueprint schema too restrictive Medium Medium Start minimal, add sections iteratively
LLM latency affects chat UX Low High Stream partial responses, show typing indicator
Build failures leave broken state Low Medium SDLC already handles partial state

What's NOT a Gap

These are already solved by the current rdev foundation:

  • Project provisioning - K8s, DNS, git all work
  • Template seeding - Composable monorepo templates
  • SDLC execution - Classifier + worker + artifact tracking
  • CI/CD - Woodpecker integration
  • Database provisioning - CockroachDB adapter
  • Webhooks - Event dispatcher with retry

The foundation is solid. The gaps are about exposing existing capabilities through a conversational UI, not rebuilding core functionality.