rdev/app-vision-gaps.md

# Orchard Studio: Gap Analysis

This document maps the delta between current `rdev` capabilities and what Orchard Studio requires.

## Current Foundation (What We Have)

| Capability | Status | Location |
|------------|--------|----------|
| SDLC Classifier | ✅ Complete | `internal/sdlc/classifier.go` |
| Feature State Machine | ✅ Complete | `internal/sdlc/` (10 phases, 31 rules) |
| Composable Templates | ✅ Complete | `internal/adapter/templates/` |
| Worker Pod Execution | ✅ Complete | `internal/worker/sdlc_executor.go` |
| Webhook Dispatcher | ✅ Complete | `internal/webhook/dispatcher.go` |
| Project Provisioning | ✅ Complete | K8s namespace, DNS, git repo |
| Database Provisioning | ✅ Complete | CockroachDB adapter |
| Tree Workflows | ✅ Proven | `cookbooks/trees/*.yaml` |

---

## Gap 0: Design Reference Capture & Processing

**Current:** No mechanism for users to provide visual inspiration. Features are described purely in text.

**Required:** Users can provide URLs or screenshots as design references, which inform the Architect's questions and the Blueprint's design system section.

### What's Missing

```
┌─────────────────────────────────────────────────────────────────────────┐
│  CURRENT FLOW                                                            │
│                                                                          │
│  User: "Build a pricing page"                                            │
│  Architect: *asks about data model, endpoints...*                        │
│  (No visual context, design decisions are guesswork)                     │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│  REQUIRED FLOW                                                           │
│                                                                          │
│  User: "Build a pricing page like this" + [URL or screenshot]            │
│  System: Captures screenshot, stores with Blueprint                      │
│  Architect: "I see a dark theme with 3 tiers..." → asks clarifying Qs   │
│  Blueprint: Populates designSystem section with extracted tokens         │
└─────────────────────────────────────────────────────────────────────────┘
```

### Two Input Types

| Input | Capture Method | Storage |
|-------|----------------|---------|
| **URL** | Playwright screenshots the page automatically | `/references/{blueprintId}/{refId}.png` |
| **Screenshot** | User uploads image (drag/drop, paste, file picker) | Same storage path |

### Implementation Required

1. **Reference Capture Service:**
   - For URLs: Reuse `verify_executor.go` pattern (Playwright pod)
   - For uploads: Standard file upload handling
   - Store thumbnails alongside Blueprint

2. **Chat Endpoint Enhancement:**
   - Accept `references[]` array in request body
   - Process references before LLM call
   - Include reference images in Architect prompt context

3. **Architect Prompt Updates:**
   - Describe what it observes in natural language
   - Ask clarifying questions about design intent
   - Extract structured design tokens into Blueprint

4. **Blueprint Schema:**
   - Add `references.items[]` array
   - Add `sections.designSystem` section
   - Track which references informed which design decisions

5. **Plan Pane Rendering:**
   - Show reference thumbnails in UI
   - Display extracted design tokens
   - Allow user to add annotations

### Complexity: Medium

- URL capture reuses existing Playwright infrastructure
- File upload is standard pattern
- Main work is Architect prompt engineering for visual understanding
- LLM vision capabilities needed (Claude can see images natively)

---

## Gap 1: Blueprint Storage & Chat API

**Current:** Features are created via `POST /sdlc/features` with a complete spec. No iterative refinement.

**Required:** Multi-turn conversation that builds a Blueprint incrementally.

### What's Missing

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT FLOW                                                   │
│                                                                 │
│  User writes spec → POST /sdlc/features → Feature created       │
│  (one shot, no iteration)                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED FLOW                                                  │
│                                                                 │
│  User message → Architect responds + updates Blueprint →        │
│  User message → Architect responds + updates Blueprint →        │
│  ...repeat until ready...                                       │
│  User: "build it" → Blueprint → SDLC Feature → Build            │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

1. **Database Tables:**
   - `blueprints` - stores structured Blueprint JSON
   - `blueprint_messages` - conversation history with snapshots

2. **API Endpoints:**
   - `POST /projects/{id}/blueprint/chat` - send message, get reply + updated blueprint
   - `GET /projects/{id}/blueprints` - list blueprints
   - `GET /projects/{id}/blueprints/{id}` - get specific blueprint
   - `DELETE /projects/{id}/blueprints/{id}` - discard draft

3. **Service Layer:**
   - `ArchitectService` - manages conversation, calls LLM, updates Blueprint

### Complexity: Medium
- Schema is defined (see app-vision.md)
- Standard CRUD + LLM integration
- Most work is in prompt engineering for Architect

---

## Gap 2: Architect Agent Persona

**Current:** We have coding agents (`/implement-feature`). They write code, not specs.

**Required:** An agent that asks questions, fills in a structured Blueprint, knows when to stop.

### What's Missing

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT AGENTS                                                 │
│                                                                 │
│  User: "Add cat photos"                                         │
│  Agent: *immediately writes code*                               │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  ARCHITECT AGENT                                                │
│                                                                 │
│  User: "Add cat photos"                                         │
│  Architect: "Should photos be public or friends-only?"          │
│  User: "Public"                                                 │
│  Architect: "Got it. Do you want likes, comments, or neither?"  │
│  ...continues until Blueprint is complete...                    │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

1. **System Prompt:**
   - `.claude/agents/architect.md` - detailed persona
   - Structured output format (reply + Blueprint JSON)
   - Question strategy (when to ask vs assume)

2. **Structured Output Parsing:**
   - LLM returns `{reply: string, blueprint: Blueprint}`
   - Validate Blueprint against schema
   - Handle partial updates (delta vs full replacement)

3. **Completeness Logic:**
   - `isReadyToBuild(blueprint)` function
   - Clear rules for when questions are resolved
   - Override mechanism for user to force build

### Complexity: Medium-High
- Prompt engineering is iterative
- Structured output from LLMs can be fragile
- Need fallback handling for malformed responses

---

## Gap 3: Operation Tracking (Tree Runner in DB)

**Current:** Tree workflows run via shell script (`tree-runner.sh`). State in local JSON files.

**Required:** Operations tracked in database, queryable via API, streamable to UI.

### What's Missing

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  ./tree-runner.sh slackpath-1.yaml                              │
│  → Runs in terminal                                             │
│  → State in .checkpoints/slackpath-1.json                       │
│  → No API visibility                                            │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  POST /operations/start {tree: "slackpath-1"}                   │
│  → Returns operation_id                                         │
│  → State in operations table                                    │
│  → GET /operations/{id}/stream returns SSE events               │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

1. **Database Tables:**
   - `operations` - tracks running/completed operations
   - `operation_events` - event log for replay/streaming

2. **Service Layer:**
   - `OrchestratorService` - manages operation lifecycle
   - Port tree-runner logic from bash to Go
   - Event emission during execution

3. **API Endpoints:**
   - `POST /projects/{id}/operations` - start operation
   - `GET /projects/{id}/operations/{id}` - get status
   - `GET /projects/{id}/operations/{id}/stream` - SSE stream

4. **Worker Integration:**
   - SDLC executor emits events as it progresses
   - Events written to `operation_events` table
   - SSE handler reads from table and streams

### Complexity: High
- Tree runner logic is non-trivial (dependencies, outputs, error handling)
- SSE streaming requires careful connection management
- Need to handle operation cancellation, resumption

---

## Gap 4: Real-Time Progress Streaming

**Current:** Webhooks fire on build complete. No per-step visibility.

**Required:** SSE stream showing "Designing schema... Writing handlers... Running tests..."

### What's Missing

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Build starts → ... silence ... → Webhook: "build complete"    │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  Build starts →                                                 │
│    event: {"phase": "spec", "status": "complete"}               │
│    event: {"phase": "design", "status": "in_progress"}          │
│    event: {"phase": "design", "status": "complete"}             │
│    event: {"phase": "implement", "progress": 0.5}               │
│    ...                                                          │
│    event: {"status": "complete", "url": "..."}                  │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

1. **SDLC Executor Changes:**
   - Emit events at phase transitions
   - Emit progress within phases (task completion)
   - Write events to `operation_events` table

2. **SSE Handler:**
   - `GET /operations/{id}/stream`
   - Long-lived connection
   - Read events from DB (or Redis pub/sub)
   - Handle client disconnection gracefully

3. **Event Types:**
   ```go
   type OperationEvent struct {
       Type      string    // "phase", "progress", "artifact", "error", "complete"
       Phase     string    // "spec", "design", "implement", "test", "deploy"
       Status    string    // "in_progress", "complete", "failed"
       Message   string    // Human-readable
       Progress  float64   // 0.0 to 1.0 for granular progress
       Timestamp time.Time
   }
   ```

### Complexity: Medium
- SSE is straightforward in Go
- Main work is instrumenting SDLC executor
- Need to balance granularity vs noise

---

## Gap 5: Blueprint → SDLC Feature Conversion

**Current:** SDLC features are created manually with spec documents.

**Required:** Automated conversion from structured Blueprint to SDLC feature spec.

### What's Missing

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Human writes: spec.md with prose description                   │
│  → POST /sdlc/features                                          │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  Blueprint JSON → Template rendering → spec.md                  │
│  → Automated POST /sdlc/features                                │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

1. **Spec Template:**
   ```markdown
   # Feature: {{.Feature}}

   ## Summary
   {{.Summary}}

   ## Data Model
   {{range .Sections.DataModel.Entities}}
   ### {{.Name}}
   | Field | Type |
   |-------|------|
   {{range .Fields}}| {{.Name}} | {{.Type}} |
   {{end}}
   {{end}}

   ## API Endpoints
   {{range .Sections.APIEndpoints.Endpoints}}
   - `{{.Method}} {{.Path}}` - {{.Description}}
   {{end}}

   ## UI Components
   {{range .Sections.UIComponents.Components}}
   - **{{.Name}}**: {{.Purpose}}
   {{end}}

   ## Assumptions
   {{range .Assumptions}}
   - {{.Assumption}}
   {{end}}
   ```

2. **Conversion Service:**
   - Takes Blueprint, renders spec.md
   - Creates SDLC feature via existing API
   - Links Blueprint to created feature (`built_feature_slug`)

### Complexity: Low
- Template rendering is straightforward
- SDLC feature creation already exists
- Main work is template design

---

## Gap 6: Frontend (Next.js Studio)

**Current:** No frontend. All interaction via API/CLI.

**Required:** Three-pane interface (Chat, Plan, Preview).

### What's Missing

Everything. This is a new application.

### Implementation Required

1. **Project Setup:**
   - Next.js 14 with App Router
   - Tailwind CSS for styling
   - Authentication (integrate with rdev auth)

2. **Core Components:**
   ```
   apps/studio/
   ├── app/
   │   ├── page.tsx              # Template selection
   │   ├── projects/
   │   │   └── [id]/
   │   │       └── page.tsx      # Three-pane workspace
   │   └── api/                  # Proxy to rdev-api
   ├── components/
   │   ├── ChatPane.tsx
   │   ├── PlanPane.tsx
   │   ├── PreviewPane.tsx
   │   ├── ActivityFeed.tsx
   │   └── BuildProgress.tsx
   └── lib/
       ├── api.ts               # rdev-api client
       └── sse.ts               # SSE connection manager
   ```

3. **State Management:**
   - Blueprint state (updated on each chat response)
   - Operation state (updated via SSE)
   - UI state (which pane is focused, etc.)

4. **Key Interactions:**
   - Send chat message → receive reply + blueprint
   - Click "Build It" → start operation → show progress
   - Operation complete → refresh preview iframe

### Complexity: Medium
- Standard Next.js app
- SSE client requires careful handling
- Most complexity is in polish and UX

---

## Gap 7: Platform Service Infrastructure

**Current:** Projects manage their own integrations. No shared services, no credential management.

**Required:** A service catalog with provisioning, credential injection, and upgrade paths for existing projects.

### The "Upgrade" Problem

```
┌─────────────────────────────────────────────────────────────────┐
│  CURRENT                                                        │
│                                                                 │
│  Project created 3 months ago                                   │
│  → No centralized logging                                       │
│  → No analytics                                                 │
│  → Rolling your own email                                       │
│  → No easy way to add platform services                         │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  REQUIRED                                                       │
│                                                                 │
│  POST /projects/{id}/services                                   │
│  { "type": "logging", "provider": "loki" }                      │
│                                                                 │
│  → Provision credentials                                        │
│  → Inject into K8s secrets                                      │
│  → Create integration PR with config changes                    │
│  → Project now ships logs to centralized system                 │
└─────────────────────────────────────────────────────────────────┘
```

### Service Rollout Order

Build infrastructure with simplest service first, then add complexity:

| Order | Service | Why This Order |
|-------|---------|----------------|
| 1 | **Logging** | Pure infrastructure, no user-facing code changes |
| 2 | **Email** | Simple API calls, clear success/failure |
| 3 | **Stats** | Frontend SDK + backend events |
| 4 | **Auth** | Most complex (middleware, user model, protected routes) |

### Implementation Required

#### 1. Service Catalog

```yaml
# internal/platform/catalog.yaml
services:
  logging:
    description: "Centralized log aggregation"
    providers:
      loki:
        name: "Grafana Loki"
        credentials:
          - LOKI_URL
          - LOKI_TENANT_ID
        integration:
          go:
            config_template: "loki-logger.go.tmpl"
            env_example: ["LOKI_URL", "LOKI_TENANT_ID"]
          node:
            packages: ["pino", "pino-loki"]
            config_template: "pino-loki.ts.tmpl"

  email:
    description: "Transactional email"
    providers:
      resend:
        name: "Resend"
        credentials:
          - RESEND_API_KEY
        integration:
          go:
            packages: ["github.com/resendlabs/resend-go"]
            service_template: "email-service.go.tmpl"
          node:
            packages: ["resend"]
            service_template: "email-client.ts.tmpl"

  stats:
    description: "Product analytics"
    providers:
      posthog:
        name: "PostHog"
        credentials:
          - POSTHOG_API_KEY
          - POSTHOG_HOST
        integration:
          go:
            packages: ["github.com/posthog/posthog-go"]
          node:
            packages: ["posthog-js", "posthog-node"]
            provider_template: "analytics-provider.tsx.tmpl"

  auth:
    description: "User authentication"
    providers:
      clerk:
        name: "Clerk"
        credentials:
          - CLERK_SECRET_KEY
          - NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
        integration:
          node:
            packages: ["@clerk/nextjs"]
            middleware_template: "clerk-middleware.ts.tmpl"
            provider_template: "clerk-provider.tsx.tmpl"
```

#### 2. Database Schema

```sql
-- Track which services a project uses
CREATE TABLE project_services (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id),
    service_type TEXT NOT NULL,      -- 'logging', 'email', 'stats', 'auth'
    provider TEXT NOT NULL,           -- 'loki', 'resend', 'posthog', 'clerk'
    environment TEXT NOT NULL,        -- 'staging', 'production', 'all'

    -- Encrypted credentials
    credentials_encrypted BYTEA,

    -- Non-sensitive config
    config JSONB NOT NULL DEFAULT '{}',

    -- Status tracking
    status TEXT NOT NULL DEFAULT 'provisioning',
    -- provisioning → active → needs_update → deprovisioned

    -- Integration tracking
    integration_status TEXT DEFAULT 'pending',
    -- pending → pr_created → integrated → needs_update
    integration_pr_url TEXT,
    integration_commit TEXT,

    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(project_id, service_type, environment)
);
```

#### 3. Provisioner Interface

```go
// internal/port/platform_provisioner.go
type PlatformProvisioner interface {
    // Provision creates credentials for a project
    Provision(ctx context.Context, req ProvisionRequest) (*ProvisionResult, error)

    // Verify checks if credentials are still valid
    Verify(ctx context.Context, projectID string, creds map[string]string) error

    // Deprovision cleans up (optional, for account removal)
    Deprovision(ctx context.Context, projectID string) error
}

type ProvisionRequest struct {
    ProjectID   uuid.UUID
    ProjectName string
    Environment string  // "staging", "production"
}

type ProvisionResult struct {
    Credentials map[string]string  // Encrypted before storage
    Config      map[string]string  // Non-sensitive config
}
```

#### 4. Service Addition API

```
POST /projects/{projectId}/services
{
  "serviceType": "logging",
  "provider": "loki"       // Optional, uses platform default
}

Response:
{
  "serviceId": "svc_abc123",
  "status": "provisioning",
  "integrationMethod": "pr",  // or "direct"
  "prUrl": null  // Populated when PR is created
}

GET /projects/{projectId}/services/{serviceId}
{
  "serviceId": "svc_abc123",
  "serviceType": "logging",
  "provider": "loki",
  "status": "active",
  "integrationStatus": "integrated",
  "integrationCommit": "abc123...",
  "credentials": {
    "LOKI_URL": "[redacted]",
    "LOKI_TENANT_ID": "project-xyz"
  }
}
```

#### 5. Integration Flow

```
POST /projects/{id}/services {type: "logging", provider: "loki"}
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  1. PROVISION                                                   │
│                                                                 │
│  LokiProvisioner.Provision()                                    │
│  → Create tenant in Loki (or use shared with project prefix)   │
│  → Generate credentials                                         │
│  → Store encrypted in project_services                          │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  2. INJECT                                                      │
│                                                                 │
│  K8sSecretInjector.Inject()                                     │
│  → Add LOKI_URL, LOKI_TENANT_ID to project's K8s secret        │
│  → Trigger deployment restart to pick up new env vars          │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  3. INTEGRATE                                                   │
│                                                                 │
│  IntegrationService.CreatePR() or .DirectCommit()               │
│  → Clone project repo                                           │
│  → Apply integration templates:                                 │
│    • Update logger config to ship to Loki                       │
│    • Add env vars to .env.example                               │
│    • Update deployment to mount secrets                         │
│  → Create PR (or direct commit for new projects)                │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│  4. VERIFY                                                      │
│                                                                 │
│  After PR merge / deploy:                                       │
│  → Check logs appearing in Loki                                 │
│  → Update integration_status to "integrated"                    │
└─────────────────────────────────────────────────────────────────┘
```

### Complexity: High

- Service catalog is straightforward (YAML/DB)
- Each provisioner is unique (Loki vs Resend vs PostHog)
- Credential encryption and management needs care
- Integration templates need to handle Go + Node + various frameworks
- PR creation requires git operations

### Starting Point: Logging with Loki

```go
// internal/adapter/loki/provisioner.go
type LokiProvisioner struct {
    lokiURL    string
    adminToken string  // For tenant creation if using multi-tenant Loki
}

func (p *LokiProvisioner) Provision(ctx context.Context, req ProvisionRequest) (*ProvisionResult, error) {
    // For single-tenant Loki, just create a unique label prefix
    tenantID := fmt.Sprintf("project-%s", req.ProjectID)

    return &ProvisionResult{
        Credentials: map[string]string{
            "LOKI_URL":       p.lokiURL,
            "LOKI_TENANT_ID": tenantID,
        },
        Config: map[string]string{
            "service_name": req.ProjectName,
        },
    }, nil
}
```

---

## Gap 8: Dual Environment Support

**Current:** Single deployment per project. Main branch = production.

**Required:** Staging + Production environments. Build deploys to staging, "Publish" promotes to production.

### The Environment Model

```
┌─────────────────────────────────────────────────────────────────┐
│  Project: cool-project                                          │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  STAGING                                                 │   │
│  │  staging.cool-project.threesix.ai                       │   │
│  │                                                          │   │
│  │  • Where development happens                             │   │
│  │  • Preview pane shows this                               │   │
│  │  • "Build It" deploys here                               │   │
│  │  • May use test credentials for services                 │   │
│  └─────────────────────────────────────────────────────────┘   │
│                         │                                       │
│                    [Publish]                                    │
│                         │                                       │
│                         ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  PRODUCTION                                              │   │
│  │  cool-project.threesix.ai                               │   │
│  │                                                          │   │
│  │  • User-facing, stable                                   │   │
│  │  • Only updated via explicit "Publish"                   │   │
│  │  • Production credentials for services                   │   │
│  │  • Enabled after first publish                           │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
```

### Implementation Required

#### 1. DNS Changes

```go
// On project creation, create both records (prod may be placeholder)
CreateDNSRecord("staging.cool-project.threesix.ai", stagingIP)
CreateDNSRecord("cool-project.threesix.ai", prodIP)  // Or placeholder until first publish
```

#### 2. K8s Deployment Model

```yaml
# Option A: Two deployments in same namespace
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-project-staging
  namespace: cool-project
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-project-production
  namespace: cool-project

# Option B: Two namespaces (cleaner isolation)
# cool-project-staging namespace
# cool-project-production namespace
```

**Recommendation:** Same namespace, two deployments. Simpler to manage, secrets can be shared or scoped.

#### 3. Database Model

Two options:

**A. Same database, schema prefixes:**
```sql
-- Staging tables
staging_users, staging_posts, staging_...

-- Production tables
prod_users, prod_posts, prod_...
```

**B. Separate databases (cleaner):**
```
cool-project-staging (CockroachDB database)
cool-project-production (CockroachDB database)
```

**Recommendation:** Separate databases. Cleaner isolation, no risk of cross-env data access.

#### 4. Project Schema Updates

```sql
ALTER TABLE projects ADD COLUMN environments JSONB NOT NULL DEFAULT '{
  "staging": {"enabled": true, "deployed_at": null},
  "production": {"enabled": false, "deployed_at": null, "published_at": null}
}';
```

#### 5. Publish API

```
POST /projects/{projectId}/publish
{
  "fromEnvironment": "staging",  // Usually staging
  "toEnvironment": "production"
}

Response:
{
  "operationId": "op_xyz789",
  "status": "publishing",
  "streamUrl": "/operations/{operationId}/stream"
}
```

**Publish Flow:**
1. Validate staging is healthy
2. Provision production credentials for any services (if not exist)
3. Run migrations on production database
4. Deploy staging image to production deployment
5. Health check production
6. Update DNS if needed
7. Update project.environments.production

### Complexity: Medium

- DNS: Already have CloudflareAdapter, just create two records
- K8s: Straightforward deployment duplication
- Database: CockroachDB adapter supports multiple databases
- Main complexity is the publish flow coordination

### Defer Until After Gap 7

Dual environments can work with platform services, but we can build Gap 7 (services) first:
- Services provision for a single environment initially
- Then extend to environment-aware provisioning
- Then add the publish flow that syncs services to production

---

## Summary: Work Required

| Gap | Effort | Dependencies | Critical Path |
|-----|--------|--------------|---------------|
| 0. Design References | 2-3 days | Gap 1 (storage) | Yes (for design flows) |
| 1. Blueprint Storage | 2-3 days | None | Yes |
| 2. Architect Agent | 3-5 days | Gap 1 | Yes |
| 3. Operation Tracking | 4-6 days | None | Yes |
| 4. Progress Streaming | 2-3 days | Gap 3 | Yes |
| 5. Blueprint → SDLC | 1-2 days | Gap 1 | Yes |
| 6. Frontend | 5-7 days | Gaps 1-5 | Yes |
| 7. Platform Services | 5-8 days | None (can start now) | Parallel track |
| 8. Dual Environments | 3-5 days | Gap 7 | After services work |

**Total Estimate:** 4-5 weeks of focused work (Gaps 7-8 can parallel with 1-6)

**Service Rollout (within Gap 7):**
1. Logging (Loki) - 2 days
2. Email (Resend) - 2 days
3. Stats (PostHog) - 2 days
4. Auth (Clerk) - 3 days

**Note:** Gap 0 (Design References) can be implemented in parallel with Gap 2 (Architect Agent) since both involve Architect prompt engineering. The reference capture infrastructure (Gap 0) builds on Gap 1's storage layer.

### Critical Path

```
                    ┌──► Gap 0 (References) ──┐
                    │                         │
Gap 1 (Blueprint) ──┼──► Gap 2 (Architect) ───┼──► Gap 5 (Conversion)
                    │                         │
                    │                         └──► Gap 6 (Frontend)
                    │                              ▲
Gap 3 (Operations) ─┴──► Gap 4 (Streaming) ────────┘


Parallel Track:

Gap 7 (Services) ──► Logging ──► Email ──► Stats ──► Auth
        │
        └──► Gap 8 (Environments) ──► Publish Flow
```

Gap 7 can start immediately and run parallel to the Studio work.
Gap 8 depends on Gap 7 for service credential handling per environment.

---

## Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Architect outputs malformed JSON | High | Medium | JSON schema validation, retry logic |
| SSE connections drop | Medium | Low | Client-side reconnection, event replay from DB |
| Blueprint schema too restrictive | Medium | Medium | Start minimal, add sections iteratively |
| LLM latency affects chat UX | Low | High | Stream partial responses, show typing indicator |
| Build failures leave broken state | Low | Medium | SDLC already handles partial state |

---

## What's NOT a Gap

These are already solved by the current rdev foundation:

- **Project provisioning** - K8s, DNS, git all work
- **Template seeding** - Composable monorepo templates
- **SDLC execution** - Classifier + worker + artifact tracking
- **CI/CD** - Woodpecker integration
- **Database provisioning** - CockroachDB adapter
- **Webhooks** - Event dispatcher with retry

The foundation is solid. The gaps are about **exposing** existing capabilities through a conversational UI, not rebuilding core functionality.