rdev/docs/media-handling-spec.md
jordan 592b2d5ec0 fix: clarify database types across docs and fix video storage persistence
Two distinct fixes:

1. Database terminology: Make it crystal clear that generated projects use
   CockroachDB in production and PostgreSQL for local dev, while the rdev
   platform itself uses PostgreSQL. Updated 15 files across skeleton agents,
   component templates, cookbook trees, and platform docs.

2. Video storage: VideoHandler was ignoring vid.Data bytes (already downloaded
   by the Gemini adapter with auth) and re-downloading from the provider URL
   with a plain GET — which fails because Gemini URLs require API key auth.
   Now uses vid.Data first, falls back to downloadURL only for public URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:13:21 -07:00

15 KiB

Media Handling Specification

Version: 1.0 Status: Implementation Owner: Platform Team Last Updated: 2026-02-08

Overview

This specification defines comprehensive media handling for rdev, enabling generated projects to store and serve user-uploaded files (images, videos, documents) via Google Cloud Storage. The implementation follows rdev's established patterns for infrastructure provisioning, skeleton packages, and component templates.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      rdev Platform Layer                     │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  GCS Provisioner (internal/adapter/gcs)               │  │
│  │  - Creates per-project buckets                        │  │
│  │  - Generates service account credentials              │  │
│  │  - Stores credentials in rdev credential store        │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↓ credentials
┌─────────────────────────────────────────────────────────────┐
│                    Generated Project Layer                   │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  pkg/storage (skeleton package)                       │  │
│  │  - Storage interface abstraction                      │  │
│  │  - GCS implementation                                 │  │
│  │  - Memory implementation (testing)                    │  │
│  │  - Signed URL generation                              │  │
│  └───────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  media-upload Component (optional)                    │  │
│  │  - HTTP upload/download/delete endpoints             │  │
│  │  - File validation                                    │  │
│  │  - Path management                                    │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│              Google Cloud Storage (External)                 │
│  - Per-project buckets (project-{name}-media)               │
│  - Per-project service accounts                             │
│  - IAM bindings (objectAdmin)                               │
│  - Lifecycle rules (temp/ auto-cleanup)                     │
│  - CORS configuration                                       │
└─────────────────────────────────────────────────────────────┘

Storage Patterns

Path Conventions

Projects should organize objects using consistent path patterns:

uploads/{user_id}/{timestamp}-{filename}     # User-uploaded files
avatars/{user_id}.jpg                        # User avatars
temp/{session_id}/{filename}                 # Temporary files (auto-delete after 24h)
public/{category}/{filename}                 # Public assets (logos, etc.)
private/{user_id}/{document_id}/{filename}   # Private documents (signed URLs only)

Content Types

Supported MIME types:

  • Images: image/jpeg, image/png, image/gif, image/webp, image/svg+xml
  • Videos: video/mp4, video/webm, video/quicktime
  • Documents: application/pdf, application/msword, application/vnd.openxmlformats-officedocument.*
  • Archives: application/zip, application/x-tar, application/gzip

Size Limits

  • Default max upload: 100MB per file
  • Component template: Configurable via MAX_UPLOAD_SIZE env var
  • GCS bucket: No hard limit (quota-based)

TTL and Expiry

Lifecycle rules automatically delete objects:

  • temp/* paths: 24 hours
  • User can configure custom rules in GCS console

Security Model

Authentication Flow

  1. Provisioning Time (rdev API):

    • Create GCS bucket: project-{name}-media
    • Create service account: project-{name}-storage@{gcp-project}.iam.gserviceaccount.com
    • Grant IAM role: roles/storage.objectAdmin on bucket
    • Generate service account JSON key
    • Store credentials in rdev credential store (encrypted)
  2. Runtime (generated project):

    • Read credentials from env vars: GCS_BUCKET, GCS_SERVICE_ACCOUNT_JSON
    • Initialize pkg/storage client with service account JSON
    • Client uses ADC (Application Default Credentials) with service account

IAM Roles

Per-project service accounts have isolated permissions:

  • Bucket-scoped: roles/storage.objectAdmin (CRUD on objects)
  • No cross-project access: Service account A cannot access bucket B
  • No IAM permissions: Cannot modify IAM policies or create resources

Signed URLs

For temporary access without service account credentials:

// Generate read URL (1 hour expiry)
signedURL, _ := storageClient.SignURL(ctx, "uploads/photo.jpg", time.Hour, false)

// Generate write URL (15 min expiry) for client-side uploads
uploadURL, _ := storageClient.SignURL(ctx, "uploads/photo.jpg", 15*time.Minute, true)

Use cases:

  • Direct browser downloads (avoid proxying through API)
  • Client-side uploads (POST directly to GCS, not API)
  • Sharing files with external users (time-limited links)

CORS Configuration

Buckets are created with CORS rules:

MaxAge: 3600
Methods: [GET, POST, PUT, DELETE, OPTIONS]
Origins: ["https://*.threesix.ai"]
ResponseHeaders: ["Content-Type", "ETag"]

Projects should override origins for custom domains.

API Standards

Upload Endpoint

Request:

POST /api/media-upload/upload
Content-Type: multipart/form-data

--boundary
Content-Disposition: form-data; name="file"; filename="photo.jpg"
Content-Type: image/jpeg

<binary data>
--boundary--

Response (201 Created):

{
  "data": {
    "url": "https://storage.googleapis.com/project-myapp-media/uploads/1706889600-photo.jpg",
    "path": "uploads/1706889600-photo.jpg",
    "filename": "photo.jpg",
    "size": 245678
  }
}

Error Responses:

  • 400 Bad Request: File too large, invalid form, missing file
  • 500 Internal Server Error: Upload failed (GCS error)

Download Endpoint

Request:

GET /api/media-upload/download/uploads/1706889600-photo.jpg

Response (307 Temporary Redirect):

HTTP/1.1 307 Temporary Redirect
Location: https://storage.googleapis.com/project-myapp-media/uploads/1706889600-photo.jpg?X-Goog-Algorithm=...

Error Responses:

  • 404 Not Found: File does not exist

Delete Endpoint

Request:

DELETE /api/media-upload/delete/uploads/1706889600-photo.jpg

Response (204 No Content):

HTTP/1.1 204 No Content

Error Responses:

  • 404 Not Found: File does not exist
  • 500 Internal Server Error: Delete failed

Rate Limiting

Component template should include rate limiting:

  • Upload: 10 requests/minute per IP
  • Download: 100 requests/minute per IP
  • Delete: 10 requests/minute per IP

Use github.com/go-chi/httprate middleware.

Testing Strategy

Unit Tests

Platform (GCS Provisioner):

  • TestSanitizeForGCP: Validate bucket name sanitization
  • TestBucketNameFor: Validate bucket naming convention
  • TestServiceAccountEmailFor: Validate service account email format

Skeleton (pkg/storage):

  • TestMemoryStorage: Verify in-memory implementation
  • TestUploadOptions: Validate option handling
  • TestErrorHandling: Verify error types (ErrNotFound, etc.)

Integration Tests

GCS Provisioner:

// Requires GCS_TEST_PROJECT_ID and GCS_TEST_CREDENTIALS_PATH env vars
func TestGCSProvisionerIntegration(t *testing.T) {
    // Create bucket
    creds, err := provisioner.CreateProjectBucket(ctx, "test-project-123")
    // Verify bucket exists in GCS
    // Verify service account exists
    // Verify IAM bindings
    // Cleanup
    provisioner.DeleteProjectBucket(ctx, "test-project-123", true)
}

pkg/storage:

// Requires test GCS bucket
func TestGCSStorageIntegration(t *testing.T) {
    // Upload file
    // Verify file exists
    // Download file
    // Delete file
}

E2E Tests (Cookbook Tree)

See cookbooks/trees/media-upload-flow.yaml:

  1. Create project
  2. Provision GCS component
  3. Add media-upload service component
  4. Wait for CI/CD pipeline
  5. Test upload endpoint
  6. Verify file in GCS bucket
  7. Test download endpoint
  8. Cleanup (delete project, bucket)

Mocks

For projects using pkg/storage, provide mock implementation:

// pkg/storage/mock.go (generated projects can create this)
type MockStorage struct {
    UploadFunc   func(ctx context.Context, path string, r io.Reader, opts UploadOptions) (string, error)
    DownloadFunc func(ctx context.Context, path string) (io.ReadCloser, *ObjectAttrs, error)
    // ... other methods
}

Operational Concerns

Bucket Lifecycle

Creation:

  • Triggered by POST /projects/{id}/components with type=gcs
  • Returns immediately after bucket + credentials created
  • Credentials stored in rdev credential store

Deletion:

  • Triggered by DELETE /project/{id}
  • Deletes all objects first (if force=true)
  • Deletes bucket
  • Deletes service account and keys

Orphan Prevention:

  • Project deletion hook cleans up all infra (CockroachDB, Redis, GCS)
  • If cleanup fails, logs warning but continues (manual cleanup required)

Cost Management

Estimates (per project):

  • Storage: $0.020/GB/month (Standard class, US region)
  • Operations: $0.005/10k reads, $0.05/10k writes
  • Network: $0.12/GB egress (to internet)

Typical project (1k users, 10GB media):

  • Storage: $0.20/month
  • Operations: $0.10/month (10k reads, 1k writes)
  • Total: ~$0.30/month

Cost optimization:

  • Use lifecycle rules to auto-delete temp files
  • Serve images via CDN (reduce GCS egress)
  • Use signed URLs (avoid API proxy overhead)

Monitoring

Metrics to track (Prometheus):

  • rdev_gcs_buckets_total: Total buckets created
  • rdev_gcs_provision_duration_seconds: Bucket creation latency
  • rdev_gcs_provision_errors_total: Provisioning failures
  • storage_upload_duration_seconds: Upload latency (in generated projects)
  • storage_upload_errors_total: Upload failures
  • storage_upload_bytes_total: Total bytes uploaded

Logs to monitor:

  • Provisioning errors (insufficient permissions, quota exceeded)
  • Upload errors (file too large, invalid content type)
  • Download 404s (broken links, deleted files)

Quotas

GCS limits:

  • Bucket creation: 100/day per GCP project (sufficient for small deployments)
  • Service accounts: 100 per GCP project (shared quota with other services)
  • IAM policies: 1500 bindings per bucket (one per service account)

Scaling beyond limits:

  • Use multiple GCP projects (shard by project ID hash)
  • Use single bucket with path prefixes (less isolation, not recommended)

Backup and Recovery

Bucket versioning:

  • Enable versioning for critical projects: gsutil versioning set on gs://bucket
  • Allows recovery from accidental deletions

Cross-region replication:

  • For high-availability projects, enable dual-region buckets
  • Example: Location: "US"Location: "NAM4" (multi-region)

Implementation Phases

Phase 1: Platform - GCS Provisioner

  • Define port.StorageProvisioner interface
  • Implement adapter/gcs.Provisioner
  • Wire into ComponentService
  • Add GCS config to cmd/rdev-api

Phase 2: Skeleton - pkg/storage

  • Define storage.Storage interface
  • Implement GCSStorage
  • Implement MemoryStorage (testing)
  • Add to skeleton templates

Phase 3: Component - media-upload

  • Create component template with upload/download handlers
  • Add Woodpecker build/deploy steps
  • Add to component registry

Phase 4: Testing - Cookbook Tree

  • Write E2E cookbook tree
  • Run in CI pipeline
  • Document in guides

Security Checklist

  • Service accounts have minimal IAM roles (objectAdmin only)
  • Credentials stored encrypted in rdev credential store
  • Bucket names do not expose sensitive project details
  • CORS origins restricted to *.threesix.ai (or custom domains)
  • Signed URLs have reasonable expiry times (≤1 hour for reads, ≤15 min for writes)
  • File size limits enforced (prevent DoS via large uploads)
  • Content-Type validation (prevent malicious file uploads)
  • Public read ACLs only set when explicitly requested (Public: true)

Future Enhancements

  1. Multi-Backend Support: Add S3, MinIO, R2 adapters
  2. Image Processing: Automatic thumbnail generation, format conversion
  3. CDN Integration: Cloudflare R2 + cache purging
  4. Quota Management: Per-project storage limits, alerting
  5. Virus Scanning: ClamAV integration for uploaded files
  6. Resumable Uploads: For large files (>100MB)
  7. Streaming: Direct browser-to-GCS uploads (bypass API)

References