rdev/docs/media-handling-spec.md
jordan adcea2fc1f
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix(templates): upgrade Go to 1.25 and fix Woodpecker syntax
## Template Version Alignment
- Go: 1.23 → 1.25 across all templates (go.work, go.mod, Dockerfiles, CI)
- Alpine: latest → 3.19 (explicit version pinning)
- Woodpecker: failure:retry → failure:ignore (invalid syntax fix)

## SDLC Tree Fixes (slackpath-5-full-lifecycle)
Fixed merge failures by correcting lifecycle flow:

1. **Branch Creation**: Added missing create-branch step (planned → ready)
   - Bug: Merge command requires feature.Branch field to be set
   - Fix: POST /projects/{id}/sdlc/features/{slug}/branch

2. **Artifact Status**: Changed approval to pass for execution artifacts
   - Bug: Review/audit/QA need status="passed" not "approved"
   - Fix: /artifacts/{type}/approve → /artifacts/{type}/pass
   - Added: pass-qa step after wait-qa

3. **Phase Transition Order**: Reordered merge phase transition
   - Bug: Merge command checks if phase == "merge" first
   - Fix: transition-to-merge BEFORE merge-feature (not after)

## GCS Provisioner Fix
- Replaced deprecated option.WithCredentialsFile with env var approach
- Now uses GOOGLE_APPLICATION_CREDENTIALS for ADC (Application Default Credentials)
- Avoids security risk from deprecated credential options
- Fixed test: Added ComponentTypeGCS to ValidComponentTypes test

## Critical Rules Added
- Version alignment: All template versions must stay in sync
- When updating versions, grep entire templates/ tree

## Files Changed
- 27 template files: Go version + Woodpecker syntax
- 1 tree file: SDLC lifecycle flow corrections
- 1 CLAUDE.md: Version alignment rule
- 1 GCS provisioner: Deprecated API fix
- 1 test file: Added missing component type

Root cause: Skeleton templates lagged behind Go 1.25 release and had
invalid Woodpecker syntax. SDLC tree skipped required branch creation
and used wrong artifact approval endpoints.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 23:57:38 -07:00

403 lines
15 KiB
Markdown

# Media Handling Specification
**Version:** 1.0
**Status:** Implementation
**Owner:** Platform Team
**Last Updated:** 2026-02-08
## Overview
This specification defines comprehensive media handling for rdev, enabling generated projects to store and serve user-uploaded files (images, videos, documents) via Google Cloud Storage. The implementation follows rdev's established patterns for infrastructure provisioning, skeleton packages, and component templates.
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ rdev Platform Layer │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ GCS Provisioner (internal/adapter/gcs) │ │
│ │ - Creates per-project buckets │ │
│ │ - Generates service account credentials │ │
│ │ - Stores credentials in rdev credential store │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↓ credentials
┌─────────────────────────────────────────────────────────────┐
│ Generated Project Layer │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ pkg/storage (skeleton package) │ │
│ │ - Storage interface abstraction │ │
│ │ - GCS implementation │ │
│ │ - Memory implementation (testing) │ │
│ │ - Signed URL generation │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ media-upload Component (optional) │ │
│ │ - HTTP upload/download/delete endpoints │ │
│ │ - File validation │ │
│ │ - Path management │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Google Cloud Storage (External) │
│ - Per-project buckets (project-{name}-media) │
│ - Per-project service accounts │
│ - IAM bindings (objectAdmin) │
│ - Lifecycle rules (temp/ auto-cleanup) │
│ - CORS configuration │
└─────────────────────────────────────────────────────────────┘
```
## Storage Patterns
### Path Conventions
Projects should organize objects using consistent path patterns:
```
uploads/{user_id}/{timestamp}-{filename} # User-uploaded files
avatars/{user_id}.jpg # User avatars
temp/{session_id}/{filename} # Temporary files (auto-delete after 24h)
public/{category}/{filename} # Public assets (logos, etc.)
private/{user_id}/{document_id}/{filename} # Private documents (signed URLs only)
```
### Content Types
Supported MIME types:
- **Images:** `image/jpeg`, `image/png`, `image/gif`, `image/webp`, `image/svg+xml`
- **Videos:** `video/mp4`, `video/webm`, `video/quicktime`
- **Documents:** `application/pdf`, `application/msword`, `application/vnd.openxmlformats-officedocument.*`
- **Archives:** `application/zip`, `application/x-tar`, `application/gzip`
### Size Limits
- **Default max upload:** 100MB per file
- **Component template:** Configurable via `MAX_UPLOAD_SIZE` env var
- **GCS bucket:** No hard limit (quota-based)
### TTL and Expiry
Lifecycle rules automatically delete objects:
- `temp/*` paths: 24 hours
- User can configure custom rules in GCS console
## Security Model
### Authentication Flow
1. **Provisioning Time** (rdev API):
- Create GCS bucket: `project-{name}-media`
- Create service account: `project-{name}-storage@{gcp-project}.iam.gserviceaccount.com`
- Grant IAM role: `roles/storage.objectAdmin` on bucket
- Generate service account JSON key
- Store credentials in rdev credential store (encrypted)
2. **Runtime** (generated project):
- Read credentials from env vars: `GCS_BUCKET`, `GCS_SERVICE_ACCOUNT_JSON`
- Initialize `pkg/storage` client with service account JSON
- Client uses ADC (Application Default Credentials) with service account
### IAM Roles
Per-project service accounts have **isolated permissions**:
- **Bucket-scoped:** `roles/storage.objectAdmin` (CRUD on objects)
- **No cross-project access:** Service account A cannot access bucket B
- **No IAM permissions:** Cannot modify IAM policies or create resources
### Signed URLs
For temporary access without service account credentials:
```go
// Generate read URL (1 hour expiry)
signedURL, _ := storageClient.SignURL(ctx, "uploads/photo.jpg", time.Hour, false)
// Generate write URL (15 min expiry) for client-side uploads
uploadURL, _ := storageClient.SignURL(ctx, "uploads/photo.jpg", 15*time.Minute, true)
```
**Use cases:**
- Direct browser downloads (avoid proxying through API)
- Client-side uploads (POST directly to GCS, not API)
- Sharing files with external users (time-limited links)
### CORS Configuration
Buckets are created with CORS rules:
```yaml
MaxAge: 3600
Methods: [GET, POST, PUT, DELETE, OPTIONS]
Origins: ["https://*.threesix.ai"]
ResponseHeaders: ["Content-Type", "ETag"]
```
Projects should override origins for custom domains.
## API Standards
### Upload Endpoint
**Request:**
```http
POST /api/media-upload/upload
Content-Type: multipart/form-data
--boundary
Content-Disposition: form-data; name="file"; filename="photo.jpg"
Content-Type: image/jpeg
<binary data>
--boundary--
```
**Response (201 Created):**
```json
{
"data": {
"url": "https://storage.googleapis.com/project-myapp-media/uploads/1706889600-photo.jpg",
"path": "uploads/1706889600-photo.jpg",
"filename": "photo.jpg",
"size": 245678
}
}
```
**Error Responses:**
- `400 Bad Request`: File too large, invalid form, missing file
- `500 Internal Server Error`: Upload failed (GCS error)
### Download Endpoint
**Request:**
```http
GET /api/media-upload/download/uploads/1706889600-photo.jpg
```
**Response (307 Temporary Redirect):**
```http
HTTP/1.1 307 Temporary Redirect
Location: https://storage.googleapis.com/project-myapp-media/uploads/1706889600-photo.jpg?X-Goog-Algorithm=...
```
**Error Responses:**
- `404 Not Found`: File does not exist
### Delete Endpoint
**Request:**
```http
DELETE /api/media-upload/delete/uploads/1706889600-photo.jpg
```
**Response (204 No Content):**
```http
HTTP/1.1 204 No Content
```
**Error Responses:**
- `404 Not Found`: File does not exist
- `500 Internal Server Error`: Delete failed
### Rate Limiting
Component template should include rate limiting:
- **Upload:** 10 requests/minute per IP
- **Download:** 100 requests/minute per IP
- **Delete:** 10 requests/minute per IP
Use `github.com/go-chi/httprate` middleware.
## Testing Strategy
### Unit Tests
**Platform (GCS Provisioner):**
- `TestSanitizeForGCP`: Validate bucket name sanitization
- `TestBucketNameFor`: Validate bucket naming convention
- `TestServiceAccountEmailFor`: Validate service account email format
**Skeleton (pkg/storage):**
- `TestMemoryStorage`: Verify in-memory implementation
- `TestUploadOptions`: Validate option handling
- `TestErrorHandling`: Verify error types (ErrNotFound, etc.)
### Integration Tests
**GCS Provisioner:**
```go
// Requires GCS_TEST_PROJECT_ID and GCS_TEST_CREDENTIALS_PATH env vars
func TestGCSProvisionerIntegration(t *testing.T) {
// Create bucket
creds, err := provisioner.CreateProjectBucket(ctx, "test-project-123")
// Verify bucket exists in GCS
// Verify service account exists
// Verify IAM bindings
// Cleanup
provisioner.DeleteProjectBucket(ctx, "test-project-123", true)
}
```
**pkg/storage:**
```go
// Requires test GCS bucket
func TestGCSStorageIntegration(t *testing.T) {
// Upload file
// Verify file exists
// Download file
// Delete file
}
```
### E2E Tests (Cookbook Tree)
See `cookbooks/trees/media-upload-flow.yaml`:
1. Create project
2. Provision GCS component
3. Add media-upload service component
4. Wait for CI/CD pipeline
5. Test upload endpoint
6. Verify file in GCS bucket
7. Test download endpoint
8. Cleanup (delete project, bucket)
### Mocks
For projects using `pkg/storage`, provide mock implementation:
```go
// pkg/storage/mock.go (generated projects can create this)
type MockStorage struct {
UploadFunc func(ctx context.Context, path string, r io.Reader, opts UploadOptions) (string, error)
DownloadFunc func(ctx context.Context, path string) (io.ReadCloser, *ObjectAttrs, error)
// ... other methods
}
```
## Operational Concerns
### Bucket Lifecycle
**Creation:**
- Triggered by `POST /projects/{id}/components` with `type=gcs`
- Returns immediately after bucket + credentials created
- Credentials stored in rdev credential store
**Deletion:**
- Triggered by `DELETE /project/{id}`
- Deletes all objects first (if `force=true`)
- Deletes bucket
- Deletes service account and keys
**Orphan Prevention:**
- Project deletion hook cleans up all infra (postgres, redis, gcs)
- If cleanup fails, logs warning but continues (manual cleanup required)
### Cost Management
**Estimates (per project):**
- **Storage:** $0.020/GB/month (Standard class, US region)
- **Operations:** $0.005/10k reads, $0.05/10k writes
- **Network:** $0.12/GB egress (to internet)
**Typical project (1k users, 10GB media):**
- Storage: $0.20/month
- Operations: $0.10/month (10k reads, 1k writes)
- **Total:** ~$0.30/month
**Cost optimization:**
- Use lifecycle rules to auto-delete temp files
- Serve images via CDN (reduce GCS egress)
- Use signed URLs (avoid API proxy overhead)
### Monitoring
**Metrics to track (Prometheus):**
- `rdev_gcs_buckets_total`: Total buckets created
- `rdev_gcs_provision_duration_seconds`: Bucket creation latency
- `rdev_gcs_provision_errors_total`: Provisioning failures
- `storage_upload_duration_seconds`: Upload latency (in generated projects)
- `storage_upload_errors_total`: Upload failures
- `storage_upload_bytes_total`: Total bytes uploaded
**Logs to monitor:**
- Provisioning errors (insufficient permissions, quota exceeded)
- Upload errors (file too large, invalid content type)
- Download 404s (broken links, deleted files)
### Quotas
**GCS limits:**
- **Bucket creation:** 100/day per GCP project (sufficient for small deployments)
- **Service accounts:** 100 per GCP project (shared quota with other services)
- **IAM policies:** 1500 bindings per bucket (one per service account)
**Scaling beyond limits:**
- Use multiple GCP projects (shard by project ID hash)
- Use single bucket with path prefixes (less isolation, not recommended)
### Backup and Recovery
**Bucket versioning:**
- Enable versioning for critical projects: `gsutil versioning set on gs://bucket`
- Allows recovery from accidental deletions
**Cross-region replication:**
- For high-availability projects, enable dual-region buckets
- Example: `Location: "US"``Location: "NAM4"` (multi-region)
## Implementation Phases
### Phase 1: Platform - GCS Provisioner ✅
- Define `port.StorageProvisioner` interface
- Implement `adapter/gcs.Provisioner`
- Wire into `ComponentService`
- Add GCS config to `cmd/rdev-api`
### Phase 2: Skeleton - pkg/storage ✅
- Define `storage.Storage` interface
- Implement `GCSStorage`
- Implement `MemoryStorage` (testing)
- Add to skeleton templates
### Phase 3: Component - media-upload ✅
- Create component template with upload/download handlers
- Add Woodpecker build/deploy steps
- Add to component registry
### Phase 4: Testing - Cookbook Tree ✅
- Write E2E cookbook tree
- Run in CI pipeline
- Document in guides
## Security Checklist
- [ ] Service accounts have minimal IAM roles (objectAdmin only)
- [ ] Credentials stored encrypted in rdev credential store
- [ ] Bucket names do not expose sensitive project details
- [ ] CORS origins restricted to *.threesix.ai (or custom domains)
- [ ] Signed URLs have reasonable expiry times (≤1 hour for reads, ≤15 min for writes)
- [ ] File size limits enforced (prevent DoS via large uploads)
- [ ] Content-Type validation (prevent malicious file uploads)
- [ ] Public read ACLs only set when explicitly requested (`Public: true`)
## Future Enhancements
1. **Multi-Backend Support:** Add S3, MinIO, R2 adapters
2. **Image Processing:** Automatic thumbnail generation, format conversion
3. **CDN Integration:** Cloudflare R2 + cache purging
4. **Quota Management:** Per-project storage limits, alerting
5. **Virus Scanning:** ClamAV integration for uploaded files
6. **Resumable Uploads:** For large files (>100MB)
7. **Streaming:** Direct browser-to-GCS uploads (bypass API)
## References
- **GCS Client Docs:** https://cloud.google.com/go/docs/reference/cloud.google.com/go/storage/latest
- **IAM Best Practices:** https://cloud.google.com/iam/docs/best-practices
- **Signed URLs:** https://cloud.google.com/storage/docs/access-control/signed-urls
- **rdev Postgres Provisioner:** `internal/adapter/postgres/provisioner.go`
- **rdev Redis Provisioner:** `internal/adapter/redis/provisioner.go`