Commit Graph

19 Commits

Author SHA1 Message Date
jordan
e42c18a9a3 feat: add session web UI mode + aeries-daeya cookbook tree
Session WebUI:
- Add `web_ui` flag to session create — launches claude-code-ui in pod on port 3001
- Install @siteboon/claude-code-ui in claudebox Dockerfile, expose port 3001
- Migration 027: add web_ui column to sessions table
- startWebUI/stopWebUI fire-and-forget helpers in SessionsHandler
- Service selects preview port 3001 (web UI) vs 8080 (sidecar) based on flag

Aeries Daeya cookbook:
- Add cookbooks/trees/aeries-daeya.yaml: privacy-first avatar social platform
  (infra → avatar data model → AI generation pipeline → studio UI)
- Add cookbooks/scripts/aeries-daeya-test.sh: run/status/diagnose/teardown harness
- Fix race condition in common.sh wait_for_pipeline: detect already-running pipelines
  at startup and track directly instead of waiting for a newer one

Docs/tooling:
- Add SDK Update Workflow section to CLAUDE.md
- Add `make sdk` and `make sdk-check` targets for OpenAPI spec management

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 23:14:08 -07:00
jordan
3dbde72966 feat: add claude_id tracking and session improvements for interactive dev
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add claude_id field to sessions (migration 026) for tracking Claude
  process IDs across pod restarts
- Extend session repository with UpdateClaudeID and session lookup methods
- Improve kubernetes executor with better error handling and exec streaming
- Add claudebox client/server improvements for session lifecycle
- Expand sessions handler with exec streaming endpoint
- Add comprehensive tests for sessions and kubernetes executor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 00:20:32 -07:00
jordan
a8c8a0a14d feat: add GCS-based persistent media storage, AI generation pipeline, and composable skeleton packages
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Adds complete media storage pipeline with GCS presigned uploads, AI image/video/text generation
via queue-based workers, realtime SSE event streaming, and comprehensive skeleton packages
(storage, mediagen, textgen, generation, realtime, persona, routing, ai-client). Includes
security fixes for media delete authorization, nil pointer guards in handlers, video persistence
via download-then-upload, consistent signed URLs, and Image→ImageIcon rename to avoid DOM collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:29:09 -07:00
jordan
7249575dea feat(sessions): add command execution endpoint and activity tracking
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add POST /sessions/:id/exec endpoint for executing commands in sessions
- Add session activity tracking (last_activity_at timestamp)
- Add database migration 024 for session activity column
- Add comprehensive tests for session handlers and service layer
- Add wildcard TLS certificate for preview.threesix.ai subdomain
- Add infrastructure mocks for testing preview service
- Refactor preview cleanup logic to remove unused methods
- Add AIOS core documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-13 08:41:05 -07:00
jordan
9226454b85 feat: label-based undeploy, GC reconciliation, checkout/sessions, pool status
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add UndeployAll() using label selectors to clean up monorepo components
  on project deletion (replaces name-based Undeploy in DeleteProject and
  the direct undeploy handler)
- Add ResourceGC background worker that periodically finds K8s resources
  whose project label has no matching DB record, deletes after 1h safety
  window
- Widen deployer client type from *kubernetes.Clientset to
  kubernetes.Interface for testability
- UndeployAll accumulates errors via errors.Join instead of failing fast
- Add checkout/checkin sidecar dev flow: temporary git tokens, branch
  checkout, review on checkin with cleanup workers
- Add interactive sessions: pod binding, command execution, SSE streaming,
  ephemeral preview URLs with session cleanup workers
- Add GET /workers/pool endpoint for aggregate capacity and queue depth
- Add sessions:read and sessions:execute auth scopes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:11:28 -07:00
jordan
a69eb7e587 feat(foundary): implement complete backend for conversational project design
Implements all 5 phases of Foundary Studio backend:

Phase 1: Chat Persistence (8 API endpoints)
- Conversations and messages with proper cascading deletes
- PostgreSQL schema with auto-update triggers
- Full CRUD operations with structured logging

Phase 2: Blueprint Entity (5 API endpoints)
- JSONB spec storage with GIN indexes
- Flexible structured data for project specifications
- Version-controlled blueprint management

Phase 3: Architect Service (3 API endpoints)
- Conversational AI orchestration with Claude
- Multi-turn dialogue with context building
- Blueprint spec extraction from conversations

Phase 4: Work Queue Integration
- Verified existing endpoint compatibility

Phase 5: Structured Questions (6 API endpoints)
- Four question types: text, choice, multichoice, yesno
- Answer validation with proper constraints
- Conversation-linked Q&A flow

Architecture:
- Textbook hexagonal architecture (domain → port → adapter → service → handler)
- Zero external dependencies in domain layer
- Consistent error handling with proper wrapping
- Auth scopes on all routes (projects:read, projects:execute)
- Structured logging with operation context and duration tracking
- NULL-safe DTO converters throughout

Database:
- 3 new migrations (019, 020, 021)
- UUIDs for all primary keys
- Proper foreign key constraints with ON DELETE CASCADE
- Optimized indexes including partial index for unanswered questions
- Auto-update triggers for timestamps

OpenAPI Documentation:
- Complete API documentation under 'Foundary' tag
- 22 new endpoints documented with examples
- Request/response schemas for all operations

Logging Improvements:
- Added operation field to all service logs
- Added duration_ms tracking for performance monitoring
- Log response_length instead of full response content
- Consistent use of logging field constants
- Execute-then-log pattern for delete operations

Files: 32 changed, 2800+ lines added
- 7 domain models
- 3 database migrations
- 3 port interfaces
- 3 postgres adapters
- 4 services (conversation, blueprint, question, architect)
- 4 handlers with DTOs
- OpenAPI documentation
- Integration in main.go

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-09 00:50:46 -07:00
jordan
f20fc6c51c feat(saga): implement enterprise-grade resilience architecture
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
jordan
3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00
jordan
cfba724f8a feat: add work task error classification and user-facing error codes
- Add WorkErrorCode type with RATE_LIMITED, AUTH_FAILED, TIMEOUT, STALE_WORKER, AGENT_ERROR, INVALID_SPEC
- Add ClassifyAgentError function to detect error patterns from stderr
- Add error_code column to work_queue table (migration 016)
- Add FailWithCode method to WorkQueue interface and implementations
- Update RequeueStaleWithIDs to mark permanently failed tasks with STALE_WORKER
- Add ErrorCode to BuildResult for API responses
- Update work executor to classify errors before failing tasks

This enables users to see actual failure reasons (e.g., "RATE_LIMITED") instead of
builds stuck in "running" state forever when Claude hits rate limits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:07:34 -07:00
jordan
c280a92012 feat: add operations audit system and template improvements
Operations Audit (new feature):
- Add Operation domain model with status tracking (pending, running, completed, failed, cancelled)
- Add OperationRepository with PostgreSQL implementation
- Add OperationService for CRUD and lifecycle management
- Add operations handlers (list, get, cancel endpoints)
- Add migration 015_operations.sql for operations table
- Add operation cleanup worker for stale operation handling
- Add ErrOperationNotFound to domain errors

Template Improvements:
- Add CLAUDE.md configuration files to astro-landing, default, and go-api templates
- Fix PORT template variable usage in nginx configs for app templates
- Add replace directives for local pkg module in Go templates
- Simplify Go service/worker Dockerfiles for workspace builds
- Fix TypeScript error in logger template

Other:
- Refactor landing-test.sh cookbook script
- Update CLAUDE.md version reference

Note: Some files exceed 500-line limit (pre-existing debt + new feature)
- component.go: 550 lines (unchanged, pre-existing)
- main.go: 522 lines (added operations wiring)
- operation_repo.go: 569 lines (new, needs splitting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:57 -07:00
jordan
f6ced22e06 fix: Use FQDN for k8s service hostnames and remove broken commonLabels
Short-form DNS names (e.g. postgres.databases.svc) fail to resolve in
new pods due to k8s DNS search domain limitations. Switch all service
hostnames to FQDNs (*.svc.cluster.local).

Remove commonLabels from kustomization.yaml — it injected labels into
all selectors including NetworkPolicy egress rules (blocking DNS to
CoreDNS) and Deployment selectors (causing immutability errors).

Add OTEL_EXPORTER_OTLP_ENDPOINT env var to deployment YAML so the
telemetry collector endpoint uses the FQDN without requiring a binary
rebuild.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 20:46:04 -07:00
jordan
c59d348040 chore: prepare for composable monorepo template implementation
This commit captures the current state before implementing the composable
monorepo template system. Key changes included:

Infrastructure:
- Add CockroachDB provisioner adapter for database provisioning
- Add Redis provisioner adapter for cache provisioning
- Add build events system with PostgreSQL storage
- Add WebSocket endpoint for real-time build progress

Code agent improvements:
- Fix Claude Code adapter to use default allowed tools instead of dangerously-skip-permissions
- Add context-aware stream closing for cancellation support
- Improve parser tests for edge cases

Build system:
- Add build event constants and metrics
- Remove deprecated git_operations.go (replaced by pod_git_operations.go)
- Add rollback logic for multi-step provisioning operations

Documentation:
- Add composable-monorepo feature documentation
- Add DNS/Cloudflare service documentation
- Update deployment and troubleshooting guides

Cookbooks:
- Add fullstack-app cookbook
- Refactor landing-test with shared library

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 11:39:28 -07:00
jordan
1ac8efa4c7 feat: Expose Woodpecker pipeline errors in API response
- Add CIPipelineError struct to domain with Type, Message, IsWarning fields
- Map Woodpecker Pipeline.Errors to domain.CIPipeline.Errors
- Fix migration 013: UUID type for project_id, cast id to text for MD5
- Remove invalid domain data migration (columns don't exist)
- Update release.sh with --deploy flag and migration support
- Fix test nil pointer: check errors in TestAPIKeyRepository_ProjectIDArrayHandling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 16:16:36 -07:00
jordan
c86516c53a feat: Add multi-domain support with auto-generated slugs for landing page cookbook
Landing page cookbook implementation (Weeks 1-4):

Domain Infrastructure:
- Add project_domains table with migration (013_project_domains.sql)
- Add ProjectDomain model with domain types (primary_auto, primary_custom, alias)
- Add SlugGenerator and ProjectDomainRepository interfaces
- Implement postgres adapters for domain and slug management

Service Layer:
- Add domain CRUD methods to ProjectInfraService
- Generate 8-char random slugs for auto-domains
- Support custom subdomains during project creation
- Add site_live health check to project status
- Trigger CI build after template seeding

Handler Updates:
- Add DomainService interface and adapter pattern
- Rewrite domain handlers to use database-backed service
- Add proper error handling for duplicate/missing domains

CI Integration:
- Add TriggerBuild to CIProvider interface
- Implement TriggerBuild in Woodpecker adapter
- Manually trigger initial build after template seed

Cookbook & Scripts:
- Add landing-test.sh script for E2E testing
- Add release.sh for version releases
- Add logs.sh for quick log access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 12:55:59 -07:00
jordan
bc47e426b0 feat: Add CI pipeline proxy, DNS alias management, and worker executor system
- Add ListPipelines/GetPipeline to CIProvider port with Woodpecker adapter
- Add DNS alias endpoints: GET/POST/DELETE /projects/{id}/domains
- Implement worker executor daemon, build executor, and git operations
- Add build service, worker service, and build audit tracking
- Add worker registry with PostgreSQL adapter and migration
- Add multi-provider code agent interface (Claude Code + OpenCode)
- Add create-and-build combo endpoint
- Update landing-page cookbook to reflect all gaps closed
- Fix tech debt: unified validation, auth scopes, error wrapping, slog patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:05:28 -07:00
jordan
39df51defd feat: Add multi-provider code agent interface with Claude Code and OpenCode adapters
Implements weeks 1-4 of the multi-provider architecture:

Week 1 - Foundation:
- Add domain models (AgentProvider, AgentRequest, AgentEvent, AgentResult)
- Define CodeAgent port interface with Execute, Cancel, Capabilities
- Create thread-safe provider registry with first-registered default

Week 2 - Claude Code Adapter:
- Extract kubectl exec logic into CodeAgent implementation
- Parse stream-json output format (init, message, tool_use, result)
- Support session continuation via --resume flag

Week 3 - OpenCode Adapter:
- HTTP/SSE client for opencode serve API
- Session management (create, send message, abort)
- Event streaming with documented buffer rationale

Week 4 - Quality & Polish:
- Fix race condition in OpenCode Cancel method
- Add AgentRequest.Validate() with ErrPromptRequired, ErrInvalidTimeout
- Document DefaultAvailabilityTimeout constants
- Add HTTP error context for debugging

Also includes:
- Work queue system with PostgreSQL adapter
- Credential store for infrastructure secrets
- Project templates with Woodpecker CI integration
- Comprehensive test coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 09:25:51 -07:00
jordan
0fd4e32073 feat: Add infrastructure adapters for threesix.ai
Add Gitea, Cloudflare DNS, and Kubernetes deployer adapters following
hexagonal architecture. These enable automated project provisioning:
- Git repository creation/management via Gitea
- DNS record management via Cloudflare
- Container deployment to Kubernetes

Includes domain models, ports, handlers, and Woodpecker CI webhook
integration for automated deployments on push.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 22:49:58 -07:00
jordan
72d16929ca feat: Implement hexagonal architecture with services, webhooks, queue, and telemetry
Major refactoring to hexagonal (ports & adapters) architecture:

- Add service layer (apikey_service, project_service) for business logic
- Add webhook system with dispatcher and delivery tracking
- Add command queue with priority-based processing
- Add rate limiting with sliding window algorithm
- Add audit logging for command execution
- Add OpenTelemetry integration (traces, metrics, spans)
- Add circuit breaker for fault tolerance
- Add cached repository wrapper for performance
- Add comprehensive validation package
- Add Kubernetes client integration for pod management
- Add database migrations (allowed_ips, audit_log, rate_limiting, queue, webhooks)
- Add network policy and PodDisruptionBudget for k8s
- Remove legacy executor and projects/registry packages
- Untrack secrets.yaml (now managed via envault)
- Add coverage.out to .gitignore
- Add e2e test infrastructure with docker-compose
- Add comprehensive documentation (API, architecture, operations, plans)
- Add golangci-lint config and pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 19:57:46 -07:00
jordan
d2de49a591 feat: Add API key authentication with auto-migrations
Implements API key authentication for all rdev endpoints:

## Database (internal/db)
- Auto-migrating postgres connection
- Embedded SQL migrations via go:embed
- api_keys table with scopes, expiration, project restrictions

## Auth Package (internal/auth)
- Key generation: rdev_sk_<prefix>_<random> format
- Scopes: projects:read, projects:execute, keys:read, keys:write, admin
- SHA-256 key hashing (secrets never stored)
- Expiration options: 30d, 60d, 90d, 1y, never
- Middleware skips /health, /ready, /docs, /openapi.json

## Key Management API
- GET /keys - List keys (keys:read)
- POST /keys - Create key (keys:write)
- GET /keys/{id} - Get key details (keys:read)
- DELETE /keys/{id} - Revoke key (keys:write)

## Environment Variables
- DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, DB_NAME
- RDEV_ADMIN_KEY - Super admin key for bootstrapping

Version bumped to 0.5.0.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:26:26 -07:00