Commit Graph

20 Commits

Author SHA1 Message Date
jordan
c59d348040 chore: prepare for composable monorepo template implementation
This commit captures the current state before implementing the composable
monorepo template system. Key changes included:

Infrastructure:
- Add CockroachDB provisioner adapter for database provisioning
- Add Redis provisioner adapter for cache provisioning
- Add build events system with PostgreSQL storage
- Add WebSocket endpoint for real-time build progress

Code agent improvements:
- Fix Claude Code adapter to use default allowed tools instead of dangerously-skip-permissions
- Add context-aware stream closing for cancellation support
- Improve parser tests for edge cases

Build system:
- Add build event constants and metrics
- Remove deprecated git_operations.go (replaced by pod_git_operations.go)
- Add rollback logic for multi-step provisioning operations

Documentation:
- Add composable-monorepo feature documentation
- Add DNS/Cloudflare service documentation
- Update deployment and troubleshooting guides

Cookbooks:
- Add fullstack-app cookbook
- Refactor landing-test with shared library

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 11:39:28 -07:00
jordan
910bcb62e1 fix: Sync build audit with work queue when stale tasks are requeued
When a worker dies mid-build, queue maintenance now updates both
work_queue and build_audit tables when requeuing stale tasks.
This prevents builds from showing "running" forever in the API.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 02:07:52 -07:00
jordan
e9984ebc07 fix: Include stderr and troubleshooting help in Claude Code errors
When Claude fails to execute, error messages now include:
- Captured stderr output from the failed command
- Troubleshooting commands to exec into pod and run `claude login`

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 23:12:01 -07:00
jordan
4a18b1cd07 fix: Persist build audit status when worker claims task
Root cause: WorkerService.ClaimTask() was modifying the audit entry
in memory but never persisting it to the database. This caused build
tasks to remain stuck at "pending" status even after being claimed.

Changes:
- Add UpdateStatus method to port.BuildAudit interface
- Implement UpdateStatus in postgres.BuildAuditRepository
- Fix ClaimTask to call audit.UpdateStatus() to persist status
- Add test coverage for audit update during task claim
- Update all mock implementations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 21:25:04 -07:00
jordan
34e72687e6 feat: Complete automation gaps for repeatable project deployments
- Initial K8s deployment auto-creation during project creation
- DNS record upsert support (create or update existing records)
- Ingress host management for domain aliases (AddIngressHost/RemoveIngressHost)
- Woodpecker deployer RBAC manifest for CI deploy steps
- Single-commit template seeding via Gitea bulk file API

Closes automation gaps exposed during www.threesix.ai launch:
- Projects now auto-create K8s Deployment/Service/Ingress on creation
- Domain aliases automatically update both DNS and K8s ingress
- CI deploy steps work without manual RBAC setup
- Template seeding triggers only one CI pipeline (not per-file)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 15:18:31 -07:00
jordan
4c41bc3a3f fix: Use cluster-issuer for TLS certs in project deploys
The deployer was using cert-manager.io/issuer (namespace-scoped)
referencing letsencrypt-threesix which only exists in the threesix
namespace. Projects deploy to the projects namespace, so changed to
cert-manager.io/cluster-issuer with letsencrypt-prod.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:29:34 -07:00
jordan
ee2c0d6482 fix: Use repo/tags format for Kaniko plugin (not destinations)
The destinations format caused Kaniko to push images with the full
registry URL as part of the repo path (registry.threesix.ai/name
instead of just name). Using registry + repo + tags format pushes
to the correct path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:07:49 -07:00
jordan
5a7b9342c6 fix: Use registry.threesix.ai instead of nonexistent zot.orchard9.ai
The templates referenced zot.orchard9.ai which has no DNS record.
The actual zot registry is at registry.threesix.ai. Also updated
static templates to use Kaniko plugin instead of docker:24-dind.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:01:48 -07:00
jordan
043cc8c63b fix: ensureNamespace uses Get-then-Create to avoid RBAC failures
The deployer was blindly calling Namespaces().Create() which triggered
cluster-scope RBAC checks even when the namespace already existed.
Now checks with Get() first and only creates if NotFound.

Also adds namespace get/create and secrets create/update/patch
permissions to the rdev-api-deployer ClusterRole.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 21:34:32 -07:00
jordan
41aca7813c fix: Use Woodpecker Kaniko plugin with destinations format
Switch from raw gcr.io/kaniko-project/executor:debug to
woodpeckerci/plugin-kaniko with destinations setting. Also use
npm install instead of npm ci (no lock file in templates) and
skip-tls-verify for self-signed registry certs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 21:23:28 -07:00
jordan
29696ec135 fix: Simplify Kaniko templates for anonymous zot registry
Zot is configured without authentication, so remove the auth
configuration step from templates. Added --insecure flag for
internal registry access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 18:47:39 -07:00
jordan
4d2076d144 feat: Update templates to use Kaniko for rootless builds
Replace Docker-in-Docker (privileged mode) with Kaniko for container
builds. This allows CI pipelines to run without requiring trusted
repo status in Woodpecker.

- astro-landing: Use Kaniko with from_secret for registry auth
- go-api: Use Kaniko with from_secret for registry auth
- default: Use Kaniko with from_secret for registry auth

Kaniko builds and pushes images without requiring privileged mode,
making it compatible with Woodpecker's default security settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 18:44:24 -07:00
jordan
1ac8efa4c7 feat: Expose Woodpecker pipeline errors in API response
- Add CIPipelineError struct to domain with Type, Message, IsWarning fields
- Map Woodpecker Pipeline.Errors to domain.CIPipeline.Errors
- Fix migration 013: UUID type for project_id, cast id to text for MD5
- Remove invalid domain data migration (columns don't exist)
- Update release.sh with --deploy flag and migration support
- Fix test nil pointer: check errors in TestAPIKeyRepository_ProjectIDArrayHandling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 16:16:36 -07:00
jordan
c86516c53a feat: Add multi-domain support with auto-generated slugs for landing page cookbook
Landing page cookbook implementation (Weeks 1-4):

Domain Infrastructure:
- Add project_domains table with migration (013_project_domains.sql)
- Add ProjectDomain model with domain types (primary_auto, primary_custom, alias)
- Add SlugGenerator and ProjectDomainRepository interfaces
- Implement postgres adapters for domain and slug management

Service Layer:
- Add domain CRUD methods to ProjectInfraService
- Generate 8-char random slugs for auto-domains
- Support custom subdomains during project creation
- Add site_live health check to project status
- Trigger CI build after template seeding

Handler Updates:
- Add DomainService interface and adapter pattern
- Rewrite domain handlers to use database-backed service
- Add proper error handling for duplicate/missing domains

CI Integration:
- Add TriggerBuild to CIProvider interface
- Implement TriggerBuild in Woodpecker adapter
- Manually trigger initial build after template seed

Cookbook & Scripts:
- Add landing-test.sh script for E2E testing
- Add release.sh for version releases
- Add logs.sh for quick log access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 12:55:59 -07:00
jordan
bc47e426b0 feat: Add CI pipeline proxy, DNS alias management, and worker executor system
- Add ListPipelines/GetPipeline to CIProvider port with Woodpecker adapter
- Add DNS alias endpoints: GET/POST/DELETE /projects/{id}/domains
- Implement worker executor daemon, build executor, and git operations
- Add build service, worker service, and build audit tracking
- Add worker registry with PostgreSQL adapter and migration
- Add multi-provider code agent interface (Claude Code + OpenCode)
- Add create-and-build combo endpoint
- Update landing-page cookbook to reflect all gaps closed
- Fix tech debt: unified validation, auth scopes, error wrapping, slog patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 21:05:28 -07:00
jordan
39df51defd feat: Add multi-provider code agent interface with Claude Code and OpenCode adapters
Implements weeks 1-4 of the multi-provider architecture:

Week 1 - Foundation:
- Add domain models (AgentProvider, AgentRequest, AgentEvent, AgentResult)
- Define CodeAgent port interface with Execute, Cancel, Capabilities
- Create thread-safe provider registry with first-registered default

Week 2 - Claude Code Adapter:
- Extract kubectl exec logic into CodeAgent implementation
- Parse stream-json output format (init, message, tool_use, result)
- Support session continuation via --resume flag

Week 3 - OpenCode Adapter:
- HTTP/SSE client for opencode serve API
- Session management (create, send message, abort)
- Event streaming with documented buffer rationale

Week 4 - Quality & Polish:
- Fix race condition in OpenCode Cancel method
- Add AgentRequest.Validate() with ErrPromptRequired, ErrInvalidTimeout
- Document DefaultAvailabilityTimeout constants
- Add HTTP error context for debugging

Also includes:
- Work queue system with PostgreSQL adapter
- Credential store for infrastructure secrets
- Project templates with Woodpecker CI integration
- Comprehensive test coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 09:25:51 -07:00
jordan
812b8341be refactor: Split large files to comply with 500-line limit
- cmd/rdev-api/main.go: Extract OpenAPI spec to openapi.go (1073→386 lines)
- internal/adapter/deployer/deployer.go: Extract K8s resources to resources.go (502→264 lines)
- internal/handlers/infrastructure.go: Extract deploy handlers to infrastructure_deploy.go (592→342 lines)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 23:02:31 -07:00
jordan
0fd4e32073 feat: Add infrastructure adapters for threesix.ai
Add Gitea, Cloudflare DNS, and Kubernetes deployer adapters following
hexagonal architecture. These enable automated project provisioning:
- Git repository creation/management via Gitea
- DNS record management via Cloudflare
- Container deployment to Kubernetes

Includes domain models, ports, handlers, and Woodpecker CI webhook
integration for automated deployments on push.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 22:49:58 -07:00
jordan
72d16929ca feat: Implement hexagonal architecture with services, webhooks, queue, and telemetry
Major refactoring to hexagonal (ports & adapters) architecture:

- Add service layer (apikey_service, project_service) for business logic
- Add webhook system with dispatcher and delivery tracking
- Add command queue with priority-based processing
- Add rate limiting with sliding window algorithm
- Add audit logging for command execution
- Add OpenTelemetry integration (traces, metrics, spans)
- Add circuit breaker for fault tolerance
- Add cached repository wrapper for performance
- Add comprehensive validation package
- Add Kubernetes client integration for pod management
- Add database migrations (allowed_ips, audit_log, rate_limiting, queue, webhooks)
- Add network policy and PodDisruptionBudget for k8s
- Remove legacy executor and projects/registry packages
- Untrack secrets.yaml (now managed via envault)
- Add coverage.out to .gitignore
- Add e2e test infrastructure with docker-compose
- Add comprehensive documentation (API, architecture, operations, plans)
- Add golangci-lint config and pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 19:57:46 -07:00
jordan
538ea57ed4 feat: Add claude-config API, security hardening, and testing infrastructure
Claude Config API (v0.6):
- Add CRUD endpoints for commands, skills, and agents
- Commands/skills/agents stored in /workspace/.claude/ (per-project, in git)
- Credentials shared via PVC at /root/.claude/ (shared across pods)
- Use base64 encoding for file writes (prevents shell injection)
- Add content size limits (1MB max)

Security Hardening:
- Add sanitize package for command/prompt validation
- Add rate limiting middleware (token bucket algorithm)
- Add concurrent command limiting
- Add input sanitization to all command handlers
- Gitignore secrets.yaml and credentials.yaml
- Add *.example templates for secrets

Testing Infrastructure:
- Add testutil package with mocks and fixtures
- Add unit tests for auth package (63% coverage)
- Add unit tests for executor (47% coverage)
- Add handler integration tests (40% coverage)
- Add 100% coverage for sanitize, cmdlimit packages
- Add 96% coverage for ratelimit package

Infrastructure:
- Shared Claude credentials PVC (ReadWriteMany)
- Reduced workspace PVC size from 20Gi to 5Gi
- Add init container cleanup before git clone
- Document Longhorn RWX requirements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 01:29:13 -07:00