Commit Graph

55 Commits

Author SHA1 Message Date
jordan
32d50a6952 feat: make infra provisioning idempotent + aeries-daeya public discovery feed
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Make postgres and redis provisioning idempotent: return success when already
  provisioned with credentials stored, allowing cookbook trees to safely include
  explicit add-db/add-redis steps alongside auto-provisioned project creation
- Update tests to reflect new idempotent behavior
- Consolidate docs CI into single multi-stage Docker build (remove separate
  build-docs step; Dockerfile.nginx now builds Slate then serves with nginx)
- Delete redundant skeleton docs/Dockerfile (replaced by multi-stage nginx image)
- Add watch verb to woodpecker-deployer RBAC (required by kubectl rollout status)
- Aeries Daeya cookbook: add public discovery feed (/) + character profiles (/c/:handle),
  characters.published/handle/tagline fields, dark pink design system, /studio/* routes,
  verify-public-discovery + verify-otp-endpoint smoke test steps
- Fix Input.tsx: remove non-existent --border-hover CSS variable hover effect
2026-02-28 17:32:21 -07:00
jordan
e42c18a9a3 feat: add session web UI mode + aeries-daeya cookbook tree
Session WebUI:
- Add `web_ui` flag to session create — launches claude-code-ui in pod on port 3001
- Install @siteboon/claude-code-ui in claudebox Dockerfile, expose port 3001
- Migration 027: add web_ui column to sessions table
- startWebUI/stopWebUI fire-and-forget helpers in SessionsHandler
- Service selects preview port 3001 (web UI) vs 8080 (sidecar) based on flag

Aeries Daeya cookbook:
- Add cookbooks/trees/aeries-daeya.yaml: privacy-first avatar social platform
  (infra → avatar data model → AI generation pipeline → studio UI)
- Add cookbooks/scripts/aeries-daeya-test.sh: run/status/diagnose/teardown harness
- Fix race condition in common.sh wait_for_pipeline: detect already-running pipelines
  at startup and track directly instead of waiting for a newer one

Docs/tooling:
- Add SDK Update Workflow section to CLAUDE.md
- Add `make sdk` and `make sdk-check` targets for OpenAPI spec management

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 23:14:08 -07:00
jordan
ddcfe52b5c feat: implement shared notify host model for platform email delivery
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Replace per-project notify host provisioning (7-9 API calls + DNS + async
Resend verification) with a shared platform host for all *.threesix.ai projects.

Under the new model:
- CreateProjectNotify: 3 calls only (account + send key + host grant)
- No per-project Resend domain, DNS records, or async verification
- All *.threesix.ai projects share `threesix.ai` as the platform host
- Custom domains still get a dedicated host via ReprovisionNotifyHost

Changes:
- domain/notify.go: slim NotifyCredentials (no Host/From/ResendDomainID);
  add NotifyHostCredentials for reprovision return path
- port/notify_provisioner.go: update interface signatures and docs
- adapter/notify/provisioner.go: rewrite CreateProjectNotify (3 steps);
  rewrite DeleteProjectNotify (account-only vs full cleanup)
- adapter/notify/provisioner_reprovision.go: return *NotifyHostCredentials
- adapter/notify/provisioner_test.go: update tests for new model
- service/project_infra_crud.go: store only NOTIFY_API_KEY on provision
- domain/credential.go: add CredKeyNotifySharedHost/CredKeyNotifySharedFrom
- cmd/rdev-api/config.go: add NotifySharedHost/NotifySharedFrom to InfraConfig
- service/component.go: add notifySharedHost/notifySharedFrom + WithNotifyDefaults
- service/component_deploy.go: inject shared host defaults when no custom host stored
- handlers/notify.go: handle shared-host projects in Reprovision guard;
  add WithSharedNotifyHost builder
- cmd/rdev-api/main.go: wire SharedHost to provisioner, component service,
  and notify handler

Bootstrap: NOTIFY_SHARED_HOST=threesix.ai and NOTIFY_SHARED_FROM=noreply@threesix.ai
stored in credential store (host id=1 already provisioned with Resend provider).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:04:11 -07:00
jordan
17240f4efd fix(rc-5): add Redis ACL persistence + cache reprovision endpoint
## Changes

### port.Deployer interface
- Add PatchProjectSecrets(ctx, projectName, patch) to merge key-value pairs
  into all K8s secrets labeled project={projectName}
- Add RestartAll(ctx, projectName) to trigger rolling restart of all deployments
  for a project, picking up fresh secrets without waiting for CI

### deployer adapter
- Implement PatchProjectSecrets: lists secrets by label, merges patch into Data,
  writes each secret back
- Implement RestartAll: lists deployments by label, sets restartedAt annotation

### domain/credential.go
- Add CredentialCategoryCache = "cache" constant
- Use constant in component_infra.go (was raw string "cache")

### handlers/cache.go (new)
- POST /projects/{projectID}/cache/reprovision
- Calls CreateProjectCache (which handles delete+recreate with new password)
- Updates credential store (REDIS_URL, REDIS_URL_STAGING, REDIS_PREFIX)
- Patches all K8s secrets for the project immediately
- Triggers RestartAll so pods pick up new credentials without waiting for deploy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 20:22:31 -07:00
jordan
3dbde72966 feat: add claude_id tracking and session improvements for interactive dev
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add claude_id field to sessions (migration 026) for tracking Claude
  process IDs across pod restarts
- Extend session repository with UpdateClaudeID and session lookup methods
- Improve kubernetes executor with better error handling and exec streaming
- Add claudebox client/server improvements for session lifecycle
- Expand sessions handler with exec streaming endpoint
- Add comprehensive tests for sessions and kubernetes executor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 00:20:32 -07:00
jordan
fa0d030def feat: improve notify domain verification reliability and add status endpoints
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add verifyWithRetry to provisioner: 60s initial DNS propagation delay,
  5 retries with 30s backoff before marking verification as failed
- Add GetNotifyDomainStatus: polls Resend API for domain verification status,
  returns "not_configured" when Resend not set up
- Add VerifyProjectNotify: synchronous re-verification for handler use
- Add getDomainStatus to resendAPI interface + resendClient implementation
- Add NotifyDomainStatus domain struct (host, resend_domain_id, status)
- Guard NOTIFY_RESEND_DOMAIN_ID storage against empty string writes
- New handler: GET /projects/{id}/notify/status (returns verification state)
- New handler: POST /projects/{id}/notify/verify (triggers re-verification)
- Add verify-notify-domain cookbook step to persona-community,
  slackpath-1, and slackpath-4 trees (polls status for up to 6 min)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 16:25:55 -07:00
jordan
2612de8446 fix: inject SERVICE_NAME/SERVICE_PORT for app components in batch path
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
AddComponentBatch was missing the SERVICE_NAME injection that AddComponent
has. When an app-react component (e.g. creator-ui) was rendered via the
batch endpoint alongside a service component, {{SERVICE_NAME}} in App.tsx.tmpl
was never substituted — rendering the literal string into the repo.

Fix: scan the batch's own code requests for a service component first
(since the service isn't in the DB yet during batch processing), then
fall back to findFirstServiceComponent from DB.

This is the same AddComponent vs AddComponentBatch parity gap that caused
the JWT_SECRET issue (RC-2). The auth API URL in every app-react project
was broken when deployed via the batch endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 11:52:03 -07:00
jordan
d91bfc50fa fix: handle missing Redis credentials and redis.Nil in provisioner
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Two bugs fixed:

1. redis.Nil not handled in GetProjectCache: When ACL GETUSER returns nil
   (user doesn't exist), go-redis represents this as redis.Nil error. The
   provisioner only checked for err.Contains("User") which didn't match,
   causing spurious "get ACL user: redis: nil" errors on re-provision.

2. provisionRedis returns 409 even when REDIS_URL not in credential store:
   If the Redis ACL user exists but REDIS_URL was never stored (e.g., due
   to a failed previous run or lost state), the service would permanently
   refuse to provision, leaving the project without usable Redis credentials.
   Now checks the credential store: if REDIS_URL exists → true 409 duplicate;
   if REDIS_URL missing → re-provision (CreateProjectCache resets the password).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:19:00 -07:00
jordan
a593605caa fix: call ensureProjectJWTSecret in AddComponentBatch
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
AddComponent (single-component path) already calls ensureProjectJWTSecret,
but AddComponentBatch has its own parallel implementation that bypassed it.
Components added via the /batch endpoint never had JWT_SECRET provisioned,
causing CrashLoopBackOff on startup ("JWT_SECRET must be set").

Add the call before the createInitialComponentDeployment loop so the secret
is stored in the credential store before K8s Secrets are created.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:12:26 -07:00
jordan
3247ce3ca0 fix: worker deployments and JWT_SECRET auto-provisioning
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
RC-1: Workers now get a Kubernetes Deployment on component creation.
NeedsPort() (port assignment) was incorrectly used to gate Deployment
creation - workers have no HTTP port but still need a Deployment so
CI `kubectl set image` can succeed. Added NeedsDeployment() returning
true for service/worker/app-react/app-astro/app-nextjs. AddIngressPath
is now guarded by port > 0 so workers don't attempt HTTP routing.

RC-2: JWT_SECRET is now auto-provisioned per-project when the first
code component is added. The skeleton service template fatally requires
JWT_SECRET at startup; previously fetchProjectCredentials() never fetched
it. ensureProjectJWTSecret() generates a cryptographically random 32-byte
secret, stores it as "{projectID}:JWT_SECRET", and JWT_SECRET is now
included in projectScopedKeys so it's injected into every deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 03:42:53 -07:00
jordan
4f01015132 feat: implement project access enforcement and management API
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Fix no-op RequireProjectAccess middleware to enforce project_ids
- Apply project access middleware to all project-scoped routes
- Filter GET /projects by allowed project IDs for restricted keys
- Add GET /me endpoint with key identity, scopes, and project access info
- Add PATCH /keys/{id} for partial key updates (name, scopes, project_ids, allowed_ips, expires_in)
- Add GET/POST/DELETE /projects/{id}/access for project-centric access management
- Auto-grant creating key access when using POST /project/create-and-build
- Accept grant_to_key_ids in create-and-build to grant multiple keys on project creation
- Move newProvisionerWithDeps test helper from production code to test file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 15:38:37 -07:00
jordan
0f25bd8dbe feat: hook in notify service for per-project email delivery
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add NotifyProvisioner (port + adapter) using real notify admin API
- Create notify account + send key + host grant per project
- Inject NOTIFY_API_KEY/HOST/FROM into component deployments
- Store NOTIFY_URL, NOTIFY_ADMIN_KEY, RESEND_API_KEY in credential store
- Add setup-notify.sh for one-time host/provider/domain setup
- Add NOTIFY_ADMIN_KEY constant to domain/credential.go
- Wire provisioner in main.go with connection test guard
- Add .claude/guides/services/notify.md and CLAUDE.md entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 00:30:32 -07:00
jordan
a8c8a0a14d feat: add GCS-based persistent media storage, AI generation pipeline, and composable skeleton packages
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Adds complete media storage pipeline with GCS presigned uploads, AI image/video/text generation
via queue-based workers, realtime SSE event streaming, and comprehensive skeleton packages
(storage, mediagen, textgen, generation, realtime, persona, routing, ai-client). Includes
security fixes for media delete authorization, nil pointer guards in handlers, video persistence
via download-then-upload, consistent signed URLs, and Image→ImageIcon rename to avoid DOM collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:29:09 -07:00
jordan
7249575dea feat(sessions): add command execution endpoint and activity tracking
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add POST /sessions/:id/exec endpoint for executing commands in sessions
- Add session activity tracking (last_activity_at timestamp)
- Add database migration 024 for session activity column
- Add comprehensive tests for session handlers and service layer
- Add wildcard TLS certificate for preview.threesix.ai subdomain
- Add infrastructure mocks for testing preview service
- Refactor preview cleanup logic to remove unused methods
- Add AIOS core documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-13 08:41:05 -07:00
jordan
84af398d85 refactor: add timeout constants for agent execution tiers
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add TimeoutAgentExecution (22m) to handlers for synchronous SDLC
execution, and TimeoutAgent{Default,Medium,Heavy} (12/22/47m) to
workers for tiered agent task execution. Aligns with SDLC action
complexity tiers and prevents inline duration literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-11 10:48:24 -07:00
jordan
542bc722ab fix(architect): handle missing projects in repo, add cookbook hooks/validation
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
The architect API returned "failed to start conversation" because
projectRepo.Get() failed — the in-memory K8s repo watches the rdev
namespace but projects deploy to the projects namespace. Made project
lookup non-fatal with fallback to default pod. Added error logging to
all architect handler methods (were silently swallowing errors).

Also adds setup-hooks, commit-after-qa, and pre-merge-validate steps
to the foundary cookbook tree for git hooks and code quality gates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 02:25:40 -07:00
jordan
c68fadbccd fix(architect): add pod_name to agent requests, rewrite foundary cookbook
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
The architect service was missing pod_name/namespace in AgentRequest
metadata, causing Claude Code adapter to reject all requests. Added
ArchitectServiceConfig with pod resolution (project PodName → default
claudebox-0). Removed silent JSON fallback in extractSpecFromMessages
that masked errors.

Rewrote foundary cookbook from 90-step SDLC flow to focused 25-step
cookbook using natural language build prompts instead of /slash-commands
that claudebox cannot execute. Added "no fallbacks" rule to CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 01:24:34 -07:00
jordan
a9ad3d8304 chore: accumulated platform hardening and CI fixes
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
CI / Woodpecker:
- Add explicit depends_on to all .woodpecker.yml steps (rdev + templates)
- Fix skip_tls_verify -> skip-tls-verify (correct Kaniko flag name)
- Add replicasets get/list to deployer RBAC for rollout status
- Skeleton template: add failure:ignore on docs steps, Traefik TLS
  annotations on ingress, depends_on on verify step

Component templates:
- Fix container name in deploy steps (PROJECT_NAME-COMPONENT_NAME)
- Replace kubectl scale with kubectl patch for replicas
- Add post-deploy image verification and rollout status checks
- Applied consistently across all 5 component templates

Adapters:
- gitea: Add HTTP client timeout (30s), context cancellation checks,
  handle 404 on GetRepo/DeleteRepo
- zot: Add retry with exponential backoff (doWithRetry), limit response
  body reads to 10MB
- cockroach: Use net.JoinHostPort for IPv6-safe DSN construction
- woodpecker: Fix error wrapping (%v -> %w)
- redis: Fix error wrapping (%v -> %w)
- deployer: Add context cancellation checks

Services:
- apikey_service: Fix error wrapping (%v -> %w)
- component_deploy: Fix error wrapping (%v -> %w)
- project_infra: Fix error wrapping (%v -> %w)
- webhook/dispatcher: Fix error wrapping (%v -> %w)

Other:
- CLAUDE.md: Add guide links for Gitea, Go 1.25, Woodpecker v3,
  Traefik v3, Zot registry
- circuitbreaker: Add test for error wrapping
- docs: Update deployment, troubleshooting, and runbook docs
- health: Fix error wrapping (%v -> %w)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:16:56 -07:00
jordan
b7d0e84946 fix(deploy): create component deployments with 0 replicas to prevent ImagePullBackOff
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Components are scaffolded before CI builds their images. Previously deployments
started with 1 replica, causing ImagePullBackOff until the first build completed.
Now deployments start at 0 replicas; CI deploy steps scale to 1 after verifying
the image exists in the registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 10:16:14 -07:00
jordan
9f957d6e75 fix(templates): harden component CI steps and compile regexes
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add --connect-timeout 10 and --max-time 15 to all verify step curl
  calls to prevent hanging on registry health checks
- Fix cli template: depends_on [deps] -> [preflight] for consistency
- Add cross-reference comment to service template about verify logic
  being replicated across all 5 component templates
- Document component CI step rules in composable-monorepo.md
- Compile regexes at package level instead of per-call in
  component_updates.go
- Add component_updates_test.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:36:23 -07:00
jordan
9226454b85 feat: label-based undeploy, GC reconciliation, checkout/sessions, pool status
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add UndeployAll() using label selectors to clean up monorepo components
  on project deletion (replaces name-based Undeploy in DeleteProject and
  the direct undeploy handler)
- Add ResourceGC background worker that periodically finds K8s resources
  whose project label has no matching DB record, deletes after 1h safety
  window
- Widen deployer client type from *kubernetes.Clientset to
  kubernetes.Interface for testability
- UndeployAll accumulates errors via errors.Join instead of failing fast
- Add checkout/checkin sidecar dev flow: temporary git tokens, branch
  checkout, review on checkin with cleanup workers
- Add interactive sessions: pod binding, command execution, SSE streaming,
  ephemeral preview URLs with session cleanup workers
- Add GET /workers/pool endpoint for aggregate capacity and queue depth
- Add sessions:read and sessions:execute auth scopes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:11:28 -07:00
jordan
a69eb7e587 feat(foundary): implement complete backend for conversational project design
Implements all 5 phases of Foundary Studio backend:

Phase 1: Chat Persistence (8 API endpoints)
- Conversations and messages with proper cascading deletes
- PostgreSQL schema with auto-update triggers
- Full CRUD operations with structured logging

Phase 2: Blueprint Entity (5 API endpoints)
- JSONB spec storage with GIN indexes
- Flexible structured data for project specifications
- Version-controlled blueprint management

Phase 3: Architect Service (3 API endpoints)
- Conversational AI orchestration with Claude
- Multi-turn dialogue with context building
- Blueprint spec extraction from conversations

Phase 4: Work Queue Integration
- Verified existing endpoint compatibility

Phase 5: Structured Questions (6 API endpoints)
- Four question types: text, choice, multichoice, yesno
- Answer validation with proper constraints
- Conversation-linked Q&A flow

Architecture:
- Textbook hexagonal architecture (domain → port → adapter → service → handler)
- Zero external dependencies in domain layer
- Consistent error handling with proper wrapping
- Auth scopes on all routes (projects:read, projects:execute)
- Structured logging with operation context and duration tracking
- NULL-safe DTO converters throughout

Database:
- 3 new migrations (019, 020, 021)
- UUIDs for all primary keys
- Proper foreign key constraints with ON DELETE CASCADE
- Optimized indexes including partial index for unanswered questions
- Auto-update triggers for timestamps

OpenAPI Documentation:
- Complete API documentation under 'Foundary' tag
- 22 new endpoints documented with examples
- Request/response schemas for all operations

Logging Improvements:
- Added operation field to all service logs
- Added duration_ms tracking for performance monitoring
- Log response_length instead of full response content
- Consistent use of logging field constants
- Execute-then-log pattern for delete operations

Files: 32 changed, 2800+ lines added
- 7 domain models
- 3 database migrations
- 3 port interfaces
- 3 postgres adapters
- 4 services (conversation, blueprint, question, architect)
- 4 handlers with DTOs
- OpenAPI documentation
- Integration in main.go

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-09 00:50:46 -07:00
jordan
adcea2fc1f fix(templates): upgrade Go to 1.25 and fix Woodpecker syntax
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
## Template Version Alignment
- Go: 1.23 → 1.25 across all templates (go.work, go.mod, Dockerfiles, CI)
- Alpine: latest → 3.19 (explicit version pinning)
- Woodpecker: failure:retry → failure:ignore (invalid syntax fix)

## SDLC Tree Fixes (slackpath-5-full-lifecycle)
Fixed merge failures by correcting lifecycle flow:

1. **Branch Creation**: Added missing create-branch step (planned → ready)
   - Bug: Merge command requires feature.Branch field to be set
   - Fix: POST /projects/{id}/sdlc/features/{slug}/branch

2. **Artifact Status**: Changed approval to pass for execution artifacts
   - Bug: Review/audit/QA need status="passed" not "approved"
   - Fix: /artifacts/{type}/approve → /artifacts/{type}/pass
   - Added: pass-qa step after wait-qa

3. **Phase Transition Order**: Reordered merge phase transition
   - Bug: Merge command checks if phase == "merge" first
   - Fix: transition-to-merge BEFORE merge-feature (not after)

## GCS Provisioner Fix
- Replaced deprecated option.WithCredentialsFile with env var approach
- Now uses GOOGLE_APPLICATION_CREDENTIALS for ADC (Application Default Credentials)
- Avoids security risk from deprecated credential options
- Fixed test: Added ComponentTypeGCS to ValidComponentTypes test

## Critical Rules Added
- Version alignment: All template versions must stay in sync
- When updating versions, grep entire templates/ tree

## Files Changed
- 27 template files: Go version + Woodpecker syntax
- 1 tree file: SDLC lifecycle flow corrections
- 1 CLAUDE.md: Version alignment rule
- 1 GCS provisioner: Deprecated API fix
- 1 test file: Added missing component type

Root cause: Skeleton templates lagged behind Go 1.25 release and had
invalid Woodpecker syntax. SDLC tree skipped required branch creation
and used wrong artifact approval endpoints.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 23:57:38 -07:00
jordan
4486042155 fix(registry): delete container images on project teardown
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Root cause of DIGEST_INVALID errors was registry disk exhaustion.
Project teardown wasn't cleaning up container images, causing the
registry PVC to fill up over time.

Changes:
- Add RegistryProvider port interface for registry operations
- Extend zot.Client with DeleteProjectRepositories method
- Wire registry provider into ProjectInfraService
- Delete images during DeleteProject cleanup (step 4)

The zot client uses the OCI distribution API:
- Lists all repos, filters by project prefix
- Gets manifest digests via HEAD request
- Deletes manifests by digest to trigger GC

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 02:56:18 -07:00
jordan
f20fc6c51c feat(saga): implement enterprise-grade resilience architecture
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
jordan
863dfd3214 fix: skip root deployment for empty template (defaults to skeleton)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When req.Template is empty, it defaults to 'skeleton' but the check
in createInitialDeployment only matched 'skeleton' explicitly, not
empty string. This caused a broken deployment to be created for
monorepo projects with a non-existent image.

Root cause: slackpath-5 creates project with empty template, which
defaults to skeleton, but createInitialDeployment was still creating
a root deployment that references registry.threesix.ai/{project}:latest
which never gets built (skeleton has no root Dockerfile).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:32:19 -07:00
jordan
af91bad0ff feat: add Slate documentation templates to skeleton
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Adds complete Slate documentation infrastructure to generated projects:
- docs/ directory with Gemfile, config.rb, and source templates
- Dockerfile for building docs site
- Dockerfile.nginx for serving static docs
- generate-docs.sh script for CI integration
- Claude command for AI-assisted docs generation
- OpenAPI → Slate markdown conversion via widdershins

Also includes:
- --export-openapi flag for service binaries
- DNS provisioning for docs.{domain} subdomain
- Updated project_infra for docs DNS records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:06:36 -07:00
jordan
d74efb75ff fix: wire workService to WorkersHandler and add /work/tasks endpoint
Critical fix: WorkersHandler was missing workService dependency, causing
500 errors when workers tried to fail tasks. This caused tasks to get
stuck in "running" state permanently.

Also adds:
- /work/tasks endpoint for debugging all tasks across projects
- List method to WorkQueue interface for admin views
- HTTP client tests for api_client.go and claudebox/client.go (48 tests)
- Split work.go DTOs into work_dto.go to stay under 500 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:35:39 -07:00
jordan
853ec4cf81 fix: go.work race condition with batch components and idempotent provisioning
Three coordinated fixes for CI pipeline race conditions:

1. Woodpecker step dependencies: Added depends_on: [deps] to all 6 component
   templates (service, worker, cli, app-astro, app-react, app-nextjs) so build
   steps wait for go work sync to complete.

2. Idempotent resource provisioning: Modified provisionResources() to check
   for existing database/cache before creating, preventing "already exists"
   errors on component re-adds.

3. Batch component endpoint: POST /projects/{id}/components/batch enables
   atomic multi-component additions in a single git commit. Validates all
   components upfront, provisions infra sequentially, commits code components
   atomically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:31:40 -07:00
jordan
1e853980e4 feat: inject provisioned credentials into component deployments
Components now automatically receive DATABASE_URL, REDIS_URL, and other
infrastructure credentials when deployed. Previously, credentials were
provisioned and stored but never injected into K8s deployments.

Changes:
- Add fetchProjectCredentials() to component_deploy.go
- Populate spec.Secrets before calling deployer.Deploy()
- Fix slackpath-4 to provision postgres + redis before services
- Add terminology docs to clarify platform vs skeleton code

This completes the infrastructure provisioning flow:
1. add-db → provisions CockroachDB, stores DATABASE_URL
2. add-service → deploys with DATABASE_URL in environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:09:15 -07:00
jordan
53862c773b fix: resolve systemic debt in worker and skeleton templates
Worker template fixes:
- Replace panic() with logger.Error() + os.Exit(1) for config errors
- Remove double-timeout application (context + middleware)
- Add error message truncation to prevent log bloat
- Use named constants for shutdown grace period and stale check interval

Skeleton pkg/auth fixes:
- Fix error wrapping to use %w consistently in jwt.go
- Add GetUserOrError() as safe alternative to MustGetUser() panic

Skeleton pkg/queue fixes:
- Check RowsAffected() errors instead of ignoring them
- Add input validation to EnqueueWithOptions (require job type, cap retries)
- Add log truncation for error messages
- Fix inaccurate doc comment claiming exponential backoff

Worker timeout consolidation:
- Add internal/worker/timeouts.go with named constants
- Migrate all workers to use timeout constants

Cleanup:
- Remove obsolete slack-preparation-thoughts.md files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:44:55 -07:00
jordan
d69da6d627 feat: add structured logging infrastructure and SDLC extensions
Major changes:
- Add internal/logging package with field constants, context propagation,
  sensitive data auto-redaction, and per-component log levels
- Add worker timeout constants (TimeoutQuickOp, TimeoutHealthCheck, etc.)
- Extend SDLC with callback handlers, generate endpoints, and executor
- Add new cookbook trees for aeries and slackpath progression
- Add skeleton templates for queue, realtime, and microservices
- Add worker component template with async job processing
- Refactor services and handlers to use new logging infrastructure
- Split component.go into component_infra.go and component_listing.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:56:04 -07:00
jordan
1790afd0ee feat: add path-based ingress management for component lifecycle
Adds AddIngressPath and RemoveIngressPath to the Deployer interface
for managing per-component ingress rules in monorepo projects.

- Implement conflict retry logic for concurrent ingress updates
- Add K8s client interface for testability
- Add comprehensive unit tests for ingress path operations
- Add component deployment and teardown methods to ComponentService
- Update service templates with OpenAPI spec improvements
- Add evolving-app cookbook tree for reference
- Split resources.go into resources_ingress.go for path-based routing
- Split component.go into component_deploy.go for deployment helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:31:50 -07:00
jordan
b093a4b26d feat: implement Visual Verification API layer (Week 2)
Add REST API endpoints for submitting visual verification tasks,
tracking progress via SSE, and retrieving screenshot/video artifacts.

Changes:
- Add ScopeVerifyRead/ScopeVerifyWrite auth scopes
- Create VerifyService for task submission and lifecycle management
- Create VerifyHandler with POST/GET/DELETE/SSE endpoints:
  - POST /verify - Submit capture task
  - GET /verify/{taskId} - Get task status and artifacts
  - GET /verify/{taskId}/stream - SSE progress stream
  - DELETE /verify/{taskId} - Cancel pending task
  - GET /projects/{id}/verify - List verify tasks
- Wire VerifyExecutor in main.go for Playwright pod execution
- Fix work.go validation to include "verify" task type
- Add comprehensive handler tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:29:40 -07:00
jordan
210064d490 feat: add diagnostics endpoint and external health monitoring
- Add /diagnostics endpoint for system health overview
- Add external health worker for monitoring Gitea, Woodpecker, Registry
- Add health check methods to Gitea and Woodpecker clients
- Remove hardcoded fallback projects (pantheon, aeries)
- Add diagnostics domain types and service layer
- Add comprehensive tests for diagnostics handler and service
- Fix tests to use registered test project instead of hardcoded one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:10:56 -07:00
jordan
b5fdf35f1b feat: add WorkerService.FailTask for audit updates + visual verification scaffolding
- Add FailTask to WorkerService to update build_audit on failure path
  (fixes bug where audit showed "running" when task actually failed)
- Add WorkServiceFailer interface to avoid circular dependency
- Add VerifyExecutor with Playwright-based visual verification
- Add verify domain types (VerifySpec, VerifyResult, screenshot capture)
- Wire VerifyExecutor placeholder into WorkExecutor (impl in Week 2)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:09:16 -07:00
jordan
cfba724f8a feat: add work task error classification and user-facing error codes
- Add WorkErrorCode type with RATE_LIMITED, AUTH_FAILED, TIMEOUT, STALE_WORKER, AGENT_ERROR, INVALID_SPEC
- Add ClassifyAgentError function to detect error patterns from stderr
- Add error_code column to work_queue table (migration 016)
- Add FailWithCode method to WorkQueue interface and implementations
- Update RequeueStaleWithIDs to mark permanently failed tasks with STALE_WORKER
- Add ErrorCode to BuildResult for API responses
- Update work executor to classify errors before failing tasks

This enables users to see actual failure reasons (e.g., "RATE_LIMITED") instead of
builds stuck in "running" state forever when Claude hits rate limits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:07:34 -07:00
jordan
6e8f5821af feat: add artifact pass/fail/needs-fix lifecycle for SDLC execution phases
- Add pass/fail/needs-fix CLI commands to cmd/sdlc/cmd_artifact.go
- Add 3 new methods to SDLCExecutor interface in internal/port
- Implement methods in kubernetes adapter
- Add service methods to SDLCService
- Add HTTP handlers for POST .../artifacts/{type}/pass|fail|needs-fix
- Update 6 skeleton commands to evaluate and set artifact status
- Update test mocks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:14:53 -07:00
jordan
572b221e20 feat: add automatic cleanup for cookbook test projects
- Add AUTO_TEARDOWN env var and --auto-teardown flag to cookbook scripts
- Scripts automatically delete created projects on exit (including Ctrl+C)
- Add DELETE /projects/cleanup API endpoint for bulk cleanup
- Supports shell-style glob patterns (e.g., "tree-test-*")
- Includes dry_run mode and older_than_hours filter for safety
- Requires admin scope for actual deletion
- Update cookbook scripts: landing-test, composable-test, template-validation,
  feature-test, tree-runner

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 17:54:15 -07:00
jordan
56e3f83955 feat: add auth scopes, OpenAPI docs, SDLC guides, and code quality improvements
- Add auth.RequireScope() to all handler routes for proper authorization
- Add SDLC OpenAPI endpoint documentation (state, features, tasks, branches, merge, archive, orchestrator)
- Add SDLC documentation guides (getting-started, cli-reference, api-reference, command-catalog)
- Add artifact_test.go for SDLC artifact coverage
- Add CLAUDE.md rules: auth scopes requirement, error wrapping with %w
- Fix error wrapping to use %w instead of %v throughout codebase
- Improve CLI merge command with conflict detection and resolution
- Fix handler tests to include auth middleware for RequireScope
- Add cookbook tree runner scripts for automated testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 13:55:50 -07:00
jordan
f22b220c6d feat: add SDLC branch management, merge, archive, and orchestrator APIs
Add branch lifecycle commands (branch, merge, archive) to the SDLC CLI.
Introduce orchestrator handler and service for multi-step SDLC workflows.
Expand skeleton template with 15 Claude commands covering the full feature
lifecycle. Extend classifier rules, error types, and executor port for
branch operations. Split rules.go and classifier_test.go to stay within
500-line limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 12:30:03 -07:00
jordan
425ef0f806 feat: add SDLC orchestration - library, CLI, and API integration
Implements deterministic feature lifecycle management for agent-driven
development. Agents use the CLI in pods; operators control via REST API.

Library (internal/sdlc/):
- Feature lifecycle with 10 phases (draft → released)
- Classifier engine with priority-ordered rules
- Artifact tracking with approval workflow
- Task management within features
- YAML-based state persistence

CLI (cmd/sdlc/):
- init, state, next, feature, artifact, task, query commands
- --json flag for machine-readable output
- Runs inside project pods

API (21 endpoints under /projects/{id}/sdlc/):
- State: GET /state, GET /next
- Features: CRUD + transition/block/unblock
- Artifacts: approve/reject per type
- Tasks: add/start/complete/block
- Queries: blocked/ready/needs-approval

Architecture:
- Port: SDLCExecutor interface (internal/port/)
- Adapter: kubectl exec into pods (internal/adapter/kubernetes/)
- Service: pod resolution + logging (internal/service/)
- Handlers: 5 files under 500-line limit (internal/handlers/)

Also includes template upgrades (chassis framework, UI components,
OpenAPI helpers, backend/frontend guides) and component improvements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 09:57:05 -07:00
jordan
c280a92012 feat: add operations audit system and template improvements
Operations Audit (new feature):
- Add Operation domain model with status tracking (pending, running, completed, failed, cancelled)
- Add OperationRepository with PostgreSQL implementation
- Add OperationService for CRUD and lifecycle management
- Add operations handlers (list, get, cancel endpoints)
- Add migration 015_operations.sql for operations table
- Add operation cleanup worker for stale operation handling
- Add ErrOperationNotFound to domain errors

Template Improvements:
- Add CLAUDE.md configuration files to astro-landing, default, and go-api templates
- Fix PORT template variable usage in nginx configs for app templates
- Add replace directives for local pkg module in Go templates
- Simplify Go service/worker Dockerfiles for workspace builds
- Fix TypeScript error in logger template

Other:
- Refactor landing-test.sh cookbook script
- Update CLAUDE.md version reference

Note: Some files exceed 500-line limit (pre-existing debt + new feature)
- component.go: 550 lines (unchanged, pre-existing)
- main.go: 522 lines (added operations wiring)
- operation_repo.go: 569 lines (new, needs splitting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:57 -07:00
jordan
05a64c51e7 release: v0.10.27 - fix: woodpecker step YAML multi-line command syntax 2026-02-01 12:42:18 -07:00
jordan
c2b0447d80 feat: add per-component deploy steps and component templates endpoint
Add deploy-{name} CI steps to all component templates (app-astro,
app-react, service, worker) so each component deploys independently
via kubectl set image on merge to main. Replace the skeleton's
generic deploy step with a verify step that confirms deployments.

Add GET /templates/components endpoint for listing available component
templates with optional type filter. Simplify component API by merging
type+template into a single type field (e.g., "app-react" instead of
type="app" template="app-react").

Include ESLint configs and pnpm-workspace.yaml in app templates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 22:31:41 -07:00
jordan
8282d60c69 feat: implement composable monorepo template system with component architecture
Adds the composable monorepo template system that generates project skeletons
with pluggable components (service, worker, app-react, app-astro, cli).

Key changes:
- Monorepo skeleton templates with shared pkg/, scripts/, and git hooks
- Component templates (service, worker, app-react, app-astro, cli) with
  Dockerfiles, CI steps, and component.yaml manifests
- Component domain model with validation and dependency resolution
- Component handler endpoints for CRUD and composition
- Template provider extended with BuildComposableProject and component assembly
- Deployer extended with composable project deployment support
- Handler timeout constants (TimeoutFastLookup through TimeoutLongRunning)
- envutil package for centralized env var reads with defaults
- api.DecodeJSON helper for standardized request body decoding
- Standardized response helpers (WriteBadRequest, WriteNotFound, etc.)
- Replaced fullstack-app cookbook with composable-app cookbook
- Hardened handler timeouts, logging, and error responses across all handlers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 19:11:42 -07:00
jordan
c59d348040 chore: prepare for composable monorepo template implementation
This commit captures the current state before implementing the composable
monorepo template system. Key changes included:

Infrastructure:
- Add CockroachDB provisioner adapter for database provisioning
- Add Redis provisioner adapter for cache provisioning
- Add build events system with PostgreSQL storage
- Add WebSocket endpoint for real-time build progress

Code agent improvements:
- Fix Claude Code adapter to use default allowed tools instead of dangerously-skip-permissions
- Add context-aware stream closing for cancellation support
- Improve parser tests for edge cases

Build system:
- Add build event constants and metrics
- Remove deprecated git_operations.go (replaced by pod_git_operations.go)
- Add rollback logic for multi-step provisioning operations

Documentation:
- Add composable-monorepo feature documentation
- Add DNS/Cloudflare service documentation
- Update deployment and troubleshooting guides

Cookbooks:
- Add fullstack-app cookbook
- Refactor landing-test with shared library

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 11:39:28 -07:00
jordan
910bcb62e1 fix: Sync build audit with work queue when stale tasks are requeued
When a worker dies mid-build, queue maintenance now updates both
work_queue and build_audit tables when requeuing stale tasks.
This prevents builds from showing "running" forever in the API.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 02:07:52 -07:00
jordan
4a18b1cd07 fix: Persist build audit status when worker claims task
Root cause: WorkerService.ClaimTask() was modifying the audit entry
in memory but never persisting it to the database. This caused build
tasks to remain stuck at "pending" status even after being claimed.

Changes:
- Add UpdateStatus method to port.BuildAudit interface
- Implement UpdateStatus in postgres.BuildAuditRepository
- Fix ClaimTask to call audit.UpdateStatus() to persist status
- Add test coverage for audit update during task claim
- Update all mock implementations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 21:25:04 -07:00
jordan
34e72687e6 feat: Complete automation gaps for repeatable project deployments
- Initial K8s deployment auto-creation during project creation
- DNS record upsert support (create or update existing records)
- Ingress host management for domain aliases (AddIngressHost/RemoveIngressHost)
- Woodpecker deployer RBAC manifest for CI deploy steps
- Single-commit template seeding via Gitea bulk file API

Closes automation gaps exposed during www.threesix.ai launch:
- Projects now auto-create K8s Deployment/Service/Ingress on creation
- Domain aliases automatically update both DNS and K8s ingress
- CI deploy steps work without manual RBAC setup
- Template seeding triggers only one CI pipeline (not per-file)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 15:18:31 -07:00