Commit Graph

67 Commits

Author SHA1 Message Date
jordan
be80fd2d4a fix: correct kaniko dockerfile path for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When --context=docs is set, the --dockerfile path should be relative
to the context directory. Changed from docs/Dockerfile.nginx to
Dockerfile.nginx since kaniko already looks in the docs/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:35:54 -07:00
jordan
caf0990ceb fix: downgrade rouge to 3.x for middleman-syntax compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
middleman-syntax ~> 3.2 requires rouge ~> 3.2, but Gemfile had rouge ~> 4.0
causing bundle install to fail with version resolution error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:48:49 -07:00
jordan
b41e0dfbf9 fix: use raw JSON responses in claudebox server
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The claudebox sidecar was using api.WriteJSON which wraps responses in
{data: ..., meta: ...} format. The claudebox HTTP client expects raw
JSON responses without wrapping.

This caused git clone to appear to fail - the HTTP request succeeded
and returned {data: {success: true, cloned: true}, meta: {...}}, but
the client decoded success=false because it couldn't find the fields
at the top level.

Added writeRawJSON helper and replaced all api.WriteJSON calls with it
for actual responses. Error responses still use api.WriteBadRequest
which returns proper error format.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:41:21 -07:00
jordan
af91bad0ff feat: add Slate documentation templates to skeleton
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Adds complete Slate documentation infrastructure to generated projects:
- docs/ directory with Gemfile, config.rb, and source templates
- Dockerfile for building docs site
- Dockerfile.nginx for serving static docs
- generate-docs.sh script for CI integration
- Claude command for AI-assisted docs generation
- OpenAPI → Slate markdown conversion via widdershins

Also includes:
- --export-openapi flag for service binaries
- DNS provisioning for docs.{domain} subdomain
- Updated project_infra for docs DNS records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:06:36 -07:00
jordan
f64377116a fix: add build-complete sync point for docs pipeline ordering
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The export-openapi step was running in parallel with component builds
because it had no explicit dependency. This could cause docs generation
to run before component services were fully built.

Changes:
- Add build-complete step with NO depends_on (waits for ALL prior steps)
- Make export-openapi depend on build-complete
- Complete docs pipeline: export-openapi → generate-docs → build-docs →
  build-docs-image → deploy-docs
- Update verify step label selector to use project= instead of app=

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:02:17 -07:00
jordan
59aa173384 fix: clear stale error when dequeuing work tasks
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When a task is retried (dequeued again after failure), the previous
error message was persisting in the work_queue table. This caused the
API to return confusing responses with status="running" but also
containing an error message from the previous attempt.

Now clears error and completed_at when claiming a task, matching the
fix already applied to build_audit.UpdateStatus.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:51:34 -07:00
jordan
9833725f31 fix: preserve work on build retry, clear stale audit data
Two critical fixes for build retry behavior:

1. pod_git_operations.go: Normalize remote URL before comparison
   - Clone stores URL with token (https://token:x@host/...)
   - Subsequent retry compares against URL without token
   - Without normalization, URLs never match, so workspace is always
     cleared and re-cloned, losing all code from previous attempt

2. build_audit.go: Clear stale result data when task transitions to running
   - When a failed task is retried, UpdateStatus only updated status/worker_id
   - Result and completed_at from previous failure remained, causing
     API to return stale failure data even while retry was running
   - Now clears result, completed_at and resets started_at when
     status is set to "running"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:40:36 -07:00
jordan
9cca5cc41b fix: add proper instrumentation to git clone for debugging
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Log clone request with work_dir, URL, and token presence
- Log workspace state (is_git_repo, existing remote)
- Log all decision points (pull vs clone, clear workspace)
- Detect and clear non-empty non-git directories before clone
- Capture both stdout and stderr for clone failures
- Include exit code in error messages
2026-02-07 07:59:53 -07:00
jordan
e58d679e67 fix: add go mod download to component Dockerfiles
Empty go.sum files were causing Docker builds to fail because
Go couldn't verify dependencies. Added go mod download steps
for both pkg and component directories before building.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:35:02 -07:00
jordan
d74efb75ff fix: wire workService to WorkersHandler and add /work/tasks endpoint
Critical fix: WorkersHandler was missing workService dependency, causing
500 errors when workers tried to fail tasks. This caused tasks to get
stuck in "running" state permanently.

Also adds:
- /work/tasks endpoint for debugging all tasks across projects
- List method to WorkQueue interface for admin views
- HTTP client tests for api_client.go and claudebox/client.go (48 tests)
- Split work.go DTOs into work_dto.go to stay under 500 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:35:39 -07:00
jordan
d7a6f37593 fix: worker graceful shutdown and RWO PVC compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add WaitGroup for graceful shutdown of in-flight tasks
- Change replicas to 1 with Recreate strategy (RWO PVC limitation)
- Optimize Dockerfile: combine RUN commands for smaller layers
- Add compiled binaries to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:35:00 -07:00
jordan
f6a2b61b16 fix: add skeleton settings.local.json (was globally gitignored)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:55:17 -07:00
jordan
3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00
jordan
3b0779fbe8 fix: slackpath trees use batch endpoint for atomic multi-component adds
Updates slackpath-2 and slackpath-4 to use POST /projects/{id}/components/batch
for adding multiple Go components atomically in a single git commit. This
prevents the go.work race condition where individual commits reference modules
that don't exist yet.

Also adds on_error: continue for infrastructure provisioning steps that may
already exist from skeleton (redis, postgres).

Verified:
- slackpath-1:  Complete (wait_build polled 5 times, detected success)
- slackpath-2:  Complete (wait_build polled 111 times, detected success)
- slackpath-3:  Infrastructure passed (worker capacity limited testing)
- slackpath-4:  Infrastructure passed (worker capacity limited testing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 14:44:53 -07:00
jordan
853ec4cf81 fix: go.work race condition with batch components and idempotent provisioning
Three coordinated fixes for CI pipeline race conditions:

1. Woodpecker step dependencies: Added depends_on: [deps] to all 6 component
   templates (service, worker, cli, app-astro, app-react, app-nextjs) so build
   steps wait for go work sync to complete.

2. Idempotent resource provisioning: Modified provisionResources() to check
   for existing database/cache before creating, preventing "already exists"
   errors on component re-adds.

3. Batch component endpoint: POST /projects/{id}/components/batch enables
   atomic multi-component additions in a single git commit. Validates all
   components upfront, provisions infra sequentially, commits code components
   atomically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:31:40 -07:00
jordan
1e853980e4 feat: inject provisioned credentials into component deployments
Components now automatically receive DATABASE_URL, REDIS_URL, and other
infrastructure credentials when deployed. Previously, credentials were
provisioned and stored but never injected into K8s deployments.

Changes:
- Add fetchProjectCredentials() to component_deploy.go
- Populate spec.Secrets before calling deployer.Deploy()
- Fix slackpath-4 to provision postgres + redis before services
- Add terminology docs to clarify platform vs skeleton code

This completes the infrastructure provisioning flow:
1. add-db → provisions CockroachDB, stores DATABASE_URL
2. add-service → deploys with DATABASE_URL in environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:09:15 -07:00
jordan
53862c773b fix: resolve systemic debt in worker and skeleton templates
Worker template fixes:
- Replace panic() with logger.Error() + os.Exit(1) for config errors
- Remove double-timeout application (context + middleware)
- Add error message truncation to prevent log bloat
- Use named constants for shutdown grace period and stale check interval

Skeleton pkg/auth fixes:
- Fix error wrapping to use %w consistently in jwt.go
- Add GetUserOrError() as safe alternative to MustGetUser() panic

Skeleton pkg/queue fixes:
- Check RowsAffected() errors instead of ignoring them
- Add input validation to EnqueueWithOptions (require job type, cap retries)
- Add log truncation for error messages
- Fix inaccurate doc comment claiming exponential backoff

Worker timeout consolidation:
- Add internal/worker/timeouts.go with named constants
- Migrate all workers to use timeout constants

Cleanup:
- Remove obsolete slack-preparation-thoughts.md files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:44:55 -07:00
jordan
d69da6d627 feat: add structured logging infrastructure and SDLC extensions
Major changes:
- Add internal/logging package with field constants, context propagation,
  sensitive data auto-redaction, and per-component log levels
- Add worker timeout constants (TimeoutQuickOp, TimeoutHealthCheck, etc.)
- Extend SDLC with callback handlers, generate endpoints, and executor
- Add new cookbook trees for aeries and slackpath progression
- Add skeleton templates for queue, realtime, and microservices
- Add worker component template with async job processing
- Refactor services and handlers to use new logging infrastructure
- Split component.go into component_infra.go and component_listing.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:56:04 -07:00
jordan
1790afd0ee feat: add path-based ingress management for component lifecycle
Adds AddIngressPath and RemoveIngressPath to the Deployer interface
for managing per-component ingress rules in monorepo projects.

- Implement conflict retry logic for concurrent ingress updates
- Add K8s client interface for testability
- Add comprehensive unit tests for ingress path operations
- Add component deployment and teardown methods to ComponentService
- Update service templates with OpenAPI spec improvements
- Add evolving-app cookbook tree for reference
- Split resources.go into resources_ingress.go for path-based routing
- Split component.go into component_deploy.go for deployment helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:31:50 -07:00
jordan
196e3d96e8 fix: make go.work.sum optional in Dockerfiles
Use glob pattern go.work.su[m] instead of go.work.sum to allow
the COPY to succeed even when go.work.sum doesn't exist yet.
This happens on fresh monorepos before dependencies are synced.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:58:46 -07:00
jordan
b093a4b26d feat: implement Visual Verification API layer (Week 2)
Add REST API endpoints for submitting visual verification tasks,
tracking progress via SSE, and retrieving screenshot/video artifacts.

Changes:
- Add ScopeVerifyRead/ScopeVerifyWrite auth scopes
- Create VerifyService for task submission and lifecycle management
- Create VerifyHandler with POST/GET/DELETE/SSE endpoints:
  - POST /verify - Submit capture task
  - GET /verify/{taskId} - Get task status and artifacts
  - GET /verify/{taskId}/stream - SSE progress stream
  - DELETE /verify/{taskId} - Cancel pending task
  - GET /projects/{id}/verify - List verify tasks
- Wire VerifyExecutor in main.go for Playwright pod execution
- Fix work.go validation to include "verify" task type
- Add comprehensive handler tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:29:40 -07:00
jordan
210064d490 feat: add diagnostics endpoint and external health monitoring
- Add /diagnostics endpoint for system health overview
- Add external health worker for monitoring Gitea, Woodpecker, Registry
- Add health check methods to Gitea and Woodpecker clients
- Remove hardcoded fallback projects (pantheon, aeries)
- Add diagnostics domain types and service layer
- Add comprehensive tests for diagnostics handler and service
- Fix tests to use registered test project instead of hardcoded one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:10:56 -07:00
jordan
9a1309a0c5 feat: fix composable monorepo CI builds + health endpoint improvements
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)

Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health

Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:46:51 -07:00
jordan
b5fdf35f1b feat: add WorkerService.FailTask for audit updates + visual verification scaffolding
- Add FailTask to WorkerService to update build_audit on failure path
  (fixes bug where audit showed "running" when task actually failed)
- Add WorkServiceFailer interface to avoid circular dependency
- Add VerifyExecutor with Playwright-based visual verification
- Add verify domain types (VerifySpec, VerifyResult, screenshot capture)
- Wire VerifyExecutor placeholder into WorkExecutor (impl in Week 2)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:09:16 -07:00
jordan
cfba724f8a feat: add work task error classification and user-facing error codes
- Add WorkErrorCode type with RATE_LIMITED, AUTH_FAILED, TIMEOUT, STALE_WORKER, AGENT_ERROR, INVALID_SPEC
- Add ClassifyAgentError function to detect error patterns from stderr
- Add error_code column to work_queue table (migration 016)
- Add FailWithCode method to WorkQueue interface and implementations
- Update RequeueStaleWithIDs to mark permanently failed tasks with STALE_WORKER
- Add ErrorCode to BuildResult for API responses
- Update work executor to classify errors before failing tasks

This enables users to see actual failure reasons (e.g., "RATE_LIMITED") instead of
builds stuck in "running" state forever when Claude hits rate limits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:07:34 -07:00
jordan
6e8f5821af feat: add artifact pass/fail/needs-fix lifecycle for SDLC execution phases
- Add pass/fail/needs-fix CLI commands to cmd/sdlc/cmd_artifact.go
- Add 3 new methods to SDLCExecutor interface in internal/port
- Implement methods in kubernetes adapter
- Add service methods to SDLCService
- Add HTTP handlers for POST .../artifacts/{type}/pass|fail|needs-fix
- Update 6 skeleton commands to evaluate and set artifact status
- Update test mocks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:14:53 -07:00
jordan
64ccf0b85d feat: add feature development E2E test and SDLC handler fixes
- Add feature-dev-test.sh: full 10-step E2E test for SDLC + Claude Code workflow
- Update feature-development.md cookbook with complete workflow documentation
- Fix SDLC orchestrator and project management handler improvements
- Update scaffold-test.sh with minor fixes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 20:12:40 -07:00
jordan
aaf66764fb feat: add worker pool infrastructure for composable projects
- Add POST /workers/register and POST /workers/{workerId}/heartbeat endpoints
- Start worker health checker goroutine in main.go
- Fix network policy to allow K8s API server access (includes real endpoint IPs)
- Add rdev.orchard9.ai/role: worker label to claudebox StatefulSet

This enables the embedded WorkExecutor to reach claudebox-0 for executing
builds on composable projects that don't have dedicated pods.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:55:37 -07:00
jordan
572b221e20 feat: add automatic cleanup for cookbook test projects
- Add AUTO_TEARDOWN env var and --auto-teardown flag to cookbook scripts
- Scripts automatically delete created projects on exit (including Ctrl+C)
- Add DELETE /projects/cleanup API endpoint for bulk cleanup
- Supports shell-style glob patterns (e.g., "tree-test-*")
- Includes dry_run mode and older_than_hours filter for safety
- Requires admin scope for actual deletion
- Update cookbook scripts: landing-test, composable-test, template-validation,
  feature-test, tree-runner

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 17:54:15 -07:00
jordan
6c51469c89 fix: cookbook tree runner stdout/stderr separation and bash brace expansion
- Fix bash brace expansion issue with ${2:-{}} defaults causing extra } chars
- Redirect step status messages to stderr to prevent JSON output pollution
- Redirect wait_pipeline/wait_site/diagnose output to stderr
- Add SDLC handler tests for state, features, tasks, artifacts endpoints
- Add SDLC classifier tests for phase transitions and blocking
- Add SDLC CLI command tests for feature, task, branch, merge operations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:15:02 -07:00
jordan
56e3f83955 feat: add auth scopes, OpenAPI docs, SDLC guides, and code quality improvements
- Add auth.RequireScope() to all handler routes for proper authorization
- Add SDLC OpenAPI endpoint documentation (state, features, tasks, branches, merge, archive, orchestrator)
- Add SDLC documentation guides (getting-started, cli-reference, api-reference, command-catalog)
- Add artifact_test.go for SDLC artifact coverage
- Add CLAUDE.md rules: auth scopes requirement, error wrapping with %w
- Fix error wrapping to use %w instead of %v throughout codebase
- Improve CLI merge command with conflict detection and resolution
- Fix handler tests to include auth middleware for RequireScope
- Add cookbook tree runner scripts for automated testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 13:55:50 -07:00
jordan
f22b220c6d feat: add SDLC branch management, merge, archive, and orchestrator APIs
Add branch lifecycle commands (branch, merge, archive) to the SDLC CLI.
Introduce orchestrator handler and service for multi-step SDLC workflows.
Expand skeleton template with 15 Claude commands covering the full feature
lifecycle. Extend classifier rules, error types, and executor port for
branch operations. Split rules.go and classifier_test.go to stay within
500-line limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 12:30:03 -07:00
jordan
425ef0f806 feat: add SDLC orchestration - library, CLI, and API integration
Implements deterministic feature lifecycle management for agent-driven
development. Agents use the CLI in pods; operators control via REST API.

Library (internal/sdlc/):
- Feature lifecycle with 10 phases (draft → released)
- Classifier engine with priority-ordered rules
- Artifact tracking with approval workflow
- Task management within features
- YAML-based state persistence

CLI (cmd/sdlc/):
- init, state, next, feature, artifact, task, query commands
- --json flag for machine-readable output
- Runs inside project pods

API (21 endpoints under /projects/{id}/sdlc/):
- State: GET /state, GET /next
- Features: CRUD + transition/block/unblock
- Artifacts: approve/reject per type
- Tasks: add/start/complete/block
- Queries: blocked/ready/needs-approval

Architecture:
- Port: SDLCExecutor interface (internal/port/)
- Adapter: kubectl exec into pods (internal/adapter/kubernetes/)
- Service: pod resolution + logging (internal/service/)
- Handlers: 5 files under 500-line limit (internal/handlers/)

Also includes template upgrades (chassis framework, UI components,
OpenAPI helpers, backend/frontend guides) and component improvements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 09:57:05 -07:00
jordan
62460bf098 feat: complete template upgrade - chassis framework, UI library, auth, app-nextjs, OpenAPI, and cookbook
Weeks 1-7 of the template upgrade plan:
- pkg/api: typed HTTPError with sentinels, Wrap/WrapMiddleware, Bind, health probes, OpenAPI schema/param builders
- skeleton/packages: ui (design tokens, components), layout (DashboardShell), auth (AuthProvider, ProtectedRoute), api-client
- skeleton/pkg: httperror, app/handler, app/bind, app/health, auth (JWT/API key middleware)
- components/app-nextjs: Next.js 14 App Router template with dashboard, server actions, auth
- cookbooks/feature-development.md with test and validation scripts
- Handler tests for components, project management, and woodpecker webhook
- 3 rounds of code review fixes applied

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 00:46:51 -07:00
jordan
c280a92012 feat: add operations audit system and template improvements
Operations Audit (new feature):
- Add Operation domain model with status tracking (pending, running, completed, failed, cancelled)
- Add OperationRepository with PostgreSQL implementation
- Add OperationService for CRUD and lifecycle management
- Add operations handlers (list, get, cancel endpoints)
- Add migration 015_operations.sql for operations table
- Add operation cleanup worker for stale operation handling
- Add ErrOperationNotFound to domain errors

Template Improvements:
- Add CLAUDE.md configuration files to astro-landing, default, and go-api templates
- Fix PORT template variable usage in nginx configs for app templates
- Add replace directives for local pkg module in Go templates
- Simplify Go service/worker Dockerfiles for workspace builds
- Fix TypeScript error in logger template

Other:
- Refactor landing-test.sh cookbook script
- Update CLAUDE.md version reference

Note: Some files exceed 500-line limit (pre-existing debt + new feature)
- component.go: 550 lines (unchanged, pre-existing)
- main.go: 522 lines (added operations wiring)
- operation_repo.go: 569 lines (new, needs splitting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:57 -07:00
jordan
b3d47abd7c feat: add curated skills, commands, and agents to skeleton template
Add best-of-best Claude Code configuration from local setup to the
composable monorepo skeleton template, giving new projects a powerful
starting configuration.

Commands added (4):
- do-parallel: Execute tasks in parallel waves with agent selection
- remember: Store learnings as institutional memory
- prepare: Pre-implementation readiness assessment
- root-cause: Root cause analysis with parallel investigation

Skills added (5):
- orchestrated-execution: Task pipelines with implementation → review → fix
- root-cause-analyst: Systematic diagnosis with confidence scoring
- knowledge-librarian: Organize learnings in ai-lookup/ structure
- feature-verifier: Verify features work with evidence matrix
- prepare: Binary outcome readiness assessment (brief or gap list)

Agents added (1):
- quality-engineer: Code quality, test coverage, error handling reviewer

All Citadel-specific references genericized to use skeleton's existing
agents (go-specialist, testing-strategist, security-architect, etc).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 15:33:25 -07:00
jordan
e26bb28b61 feat: add pipeline steps API with debugging diagnostics
- Add GET /projects/{id}/pipelines/{number}/steps endpoint
- Return step name, status, duration, exit_code for all steps
- Include last 50 lines of log for failed steps
- Enhance test script with automatic diagnostics on failure
- Add diagnose subcommand for deep pipeline analysis
- Show K8s pod state on site accessibility failures
- Split woodpecker adapter into client.go and pipelines.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:44:26 -07:00
jordan
05a64c51e7 release: v0.10.27 - fix: woodpecker step YAML multi-line command syntax 2026-02-01 12:42:18 -07:00
jordan
c2b0447d80 feat: add per-component deploy steps and component templates endpoint
Add deploy-{name} CI steps to all component templates (app-astro,
app-react, service, worker) so each component deploys independently
via kubectl set image on merge to main. Replace the skeleton's
generic deploy step with a verify step that confirms deployments.

Add GET /templates/components endpoint for listing available component
templates with optional type filter. Simplify component API by merging
type+template into a single type field (e.g., "app-react" instead of
type="app" template="app-react").

Include ESLint configs and pnpm-workspace.yaml in app templates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 22:31:41 -07:00
jordan
f6ced22e06 fix: Use FQDN for k8s service hostnames and remove broken commonLabels
Short-form DNS names (e.g. postgres.databases.svc) fail to resolve in
new pods due to k8s DNS search domain limitations. Switch all service
hostnames to FQDNs (*.svc.cluster.local).

Remove commonLabels from kustomization.yaml — it injected labels into
all selectors including NetworkPolicy egress rules (blocking DNS to
CoreDNS) and Deployment selectors (causing immutability errors).

Add OTEL_EXPORTER_OTLP_ENDPOINT env var to deployment YAML so the
telemetry collector endpoint uses the FQDN without requiring a binary
rebuild.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 20:46:04 -07:00
jordan
8282d60c69 feat: implement composable monorepo template system with component architecture
Adds the composable monorepo template system that generates project skeletons
with pluggable components (service, worker, app-react, app-astro, cli).

Key changes:
- Monorepo skeleton templates with shared pkg/, scripts/, and git hooks
- Component templates (service, worker, app-react, app-astro, cli) with
  Dockerfiles, CI steps, and component.yaml manifests
- Component domain model with validation and dependency resolution
- Component handler endpoints for CRUD and composition
- Template provider extended with BuildComposableProject and component assembly
- Deployer extended with composable project deployment support
- Handler timeout constants (TimeoutFastLookup through TimeoutLongRunning)
- envutil package for centralized env var reads with defaults
- api.DecodeJSON helper for standardized request body decoding
- Standardized response helpers (WriteBadRequest, WriteNotFound, etc.)
- Replaced fullstack-app cookbook with composable-app cookbook
- Hardened handler timeouts, logging, and error responses across all handlers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 19:11:42 -07:00
jordan
c59d348040 chore: prepare for composable monorepo template implementation
This commit captures the current state before implementing the composable
monorepo template system. Key changes included:

Infrastructure:
- Add CockroachDB provisioner adapter for database provisioning
- Add Redis provisioner adapter for cache provisioning
- Add build events system with PostgreSQL storage
- Add WebSocket endpoint for real-time build progress

Code agent improvements:
- Fix Claude Code adapter to use default allowed tools instead of dangerously-skip-permissions
- Add context-aware stream closing for cancellation support
- Improve parser tests for edge cases

Build system:
- Add build event constants and metrics
- Remove deprecated git_operations.go (replaced by pod_git_operations.go)
- Add rollback logic for multi-step provisioning operations

Documentation:
- Add composable-monorepo feature documentation
- Add DNS/Cloudflare service documentation
- Update deployment and troubleshooting guides

Cookbooks:
- Add fullstack-app cookbook
- Refactor landing-test with shared library

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 11:39:28 -07:00
jordan
910bcb62e1 fix: Sync build audit with work queue when stale tasks are requeued
When a worker dies mid-build, queue maintenance now updates both
work_queue and build_audit tables when requeuing stale tasks.
This prevents builds from showing "running" forever in the API.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 02:07:52 -07:00
jordan
e9984ebc07 fix: Include stderr and troubleshooting help in Claude Code errors
When Claude fails to execute, error messages now include:
- Captured stderr output from the failed command
- Troubleshooting commands to exec into pod and run `claude login`

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 23:12:01 -07:00
jordan
9c15976f86 feat: Complete Claude endpoint and update cookbook
- Add session_id, model, allowed_tools to Claude request handler
- Update OpenAPI spec for Claude endpoint
- Fix BuildExecutor constructor call sites
- Rewrite landing-test.sh for agent-driven flow
- Fix cookbook documentation for correct API format

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 21:25:29 -07:00
jordan
4a18b1cd07 fix: Persist build audit status when worker claims task
Root cause: WorkerService.ClaimTask() was modifying the audit entry
in memory but never persisting it to the database. This caused build
tasks to remain stuck at "pending" status even after being claimed.

Changes:
- Add UpdateStatus method to port.BuildAudit interface
- Implement UpdateStatus in postgres.BuildAuditRepository
- Fix ClaimTask to call audit.UpdateStatus() to persist status
- Add test coverage for audit update during task claim
- Update all mock implementations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 21:25:04 -07:00
jordan
34e72687e6 feat: Complete automation gaps for repeatable project deployments
- Initial K8s deployment auto-creation during project creation
- DNS record upsert support (create or update existing records)
- Ingress host management for domain aliases (AddIngressHost/RemoveIngressHost)
- Woodpecker deployer RBAC manifest for CI deploy steps
- Single-commit template seeding via Gitea bulk file API

Closes automation gaps exposed during www.threesix.ai launch:
- Projects now auto-create K8s Deployment/Service/Ingress on creation
- Domain aliases automatically update both DNS and K8s ingress
- CI deploy steps work without manual RBAC setup
- Template seeding triggers only one CI pipeline (not per-file)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 15:18:31 -07:00
jordan
4c41bc3a3f fix: Use cluster-issuer for TLS certs in project deploys
The deployer was using cert-manager.io/issuer (namespace-scoped)
referencing letsencrypt-threesix which only exists in the threesix
namespace. Projects deploy to the projects namespace, so changed to
cert-manager.io/cluster-issuer with letsencrypt-prod.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:29:34 -07:00
jordan
ee2c0d6482 fix: Use repo/tags format for Kaniko plugin (not destinations)
The destinations format caused Kaniko to push images with the full
registry URL as part of the repo path (registry.threesix.ai/name
instead of just name). Using registry + repo + tags format pushes
to the correct path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:07:49 -07:00
jordan
5a7b9342c6 fix: Use registry.threesix.ai instead of nonexistent zot.orchard9.ai
The templates referenced zot.orchard9.ai which has no DNS record.
The actual zot registry is at registry.threesix.ai. Also updated
static templates to use Kaniko plugin instead of docker:24-dind.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:01:48 -07:00