Commit Graph

73 Commits

Author SHA1 Message Date
jordan
592b2d5ec0 fix: clarify database types across docs and fix video storage persistence
Two distinct fixes:

1. Database terminology: Make it crystal clear that generated projects use
   CockroachDB in production and PostgreSQL for local dev, while the rdev
   platform itself uses PostgreSQL. Updated 15 files across skeleton agents,
   component templates, cookbook trees, and platform docs.

2. Video storage: VideoHandler was ignoring vid.Data bytes (already downloaded
   by the Gemini adapter with auth) and re-downloading from the provider URL
   with a plain GET — which fails because Gemini URLs require API key auth.
   Now uses vid.Data first, falls back to downloadURL only for public URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:13:21 -07:00
jordan
a8c8a0a14d feat: add GCS-based persistent media storage, AI generation pipeline, and composable skeleton packages
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Adds complete media storage pipeline with GCS presigned uploads, AI image/video/text generation
via queue-based workers, realtime SSE event streaming, and comprehensive skeleton packages
(storage, mediagen, textgen, generation, realtime, persona, routing, ai-client). Includes
security fixes for media delete authorization, nil pointer guards in handlers, video persistence
via download-then-upload, consistent signed URLs, and Image→ImageIcon rename to avoid DOM collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:29:09 -07:00
jordan
7249575dea feat(sessions): add command execution endpoint and activity tracking
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add POST /sessions/:id/exec endpoint for executing commands in sessions
- Add session activity tracking (last_activity_at timestamp)
- Add database migration 024 for session activity column
- Add comprehensive tests for session handlers and service layer
- Add wildcard TLS certificate for preview.threesix.ai subdomain
- Add infrastructure mocks for testing preview service
- Refactor preview cleanup logic to remove unused methods
- Add AIOS core documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-13 08:41:05 -07:00
jordan
84af398d85 refactor: add timeout constants for agent execution tiers
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add TimeoutAgentExecution (22m) to handlers for synchronous SDLC
execution, and TimeoutAgent{Default,Medium,Heavy} (12/22/47m) to
workers for tiered agent task execution. Aligns with SDLC action
complexity tiers and prevents inline duration literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-11 10:48:24 -07:00
jordan
542bc722ab fix(architect): handle missing projects in repo, add cookbook hooks/validation
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
The architect API returned "failed to start conversation" because
projectRepo.Get() failed — the in-memory K8s repo watches the rdev
namespace but projects deploy to the projects namespace. Made project
lookup non-fatal with fallback to default pod. Added error logging to
all architect handler methods (were silently swallowing errors).

Also adds setup-hooks, commit-after-qa, and pre-merge-validate steps
to the foundary cookbook tree for git hooks and code quality gates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 02:25:40 -07:00
jordan
a9ad3d8304 chore: accumulated platform hardening and CI fixes
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
CI / Woodpecker:
- Add explicit depends_on to all .woodpecker.yml steps (rdev + templates)
- Fix skip_tls_verify -> skip-tls-verify (correct Kaniko flag name)
- Add replicasets get/list to deployer RBAC for rollout status
- Skeleton template: add failure:ignore on docs steps, Traefik TLS
  annotations on ingress, depends_on on verify step

Component templates:
- Fix container name in deploy steps (PROJECT_NAME-COMPONENT_NAME)
- Replace kubectl scale with kubectl patch for replicas
- Add post-deploy image verification and rollout status checks
- Applied consistently across all 5 component templates

Adapters:
- gitea: Add HTTP client timeout (30s), context cancellation checks,
  handle 404 on GetRepo/DeleteRepo
- zot: Add retry with exponential backoff (doWithRetry), limit response
  body reads to 10MB
- cockroach: Use net.JoinHostPort for IPv6-safe DSN construction
- woodpecker: Fix error wrapping (%v -> %w)
- redis: Fix error wrapping (%v -> %w)
- deployer: Add context cancellation checks

Services:
- apikey_service: Fix error wrapping (%v -> %w)
- component_deploy: Fix error wrapping (%v -> %w)
- project_infra: Fix error wrapping (%v -> %w)
- webhook/dispatcher: Fix error wrapping (%v -> %w)

Other:
- CLAUDE.md: Add guide links for Gitea, Go 1.25, Woodpecker v3,
  Traefik v3, Zot registry
- circuitbreaker: Add test for error wrapping
- docs: Update deployment, troubleshooting, and runbook docs
- health: Fix error wrapping (%v -> %w)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:16:56 -07:00
jordan
3c9876a678 fix(worker): increase SSE scanner buffer to 1MB in claudebox client
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The HTTP claudebox client's ExecuteStream method used a bare
bufio.NewScanner with the default 64KB max token size. When Claude Code
produces tool results > 64KB (e.g., reading large files), the SSE event
exceeds the scanner limit and fails with "token too long".

Every other scanner in the codebase (claudecode adapter, claudebox
executor, kubernetes executor) already uses scanner.Buffer(buf, 1MB).
This was the only one missed.

Fixes: "agent execution failed: read stream: bufio.Scanner: token too long"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:14:20 -07:00
jordan
b6e778d5ab fix(git): harden git flow for concurrent SDLC stress test failures
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
5 fixes from stress test analysis:

1. CRITICAL: Add pull-before-push to claudebox GitOperations.CommitAndPush,
   matching the fix already in PodGitOperations (prevents push rejections
   when concurrent builds advance the remote).

2. HIGH: Extract ResetToMain into PodGitOperations as a shared public method.
   Wire into BuildExecutor after CloneRepo and update SDLCTaskExecutor to
   use the shared method. Prevents builds from running on wrong branch when
   worker pods are reused across tasks.

3. HIGH: Make branch create push failure fatal with retry+rollback in
   cmd/sdlc/cmd_branch.go. Prevents orphaned .sdlc/ state that causes
   merge failures after completing all 10 SDLC phases.

4. MEDIUM: Shell-escape token in credential helpers (both PodGitOperations
   and claudebox GitOperations) to prevent shell injection via tokens
   containing special characters.

5. MEDIUM: Add GitResetToMain to claudebox sidecar (git.go implementation,
   server.go endpoint, client.go HTTP method) and wire into
   HTTPSDLCTaskExecutor for the HTTP sidecar path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 20:57:27 -07:00
jordan
cefc15aa7d fix(worker): include stdout in error messages when Claude command fails
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Auth errors like "OAuth token has expired" were lost because Claude writes
them to stdout, not stderr. The error message only showed kubectl's generic
"command terminated with exit code 1". Now includes both stdout and stderr
in the error, making failures immediately diagnosable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 17:55:46 -07:00
jordan
b7d0e84946 fix(deploy): create component deployments with 0 replicas to prevent ImagePullBackOff
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Components are scaffolded before CI builds their images. Previously deployments
started with 1 replica, causing ImagePullBackOff until the first build completed.
Now deployments start at 0 replicas; CI deploy steps scale to 1 after verifying
the image exists in the registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 10:16:14 -07:00
jordan
9f957d6e75 fix(templates): harden component CI steps and compile regexes
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add --connect-timeout 10 and --max-time 15 to all verify step curl
  calls to prevent hanging on registry health checks
- Fix cli template: depends_on [deps] -> [preflight] for consistency
- Add cross-reference comment to service template about verify logic
  being replicated across all 5 component templates
- Document component CI step rules in composable-monorepo.md
- Compile regexes at package level instead of per-call in
  component_updates.go
- Add component_updates_test.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:36:23 -07:00
jordan
9226454b85 feat: label-based undeploy, GC reconciliation, checkout/sessions, pool status
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add UndeployAll() using label selectors to clean up monorepo components
  on project deletion (replaces name-based Undeploy in DeleteProject and
  the direct undeploy handler)
- Add ResourceGC background worker that periodically finds K8s resources
  whose project label has no matching DB record, deletes after 1h safety
  window
- Widen deployer client type from *kubernetes.Clientset to
  kubernetes.Interface for testability
- UndeployAll accumulates errors via errors.Join instead of failing fast
- Add checkout/checkin sidecar dev flow: temporary git tokens, branch
  checkout, review on checkin with cleanup workers
- Add interactive sessions: pod binding, command execution, SSE streaming,
  ephemeral preview URLs with session cleanup workers
- Add GET /workers/pool endpoint for aggregate capacity and queue depth
- Add sessions:read and sessions:execute auth scopes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:11:28 -07:00
jordan
a69eb7e587 feat(foundary): implement complete backend for conversational project design
Implements all 5 phases of Foundary Studio backend:

Phase 1: Chat Persistence (8 API endpoints)
- Conversations and messages with proper cascading deletes
- PostgreSQL schema with auto-update triggers
- Full CRUD operations with structured logging

Phase 2: Blueprint Entity (5 API endpoints)
- JSONB spec storage with GIN indexes
- Flexible structured data for project specifications
- Version-controlled blueprint management

Phase 3: Architect Service (3 API endpoints)
- Conversational AI orchestration with Claude
- Multi-turn dialogue with context building
- Blueprint spec extraction from conversations

Phase 4: Work Queue Integration
- Verified existing endpoint compatibility

Phase 5: Structured Questions (6 API endpoints)
- Four question types: text, choice, multichoice, yesno
- Answer validation with proper constraints
- Conversation-linked Q&A flow

Architecture:
- Textbook hexagonal architecture (domain → port → adapter → service → handler)
- Zero external dependencies in domain layer
- Consistent error handling with proper wrapping
- Auth scopes on all routes (projects:read, projects:execute)
- Structured logging with operation context and duration tracking
- NULL-safe DTO converters throughout

Database:
- 3 new migrations (019, 020, 021)
- UUIDs for all primary keys
- Proper foreign key constraints with ON DELETE CASCADE
- Optimized indexes including partial index for unanswered questions
- Auto-update triggers for timestamps

OpenAPI Documentation:
- Complete API documentation under 'Foundary' tag
- 22 new endpoints documented with examples
- Request/response schemas for all operations

Logging Improvements:
- Added operation field to all service logs
- Added duration_ms tracking for performance monitoring
- Log response_length instead of full response content
- Consistent use of logging field constants
- Execute-then-log pattern for delete operations

Files: 32 changed, 2800+ lines added
- 7 domain models
- 3 database migrations
- 3 port interfaces
- 3 postgres adapters
- 4 services (conversation, blueprint, question, architect)
- 4 handlers with DTOs
- OpenAPI documentation
- Integration in main.go

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-09 00:50:46 -07:00
jordan
adcea2fc1f fix(templates): upgrade Go to 1.25 and fix Woodpecker syntax
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
## Template Version Alignment
- Go: 1.23 → 1.25 across all templates (go.work, go.mod, Dockerfiles, CI)
- Alpine: latest → 3.19 (explicit version pinning)
- Woodpecker: failure:retry → failure:ignore (invalid syntax fix)

## SDLC Tree Fixes (slackpath-5-full-lifecycle)
Fixed merge failures by correcting lifecycle flow:

1. **Branch Creation**: Added missing create-branch step (planned → ready)
   - Bug: Merge command requires feature.Branch field to be set
   - Fix: POST /projects/{id}/sdlc/features/{slug}/branch

2. **Artifact Status**: Changed approval to pass for execution artifacts
   - Bug: Review/audit/QA need status="passed" not "approved"
   - Fix: /artifacts/{type}/approve → /artifacts/{type}/pass
   - Added: pass-qa step after wait-qa

3. **Phase Transition Order**: Reordered merge phase transition
   - Bug: Merge command checks if phase == "merge" first
   - Fix: transition-to-merge BEFORE merge-feature (not after)

## GCS Provisioner Fix
- Replaced deprecated option.WithCredentialsFile with env var approach
- Now uses GOOGLE_APPLICATION_CREDENTIALS for ADC (Application Default Credentials)
- Avoids security risk from deprecated credential options
- Fixed test: Added ComponentTypeGCS to ValidComponentTypes test

## Critical Rules Added
- Version alignment: All template versions must stay in sync
- When updating versions, grep entire templates/ tree

## Files Changed
- 27 template files: Go version + Woodpecker syntax
- 1 tree file: SDLC lifecycle flow corrections
- 1 CLAUDE.md: Version alignment rule
- 1 GCS provisioner: Deprecated API fix
- 1 test file: Added missing component type

Root cause: Skeleton templates lagged behind Go 1.25 release and had
invalid Woodpecker syntax. SDLC tree skipped required branch creation
and used wrong artifact approval endpoints.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 23:57:38 -07:00
jordan
4486042155 fix(registry): delete container images on project teardown
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Root cause of DIGEST_INVALID errors was registry disk exhaustion.
Project teardown wasn't cleaning up container images, causing the
registry PVC to fill up over time.

Changes:
- Add RegistryProvider port interface for registry operations
- Extend zot.Client with DeleteProjectRepositories method
- Wire registry provider into ProjectInfraService
- Delete images during DeleteProject cleanup (step 4)

The zot client uses the OCI distribution API:
- Lists all repos, filters by project prefix
- Gets manifest digests via HEAD request
- Deletes manifests by digest to trigger GC

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 02:56:18 -07:00
jordan
f20fc6c51c feat(saga): implement enterprise-grade resilience architecture
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
jordan
9085965864 fix(skeleton): enforce chi {param} URL syntax in agent guidance
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Agents were generating `:id` (Echo/Gin style) instead of `{id}` (chi style),
causing routes to not match. Updated api-designer, go-specialist agents and
skeleton CLAUDE.md with explicit CRITICAL notes about brace syntax.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:44:52 -07:00
jordan
863dfd3214 fix: skip root deployment for empty template (defaults to skeleton)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When req.Template is empty, it defaults to 'skeleton' but the check
in createInitialDeployment only matched 'skeleton' explicitly, not
empty string. This caused a broken deployment to be created for
monorepo projects with a non-existent image.

Root cause: slackpath-5 creates project with empty template, which
defaults to skeleton, but createInitialDeployment was still creating
a root deployment that references registry.threesix.ai/{project}:latest
which never gets built (skeleton has no root Dockerfile).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:32:19 -07:00
jordan
bcf9f28bb9 fix: add failure:ignore to docs build steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When docs infrastructure doesn't exist, the docs build steps should
gracefully skip without failing the entire pipeline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:26:00 -07:00
jordan
2a25a161cb fix: use plugin-kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The raw gcr.io/kaniko-project/executor with commands: doesn't work
properly in Woodpecker. Switch to woodpeckerci/plugin-kaniko with
settings: to match other component builds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:08:31 -07:00
jordan
bed72961fe fix: add --insecure flag to kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The registry.threesix.ai uses a self-signed certificate.
Service builds use plugin-kaniko with skip-tls-verify, but docs
build used raw kaniko executor without TLS bypass, causing exit 128.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:50:38 -07:00
jordan
be80fd2d4a fix: correct kaniko dockerfile path for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When --context=docs is set, the --dockerfile path should be relative
to the context directory. Changed from docs/Dockerfile.nginx to
Dockerfile.nginx since kaniko already looks in the docs/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:35:54 -07:00
jordan
caf0990ceb fix: downgrade rouge to 3.x for middleman-syntax compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
middleman-syntax ~> 3.2 requires rouge ~> 3.2, but Gemfile had rouge ~> 4.0
causing bundle install to fail with version resolution error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:48:49 -07:00
jordan
af91bad0ff feat: add Slate documentation templates to skeleton
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Adds complete Slate documentation infrastructure to generated projects:
- docs/ directory with Gemfile, config.rb, and source templates
- Dockerfile for building docs site
- Dockerfile.nginx for serving static docs
- generate-docs.sh script for CI integration
- Claude command for AI-assisted docs generation
- OpenAPI → Slate markdown conversion via widdershins

Also includes:
- --export-openapi flag for service binaries
- DNS provisioning for docs.{domain} subdomain
- Updated project_infra for docs DNS records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:06:36 -07:00
jordan
f64377116a fix: add build-complete sync point for docs pipeline ordering
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The export-openapi step was running in parallel with component builds
because it had no explicit dependency. This could cause docs generation
to run before component services were fully built.

Changes:
- Add build-complete step with NO depends_on (waits for ALL prior steps)
- Make export-openapi depend on build-complete
- Complete docs pipeline: export-openapi → generate-docs → build-docs →
  build-docs-image → deploy-docs
- Update verify step label selector to use project= instead of app=

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:02:17 -07:00
jordan
59aa173384 fix: clear stale error when dequeuing work tasks
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When a task is retried (dequeued again after failure), the previous
error message was persisting in the work_queue table. This caused the
API to return confusing responses with status="running" but also
containing an error message from the previous attempt.

Now clears error and completed_at when claiming a task, matching the
fix already applied to build_audit.UpdateStatus.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:51:34 -07:00
jordan
9833725f31 fix: preserve work on build retry, clear stale audit data
Two critical fixes for build retry behavior:

1. pod_git_operations.go: Normalize remote URL before comparison
   - Clone stores URL with token (https://token:x@host/...)
   - Subsequent retry compares against URL without token
   - Without normalization, URLs never match, so workspace is always
     cleared and re-cloned, losing all code from previous attempt

2. build_audit.go: Clear stale result data when task transitions to running
   - When a failed task is retried, UpdateStatus only updated status/worker_id
   - Result and completed_at from previous failure remained, causing
     API to return stale failure data even while retry was running
   - Now clears result, completed_at and resets started_at when
     status is set to "running"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:40:36 -07:00
jordan
e58d679e67 fix: add go mod download to component Dockerfiles
Empty go.sum files were causing Docker builds to fail because
Go couldn't verify dependencies. Added go mod download steps
for both pkg and component directories before building.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:35:02 -07:00
jordan
d74efb75ff fix: wire workService to WorkersHandler and add /work/tasks endpoint
Critical fix: WorkersHandler was missing workService dependency, causing
500 errors when workers tried to fail tasks. This caused tasks to get
stuck in "running" state permanently.

Also adds:
- /work/tasks endpoint for debugging all tasks across projects
- List method to WorkQueue interface for admin views
- HTTP client tests for api_client.go and claudebox/client.go (48 tests)
- Split work.go DTOs into work_dto.go to stay under 500 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:35:39 -07:00
jordan
d7a6f37593 fix: worker graceful shutdown and RWO PVC compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add WaitGroup for graceful shutdown of in-flight tasks
- Change replicas to 1 with Recreate strategy (RWO PVC limitation)
- Optimize Dockerfile: combine RUN commands for smaller layers
- Add compiled binaries to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:35:00 -07:00
jordan
f6a2b61b16 fix: add skeleton settings.local.json (was globally gitignored)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:55:17 -07:00
jordan
3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00
jordan
3b0779fbe8 fix: slackpath trees use batch endpoint for atomic multi-component adds
Updates slackpath-2 and slackpath-4 to use POST /projects/{id}/components/batch
for adding multiple Go components atomically in a single git commit. This
prevents the go.work race condition where individual commits reference modules
that don't exist yet.

Also adds on_error: continue for infrastructure provisioning steps that may
already exist from skeleton (redis, postgres).

Verified:
- slackpath-1:  Complete (wait_build polled 5 times, detected success)
- slackpath-2:  Complete (wait_build polled 111 times, detected success)
- slackpath-3:  Infrastructure passed (worker capacity limited testing)
- slackpath-4:  Infrastructure passed (worker capacity limited testing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 14:44:53 -07:00
jordan
853ec4cf81 fix: go.work race condition with batch components and idempotent provisioning
Three coordinated fixes for CI pipeline race conditions:

1. Woodpecker step dependencies: Added depends_on: [deps] to all 6 component
   templates (service, worker, cli, app-astro, app-react, app-nextjs) so build
   steps wait for go work sync to complete.

2. Idempotent resource provisioning: Modified provisionResources() to check
   for existing database/cache before creating, preventing "already exists"
   errors on component re-adds.

3. Batch component endpoint: POST /projects/{id}/components/batch enables
   atomic multi-component additions in a single git commit. Validates all
   components upfront, provisions infra sequentially, commits code components
   atomically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:31:40 -07:00
jordan
53862c773b fix: resolve systemic debt in worker and skeleton templates
Worker template fixes:
- Replace panic() with logger.Error() + os.Exit(1) for config errors
- Remove double-timeout application (context + middleware)
- Add error message truncation to prevent log bloat
- Use named constants for shutdown grace period and stale check interval

Skeleton pkg/auth fixes:
- Fix error wrapping to use %w consistently in jwt.go
- Add GetUserOrError() as safe alternative to MustGetUser() panic

Skeleton pkg/queue fixes:
- Check RowsAffected() errors instead of ignoring them
- Add input validation to EnqueueWithOptions (require job type, cap retries)
- Add log truncation for error messages
- Fix inaccurate doc comment claiming exponential backoff

Worker timeout consolidation:
- Add internal/worker/timeouts.go with named constants
- Migrate all workers to use timeout constants

Cleanup:
- Remove obsolete slack-preparation-thoughts.md files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:44:55 -07:00
jordan
d69da6d627 feat: add structured logging infrastructure and SDLC extensions
Major changes:
- Add internal/logging package with field constants, context propagation,
  sensitive data auto-redaction, and per-component log levels
- Add worker timeout constants (TimeoutQuickOp, TimeoutHealthCheck, etc.)
- Extend SDLC with callback handlers, generate endpoints, and executor
- Add new cookbook trees for aeries and slackpath progression
- Add skeleton templates for queue, realtime, and microservices
- Add worker component template with async job processing
- Refactor services and handlers to use new logging infrastructure
- Split component.go into component_infra.go and component_listing.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:56:04 -07:00
jordan
1790afd0ee feat: add path-based ingress management for component lifecycle
Adds AddIngressPath and RemoveIngressPath to the Deployer interface
for managing per-component ingress rules in monorepo projects.

- Implement conflict retry logic for concurrent ingress updates
- Add K8s client interface for testability
- Add comprehensive unit tests for ingress path operations
- Add component deployment and teardown methods to ComponentService
- Update service templates with OpenAPI spec improvements
- Add evolving-app cookbook tree for reference
- Split resources.go into resources_ingress.go for path-based routing
- Split component.go into component_deploy.go for deployment helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:31:50 -07:00
jordan
196e3d96e8 fix: make go.work.sum optional in Dockerfiles
Use glob pattern go.work.su[m] instead of go.work.sum to allow
the COPY to succeed even when go.work.sum doesn't exist yet.
This happens on fresh monorepos before dependencies are synced.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:58:46 -07:00
jordan
210064d490 feat: add diagnostics endpoint and external health monitoring
- Add /diagnostics endpoint for system health overview
- Add external health worker for monitoring Gitea, Woodpecker, Registry
- Add health check methods to Gitea and Woodpecker clients
- Remove hardcoded fallback projects (pantheon, aeries)
- Add diagnostics domain types and service layer
- Add comprehensive tests for diagnostics handler and service
- Fix tests to use registered test project instead of hardcoded one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:10:56 -07:00
jordan
9a1309a0c5 feat: fix composable monorepo CI builds + health endpoint improvements
Composable monorepo CI fixes:
- Add empty go.sum.tmpl files for pkg, service, worker, and cli components
- Fix Dockerfile.tmpl glob patterns (COPY go.work.sum* is invalid in Kaniko)
- Add deps step to CI that runs go work sync and go mod tidy before builds
- Fix scalar-go dependency version (v0.1.2 doesn't exist, use v0.13.0)

Health endpoint improvements:
- Add registry health check (zot OCI /v2/ endpoint)
- Add health metrics for CI, registry, and Git
- Add /health/ci endpoint for Woodpecker health

Visual verification scaffolding:
- Add Playwright pod and scripts ConfigMap
- Add vision.md and implementation breakdown plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:46:51 -07:00
jordan
cfba724f8a feat: add work task error classification and user-facing error codes
- Add WorkErrorCode type with RATE_LIMITED, AUTH_FAILED, TIMEOUT, STALE_WORKER, AGENT_ERROR, INVALID_SPEC
- Add ClassifyAgentError function to detect error patterns from stderr
- Add error_code column to work_queue table (migration 016)
- Add FailWithCode method to WorkQueue interface and implementations
- Update RequeueStaleWithIDs to mark permanently failed tasks with STALE_WORKER
- Add ErrorCode to BuildResult for API responses
- Update work executor to classify errors before failing tasks

This enables users to see actual failure reasons (e.g., "RATE_LIMITED") instead of
builds stuck in "running" state forever when Claude hits rate limits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:07:34 -07:00
jordan
6e8f5821af feat: add artifact pass/fail/needs-fix lifecycle for SDLC execution phases
- Add pass/fail/needs-fix CLI commands to cmd/sdlc/cmd_artifact.go
- Add 3 new methods to SDLCExecutor interface in internal/port
- Implement methods in kubernetes adapter
- Add service methods to SDLCService
- Add HTTP handlers for POST .../artifacts/{type}/pass|fail|needs-fix
- Update 6 skeleton commands to evaluate and set artifact status
- Update test mocks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:14:53 -07:00
jordan
56e3f83955 feat: add auth scopes, OpenAPI docs, SDLC guides, and code quality improvements
- Add auth.RequireScope() to all handler routes for proper authorization
- Add SDLC OpenAPI endpoint documentation (state, features, tasks, branches, merge, archive, orchestrator)
- Add SDLC documentation guides (getting-started, cli-reference, api-reference, command-catalog)
- Add artifact_test.go for SDLC artifact coverage
- Add CLAUDE.md rules: auth scopes requirement, error wrapping with %w
- Fix error wrapping to use %w instead of %v throughout codebase
- Improve CLI merge command with conflict detection and resolution
- Fix handler tests to include auth middleware for RequireScope
- Add cookbook tree runner scripts for automated testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 13:55:50 -07:00
jordan
f22b220c6d feat: add SDLC branch management, merge, archive, and orchestrator APIs
Add branch lifecycle commands (branch, merge, archive) to the SDLC CLI.
Introduce orchestrator handler and service for multi-step SDLC workflows.
Expand skeleton template with 15 Claude commands covering the full feature
lifecycle. Extend classifier rules, error types, and executor port for
branch operations. Split rules.go and classifier_test.go to stay within
500-line limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 12:30:03 -07:00
jordan
425ef0f806 feat: add SDLC orchestration - library, CLI, and API integration
Implements deterministic feature lifecycle management for agent-driven
development. Agents use the CLI in pods; operators control via REST API.

Library (internal/sdlc/):
- Feature lifecycle with 10 phases (draft → released)
- Classifier engine with priority-ordered rules
- Artifact tracking with approval workflow
- Task management within features
- YAML-based state persistence

CLI (cmd/sdlc/):
- init, state, next, feature, artifact, task, query commands
- --json flag for machine-readable output
- Runs inside project pods

API (21 endpoints under /projects/{id}/sdlc/):
- State: GET /state, GET /next
- Features: CRUD + transition/block/unblock
- Artifacts: approve/reject per type
- Tasks: add/start/complete/block
- Queries: blocked/ready/needs-approval

Architecture:
- Port: SDLCExecutor interface (internal/port/)
- Adapter: kubectl exec into pods (internal/adapter/kubernetes/)
- Service: pod resolution + logging (internal/service/)
- Handlers: 5 files under 500-line limit (internal/handlers/)

Also includes template upgrades (chassis framework, UI components,
OpenAPI helpers, backend/frontend guides) and component improvements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 09:57:05 -07:00
jordan
62460bf098 feat: complete template upgrade - chassis framework, UI library, auth, app-nextjs, OpenAPI, and cookbook
Weeks 1-7 of the template upgrade plan:
- pkg/api: typed HTTPError with sentinels, Wrap/WrapMiddleware, Bind, health probes, OpenAPI schema/param builders
- skeleton/packages: ui (design tokens, components), layout (DashboardShell), auth (AuthProvider, ProtectedRoute), api-client
- skeleton/pkg: httperror, app/handler, app/bind, app/health, auth (JWT/API key middleware)
- components/app-nextjs: Next.js 14 App Router template with dashboard, server actions, auth
- cookbooks/feature-development.md with test and validation scripts
- Handler tests for components, project management, and woodpecker webhook
- 3 rounds of code review fixes applied

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 00:46:51 -07:00
jordan
c280a92012 feat: add operations audit system and template improvements
Operations Audit (new feature):
- Add Operation domain model with status tracking (pending, running, completed, failed, cancelled)
- Add OperationRepository with PostgreSQL implementation
- Add OperationService for CRUD and lifecycle management
- Add operations handlers (list, get, cancel endpoints)
- Add migration 015_operations.sql for operations table
- Add operation cleanup worker for stale operation handling
- Add ErrOperationNotFound to domain errors

Template Improvements:
- Add CLAUDE.md configuration files to astro-landing, default, and go-api templates
- Fix PORT template variable usage in nginx configs for app templates
- Add replace directives for local pkg module in Go templates
- Simplify Go service/worker Dockerfiles for workspace builds
- Fix TypeScript error in logger template

Other:
- Refactor landing-test.sh cookbook script
- Update CLAUDE.md version reference

Note: Some files exceed 500-line limit (pre-existing debt + new feature)
- component.go: 550 lines (unchanged, pre-existing)
- main.go: 522 lines (added operations wiring)
- operation_repo.go: 569 lines (new, needs splitting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:57 -07:00
jordan
b3d47abd7c feat: add curated skills, commands, and agents to skeleton template
Add best-of-best Claude Code configuration from local setup to the
composable monorepo skeleton template, giving new projects a powerful
starting configuration.

Commands added (4):
- do-parallel: Execute tasks in parallel waves with agent selection
- remember: Store learnings as institutional memory
- prepare: Pre-implementation readiness assessment
- root-cause: Root cause analysis with parallel investigation

Skills added (5):
- orchestrated-execution: Task pipelines with implementation → review → fix
- root-cause-analyst: Systematic diagnosis with confidence scoring
- knowledge-librarian: Organize learnings in ai-lookup/ structure
- feature-verifier: Verify features work with evidence matrix
- prepare: Binary outcome readiness assessment (brief or gap list)

Agents added (1):
- quality-engineer: Code quality, test coverage, error handling reviewer

All Citadel-specific references genericized to use skeleton's existing
agents (go-specialist, testing-strategist, security-architect, etc).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 15:33:25 -07:00
jordan
e26bb28b61 feat: add pipeline steps API with debugging diagnostics
- Add GET /projects/{id}/pipelines/{number}/steps endpoint
- Return step name, status, duration, exit_code for all steps
- Include last 50 lines of log for failed steps
- Enhance test script with automatic diagnostics on failure
- Add diagnose subcommand for deep pipeline analysis
- Show K8s pod state on site accessibility failures
- Split woodpecker adapter into client.go and pipelines.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:44:26 -07:00
jordan
05a64c51e7 release: v0.10.27 - fix: woodpecker step YAML multi-line command syntax 2026-02-01 12:42:18 -07:00