Commit Graph

164 Commits

Author SHA1 Message Date
jordan
b648a52265 fix(cookbook): don't block slackpath-5 on slow docs builds
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The wait-init step was timing out because it waited for the entire pipeline
including docs build steps. The service (preferences-api) deploys successfully
before docs. Added on_error: continue so the tree proceeds after service deploy.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:59:52 -07:00
jordan
9085965864 fix(skeleton): enforce chi {param} URL syntax in agent guidance
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Agents were generating `:id` (Echo/Gin style) instead of `{id}` (chi style),
causing routes to not match. Updated api-designer, go-specialist agents and
skeleton CLAUDE.md with explicit CRITICAL notes about brace syntax.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:44:52 -07:00
jordan
863dfd3214 fix: skip root deployment for empty template (defaults to skeleton)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When req.Template is empty, it defaults to 'skeleton' but the check
in createInitialDeployment only matched 'skeleton' explicitly, not
empty string. This caused a broken deployment to be created for
monorepo projects with a non-existent image.

Root cause: slackpath-5 creates project with empty template, which
defaults to skeleton, but createInitialDeployment was still creating
a root deployment that references registry.threesix.ai/{project}:latest
which never gets built (skeleton has no root Dockerfile).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:32:19 -07:00
jordan
bcf9f28bb9 fix: add failure:ignore to docs build steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When docs infrastructure doesn't exist, the docs build steps should
gracefully skip without failing the entire pipeline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:26:00 -07:00
jordan
2a25a161cb fix: use plugin-kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The raw gcr.io/kaniko-project/executor with commands: doesn't work
properly in Woodpecker. Switch to woodpeckerci/plugin-kaniko with
settings: to match other component builds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:08:31 -07:00
jordan
bed72961fe fix: add --insecure flag to kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The registry.threesix.ai uses a self-signed certificate.
Service builds use plugin-kaniko with skip-tls-verify, but docs
build used raw kaniko executor without TLS bypass, causing exit 128.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:50:38 -07:00
jordan
be80fd2d4a fix: correct kaniko dockerfile path for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When --context=docs is set, the --dockerfile path should be relative
to the context directory. Changed from docs/Dockerfile.nginx to
Dockerfile.nginx since kaniko already looks in the docs/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:35:54 -07:00
jordan
7f0dd8cc8b chore: trigger CI rebuild for template fixes
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-07 16:55:06 -07:00
jordan
caf0990ceb fix: downgrade rouge to 3.x for middleman-syntax compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
middleman-syntax ~> 3.2 requires rouge ~> 3.2, but Gemfile had rouge ~> 4.0
causing bundle install to fail with version resolution error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:48:49 -07:00
jordan
b41e0dfbf9 fix: use raw JSON responses in claudebox server
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The claudebox sidecar was using api.WriteJSON which wraps responses in
{data: ..., meta: ...} format. The claudebox HTTP client expects raw
JSON responses without wrapping.

This caused git clone to appear to fail - the HTTP request succeeded
and returned {data: {success: true, cloned: true}, meta: {...}}, but
the client decoded success=false because it couldn't find the fields
at the top level.

Added writeRawJSON helper and replaced all api.WriteJSON calls with it
for actual responses. Error responses still use api.WriteBadRequest
which returns proper error format.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:41:21 -07:00
jordan
af91bad0ff feat: add Slate documentation templates to skeleton
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Adds complete Slate documentation infrastructure to generated projects:
- docs/ directory with Gemfile, config.rb, and source templates
- Dockerfile for building docs site
- Dockerfile.nginx for serving static docs
- generate-docs.sh script for CI integration
- Claude command for AI-assisted docs generation
- OpenAPI → Slate markdown conversion via widdershins

Also includes:
- --export-openapi flag for service binaries
- DNS provisioning for docs.{domain} subdomain
- Updated project_infra for docs DNS records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:06:36 -07:00
jordan
f64377116a fix: add build-complete sync point for docs pipeline ordering
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The export-openapi step was running in parallel with component builds
because it had no explicit dependency. This could cause docs generation
to run before component services were fully built.

Changes:
- Add build-complete step with NO depends_on (waits for ALL prior steps)
- Make export-openapi depend on build-complete
- Complete docs pipeline: export-openapi → generate-docs → build-docs →
  build-docs-image → deploy-docs
- Update verify step label selector to use project= instead of app=

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:02:17 -07:00
jordan
ff4e31e289 chore: trigger CI rebuild
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:32:31 -07:00
jordan
02825666fb chore: swap remotes so origin=gitea (CI), github=backup
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:03:47 -07:00
jordan
59aa173384 fix: clear stale error when dequeuing work tasks
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When a task is retried (dequeued again after failure), the previous
error message was persisting in the work_queue table. This caused the
API to return confusing responses with status="running" but also
containing an error message from the previous attempt.

Now clears error and completed_at when claiming a task, matching the
fix already applied to build_audit.UpdateStatus.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:51:34 -07:00
jordan
9833725f31 fix: preserve work on build retry, clear stale audit data
Two critical fixes for build retry behavior:

1. pod_git_operations.go: Normalize remote URL before comparison
   - Clone stores URL with token (https://token:x@host/...)
   - Subsequent retry compares against URL without token
   - Without normalization, URLs never match, so workspace is always
     cleared and re-cloned, losing all code from previous attempt

2. build_audit.go: Clear stale result data when task transitions to running
   - When a failed task is retried, UpdateStatus only updated status/worker_id
   - Result and completed_at from previous failure remained, causing
     API to return stale failure data even while retry was running
   - Now clears result, completed_at and resets started_at when
     status is set to "running"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:40:36 -07:00
jordan
9cca5cc41b fix: add proper instrumentation to git clone for debugging
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Log clone request with work_dir, URL, and token presence
- Log workspace state (is_git_repo, existing remote)
- Log all decision points (pull vs clone, clear workspace)
- Detect and clear non-empty non-git directories before clone
- Capture both stdout and stderr for clone failures
- Include exit code in error messages
2026-02-07 07:59:53 -07:00
jordan
83b5d1ebb4 ci: trigger rebuild for workService fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 00:38:12 -07:00
jordan
e58d679e67 fix: add go mod download to component Dockerfiles
Empty go.sum files were causing Docker builds to fail because
Go couldn't verify dependencies. Added go mod download steps
for both pkg and component directories before building.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:35:02 -07:00
jordan
5d86bb7c57 feat: enable Claude Code OTEL telemetry in claudebox containers
Add OpenTelemetry environment variables to export Claude Code logs
and metrics to the existing OTEL collector. Provides visibility into
long-running builds.

- claudebox-worker: sidecar in rdev-worker deployment
- claudebox-standalone: StatefulSet for direct access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 19:43:47 -07:00
jordan
bc010c4746 feat: add RWX storage class and full SDLC lifecycle cookbook
- Add longhorn-rwx StorageClass for RWX volume support
- Add slackpath-5-full-lifecycle.yaml cookbook tree (all 10 SDLC phases)
- Update worker-pool.md documentation
- Consolidate PVC configuration, remove separate pvc-shared-claude.yaml
- Update rdev-worker and kustomization for new PVC structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:37:57 -07:00
jordan
d74efb75ff fix: wire workService to WorkersHandler and add /work/tasks endpoint
Critical fix: WorkersHandler was missing workService dependency, causing
500 errors when workers tried to fail tasks. This caused tasks to get
stuck in "running" state permanently.

Also adds:
- /work/tasks endpoint for debugging all tasks across projects
- List method to WorkQueue interface for admin views
- HTTP client tests for api_client.go and claudebox/client.go (48 tests)
- Split work.go DTOs into work_dto.go to stay under 500 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:35:39 -07:00
jordan
d7a6f37593 fix: worker graceful shutdown and RWO PVC compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add WaitGroup for graceful shutdown of in-flight tasks
- Change replicas to 1 with Recreate strategy (RWO PVC limitation)
- Optimize Dockerfile: combine RUN commands for smaller layers
- Add compiled binaries to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:35:00 -07:00
jordan
bc3b9b9e42 docs: remove stale ghcr.io references from releasing.md
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:12:51 -07:00
jordan
96c9389c97 docs: update build/deploy docs for Woodpecker CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- deploying.md: Add Woodpecker CI section, update constraints
- releasing.md: Add automated releases via Woodpecker, Zot registry
- RELEASE_CHECKLIST.md: Update build/deploy commands
- CLAUDE.md: Update quick reference for automated deploys

Images now at registry.threesix.ai/rdev/* instead of ghcr.io

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:54:00 -07:00
jordan
2fd52dcfed ci: fix container names in deploy step
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Use 'worker' and 'claudebox' container names for rdev-worker deployment
- Update both containers in rdev-worker deployment in single command

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:44:05 -07:00
jordan
60f98f3c18 ci: use woodpecker kaniko plugin instead of direct executor
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 23:26:39 -07:00
jordan
f6a2b61b16 fix: add skeleton settings.local.json (was globally gitignored)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:55:17 -07:00
jordan
219ccf23d0 ci: fix Go version to 1.25 for tests
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:37:11 -07:00
jordan
e76567d84d ci: trigger initial Woodpecker build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 20:05:27 -07:00
jordan
dc00921703 ci: add Woodpecker CI for self-hosted builds
- Add .woodpecker.yml with build steps for api, worker, claudebox
- Update K8s manifests to use registry.threesix.ai/rdev/*
- Remove ghcr-secret imagePullSecrets (Zot is unauthenticated)

Builds will run on Woodpecker using kaniko, pushing to our internal
Zot registry. This eliminates the QEMU cross-compilation issues on
Apple Silicon.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:26:44 -07:00
jordan
3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00
jordan
3b0779fbe8 fix: slackpath trees use batch endpoint for atomic multi-component adds
Updates slackpath-2 and slackpath-4 to use POST /projects/{id}/components/batch
for adding multiple Go components atomically in a single git commit. This
prevents the go.work race condition where individual commits reference modules
that don't exist yet.

Also adds on_error: continue for infrastructure provisioning steps that may
already exist from skeleton (redis, postgres).

Verified:
- slackpath-1:  Complete (wait_build polled 5 times, detected success)
- slackpath-2:  Complete (wait_build polled 111 times, detected success)
- slackpath-3:  Infrastructure passed (worker capacity limited testing)
- slackpath-4:  Infrastructure passed (worker capacity limited testing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 14:44:53 -07:00
jordan
da482b48b4 release: v0.10.56 - fix: worker template unused pkg/config import 2026-02-05 13:46:45 -07:00
jordan
0c7282b9eb release: v0.10.55 - fix: Dockerfile templates use GOWORK=off for independent component builds 2026-02-05 13:09:35 -07:00
jordan
a7fcba3587 release: v0.10.54 - fix: go.work race condition with batch components 2026-02-05 12:46:22 -07:00
jordan
853ec4cf81 fix: go.work race condition with batch components and idempotent provisioning
Three coordinated fixes for CI pipeline race conditions:

1. Woodpecker step dependencies: Added depends_on: [deps] to all 6 component
   templates (service, worker, cli, app-astro, app-react, app-nextjs) so build
   steps wait for go work sync to complete.

2. Idempotent resource provisioning: Modified provisionResources() to check
   for existing database/cache before creating, preventing "already exists"
   errors on component re-adds.

3. Batch component endpoint: POST /projects/{id}/components/batch enables
   atomic multi-component additions in a single git commit. Validates all
   components upfront, provisions infra sequentially, commits code components
   atomically.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:31:40 -07:00
jordan
19837f7251 release: v0.10.53 - fix: shell-quote SDLC command args to handle spaces in titles 2026-02-05 00:44:34 -07:00
jordan
022184ef6a chore: update claudebox to v0.4.0 (includes sdlc binary) 2026-02-05 00:18:02 -07:00
jordan
4766a54314 release: v0.10.52 - feat: SDLC worker routing for skeleton projects with auto-init 2026-02-05 00:16:29 -07:00
jordan
46c8bfeec2 release: v0.10.51 - feat: inject provisioned credentials into component deployments 2026-02-05 00:09:43 -07:00
jordan
1e853980e4 feat: inject provisioned credentials into component deployments
Components now automatically receive DATABASE_URL, REDIS_URL, and other
infrastructure credentials when deployed. Previously, credentials were
provisioned and stored but never injected into K8s deployments.

Changes:
- Add fetchProjectCredentials() to component_deploy.go
- Populate spec.Secrets before calling deployer.Deploy()
- Fix slackpath-4 to provision postgres + redis before services
- Add terminology docs to clarify platform vs skeleton code

This completes the infrastructure provisioning flow:
1. add-db → provisions CockroachDB, stores DATABASE_URL
2. add-service → deploys with DATABASE_URL in environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:09:15 -07:00
jordan
34e12ff3d5 release: v0.10.50 - fix: resolve systemic debt in worker and skeleton templates 2026-02-04 23:57:55 -07:00
jordan
53862c773b fix: resolve systemic debt in worker and skeleton templates
Worker template fixes:
- Replace panic() with logger.Error() + os.Exit(1) for config errors
- Remove double-timeout application (context + middleware)
- Add error message truncation to prevent log bloat
- Use named constants for shutdown grace period and stale check interval

Skeleton pkg/auth fixes:
- Fix error wrapping to use %w consistently in jwt.go
- Add GetUserOrError() as safe alternative to MustGetUser() panic

Skeleton pkg/queue fixes:
- Check RowsAffected() errors instead of ignoring them
- Add input validation to EnqueueWithOptions (require job type, cap retries)
- Add log truncation for error messages
- Fix inaccurate doc comment claiming exponential backoff

Worker timeout consolidation:
- Add internal/worker/timeouts.go with named constants
- Migrate all workers to use timeout constants

Cleanup:
- Remove obsolete slack-preparation-thoughts.md files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:44:55 -07:00
jordan
d69da6d627 feat: add structured logging infrastructure and SDLC extensions
Major changes:
- Add internal/logging package with field constants, context propagation,
  sensitive data auto-redaction, and per-component log levels
- Add worker timeout constants (TimeoutQuickOp, TimeoutHealthCheck, etc.)
- Extend SDLC with callback handlers, generate endpoints, and executor
- Add new cookbook trees for aeries and slackpath progression
- Add skeleton templates for queue, realtime, and microservices
- Add worker component template with async job processing
- Refactor services and handlers to use new logging infrastructure
- Split component.go into component_infra.go and component_listing.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:56:04 -07:00
jordan
1790afd0ee feat: add path-based ingress management for component lifecycle
Adds AddIngressPath and RemoveIngressPath to the Deployer interface
for managing per-component ingress rules in monorepo projects.

- Implement conflict retry logic for concurrent ingress updates
- Add K8s client interface for testability
- Add comprehensive unit tests for ingress path operations
- Add component deployment and teardown methods to ComponentService
- Update service templates with OpenAPI spec improvements
- Add evolving-app cookbook tree for reference
- Split resources.go into resources_ingress.go for path-based routing
- Split component.go into component_deploy.go for deployment helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:31:50 -07:00
jordan
619a57c240 release: v0.10.49 - fix: add workspace package paths to app-react and app-astro tailwind configs 2026-02-04 01:03:14 -07:00
jordan
78e8eb5f27 release: v0.10.48 - feat: multi-component ingress routing with path-based routing 2026-02-03 23:03:22 -07:00
jordan
f8433a1d16 release: v0.10.47 - fix: make go.work.sum optional in component Dockerfiles 2026-02-03 19:58:49 -07:00
jordan
196e3d96e8 fix: make go.work.sum optional in Dockerfiles
Use glob pattern go.work.su[m] instead of go.work.sum to allow
the COPY to succeed even when go.work.sum doesn't exist yet.
This happens on fresh monorepos before dependencies are synced.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:58:46 -07:00