Commit Graph

91 Commits

Author SHA1 Message Date
jordan
ee1c214b7e fix: correct Resend DNS record type, name, and MX priority
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Three bugs in the notify provisioner DNS record upsert:

1. rec.Record ("DKIM"/"SPF") was used as the DNS record type — Cloudflare
   doesn't know those labels. Fix: use rec.DNSType ("TXT"/"MX") from the
   resendDNSRecord.type JSON field, which is the actual DNS record type.

2. rec.Name from Resend is already relative to the zone apex
   (e.g., "resend._domainkey.mail.project-name"), not relative to the
   registered domain. Code was doing rec.Name + "." + host which produced
   a doubled subdomain. Fix: pass rec.Name directly — Cloudflare's
   normalizeName appends ".baseDomain" to build the correct FQDN.

3. MX records have priority 10 in Resend's response but DNSRecord had no
   Priority field and Cloudflare CreateRecord/UpdateRecord didn't send it.
   Fix: add Priority int to domain.DNSRecord and include it in the body
   for both Create and Update when non-zero.

These bugs caused DKIM/SPF DNS records to never be created for any project.
Re-provision affected projects using POST /projects/{id}/notify/provision
after clearing NOTIFY_RESEND_DOMAIN_ID from the credential store.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 19:52:11 -07:00
jordan
c54664b751 feat: add POST /projects/{id}/notify/provision to repair Resend domain
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Repairs projects where notify account was created but Resend domain
provisioning (steps 7-9) failed — e.g., RESEND_API_KEY not yet
configured at project creation time.

- ProvisionNotifyDomain in port.NotifyProvisioner interface
- Provisioner.ProvisionNotifyDomain: creates Resend domain for existing
  notify host, upserts DKIM/SPF DNS records in Cloudflare, kicks off
  async verification via verifyWithRetry
- POST /projects/{id}/notify/provision handler:
  - reads NOTIFY_HOST from credential store (fails if not set)
  - rejects if NOTIFY_RESEND_DOMAIN_ID already present (use /verify)
  - stores returned domain ID as NOTIFY_RESEND_DOMAIN_ID credential

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 17:52:03 -07:00
jordan
fa0d030def feat: improve notify domain verification reliability and add status endpoints
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add verifyWithRetry to provisioner: 60s initial DNS propagation delay,
  5 retries with 30s backoff before marking verification as failed
- Add GetNotifyDomainStatus: polls Resend API for domain verification status,
  returns "not_configured" when Resend not set up
- Add VerifyProjectNotify: synchronous re-verification for handler use
- Add getDomainStatus to resendAPI interface + resendClient implementation
- Add NotifyDomainStatus domain struct (host, resend_domain_id, status)
- Guard NOTIFY_RESEND_DOMAIN_ID storage against empty string writes
- New handler: GET /projects/{id}/notify/status (returns verification state)
- New handler: POST /projects/{id}/notify/verify (triggers re-verification)
- Add verify-notify-domain cookbook step to persona-community,
  slackpath-1, and slackpath-4 trees (polls status for up to 6 min)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 16:25:55 -07:00
jordan
a843fd7ff4 fix: make NOTIFY_API_KEY optional — fall back to log-only email mode
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
NOTIFY_URL is a global platform credential; NOTIFY_API_KEY is project-
scoped and may not be provisioned if notify setup failed or the
notifyProvisioner wasn't configured. Previously the service would crash
on startup with "invalid configuration: API key is required" when
NOTIFY_URL was set but NOTIFY_API_KEY was missing.

Now the condition checks both: only initialize the notify client when
both NOTIFY_URL and NOTIFY_API_KEY are set. When either is absent, fall
back to log-only mode with a warning (instead of os.Exit(1)).

This is the correct behavior: email not delivered is survivable, but a
service crash on startup breaks the entire application.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:52:58 -07:00
jordan
d91bfc50fa fix: handle missing Redis credentials and redis.Nil in provisioner
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Two bugs fixed:

1. redis.Nil not handled in GetProjectCache: When ACL GETUSER returns nil
   (user doesn't exist), go-redis represents this as redis.Nil error. The
   provisioner only checked for err.Contains("User") which didn't match,
   causing spurious "get ACL user: redis: nil" errors on re-provision.

2. provisionRedis returns 409 even when REDIS_URL not in credential store:
   If the Redis ACL user exists but REDIS_URL was never stored (e.g., due
   to a failed previous run or lost state), the service would permanently
   refuse to provision, leaving the project without usable Redis credentials.
   Now checks the credential store: if REDIS_URL exists → true 409 duplicate;
   if REDIS_URL missing → re-provision (CreateProjectCache resets the password).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:19:00 -07:00
jordan
ad1f19739d fix: call config.MustInit() before config.Load() so Viper reads env vars
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Without MustInit(), viper.AutomaticEnv() is never called, so Viper
cannot read environment variables injected by K8s secrets (envFrom).
This caused DATABASE_URL to always appear empty in deployed services,
forcing them into standalone/in-memory mode even when a database was
provisioned.

os.Getenv() calls like JWT_SECRET worked fine (direct syscall).
Viper-backed reads like DATABASE_URL did not (require AutomaticEnv).

Added pkgconfig.MustInit() call at the top of main() in both the
service component template and the full-monorepo example-api.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 04:25:38 -07:00
jordan
4d9203eddc fix: commit usePersonaGeneration.ts skeleton template
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
The usePersonaGeneration hook was created on disk but never committed
to git, so rendered projects had a broken import in index.ts causing
TypeScript build failures in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 03:59:31 -07:00
jordan
9be5c7d81b fix: address code review issues in album and generation skeleton packages
- Add ErrAnchorRequired sentinel to pkg/album — replaces fragile string equality check
  used for 422 detection; callers now use errors.Is()
- Extract parseShotIndex() helper in album handler — replaces fmt.Sscanf which silently
  accepted partial parses like "12abc"; strconv.Atoi requires the full string to be numeric
- Restructure caption saves in album/handler — captions now written outside the
  len(img.Data) > 0 gate, so URL-only providers (no bytes returned) still get captions
- Add storage.FetchURL() shared utility — removes fetchBytes/downloadURL duplication
  across album and generation packages; callers control timeout via their http.Client
- Add video captions to VideoHandler — same caption sidecar pattern applied to videos
- Add persona generation event types to realtime package — persona_spec_*, persona_image_*,
  persona_video_* events added to EventType union and usePersonaGeneration hook exported

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 03:01:37 -07:00
jordan
062a828a00 feat: save prompt caption alongside every generated image
After each successful image upload to storage, a sidecar `.caption`
file is uploaded at the same path with `.caption` extension containing
the exact prompt used to generate the image.

Coverage:
- generation/handlers.go: ImageHandler → media/{userID}/images/{jobID}_{i}.caption
- album/handler.go: AnchorHandler → albums/{userID}/{albumID}/anchor.caption
- album/handler.go: ShotHandler → albums/{userID}/{albumID}/shots/{shotIndex}.caption
- personagen/service.go: generatePosition → personas/{specID}/images/{pos:02d}.caption

Caption failures are logged at warn level and never abort the job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 02:38:08 -07:00
jordan
3979ef2d08 feat: wire mixed-heritage through Stage 4 and fix pronoun support
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- specgen: extend dnaLLMResponse with heritage fields; conditionally extend
  Stage 4 prompt for EthnicityMixed to ask LLM for primary_heritage,
  secondary_heritage, and mix_percentage; populate IdentityDNA fields from
  response so mixed personas get a real heritage breakdown
- imagegen: buildIdentitySection() produces "East Asian and Latina/Hispanic
  heritage" description for mixed personas instead of generic "mixed-race"
- videogen: add genderPronouns() helper; replace hardcoded she/her with
  pronoun set across all 4 video prompts; generateVideo() returns raw bytes
  so caller can upload to storage
- service: GenerateVideo() uploads video to storage and sets VideoSpec.URL;
  anchor ordering ensures position 1 is generated first; emit
  persona_video_failed SSE event on non-fatal video failures; replace manual
  fold helpers with strings.ToLower + strings.Contains
- worker/main: register persona_generate handler when both AI managers ready
- docs: add persona_video_failed to SSE events reference in personagen.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:21:59 -07:00
jordan
002c32aedb feat: add album generation system to skeleton
Adds anchor-based image album generation across docs, skeleton, and rendered
full-monorepo. One subject description + one anchor image + N directed shots,
covering personas, products, characters, and brand assets out of the box.

## What ships

**Skeleton packages:**
- pkg/album/types.go — Album, Shot, ShotStatus, ShotTemplate, AlbumUpdater
- pkg/album/templates.go — PortraitSession, ProductShoot, CharacterSheet built-ins
- pkg/album/handler.go — AnchorHandler + ShotHandler queue job handlers
- packages/realtime/src/useAlbumGeneration.ts — SSE hook owning all album state
- packages/ui/src/components/AlbumGrid.tsx — responsive shot grid with shimmer
- packages/ui/src/components/ShotCard.tsx — pending/generating/complete/failed states
- packages/ui/src/components/AnchorPreview.tsx — anchor CTA + image with controls

**Component service template:**
- internal/port/album.go — AlbumRepository interface
- internal/adapter/memory/album.go — in-memory repo for standalone dev
- internal/service/album.go — create, list, get, generateAnchor, generateAllShots
- internal/api/handlers/album.go — HTTP handlers (CRUD + 202 generation endpoints)
- Routes: GET/POST /albums, GET/DELETE /albums/{id}, POST /albums/{id}/anchor,
  POST/DELETE /albums/{id}/shots, POST /albums/{id}/shots/{index}

**Documentation:**
- .claude/guides/album.md — full guide with API, SSE events, frontend usage

**Key architecture decisions:**
- Anchor bytes never stored in queue payload — workers fetch AnchorURL at runtime
- Generation order enforced: POST /shots returns 422 if no anchor exists
- All album SSE events on existing user:<userId> channel (no new channel)
- AlbumUpdater interface lets job handlers update repo from inside queue workers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:57:21 -07:00
jordan
4603402b84 feat: OTP supports unified register+login flow
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Previously SendOTP silently dropped requests for unknown emails, so new
users had no passwordless path in. Now:

- SendOTP: if REGISTRATION_ENABLED and email unknown, generates and
  sends the code anyway (UserID nil until verify)
- VerifyOTP: if email unknown after valid code, auto-registers the user
  (emailVerified=true — OTP delivery proves ownership, name defaults to
  email local-part) then creates a session

REGISTRATION_ENABLED=false continues to block unknown emails at SendOTP,
preserving invite-only / closed-beta behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:17:42 -07:00
jordan
5ac9af018a fix: always log OTP codes to stdout in standalone dev mode
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
In-memory auth codes are ephemeral — they're wiped on server restart.
Previously, codes were only visible via email delivery. If the server
restarted between OTP send and OTP verify, the code would be lost.

Now memory.AuthCodeRepository.Create() always logs the code to stdout
with a [DEV] prefix. This gives developers a reliable fallback regardless
of whether NOTIFY_URL is set. Updated CLAUDE.md to document this behavior
and the DEV_USER_EMAIL env var.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:13:12 -07:00
jordan
5f66eb0e7b fix: seed dev user from DEV_USER_EMAIL env var so auth survives restarts
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
In standalone mode (no DATABASE_URL), the in-memory user store only had
hardcoded demo accounts. Any real email the developer used was lost on every
server restart, causing OTP requests to silently fail with "unknown email".

NewUserRepository now accepts devEmail + devPassword. If DEV_USER_EMAIL is
set, that account is seeded on every startup alongside the demo users. The
developer's email is always registered, OTPs route to notify (or log to
console), and re-renders/restarts no longer break the auth flow.

New config fields: DevUserEmail (DEV_USER_EMAIL) / DevUserPassword
(DEV_USER_PASSWORD, default: "DevPassword1").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:46:12 -07:00
jordan
27e6cfd42b feat: add HTML email template system to skeleton service component
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Every project generated from the skeleton now ships with styled,
production-ready transactional emails out of the box.

New pkg/email package:
- Renderer: loads templates from caller-provided embed.FS, inlines CSS via
  douceur at startup, derives plain text via goquery for multipart delivery
- DevHandler: live browser preview at GET /dev/emails and /dev/emails/{purpose}
  (development only, never mounted in production)
- CSSInlineErr field on RenderedEmail so callers can log degraded renders

New service component templates:
- internal/email/embed.go.tmpl — embeds template FS (uses all: prefix for _*.html)
- internal/email/renderer_test.go.tmpl — 9 tests covering all purposes + brand injection
- internal/email/templates/ — 5 HTML email types (login_otp, email_verify,
  magic_link, password_reset, welcome) + 5 shared partials (_layout, _header,
  _footer, _button, _code_box)

Updated service component templates:
- config.go.tmpl — brand fields: AppName, AppURL, SupportEmail, LogoURL, BrandColor
- main.go.tmpl — wires renderer at startup, logs template count
- routes.go.tmpl — mounts /dev/emails in development; EmailRenderer in Dependencies
- notify.go.tmpl — renders HTML before sending; warns on CSS inlining failure
- go.mod.tmpl — adds douceur, goquery, gorilla/css, andybalholm/cascadia

Deleted: internal/adapter/email/helpers.go.tmpl (replaced by meta.yaml + renderer)

Fix: template directory named email_verify (matching domain.PurposeEmailVerify)
rather than verify_email — the mismatch caused all verification emails to fail
with "unknown email purpose" at send time while tests passed (tests called
Render directly with the wrong name).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:44:59 -07:00
jordan
4f01015132 feat: implement project access enforcement and management API
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Fix no-op RequireProjectAccess middleware to enforce project_ids
- Apply project access middleware to all project-scoped routes
- Filter GET /projects by allowed project IDs for restricted keys
- Add GET /me endpoint with key identity, scopes, and project access info
- Add PATCH /keys/{id} for partial key updates (name, scopes, project_ids, allowed_ips, expires_in)
- Add GET/POST/DELETE /projects/{id}/access for project-centric access management
- Auto-grant creating key access when using POST /project/create-and-build
- Accept grant_to_key_ids in create-and-build to grant multiple keys on project creation
- Move newProvisionerWithDeps test helper from production code to test file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 15:38:37 -07:00
jordan
0f25bd8dbe feat: hook in notify service for per-project email delivery
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add NotifyProvisioner (port + adapter) using real notify admin API
- Create notify account + send key + host grant per project
- Inject NOTIFY_API_KEY/HOST/FROM into component deployments
- Store NOTIFY_URL, NOTIFY_ADMIN_KEY, RESEND_API_KEY in credential store
- Add setup-notify.sh for one-time host/provider/domain setup
- Add NOTIFY_ADMIN_KEY constant to domain/credential.go
- Wire provisioner in main.go with connection test guard
- Add .claude/guides/services/notify.md and CLAUDE.md entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 00:30:32 -07:00
jordan
bc77504b35 fix: add 'use client' directive to MediaLibrary and MediaUploader components
These components use useState/useRef hooks but lacked the Next.js 'use client'
directive, causing the Next.js app build to fail with Server Component errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 00:32:24 -07:00
jordan
592b2d5ec0 fix: clarify database types across docs and fix video storage persistence
Two distinct fixes:

1. Database terminology: Make it crystal clear that generated projects use
   CockroachDB in production and PostgreSQL for local dev, while the rdev
   platform itself uses PostgreSQL. Updated 15 files across skeleton agents,
   component templates, cookbook trees, and platform docs.

2. Video storage: VideoHandler was ignoring vid.Data bytes (already downloaded
   by the Gemini adapter with auth) and re-downloading from the provider URL
   with a plain GET — which fails because Gemini URLs require API key auth.
   Now uses vid.Data first, falls back to downloadURL only for public URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:13:21 -07:00
jordan
a8c8a0a14d feat: add GCS-based persistent media storage, AI generation pipeline, and composable skeleton packages
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Adds complete media storage pipeline with GCS presigned uploads, AI image/video/text generation
via queue-based workers, realtime SSE event streaming, and comprehensive skeleton packages
(storage, mediagen, textgen, generation, realtime, persona, routing, ai-client). Includes
security fixes for media delete authorization, nil pointer guards in handlers, video persistence
via download-then-upload, consistent signed URLs, and Image→ImageIcon rename to avoid DOM collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:29:09 -07:00
jordan
7249575dea feat(sessions): add command execution endpoint and activity tracking
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add POST /sessions/:id/exec endpoint for executing commands in sessions
- Add session activity tracking (last_activity_at timestamp)
- Add database migration 024 for session activity column
- Add comprehensive tests for session handlers and service layer
- Add wildcard TLS certificate for preview.threesix.ai subdomain
- Add infrastructure mocks for testing preview service
- Refactor preview cleanup logic to remove unused methods
- Add AIOS core documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-13 08:41:05 -07:00
jordan
84af398d85 refactor: add timeout constants for agent execution tiers
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add TimeoutAgentExecution (22m) to handlers for synchronous SDLC
execution, and TimeoutAgent{Default,Medium,Heavy} (12/22/47m) to
workers for tiered agent task execution. Aligns with SDLC action
complexity tiers and prevents inline duration literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-11 10:48:24 -07:00
jordan
542bc722ab fix(architect): handle missing projects in repo, add cookbook hooks/validation
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
The architect API returned "failed to start conversation" because
projectRepo.Get() failed — the in-memory K8s repo watches the rdev
namespace but projects deploy to the projects namespace. Made project
lookup non-fatal with fallback to default pod. Added error logging to
all architect handler methods (were silently swallowing errors).

Also adds setup-hooks, commit-after-qa, and pre-merge-validate steps
to the foundary cookbook tree for git hooks and code quality gates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 02:25:40 -07:00
jordan
a9ad3d8304 chore: accumulated platform hardening and CI fixes
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
CI / Woodpecker:
- Add explicit depends_on to all .woodpecker.yml steps (rdev + templates)
- Fix skip_tls_verify -> skip-tls-verify (correct Kaniko flag name)
- Add replicasets get/list to deployer RBAC for rollout status
- Skeleton template: add failure:ignore on docs steps, Traefik TLS
  annotations on ingress, depends_on on verify step

Component templates:
- Fix container name in deploy steps (PROJECT_NAME-COMPONENT_NAME)
- Replace kubectl scale with kubectl patch for replicas
- Add post-deploy image verification and rollout status checks
- Applied consistently across all 5 component templates

Adapters:
- gitea: Add HTTP client timeout (30s), context cancellation checks,
  handle 404 on GetRepo/DeleteRepo
- zot: Add retry with exponential backoff (doWithRetry), limit response
  body reads to 10MB
- cockroach: Use net.JoinHostPort for IPv6-safe DSN construction
- woodpecker: Fix error wrapping (%v -> %w)
- redis: Fix error wrapping (%v -> %w)
- deployer: Add context cancellation checks

Services:
- apikey_service: Fix error wrapping (%v -> %w)
- component_deploy: Fix error wrapping (%v -> %w)
- project_infra: Fix error wrapping (%v -> %w)
- webhook/dispatcher: Fix error wrapping (%v -> %w)

Other:
- CLAUDE.md: Add guide links for Gitea, Go 1.25, Woodpecker v3,
  Traefik v3, Zot registry
- circuitbreaker: Add test for error wrapping
- docs: Update deployment, troubleshooting, and runbook docs
- health: Fix error wrapping (%v -> %w)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:16:56 -07:00
jordan
3c9876a678 fix(worker): increase SSE scanner buffer to 1MB in claudebox client
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The HTTP claudebox client's ExecuteStream method used a bare
bufio.NewScanner with the default 64KB max token size. When Claude Code
produces tool results > 64KB (e.g., reading large files), the SSE event
exceeds the scanner limit and fails with "token too long".

Every other scanner in the codebase (claudecode adapter, claudebox
executor, kubernetes executor) already uses scanner.Buffer(buf, 1MB).
This was the only one missed.

Fixes: "agent execution failed: read stream: bufio.Scanner: token too long"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:14:20 -07:00
jordan
b6e778d5ab fix(git): harden git flow for concurrent SDLC stress test failures
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
5 fixes from stress test analysis:

1. CRITICAL: Add pull-before-push to claudebox GitOperations.CommitAndPush,
   matching the fix already in PodGitOperations (prevents push rejections
   when concurrent builds advance the remote).

2. HIGH: Extract ResetToMain into PodGitOperations as a shared public method.
   Wire into BuildExecutor after CloneRepo and update SDLCTaskExecutor to
   use the shared method. Prevents builds from running on wrong branch when
   worker pods are reused across tasks.

3. HIGH: Make branch create push failure fatal with retry+rollback in
   cmd/sdlc/cmd_branch.go. Prevents orphaned .sdlc/ state that causes
   merge failures after completing all 10 SDLC phases.

4. MEDIUM: Shell-escape token in credential helpers (both PodGitOperations
   and claudebox GitOperations) to prevent shell injection via tokens
   containing special characters.

5. MEDIUM: Add GitResetToMain to claudebox sidecar (git.go implementation,
   server.go endpoint, client.go HTTP method) and wire into
   HTTPSDLCTaskExecutor for the HTTP sidecar path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 20:57:27 -07:00
jordan
cefc15aa7d fix(worker): include stdout in error messages when Claude command fails
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Auth errors like "OAuth token has expired" were lost because Claude writes
them to stdout, not stderr. The error message only showed kubectl's generic
"command terminated with exit code 1". Now includes both stdout and stderr
in the error, making failures immediately diagnosable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 17:55:46 -07:00
jordan
b7d0e84946 fix(deploy): create component deployments with 0 replicas to prevent ImagePullBackOff
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Components are scaffolded before CI builds their images. Previously deployments
started with 1 replica, causing ImagePullBackOff until the first build completed.
Now deployments start at 0 replicas; CI deploy steps scale to 1 after verifying
the image exists in the registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 10:16:14 -07:00
jordan
9f957d6e75 fix(templates): harden component CI steps and compile regexes
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add --connect-timeout 10 and --max-time 15 to all verify step curl
  calls to prevent hanging on registry health checks
- Fix cli template: depends_on [deps] -> [preflight] for consistency
- Add cross-reference comment to service template about verify logic
  being replicated across all 5 component templates
- Document component CI step rules in composable-monorepo.md
- Compile regexes at package level instead of per-call in
  component_updates.go
- Add component_updates_test.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:36:23 -07:00
jordan
9226454b85 feat: label-based undeploy, GC reconciliation, checkout/sessions, pool status
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add UndeployAll() using label selectors to clean up monorepo components
  on project deletion (replaces name-based Undeploy in DeleteProject and
  the direct undeploy handler)
- Add ResourceGC background worker that periodically finds K8s resources
  whose project label has no matching DB record, deletes after 1h safety
  window
- Widen deployer client type from *kubernetes.Clientset to
  kubernetes.Interface for testability
- UndeployAll accumulates errors via errors.Join instead of failing fast
- Add checkout/checkin sidecar dev flow: temporary git tokens, branch
  checkout, review on checkin with cleanup workers
- Add interactive sessions: pod binding, command execution, SSE streaming,
  ephemeral preview URLs with session cleanup workers
- Add GET /workers/pool endpoint for aggregate capacity and queue depth
- Add sessions:read and sessions:execute auth scopes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 19:11:28 -07:00
jordan
a69eb7e587 feat(foundary): implement complete backend for conversational project design
Implements all 5 phases of Foundary Studio backend:

Phase 1: Chat Persistence (8 API endpoints)
- Conversations and messages with proper cascading deletes
- PostgreSQL schema with auto-update triggers
- Full CRUD operations with structured logging

Phase 2: Blueprint Entity (5 API endpoints)
- JSONB spec storage with GIN indexes
- Flexible structured data for project specifications
- Version-controlled blueprint management

Phase 3: Architect Service (3 API endpoints)
- Conversational AI orchestration with Claude
- Multi-turn dialogue with context building
- Blueprint spec extraction from conversations

Phase 4: Work Queue Integration
- Verified existing endpoint compatibility

Phase 5: Structured Questions (6 API endpoints)
- Four question types: text, choice, multichoice, yesno
- Answer validation with proper constraints
- Conversation-linked Q&A flow

Architecture:
- Textbook hexagonal architecture (domain → port → adapter → service → handler)
- Zero external dependencies in domain layer
- Consistent error handling with proper wrapping
- Auth scopes on all routes (projects:read, projects:execute)
- Structured logging with operation context and duration tracking
- NULL-safe DTO converters throughout

Database:
- 3 new migrations (019, 020, 021)
- UUIDs for all primary keys
- Proper foreign key constraints with ON DELETE CASCADE
- Optimized indexes including partial index for unanswered questions
- Auto-update triggers for timestamps

OpenAPI Documentation:
- Complete API documentation under 'Foundary' tag
- 22 new endpoints documented with examples
- Request/response schemas for all operations

Logging Improvements:
- Added operation field to all service logs
- Added duration_ms tracking for performance monitoring
- Log response_length instead of full response content
- Consistent use of logging field constants
- Execute-then-log pattern for delete operations

Files: 32 changed, 2800+ lines added
- 7 domain models
- 3 database migrations
- 3 port interfaces
- 3 postgres adapters
- 4 services (conversation, blueprint, question, architect)
- 4 handlers with DTOs
- OpenAPI documentation
- Integration in main.go

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-09 00:50:46 -07:00
jordan
adcea2fc1f fix(templates): upgrade Go to 1.25 and fix Woodpecker syntax
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
## Template Version Alignment
- Go: 1.23 → 1.25 across all templates (go.work, go.mod, Dockerfiles, CI)
- Alpine: latest → 3.19 (explicit version pinning)
- Woodpecker: failure:retry → failure:ignore (invalid syntax fix)

## SDLC Tree Fixes (slackpath-5-full-lifecycle)
Fixed merge failures by correcting lifecycle flow:

1. **Branch Creation**: Added missing create-branch step (planned → ready)
   - Bug: Merge command requires feature.Branch field to be set
   - Fix: POST /projects/{id}/sdlc/features/{slug}/branch

2. **Artifact Status**: Changed approval to pass for execution artifacts
   - Bug: Review/audit/QA need status="passed" not "approved"
   - Fix: /artifacts/{type}/approve → /artifacts/{type}/pass
   - Added: pass-qa step after wait-qa

3. **Phase Transition Order**: Reordered merge phase transition
   - Bug: Merge command checks if phase == "merge" first
   - Fix: transition-to-merge BEFORE merge-feature (not after)

## GCS Provisioner Fix
- Replaced deprecated option.WithCredentialsFile with env var approach
- Now uses GOOGLE_APPLICATION_CREDENTIALS for ADC (Application Default Credentials)
- Avoids security risk from deprecated credential options
- Fixed test: Added ComponentTypeGCS to ValidComponentTypes test

## Critical Rules Added
- Version alignment: All template versions must stay in sync
- When updating versions, grep entire templates/ tree

## Files Changed
- 27 template files: Go version + Woodpecker syntax
- 1 tree file: SDLC lifecycle flow corrections
- 1 CLAUDE.md: Version alignment rule
- 1 GCS provisioner: Deprecated API fix
- 1 test file: Added missing component type

Root cause: Skeleton templates lagged behind Go 1.25 release and had
invalid Woodpecker syntax. SDLC tree skipped required branch creation
and used wrong artifact approval endpoints.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 23:57:38 -07:00
jordan
4486042155 fix(registry): delete container images on project teardown
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Root cause of DIGEST_INVALID errors was registry disk exhaustion.
Project teardown wasn't cleaning up container images, causing the
registry PVC to fill up over time.

Changes:
- Add RegistryProvider port interface for registry operations
- Extend zot.Client with DeleteProjectRepositories method
- Wire registry provider into ProjectInfraService
- Delete images during DeleteProject cleanup (step 4)

The zot client uses the OCI distribution API:
- Lists all repos, filters by project prefix
- Gets manifest digests via HEAD request
- Deletes manifests by digest to trigger GC

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 02:56:18 -07:00
jordan
f20fc6c51c feat(saga): implement enterprise-grade resilience architecture
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
jordan
9085965864 fix(skeleton): enforce chi {param} URL syntax in agent guidance
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Agents were generating `:id` (Echo/Gin style) instead of `{id}` (chi style),
causing routes to not match. Updated api-designer, go-specialist agents and
skeleton CLAUDE.md with explicit CRITICAL notes about brace syntax.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:44:52 -07:00
jordan
863dfd3214 fix: skip root deployment for empty template (defaults to skeleton)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When req.Template is empty, it defaults to 'skeleton' but the check
in createInitialDeployment only matched 'skeleton' explicitly, not
empty string. This caused a broken deployment to be created for
monorepo projects with a non-existent image.

Root cause: slackpath-5 creates project with empty template, which
defaults to skeleton, but createInitialDeployment was still creating
a root deployment that references registry.threesix.ai/{project}:latest
which never gets built (skeleton has no root Dockerfile).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 19:32:19 -07:00
jordan
bcf9f28bb9 fix: add failure:ignore to docs build steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When docs infrastructure doesn't exist, the docs build steps should
gracefully skip without failing the entire pipeline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:26:00 -07:00
jordan
2a25a161cb fix: use plugin-kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The raw gcr.io/kaniko-project/executor with commands: doesn't work
properly in Woodpecker. Switch to woodpeckerci/plugin-kaniko with
settings: to match other component builds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 18:08:31 -07:00
jordan
bed72961fe fix: add --insecure flag to kaniko for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The registry.threesix.ai uses a self-signed certificate.
Service builds use plugin-kaniko with skip-tls-verify, but docs
build used raw kaniko executor without TLS bypass, causing exit 128.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:50:38 -07:00
jordan
be80fd2d4a fix: correct kaniko dockerfile path for docs image build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When --context=docs is set, the --dockerfile path should be relative
to the context directory. Changed from docs/Dockerfile.nginx to
Dockerfile.nginx since kaniko already looks in the docs/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 17:35:54 -07:00
jordan
caf0990ceb fix: downgrade rouge to 3.x for middleman-syntax compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
middleman-syntax ~> 3.2 requires rouge ~> 3.2, but Gemfile had rouge ~> 4.0
causing bundle install to fail with version resolution error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:48:49 -07:00
jordan
af91bad0ff feat: add Slate documentation templates to skeleton
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Adds complete Slate documentation infrastructure to generated projects:
- docs/ directory with Gemfile, config.rb, and source templates
- Dockerfile for building docs site
- Dockerfile.nginx for serving static docs
- generate-docs.sh script for CI integration
- Claude command for AI-assisted docs generation
- OpenAPI → Slate markdown conversion via widdershins

Also includes:
- --export-openapi flag for service binaries
- DNS provisioning for docs.{domain} subdomain
- Updated project_infra for docs DNS records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:06:36 -07:00
jordan
f64377116a fix: add build-complete sync point for docs pipeline ordering
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
The export-openapi step was running in parallel with component builds
because it had no explicit dependency. This could cause docs generation
to run before component services were fully built.

Changes:
- Add build-complete step with NO depends_on (waits for ALL prior steps)
- Make export-openapi depend on build-complete
- Complete docs pipeline: export-openapi → generate-docs → build-docs →
  build-docs-image → deploy-docs
- Update verify step label selector to use project= instead of app=

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 16:02:17 -07:00
jordan
59aa173384 fix: clear stale error when dequeuing work tasks
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
When a task is retried (dequeued again after failure), the previous
error message was persisting in the work_queue table. This caused the
API to return confusing responses with status="running" but also
containing an error message from the previous attempt.

Now clears error and completed_at when claiming a task, matching the
fix already applied to build_audit.UpdateStatus.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:51:34 -07:00
jordan
9833725f31 fix: preserve work on build retry, clear stale audit data
Two critical fixes for build retry behavior:

1. pod_git_operations.go: Normalize remote URL before comparison
   - Clone stores URL with token (https://token:x@host/...)
   - Subsequent retry compares against URL without token
   - Without normalization, URLs never match, so workspace is always
     cleared and re-cloned, losing all code from previous attempt

2. build_audit.go: Clear stale result data when task transitions to running
   - When a failed task is retried, UpdateStatus only updated status/worker_id
   - Result and completed_at from previous failure remained, causing
     API to return stale failure data even while retry was running
   - Now clears result, completed_at and resets started_at when
     status is set to "running"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 08:40:36 -07:00
jordan
e58d679e67 fix: add go mod download to component Dockerfiles
Empty go.sum files were causing Docker builds to fail because
Go couldn't verify dependencies. Added go mod download steps
for both pkg and component directories before building.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:35:02 -07:00
jordan
d74efb75ff fix: wire workService to WorkersHandler and add /work/tasks endpoint
Critical fix: WorkersHandler was missing workService dependency, causing
500 errors when workers tried to fail tasks. This caused tasks to get
stuck in "running" state permanently.

Also adds:
- /work/tasks endpoint for debugging all tasks across projects
- List method to WorkQueue interface for admin views
- HTTP client tests for api_client.go and claudebox/client.go (48 tests)
- Split work.go DTOs into work_dto.go to stay under 500 lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:35:39 -07:00
jordan
d7a6f37593 fix: worker graceful shutdown and RWO PVC compatibility
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add WaitGroup for graceful shutdown of in-flight tasks
- Change replicas to 1 with Recreate strategy (RWO PVC limitation)
- Optimize Dockerfile: combine RUN commands for smaller layers
- Add compiled binaries to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:35:00 -07:00
jordan
f6a2b61b16 fix: add skeleton settings.local.json (was globally gitignored)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:55:17 -07:00
jordan
3b35900a2d feat: enterprise worker pool with HTTP sidecar pattern
Implements horizontally-scalable worker pool architecture:
- claudebox-sidecar: HTTP server for Claude Code, git, and SDLC ops
- rdev-worker: standalone worker binary polling rdev-api for tasks
- HTTP client adapter for sidecar communication
- HPA with custom Prometheus metrics for autoscaling
- ServiceMonitor for metrics scraping

Code review fixes applied:
- URL-encode query parameters in GitStatus (Critical #1)
- Remove unused shellQuote function (Critical #2)
- Use stdlib strings.Split/TrimSpace (Critical #3)
- Add version injection via ldflags (Warning #4)
- Add debug logging for swallowed git/sdlc errors (Warning #5, #6)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:11 -07:00