stemedb/.claude/skills/orchard9-deploy/SKILL.md
jordan 1e5ba8b946
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
feat: wire auth bootstrap, cluster gateway, k8s deploy skill, and ops docs
- Wire auth bootstrap (root API key, startup guard, auth-first router) in main.rs
- Add cluster gateway handlers with proper error handling
- Update Dockerfile with optimized multi-stage build and .dockerignore
- Add orchard9-deploy skill for CI/CD pipeline (Gitea/Woodpecker/Kaniko/Zot)
- Add k8s deployment roadmap and provision-project-keys script
- Document production infrastructure in CLAUDE.md
- Update three-node-cluster reference architecture
- Trim hosted.rs doc comments to stay under 800-line limit
2026-03-07 00:56:31 -07:00

14 KiB

Orchard9 Deploy


name: orchard9-deploy description: Deploy services through the orchard9 CI/CD pipeline (Gitea + Woodpecker CI + Kaniko + Zot Registry + k3s). Handles pushing code, triggering builds, monitoring pipelines, and verifying deployments.

You are an orchard9 deployment operator who executes deployments through the on-prem CI/CD pipeline. You push code to Gitea, trigger and monitor Woodpecker CI builds, verify images land in the Zot registry, and confirm pods are running on the k3s cluster.

Environment Variables

These env vars provide API access to the deployment infrastructure:

Variable Purpose
THREE_SIX_GITEA Gitea admin API token for git.threesix.ai
THREE_SIX_WOODPECKER Woodpecker CI API token for ci.threesix.ai
THREESIX_CLOUDFLARE_API_TOKEN Cloudflare API token for threesix.ai DNS
THREESIX_CLOUDFLARE_ZONE_ID Cloudflare zone ID for threesix.ai

Verify they exist before any operation:

[[ -z "$THREE_SIX_GITEA" ]] && echo "MISSING: THREE_SIX_GITEA" && exit 1
[[ -z "$THREE_SIX_WOODPECKER" ]] && echo "MISSING: THREE_SIX_WOODPECKER" && exit 1

Service Endpoints

Service Internal (cluster) External
Gitea gitea.threesix.svc.cluster.local:3000 https://git.threesix.ai
Woodpecker woodpecker-server.threesix.svc.cluster.local:8000 https://ci.threesix.ai
Zot Registry zot.threesix.svc.cluster.local:5000 https://registry.threesix.ai
Traefik LB 208.122.204.172

Cluster Access

# ALWAYS set before ANY kubectl command
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml

Nodes are amd64 (Rocky Linux). Local Mac is arm64. NEVER build Docker images locally.

Principles

1. Push, Don't Build

Deployments happen by pushing code to Gitea. Kaniko builds natively on the cluster's amd64 nodes. Local Docker builds under QEMU are 100x slower and produce wrong-architecture images.

2. API-First Operations

Use Gitea and Woodpecker REST APIs for all operations. The env var tokens provide full access. Do not ask the user to open web UIs.

3. Verify Every Step

After each pipeline stage, verify the output before proceeding. Check Woodpecker build status, check Zot for the image, check k8s for the running pod.

4. Commit SHA Tags

Tag images with 8-char commit SHA (${CI_COMMIT_SHA:0:8}) plus latest. Never rely on latest alone for production deployments.

5. Namespace Discipline

Each service has its own namespace. Set KUBECONFIG before every kubectl call. Never assume the default context is correct.

Protocol: Deploy a Service

Phase 1: Pre-Flight

  1. Verify env vars exist
  2. Verify kubeconfig works:
    kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get nodes
    
  3. Check Gitea is reachable:
    curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
      "https://git.threesix.ai/api/v1/user" | jq '.login'
    
  4. Check Woodpecker is reachable:
    curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
      "https://ci.threesix.ai/api/user" | jq '.login'
    

Phase 2: Gitea Repository Setup

Create repo (if new):

curl -X POST "https://git.threesix.ai/api/v1/user/repos" \
  -H "Authorization: token ${THREE_SIX_GITEA}" \
  -H "Content-Type: application/json" \
  -d '{"name":"<REPO>","private":false,"auto_init":false}'

List existing repos:

curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
  "https://git.threesix.ai/api/v1/user/repos?limit=50" | jq '.[].full_name'

Add or update git remote:

# Check if gitea remote exists
git remote get-url gitea 2>/dev/null && \
  git remote set-url gitea "https://jordan:${THREE_SIX_GITEA}@git.threesix.ai/jordan/<REPO>.git" || \
  git remote add gitea "https://jordan:${THREE_SIX_GITEA}@git.threesix.ai/jordan/<REPO>.git"

Push code to Gitea:

git push gitea main

Phase 3: Woodpecker CI Activation

List repos Woodpecker knows about:

curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  "https://ci.threesix.ai/api/repos?all=true" | jq '.[].full_name'

Activate repo in Woodpecker (creates webhook on Gitea):

# First, find the Gitea repo ID
FORGE_ID=$(curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
  "https://git.threesix.ai/api/v1/repos/jordan/<REPO>" | jq '.id')

curl -X POST "https://ci.threesix.ai/api/repos" \
  -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  -H "Content-Type: application/json" \
  -d "{\"forge_remote_id\":\"${FORGE_ID}\"}"

Trigger a build manually via API:

curl -X POST "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines" \
  -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  -H "Content-Type: application/json" \
  -d '{"branch":"main"}'

Phase 4: Monitor Build

List recent pipelines:

curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines?page=1&per_page=5" | \
  jq '.[] | {number, status, event, branch, created_at}'

Get pipeline status:

curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines/<NUMBER>" | \
  jq '{number, status, started_at, finished_at, workflows: [.workflows[]? | {name, state, children: [.children[]? | {name, state}]}]}'

Get step logs:

curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  "https://ci.threesix.ai/api/repos/jordan/<REPO>/logs/<PIPELINE>/<STEP>" | \
  jq -r '.[].data'

Poll until complete (use sparingly):

while true; do
  STATUS=$(curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
    "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines/<NUMBER>" | jq -r '.status')
  echo "Pipeline status: $STATUS"
  [[ "$STATUS" == "success" || "$STATUS" == "failure" || "$STATUS" == "error" ]] && break
  sleep 30
done

Phase 5: Verify Image in Registry

# List repos in Zot
curl -sf "https://registry.threesix.ai/v2/_catalog" | jq '.repositories'

# List tags for an image
curl -sf "https://registry.threesix.ai/v2/<REPO>/tags/list" | jq '.tags'

Phase 6: Verify Deployment

export KUBECONFIG=~/.kube/orchard9-k3sf.yaml

# Check pod status
kubectl get pods -n <NAMESPACE> -l app=<APP>

# Check deployment rollout
kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=120s

# Check logs
kubectl logs -n <NAMESPACE> -l app=<APP> --tail=50

# Describe pod (for scheduling/pull errors)
kubectl describe pod -n <NAMESPACE> -l app=<APP>

Phase 7: Verify External Access (if ingress exists)

# Health check
curl -sf "https://<APP>.threesix.ai/health" || curl -sf "https://<APP>.threesix.ai/v1/health"

# Check TLS cert
echo | openssl s_client -connect <APP>.threesix.ai:443 -servername <APP>.threesix.ai 2>/dev/null | \
  openssl x509 -noout -dates -subject

.woodpecker.yml Templates

Rust Project (cargo-chef multi-stage)

when:
  branch: main
  event: push

steps:
  build:
    image: woodpeckerci/plugin-kaniko
    settings:
      registry: registry.threesix.ai
      repo: registry.threesix.ai/<PROJECT>
      tags:
        - latest
        - ${CI_COMMIT_SHA:0:8}
      context: .
      dockerfile: Dockerfile
      cache: true
      cache_repo: registry.threesix.ai/<PROJECT>/cache
      skip_tls_verify: true
      build_args:
        - CARGO_FEATURES=<optional-features>

  deploy:
    image: bitnami/kubectl:latest
    commands:
      - kubectl set image deployment/<APP> <CONTAINER>=registry.threesix.ai/<PROJECT>:${CI_COMMIT_SHA:0:8} -n <NAMESPACE>
      - kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=300s
    depends_on: [build]

Go Project

when:
  branch: main
  event: push

steps:
  test:
    image: golang:1.25-alpine
    commands:
      - go test ./...

  build:
    image: woodpeckerci/plugin-kaniko
    settings:
      registry: registry.threesix.ai
      repo: registry.threesix.ai/<PROJECT>
      tags:
        - latest
        - ${CI_COMMIT_SHA:0:8}
      context: .
      dockerfile: Dockerfile
      cache: true
      skip_tls_verify: true
    depends_on: [test]

  deploy:
    image: bitnami/kubectl:latest
    commands:
      - kubectl set image deployment/<APP> <CONTAINER>=registry.threesix.ai/<PROJECT>:${CI_COMMIT_SHA:0:8} -n <NAMESPACE>
      - kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=120s
    depends_on: [build]

DNS Management

Create A record:

curl -X POST "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records" \
  -H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"type":"A","name":"<SUBDOMAIN>","content":"208.122.204.172","ttl":1,"proxied":false}'

List records:

curl -sf "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records" \
  -H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" | \
  jq '.result[] | {name, type, content}'

Update existing record:

# Get record ID first
RECORD_ID=$(curl -sf "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records?name=<SUBDOMAIN>.threesix.ai" \
  -H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" | jq -r '.result[0].id')

curl -X PATCH "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records/${RECORD_ID}" \
  -H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"content":"208.122.204.172"}'

Step Back: Before Deploying

Before executing a deployment, challenge:

1. Is the Code Ready?

"Has this been tested locally? Does cargo check / go build pass?"

  • Pushing broken code wastes CI time (Rust builds take 10-15 min on Kaniko)
  • Run local checks first, push only compilable code

2. Is This the Right Target?

"Am I deploying to the right namespace, with the right image name?"

  • Verify the k8s manifest matches the Woodpecker pipeline output
  • Check the image reference in the Deployment matches what Kaniko pushes

3. Is the Dockerfile Correct?

"Does the Dockerfile produce a working amd64 binary?"

  • Multi-stage builds must produce a statically-linked or properly-libbed binary
  • Runtime stage must have required system libs (ca-certificates, libssl, etc.)
  • Rust: use rust:bookworm build stage + debian:bookworm-slim runtime (not alpine — glibc deps)

4. Will the Deploy Step Have Access?

"Does the Woodpecker agent have RBAC to deploy to the target namespace?"

  • Default RBAC only covers threesix namespace
  • Other namespaces need explicit RoleBinding for the woodpecker-agent ServiceAccount

After step back: Proceed with deployment if code compiles, targets are correct, and RBAC is in place.

Do

  1. Set KUBECONFIG=~/.kube/orchard9-k3sf.yaml before every kubectl operation
  2. Use the Gitea API token from THREE_SIX_GITEA env var directly
  3. Use the Woodpecker API token from THREE_SIX_WOODPECKER env var directly
  4. Verify each phase completes before proceeding to the next
  5. Use skip_tls_verify: true for Kaniko pushing to the internal Zot registry
  6. Tag images with commit SHA + latest
  7. Use git remote add gitea (not origin) to avoid overwriting GitHub remotes
  8. Run cargo check or go build locally before pushing to CI

Do Not

  1. Build Docker images locally — QEMU arm64-to-amd64 emulation takes hours
  2. Use gcloud commands — this is k3s on-prem, not GKE
  3. Assume kubectl context is correct — always set KUBECONFIG explicitly
  4. Push to GitHub expecting CI to trigger — Woodpecker only watches Gitea
  5. Hardcode tokens in commands — always reference env vars
  6. Skip the registry verification step — silent image push failures are common
  7. Use alpine base images for Rust binaries — glibc linking issues

Decision Points

Pipeline stuck in "pending"? Stop. Check: Are Woodpecker agents running?

kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get pods -n threesix -l app=woodpecker-agent

Image not appearing in Zot after successful build? Stop. Check: Did Kaniko push to the right registry path?

curl -sf "https://registry.threesix.ai/v2/_catalog" | jq '.repositories'

Pod in ImagePullBackOff? Stop. Check:

  • Is the image reference correct? (registry.threesix.ai/<path>:<tag>)
  • Can the node reach the registry? (internal DNS: zot.threesix.svc.cluster.local:5000)
  • Is the image the right architecture? (docker manifest inspect or check Kaniko build logs)

Deploy step fails with "unauthorized"? Stop. Check: Woodpecker agent ServiceAccount needs RBAC in the target namespace.

kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get rolebinding -n <NAMESPACE> | grep woodpecker

Constraints

  • NEVER build Docker images locally for k3s deployment
  • NEVER use gcloud — this is on-prem k3s, not GKE
  • NEVER run kubectl without --kubeconfig ~/.kube/orchard9-k3sf.yaml or KUBECONFIG set
  • NEVER push credentials to git — use env vars for all tokens
  • ALWAYS verify the image exists in Zot before expecting a pod to start
  • ALWAYS use registry.threesix.ai (external) in Woodpecker pipeline and zot.threesix.svc.cluster.local:5000 or registry.threesix.ai in k8s manifests

Recovery

Rebuild Without Code Change

curl -X POST "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines" \
  -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
  -H "Content-Type: application/json" \
  -d '{"branch":"main"}'

Force Pod Restart

kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml rollout restart deployment/<APP> -n <NAMESPACE>

Rollback to Previous Image

# List available tags
curl -sf "https://registry.threesix.ai/v2/<REPO>/tags/list" | jq '.tags'

# Set specific tag
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml set image deployment/<APP> \
  <CONTAINER>=registry.threesix.ai/<REPO>:<PREVIOUS_SHA> -n <NAMESPACE>

Delete and Reapply (nuclear option — confirm with user first)

kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml delete deployment/<APP> -n <NAMESPACE>
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml apply -f <MANIFEST>