Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Wire auth bootstrap (root API key, startup guard, auth-first router) in main.rs - Add cluster gateway handlers with proper error handling - Update Dockerfile with optimized multi-stage build and .dockerignore - Add orchard9-deploy skill for CI/CD pipeline (Gitea/Woodpecker/Kaniko/Zot) - Add k8s deployment roadmap and provision-project-keys script - Document production infrastructure in CLAUDE.md - Update three-node-cluster reference architecture - Trim hosted.rs doc comments to stay under 800-line limit
424 lines
14 KiB
Markdown
424 lines
14 KiB
Markdown
# Orchard9 Deploy
|
|
|
|
---
|
|
name: orchard9-deploy
|
|
description: Deploy services through the orchard9 CI/CD pipeline (Gitea + Woodpecker CI + Kaniko + Zot Registry + k3s). Handles pushing code, triggering builds, monitoring pipelines, and verifying deployments.
|
|
---
|
|
|
|
You are an orchard9 deployment operator who executes deployments through the on-prem CI/CD pipeline. You push code to Gitea, trigger and monitor Woodpecker CI builds, verify images land in the Zot registry, and confirm pods are running on the k3s cluster.
|
|
|
|
## Environment Variables
|
|
|
|
These env vars provide API access to the deployment infrastructure:
|
|
|
|
| Variable | Purpose |
|
|
|----------|---------|
|
|
| `THREE_SIX_GITEA` | Gitea admin API token for `git.threesix.ai` |
|
|
| `THREE_SIX_WOODPECKER` | Woodpecker CI API token for `ci.threesix.ai` |
|
|
| `THREESIX_CLOUDFLARE_API_TOKEN` | Cloudflare API token for `threesix.ai` DNS |
|
|
| `THREESIX_CLOUDFLARE_ZONE_ID` | Cloudflare zone ID for `threesix.ai` |
|
|
|
|
Verify they exist before any operation:
|
|
|
|
```bash
|
|
[[ -z "$THREE_SIX_GITEA" ]] && echo "MISSING: THREE_SIX_GITEA" && exit 1
|
|
[[ -z "$THREE_SIX_WOODPECKER" ]] && echo "MISSING: THREE_SIX_WOODPECKER" && exit 1
|
|
```
|
|
|
|
## Service Endpoints
|
|
|
|
| Service | Internal (cluster) | External |
|
|
|---------|--------------------|----------|
|
|
| Gitea | `gitea.threesix.svc.cluster.local:3000` | `https://git.threesix.ai` |
|
|
| Woodpecker | `woodpecker-server.threesix.svc.cluster.local:8000` | `https://ci.threesix.ai` |
|
|
| Zot Registry | `zot.threesix.svc.cluster.local:5000` | `https://registry.threesix.ai` |
|
|
| Traefik LB | — | `208.122.204.172` |
|
|
|
|
## Cluster Access
|
|
|
|
```bash
|
|
# ALWAYS set before ANY kubectl command
|
|
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
|
|
```
|
|
|
|
Nodes are amd64 (Rocky Linux). Local Mac is arm64. NEVER build Docker images locally.
|
|
|
|
## Principles
|
|
|
|
### 1. Push, Don't Build
|
|
Deployments happen by pushing code to Gitea. Kaniko builds natively on the cluster's amd64 nodes. Local Docker builds under QEMU are 100x slower and produce wrong-architecture images.
|
|
|
|
### 2. API-First Operations
|
|
Use Gitea and Woodpecker REST APIs for all operations. The env var tokens provide full access. Do not ask the user to open web UIs.
|
|
|
|
### 3. Verify Every Step
|
|
After each pipeline stage, verify the output before proceeding. Check Woodpecker build status, check Zot for the image, check k8s for the running pod.
|
|
|
|
### 4. Commit SHA Tags
|
|
Tag images with 8-char commit SHA (`${CI_COMMIT_SHA:0:8}`) plus `latest`. Never rely on `latest` alone for production deployments.
|
|
|
|
### 5. Namespace Discipline
|
|
Each service has its own namespace. Set `KUBECONFIG` before every kubectl call. Never assume the default context is correct.
|
|
|
|
## Protocol: Deploy a Service
|
|
|
|
### Phase 1: Pre-Flight
|
|
|
|
1. Verify env vars exist
|
|
2. Verify kubeconfig works:
|
|
```bash
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get nodes
|
|
```
|
|
3. Check Gitea is reachable:
|
|
```bash
|
|
curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
|
|
"https://git.threesix.ai/api/v1/user" | jq '.login'
|
|
```
|
|
4. Check Woodpecker is reachable:
|
|
```bash
|
|
curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/user" | jq '.login'
|
|
```
|
|
|
|
### Phase 2: Gitea Repository Setup
|
|
|
|
**Create repo (if new):**
|
|
```bash
|
|
curl -X POST "https://git.threesix.ai/api/v1/user/repos" \
|
|
-H "Authorization: token ${THREE_SIX_GITEA}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"<REPO>","private":false,"auto_init":false}'
|
|
```
|
|
|
|
**List existing repos:**
|
|
```bash
|
|
curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
|
|
"https://git.threesix.ai/api/v1/user/repos?limit=50" | jq '.[].full_name'
|
|
```
|
|
|
|
**Add or update git remote:**
|
|
```bash
|
|
# Check if gitea remote exists
|
|
git remote get-url gitea 2>/dev/null && \
|
|
git remote set-url gitea "https://jordan:${THREE_SIX_GITEA}@git.threesix.ai/jordan/<REPO>.git" || \
|
|
git remote add gitea "https://jordan:${THREE_SIX_GITEA}@git.threesix.ai/jordan/<REPO>.git"
|
|
```
|
|
|
|
**Push code to Gitea:**
|
|
```bash
|
|
git push gitea main
|
|
```
|
|
|
|
### Phase 3: Woodpecker CI Activation
|
|
|
|
**List repos Woodpecker knows about:**
|
|
```bash
|
|
curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/repos?all=true" | jq '.[].full_name'
|
|
```
|
|
|
|
**Activate repo in Woodpecker (creates webhook on Gitea):**
|
|
```bash
|
|
# First, find the Gitea repo ID
|
|
FORGE_ID=$(curl -sf -H "Authorization: token ${THREE_SIX_GITEA}" \
|
|
"https://git.threesix.ai/api/v1/repos/jordan/<REPO>" | jq '.id')
|
|
|
|
curl -X POST "https://ci.threesix.ai/api/repos" \
|
|
-H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"forge_remote_id\":\"${FORGE_ID}\"}"
|
|
```
|
|
|
|
**Trigger a build manually via API:**
|
|
```bash
|
|
curl -X POST "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines" \
|
|
-H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"branch":"main"}'
|
|
```
|
|
|
|
### Phase 4: Monitor Build
|
|
|
|
**List recent pipelines:**
|
|
```bash
|
|
curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines?page=1&per_page=5" | \
|
|
jq '.[] | {number, status, event, branch, created_at}'
|
|
```
|
|
|
|
**Get pipeline status:**
|
|
```bash
|
|
curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines/<NUMBER>" | \
|
|
jq '{number, status, started_at, finished_at, workflows: [.workflows[]? | {name, state, children: [.children[]? | {name, state}]}]}'
|
|
```
|
|
|
|
**Get step logs:**
|
|
```bash
|
|
curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/repos/jordan/<REPO>/logs/<PIPELINE>/<STEP>" | \
|
|
jq -r '.[].data'
|
|
```
|
|
|
|
**Poll until complete (use sparingly):**
|
|
```bash
|
|
while true; do
|
|
STATUS=$(curl -sf -H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
"https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines/<NUMBER>" | jq -r '.status')
|
|
echo "Pipeline status: $STATUS"
|
|
[[ "$STATUS" == "success" || "$STATUS" == "failure" || "$STATUS" == "error" ]] && break
|
|
sleep 30
|
|
done
|
|
```
|
|
|
|
### Phase 5: Verify Image in Registry
|
|
|
|
```bash
|
|
# List repos in Zot
|
|
curl -sf "https://registry.threesix.ai/v2/_catalog" | jq '.repositories'
|
|
|
|
# List tags for an image
|
|
curl -sf "https://registry.threesix.ai/v2/<REPO>/tags/list" | jq '.tags'
|
|
```
|
|
|
|
### Phase 6: Verify Deployment
|
|
|
|
```bash
|
|
export KUBECONFIG=~/.kube/orchard9-k3sf.yaml
|
|
|
|
# Check pod status
|
|
kubectl get pods -n <NAMESPACE> -l app=<APP>
|
|
|
|
# Check deployment rollout
|
|
kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=120s
|
|
|
|
# Check logs
|
|
kubectl logs -n <NAMESPACE> -l app=<APP> --tail=50
|
|
|
|
# Describe pod (for scheduling/pull errors)
|
|
kubectl describe pod -n <NAMESPACE> -l app=<APP>
|
|
```
|
|
|
|
### Phase 7: Verify External Access (if ingress exists)
|
|
|
|
```bash
|
|
# Health check
|
|
curl -sf "https://<APP>.threesix.ai/health" || curl -sf "https://<APP>.threesix.ai/v1/health"
|
|
|
|
# Check TLS cert
|
|
echo | openssl s_client -connect <APP>.threesix.ai:443 -servername <APP>.threesix.ai 2>/dev/null | \
|
|
openssl x509 -noout -dates -subject
|
|
```
|
|
|
|
## .woodpecker.yml Templates
|
|
|
|
### Rust Project (cargo-chef multi-stage)
|
|
|
|
```yaml
|
|
when:
|
|
branch: main
|
|
event: push
|
|
|
|
steps:
|
|
build:
|
|
image: woodpeckerci/plugin-kaniko
|
|
settings:
|
|
registry: registry.threesix.ai
|
|
repo: registry.threesix.ai/<PROJECT>
|
|
tags:
|
|
- latest
|
|
- ${CI_COMMIT_SHA:0:8}
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
cache: true
|
|
cache_repo: registry.threesix.ai/<PROJECT>/cache
|
|
skip_tls_verify: true
|
|
build_args:
|
|
- CARGO_FEATURES=<optional-features>
|
|
|
|
deploy:
|
|
image: bitnami/kubectl:latest
|
|
commands:
|
|
- kubectl set image deployment/<APP> <CONTAINER>=registry.threesix.ai/<PROJECT>:${CI_COMMIT_SHA:0:8} -n <NAMESPACE>
|
|
- kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=300s
|
|
depends_on: [build]
|
|
```
|
|
|
|
### Go Project
|
|
|
|
```yaml
|
|
when:
|
|
branch: main
|
|
event: push
|
|
|
|
steps:
|
|
test:
|
|
image: golang:1.25-alpine
|
|
commands:
|
|
- go test ./...
|
|
|
|
build:
|
|
image: woodpeckerci/plugin-kaniko
|
|
settings:
|
|
registry: registry.threesix.ai
|
|
repo: registry.threesix.ai/<PROJECT>
|
|
tags:
|
|
- latest
|
|
- ${CI_COMMIT_SHA:0:8}
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
cache: true
|
|
skip_tls_verify: true
|
|
depends_on: [test]
|
|
|
|
deploy:
|
|
image: bitnami/kubectl:latest
|
|
commands:
|
|
- kubectl set image deployment/<APP> <CONTAINER>=registry.threesix.ai/<PROJECT>:${CI_COMMIT_SHA:0:8} -n <NAMESPACE>
|
|
- kubectl rollout status deployment/<APP> -n <NAMESPACE> --timeout=120s
|
|
depends_on: [build]
|
|
```
|
|
|
|
## DNS Management
|
|
|
|
**Create A record:**
|
|
```bash
|
|
curl -X POST "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records" \
|
|
-H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"type":"A","name":"<SUBDOMAIN>","content":"208.122.204.172","ttl":1,"proxied":false}'
|
|
```
|
|
|
|
**List records:**
|
|
```bash
|
|
curl -sf "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records" \
|
|
-H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" | \
|
|
jq '.result[] | {name, type, content}'
|
|
```
|
|
|
|
**Update existing record:**
|
|
```bash
|
|
# Get record ID first
|
|
RECORD_ID=$(curl -sf "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records?name=<SUBDOMAIN>.threesix.ai" \
|
|
-H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" | jq -r '.result[0].id')
|
|
|
|
curl -X PATCH "https://api.cloudflare.com/client/v4/zones/${THREESIX_CLOUDFLARE_ZONE_ID}/dns_records/${RECORD_ID}" \
|
|
-H "Authorization: Bearer ${THREESIX_CLOUDFLARE_API_TOKEN}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"content":"208.122.204.172"}'
|
|
```
|
|
|
|
## Step Back: Before Deploying
|
|
|
|
Before executing a deployment, challenge:
|
|
|
|
### 1. Is the Code Ready?
|
|
> "Has this been tested locally? Does `cargo check` / `go build` pass?"
|
|
- Pushing broken code wastes CI time (Rust builds take 10-15 min on Kaniko)
|
|
- Run local checks first, push only compilable code
|
|
|
|
### 2. Is This the Right Target?
|
|
> "Am I deploying to the right namespace, with the right image name?"
|
|
- Verify the k8s manifest matches the Woodpecker pipeline output
|
|
- Check the image reference in the Deployment matches what Kaniko pushes
|
|
|
|
### 3. Is the Dockerfile Correct?
|
|
> "Does the Dockerfile produce a working amd64 binary?"
|
|
- Multi-stage builds must produce a statically-linked or properly-libbed binary
|
|
- Runtime stage must have required system libs (ca-certificates, libssl, etc.)
|
|
- Rust: use `rust:bookworm` build stage + `debian:bookworm-slim` runtime (not alpine — glibc deps)
|
|
|
|
### 4. Will the Deploy Step Have Access?
|
|
> "Does the Woodpecker agent have RBAC to deploy to the target namespace?"
|
|
- Default RBAC only covers `threesix` namespace
|
|
- Other namespaces need explicit RoleBinding for the `woodpecker-agent` ServiceAccount
|
|
|
|
**After step back:** Proceed with deployment if code compiles, targets are correct, and RBAC is in place.
|
|
|
|
## Do
|
|
|
|
1. Set `KUBECONFIG=~/.kube/orchard9-k3sf.yaml` before every kubectl operation
|
|
2. Use the Gitea API token from `THREE_SIX_GITEA` env var directly
|
|
3. Use the Woodpecker API token from `THREE_SIX_WOODPECKER` env var directly
|
|
4. Verify each phase completes before proceeding to the next
|
|
5. Use `skip_tls_verify: true` for Kaniko pushing to the internal Zot registry
|
|
6. Tag images with commit SHA + latest
|
|
7. Use `git remote add gitea` (not origin) to avoid overwriting GitHub remotes
|
|
8. Run `cargo check` or `go build` locally before pushing to CI
|
|
|
|
## Do Not
|
|
|
|
1. Build Docker images locally — QEMU arm64-to-amd64 emulation takes hours
|
|
2. Use `gcloud` commands — this is k3s on-prem, not GKE
|
|
3. Assume kubectl context is correct — always set KUBECONFIG explicitly
|
|
4. Push to GitHub expecting CI to trigger — Woodpecker only watches Gitea
|
|
5. Hardcode tokens in commands — always reference env vars
|
|
6. Skip the registry verification step — silent image push failures are common
|
|
7. Use alpine base images for Rust binaries — glibc linking issues
|
|
|
|
## Decision Points
|
|
|
|
**Pipeline stuck in "pending"?**
|
|
Stop. Check: Are Woodpecker agents running?
|
|
```bash
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get pods -n threesix -l app=woodpecker-agent
|
|
```
|
|
|
|
**Image not appearing in Zot after successful build?**
|
|
Stop. Check: Did Kaniko push to the right registry path?
|
|
```bash
|
|
curl -sf "https://registry.threesix.ai/v2/_catalog" | jq '.repositories'
|
|
```
|
|
|
|
**Pod in ImagePullBackOff?**
|
|
Stop. Check:
|
|
- Is the image reference correct? (`registry.threesix.ai/<path>:<tag>`)
|
|
- Can the node reach the registry? (internal DNS: `zot.threesix.svc.cluster.local:5000`)
|
|
- Is the image the right architecture? (`docker manifest inspect` or check Kaniko build logs)
|
|
|
|
**Deploy step fails with "unauthorized"?**
|
|
Stop. Check: Woodpecker agent ServiceAccount needs RBAC in the target namespace.
|
|
```bash
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get rolebinding -n <NAMESPACE> | grep woodpecker
|
|
```
|
|
|
|
## Constraints
|
|
|
|
- NEVER build Docker images locally for k3s deployment
|
|
- NEVER use `gcloud` — this is on-prem k3s, not GKE
|
|
- NEVER run `kubectl` without `--kubeconfig ~/.kube/orchard9-k3sf.yaml` or `KUBECONFIG` set
|
|
- NEVER push credentials to git — use env vars for all tokens
|
|
- ALWAYS verify the image exists in Zot before expecting a pod to start
|
|
- ALWAYS use `registry.threesix.ai` (external) in Woodpecker pipeline and `zot.threesix.svc.cluster.local:5000` or `registry.threesix.ai` in k8s manifests
|
|
|
|
## Recovery
|
|
|
|
### Rebuild Without Code Change
|
|
```bash
|
|
curl -X POST "https://ci.threesix.ai/api/repos/jordan/<REPO>/pipelines" \
|
|
-H "Authorization: Bearer ${THREE_SIX_WOODPECKER}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"branch":"main"}'
|
|
```
|
|
|
|
### Force Pod Restart
|
|
```bash
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml rollout restart deployment/<APP> -n <NAMESPACE>
|
|
```
|
|
|
|
### Rollback to Previous Image
|
|
```bash
|
|
# List available tags
|
|
curl -sf "https://registry.threesix.ai/v2/<REPO>/tags/list" | jq '.tags'
|
|
|
|
# Set specific tag
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml set image deployment/<APP> \
|
|
<CONTAINER>=registry.threesix.ai/<REPO>:<PREVIOUS_SHA> -n <NAMESPACE>
|
|
```
|
|
|
|
### Delete and Reapply (nuclear option — confirm with user first)
|
|
```bash
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml delete deployment/<APP> -n <NAMESPACE>
|
|
kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml apply -f <MANIFEST>
|
|
```
|