diff --git a/CLAUDE.md b/CLAUDE.md index 20ad2ef..84a5e3f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -450,39 +450,42 @@ Python CLI tools for adverse event signal detection. Different rules from Rust c ## Production Infrastructure -All production infra is under the `jordan@roamrhino.com` Google account, GCP project `orchard9`. +### Git Remotes -### GCP / Google Artifact Registry +Three remotes configured — `all` pushes to both: -- **Account:** `jordan@roamrhino.com` -- **Project:** `orchard9` -- **Docker registry:** `us-central1-docker.pkg.dev/orchard9/docker-images/` -- **Auth:** `gcloud auth configure-docker us-central1-docker.pkg.dev` (one-time per machine) -- **Secret Manager:** all production secrets live here under project `orchard9` - - StemeDB root API key secret name: `stemedb-root-api-key` - - Per-project keys follow pattern: `stemedb-key-` +| Remote | Target | Purpose | +|--------|--------|---------| +| `origin` | `github.com:orchard9/stemedb` | Source of truth | +| `gitea` | `git.threesix.ai/jordan/stemedb` | Triggers Woodpecker CI | +| `all` | Both GitHub + Gitea | **Use this for deploys** | + +**NEVER build Docker images locally.** Mac is ARM, cluster is amd64. Always push to Gitea to trigger Kaniko (native amd64 build on-cluster). + +### Deployment + +```bash +# Deploy: push to both remotes (triggers Woodpecker → Kaniko → Zot → kubectl rollout) +git push all main + +# Pipeline: git push → Gitea webhook → Woodpecker CI → Kaniko build → registry.threesix.ai (Zot) → kubectl set image +# Image tags: latest + ${CI_COMMIT_SHA:0:8} +# Build time: ~15-20 min cold, ~2-5 min warm (cargo-chef caches deps) +``` + +Pipeline config: `.woodpecker.yml` — Kaniko builds `stemedb-api` with `--features cluster`, deploys via `kubectl set image statefulset/stemedb`. ### k3s Cluster -- **Kubeconfig:** `~/.kube/orchard9-k3sf.yaml` (separate from GKE contexts — use `--kubeconfig` flag) +- **Kubeconfig:** `~/.kube/orchard9-k3sf.yaml` (use `--kubeconfig` flag) - **Fleet repo:** `/Users/jordanwashburn/Workspace/orchard9/k3s-fleet` -- **Nodes:** 3-node cluster (2 servers + 1 agent), architecture: `amd64` -- **Docker builds:** Must use `--platform linux/amd64` (Mac is ARM) -- **Kustomize base:** `deployments/k8s/base/` — apply with `kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml apply -k deployments/k8s/base//` -- **ClusterSecretStore:** `gcp-secret-manager` (ExternalSecrets Operator, reads from GCP SM above) -- **imagePullSecrets:** `gcr-secret` (pre-configured on cluster nodes) -- **Storage class:** `longhorn` (Longhorn CSI, RWO volumes) -- **Ingress:** Traefik — `ingressClassName: traefik`, entrypoint `websecure` +- **Nodes:** 3-node cluster (2 servers + 1 agent), amd64 +- **Registry:** Zot at `registry.threesix.ai` (in-cluster, namespace `threesix`) +- **Storage:** Longhorn CSI (`storageClassName: longhorn`, RWO) +- **Ingress:** Traefik (`ingressClassName: traefik`) - **TLS:** cert-manager, `ClusterIssuer: letsencrypt-prod` - -### Cloudflare DNS (threesix.ai) - -- **Domain:** `threesix.ai` — all services live at `*.threesix.ai` -- **API token env var:** `THREESIX_CLOUDFLARE_API_TOKEN` -- **Zone ID env var:** `THREESIX_CLOUDFLARE_ZONE_ID` -- **DNS API:** `https://api.cloudflare.com/client/v4/zones/$THREESIX_CLOUDFLARE_ZONE_ID/dns_records` -- To add/update a record, POST/PATCH to that endpoint with `Authorization: Bearer $THREESIX_CLOUDFLARE_API_TOKEN` -- To find Traefik LB IP: `kubectl get svc -n kube-system` (look for Traefik LoadBalancer EXTERNAL-IP) +- **Secrets:** ExternalSecrets Operator → GCP Secret Manager (project `orchard9`) +- **k8s manifests:** `k3s-fleet/deployments/k8s/base/stemedb/` ### Service URLs @@ -492,35 +495,20 @@ All production infra is under the `jordan@roamrhino.com` Google account, GCP pro | StemeDB Gateway (internal) | `http://stemedb-gateway.stemedb.svc:18181` | | StemeDB API (per-pod) | `http://stemedb-{0,1,2}.stemedb-headless.stemedb.svc:18180` | -### Deployment Workflow +### GCP -**Automated (normal):** Push to `main` → Woodpecker CI → Kaniko build → Zot registry → `kubectl set image statefulset/stemedb` +- **Account:** `jordan@roamrhino.com`, project `orchard9` +- **Secret Manager:** `stemedb-root-api-key`, per-project: `stemedb-key-` + +### DNS (Cloudflare) + +- **Domain:** `threesix.ai` — env vars: `THREESIX_CLOUDFLARE_API_TOKEN`, `THREESIX_CLOUDFLARE_ZONE_ID` + +### Verify Deployment ```bash -# Automated deploy: just push to main -git push origin main -# Woodpecker pipeline handles: build → registry.threesix.ai → kubectl rollout - -# Manual deploy (if needed): -docker build --platform linux/amd64 -t registry.threesix.ai/stemedb-api:latest . -docker push registry.threesix.ai/stemedb-api:latest -kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml set image statefulset/stemedb \ - stemedb=registry.threesix.ai/stemedb-api:latest -n stemedb -kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml rollout status statefulset/stemedb -n stemedb --timeout=300s - -# First-time setup only: -# 1. Create GCP secret -echo -n "steme_live_$(openssl rand -hex 24)" | \ - gcloud secrets create stemedb-root-api-key --project=orchard9 \ - --replication-policy=automatic --data-file=- - -# 2. Apply k8s manifests -kubectl apply -k /Users/jordanwashburn/Workspace/orchard9/k3s-fleet/deployments/k8s/base/stemedb/ - -# 3. Add DNS A record (Cloudflare) -TRAEFIK_IP=$(kubectl get svc -n kube-system traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -curl -X POST "https://api.cloudflare.com/client/v4/zones/$THREESIX_CLOUDFLARE_ZONE_ID/dns_records" \ - -H "Authorization: Bearer $THREESIX_CLOUDFLARE_API_TOKEN" \ - -H "Content-Type: application/json" \ - -d "{\"type\":\"A\",\"name\":\"stemedb\",\"content\":\"$TRAEFIK_IP\",\"ttl\":1,\"proxied\":false}" +kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get pods -n stemedb +curl -s https://stemedb.threesix.ai/v1/health +curl -s https://stemedb.threesix.ai/v1/cluster/status +curl -s https://stemedb.threesix.ai/metrics | head -5 ```