rdev/docs/plans/THREESIX_INFRASTRUCTURE.md
jordan 72d16929ca feat: Implement hexagonal architecture with services, webhooks, queue, and telemetry
Major refactoring to hexagonal (ports & adapters) architecture:

- Add service layer (apikey_service, project_service) for business logic
- Add webhook system with dispatcher and delivery tracking
- Add command queue with priority-based processing
- Add rate limiting with sliding window algorithm
- Add audit logging for command execution
- Add OpenTelemetry integration (traces, metrics, spans)
- Add circuit breaker for fault tolerance
- Add cached repository wrapper for performance
- Add comprehensive validation package
- Add Kubernetes client integration for pod management
- Add database migrations (allowed_ips, audit_log, rate_limiting, queue, webhooks)
- Add network policy and PodDisruptionBudget for k8s
- Remove legacy executor and projects/registry packages
- Untrack secrets.yaml (now managed via envault)
- Add coverage.out to .gitignore
- Add e2e test infrastructure with docker-compose
- Add comprehensive documentation (API, architecture, operations, plans)
- Add golangci-lint config and pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 19:57:46 -07:00

931 lines
24 KiB
Markdown

# threesix.ai Infrastructure Implementation Plan
> Self-hosted git, CI/CD, and deployment infrastructure for agent-driven development.
## Overview
Replace GitHub dependency with self-hosted infrastructure on k3s:
- **soft-serve** - Git server (SSH-based, minimal)
- **Zot** - Container registry (OCI-native)
- **Woodpecker** - CI/CD pipelines
- **rdev-api** - Orchestration layer with DNS management
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ threesix.ai │
│ │
│ git.threesix.ai ──────▶ soft-serve (SSH :22) │
│ registry.threesix.ai ─▶ zot (internal only, HTTPS for UI) │
│ ci.threesix.ai ───────▶ woodpecker (web UI) │
│ *.threesix.ai ────────▶ project deployments │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ k3s cluster │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ soft-serve │───▶│ woodpecker │───▶│ zot │ │
│ │ (git repos) │ │ (CI/CD) │ │ (registry) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ └───────────▶│ rdev-api │◀──────────┘ │
│ │ │ │
│ │ - Create repos │
│ │ - Deploy apps │
│ │ - Manage DNS │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Cloudflare │ │
│ │ DNS API │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Configuration
### Credentials (from .secrets)
| Key | Value | Purpose |
|-----|-------|---------|
| CLOUDFLARE_API_TOKEN | `nGoDhG6Za...` | DNS management |
| CLOUDFLARE_ZONE_ID | `e0bc8d51...` | threesix.ai zone |
### Network
| Resource | Value |
|----------|-------|
| External IP | 208.122.204.172 |
| Let's Encrypt Email | jordan@threesix.ai |
| Domain | threesix.ai |
### Admin Access
```
SSH Public Key: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDZwQF0Ro0E0foFo0oro/NrfUb5abEec/A0OP2qO8dVn jordanwashburn@jordanmacstudio.lan
```
---
## Phase 1: Foundation (K8s Infrastructure)
### 1.1 Create Namespace and Secrets
```yaml
# deployments/k8s/base/threesix/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: threesix
---
# Cloudflare API secret for cert-manager and rdev-api
apiVersion: v1
kind: Secret
metadata:
name: cloudflare-api
namespace: threesix
type: Opaque
stringData:
api-token: "${CLOUDFLARE_API_TOKEN}"
zone-id: "${CLOUDFLARE_ZONE_ID}"
```
### 1.2 Configure cert-manager for Wildcard Certs
```yaml
# deployments/k8s/base/threesix/cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-threesix
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: jordan@threesix.ai
privateKeySecretRef:
name: letsencrypt-threesix-account
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api
key: api-token
selector:
dnsZones:
- "threesix.ai"
---
# Wildcard certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: threesix-wildcard
namespace: threesix
spec:
secretName: threesix-wildcard-tls
issuerRef:
name: letsencrypt-threesix
kind: ClusterIssuer
dnsNames:
- "threesix.ai"
- "*.threesix.ai"
```
### 1.3 Deploy soft-serve
```yaml
# deployments/k8s/base/threesix/soft-serve.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: soft-serve-config
namespace: threesix
data:
config.yaml: |
name: threesix
log_format: text
ssh:
listen_addr: :22
public_url: ssh://git.threesix.ai
max_timeout: 30
idle_timeout: 120
http:
listen_addr: :23231
public_url: https://git.threesix.ai
stats:
listen_addr: :23233
initial_admin_keys:
- "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDZwQF0Ro0E0foFo0oro/NrfUb5abEec/A0OP2qO8dVn jordanwashburn"
# Allow anyone to read public repos, admins can create
anon_access: read-only
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: soft-serve
namespace: threesix
spec:
serviceName: soft-serve
replicas: 1
selector:
matchLabels:
app: soft-serve
template:
metadata:
labels:
app: soft-serve
spec:
containers:
- name: soft-serve
image: charmcli/soft-serve:latest
ports:
- containerPort: 22
name: ssh
- containerPort: 23231
name: http
- containerPort: 23233
name: stats
volumeMounts:
- name: data
mountPath: /soft-serve
- name: config
mountPath: /soft-serve/config.yaml
subPath: config.yaml
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "256Mi"
cpu: "500m"
volumes:
- name: config
configMap:
name: soft-serve-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: soft-serve
namespace: threesix
spec:
selector:
app: soft-serve
ports:
- name: ssh
port: 22
targetPort: 22
- name: http
port: 80
targetPort: 23231
- name: stats
port: 23233
targetPort: 23233
---
# External SSH access via LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: soft-serve-ssh
namespace: threesix
spec:
type: LoadBalancer
selector:
app: soft-serve
ports:
- name: ssh
port: 22
targetPort: 22
---
# HTTP access via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: soft-serve
namespace: threesix
annotations:
cert-manager.io/cluster-issuer: letsencrypt-threesix
spec:
ingressClassName: traefik
tls:
- hosts:
- git.threesix.ai
secretName: git-threesix-tls
rules:
- host: git.threesix.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: soft-serve
port:
number: 80
```
### 1.4 Deploy Zot Registry
```yaml
# deployments/k8s/base/threesix/zot.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: zot-config
namespace: threesix
data:
config.json: |
{
"distSpecVersion": "1.1.0",
"storage": {
"rootDirectory": "/var/lib/zot",
"gc": true,
"gcDelay": "1h"
},
"http": {
"address": "0.0.0.0",
"port": "5000"
},
"log": {
"level": "info"
},
"extensions": {
"search": {
"enable": true
},
"ui": {
"enable": true
}
}
}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zot
namespace: threesix
spec:
serviceName: zot
replicas: 1
selector:
matchLabels:
app: zot
template:
metadata:
labels:
app: zot
spec:
containers:
- name: zot
image: ghcr.io/project-zot/zot-linux-amd64:latest
ports:
- containerPort: 5000
volumeMounts:
- name: data
mountPath: /var/lib/zot
- name: config
mountPath: /etc/zot/config.json
subPath: config.json
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "1000m"
volumes:
- name: config
configMap:
name: zot-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
name: zot
namespace: threesix
spec:
selector:
app: zot
ports:
- port: 5000
targetPort: 5000
---
# Internal DNS name for cluster access
# Pods can pull from: zot.threesix.svc.cluster.local:5000/image:tag
---
# Optional: External UI access
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: zot
namespace: threesix
annotations:
cert-manager.io/cluster-issuer: letsencrypt-threesix
spec:
ingressClassName: traefik
tls:
- hosts:
- registry.threesix.ai
secretName: registry-threesix-tls
rules:
- host: registry.threesix.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: zot
port:
number: 5000
```
### 1.5 Initial DNS Records
Create via Cloudflare API or dashboard:
| Type | Name | Value | Proxy |
|------|------|-------|-------|
| A | git | 208.122.204.172 | No (SSH needs direct) |
| A | registry | 208.122.204.172 | No |
| A | ci | 208.122.204.172 | Yes (optional) |
| A | * | 208.122.204.172 | Yes (optional) |
---
## Phase 2: CI/CD (Woodpecker)
### 2.1 Deploy Woodpecker Server
```yaml
# deployments/k8s/base/threesix/woodpecker-server.yaml
apiVersion: v1
kind: Secret
metadata:
name: woodpecker-secrets
namespace: threesix
type: Opaque
stringData:
# Generate with: openssl rand -hex 32
WOODPECKER_AGENT_SECRET: "${WOODPECKER_AGENT_SECRET}"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: woodpecker-server
namespace: threesix
spec:
replicas: 1
selector:
matchLabels:
app: woodpecker-server
template:
metadata:
labels:
app: woodpecker-server
spec:
containers:
- name: woodpecker
image: woodpeckerci/woodpecker-server:latest
ports:
- containerPort: 8000
env:
- name: WOODPECKER_HOST
value: "https://ci.threesix.ai"
- name: WOODPECKER_OPEN
value: "false"
- name: WOODPECKER_ADMIN
value: "jordan"
# Soft-serve / generic git integration
- name: WOODPECKER_GITEA
value: "false"
- name: WOODPECKER_WEBHOOK_HOST
value: "http://woodpecker-server.threesix.svc:8000"
envFrom:
- secretRef:
name: woodpecker-secrets
volumeMounts:
- name: data
mountPath: /var/lib/woodpecker
volumes:
- name: data
persistentVolumeClaim:
claimName: woodpecker-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: woodpecker-data
namespace: threesix
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
name: woodpecker-server
namespace: threesix
spec:
selector:
app: woodpecker-server
ports:
- name: http
port: 8000
targetPort: 8000
- name: grpc
port: 9000
targetPort: 9000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: woodpecker
namespace: threesix
annotations:
cert-manager.io/cluster-issuer: letsencrypt-threesix
spec:
ingressClassName: traefik
tls:
- hosts:
- ci.threesix.ai
secretName: ci-threesix-tls
rules:
- host: ci.threesix.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: woodpecker-server
port:
number: 8000
```
### 2.2 Deploy Woodpecker Agent (with Kaniko)
```yaml
# deployments/k8s/base/threesix/woodpecker-agent.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: woodpecker-agent
namespace: threesix
spec:
replicas: 2
selector:
matchLabels:
app: woodpecker-agent
template:
metadata:
labels:
app: woodpecker-agent
spec:
containers:
- name: agent
image: woodpeckerci/woodpecker-agent:latest
env:
- name: WOODPECKER_SERVER
value: "woodpecker-server.threesix.svc:9000"
- name: WOODPECKER_BACKEND
value: "kubernetes"
- name: WOODPECKER_BACKEND_K8S_NAMESPACE
value: "threesix"
- name: WOODPECKER_BACKEND_K8S_STORAGE_CLASS
value: "longhorn"
- name: WOODPECKER_BACKEND_K8S_VOLUME_SIZE
value: "10Gi"
envFrom:
- secretRef:
name: woodpecker-secrets
serviceAccountName: woodpecker-agent
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: woodpecker-agent
namespace: threesix
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: woodpecker-agent
namespace: threesix
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "secrets", "configmaps", "persistentvolumeclaims"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: woodpecker-agent
namespace: threesix
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: woodpecker-agent
subjects:
- kind: ServiceAccount
name: woodpecker-agent
namespace: threesix
```
---
## Phase 3: rdev-api Extensions
### 3.1 New Port Interfaces
```go
// internal/port/git.go
package port
import "context"
// GitRepository manages git repositories.
type GitRepository interface {
// CreateRepo creates a new git repository.
CreateRepo(ctx context.Context, name, description string) (*Repo, error)
// DeleteRepo deletes a repository.
DeleteRepo(ctx context.Context, name string) error
// ListRepos returns all repositories.
ListRepos(ctx context.Context) ([]*Repo, error)
// GetRepo returns a single repository.
GetRepo(ctx context.Context, name string) (*Repo, error)
// AddCollaborator adds a user's SSH key to a repo.
AddCollaborator(ctx context.Context, repo, keyName, publicKey string) error
// AddWebhook adds a webhook to trigger on push.
AddWebhook(ctx context.Context, repo, url, secret string) error
}
// Repo represents a git repository.
type Repo struct {
Name string
Description string
CloneSSH string // ssh://git@git.threesix.ai/name.git
CloneHTTP string // https://git.threesix.ai/name.git
CreatedAt time.Time
}
```
```go
// internal/port/dns.go
package port
import "context"
// DNSProvider manages DNS records.
type DNSProvider interface {
// CreateRecord creates a DNS record.
CreateRecord(ctx context.Context, record DNSRecord) error
// DeleteRecord removes a DNS record.
DeleteRecord(ctx context.Context, recordType, name string) error
// ListRecords returns all records for the zone.
ListRecords(ctx context.Context) ([]*DNSRecord, error)
}
// DNSRecord represents a DNS record.
type DNSRecord struct {
Type string // A, CNAME, TXT
Name string // subdomain or @ for root
Content string // IP or target
TTL int // seconds, 1 = auto
Proxied bool // Cloudflare proxy
}
```
```go
// internal/port/deployer.go
package port
import "context"
// Deployer manages application deployments.
type Deployer interface {
// Deploy creates or updates a deployment.
Deploy(ctx context.Context, spec DeploySpec) error
// Undeploy removes a deployment.
Undeploy(ctx context.Context, projectName string) error
// GetStatus returns deployment status.
GetStatus(ctx context.Context, projectName string) (*DeployStatus, error)
}
// DeploySpec defines a deployment.
type DeploySpec struct {
ProjectName string
Image string
Domain string // e.g., "myapp.threesix.ai"
Port int // container port
Replicas int
EnvVars map[string]string
Secrets map[string]string
}
// DeployStatus represents current deployment state.
type DeployStatus struct {
ProjectName string
Image string
Replicas int
ReadyReplicas int
URL string
Status string // "running", "pending", "failed"
}
```
### 3.2 New Adapters
```
internal/adapter/
├── softserve/ # soft-serve SSH/API client
│ └── client.go
├── cloudflare/ # Cloudflare DNS API client
│ └── client.go
├── deployer/ # K8s deployment manager
│ └── deployer.go
└── registry/ # Zot registry client (optional)
└── client.go
```
### 3.3 New Handlers
```go
// internal/handlers/projects_git.go
// POST /projects/{id}/repo - Create git repo for project
// DELETE /projects/{id}/repo - Delete git repo
// GET /projects/{id}/repo - Get repo info
// POST /projects/{id}/deploy - Deploy project
// DELETE /projects/{id}/deploy - Undeploy project
// GET /projects/{id}/deploy/status - Get deployment status
// POST /projects/{id}/domain - Set custom domain
// DELETE /projects/{id}/domain - Remove custom domain
```
### 3.4 New API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| POST | `/projects/{id}/repo` | Create git repo |
| DELETE | `/projects/{id}/repo` | Delete git repo |
| GET | `/projects/{id}/repo` | Get repo info (clone URLs) |
| POST | `/projects/{id}/deploy` | Deploy from image |
| DELETE | `/projects/{id}/deploy` | Remove deployment |
| GET | `/projects/{id}/deploy/status` | Deployment status |
| POST | `/projects/{id}/domain` | Add custom domain |
| DELETE | `/projects/{id}/domain` | Remove custom domain |
---
## Phase 4: Database Schema
### 4.1 Migration: Add Git and Deployment Fields
```sql
-- migrations/010_project_infrastructure.up.sql
-- Add infrastructure fields to projects
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
git_repo_name VARCHAR(255);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
git_clone_ssh VARCHAR(512);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
git_clone_http VARCHAR(512);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
domain VARCHAR(255);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
custom_domain VARCHAR(255);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
deployment_image VARCHAR(512);
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
deployment_status VARCHAR(50) DEFAULT 'none';
ALTER TABLE projects ADD COLUMN IF NOT EXISTS
deployment_replicas INTEGER DEFAULT 1;
-- Index for domain lookups
CREATE INDEX IF NOT EXISTS idx_projects_domain ON projects(domain);
CREATE INDEX IF NOT EXISTS idx_projects_custom_domain ON projects(custom_domain);
```
---
## Phase 5: Pantheon Integration
### 5.1 New Commands for Agents
```
/project create <name>
→ Creates project in DB
→ Creates git repo in soft-serve
→ Creates DNS record (<name>.threesix.ai)
→ Returns clone URL
/project deploy <name>
→ Triggers build from latest commit
→ Deploys to k8s
→ Returns live URL
/project status <name>
→ Shows git repo, deployment status, URLs
/project domain <name> <custom-domain>
→ Adds custom domain to project
→ Instructions for DNS pointing
```
### 5.2 Webhook Flow
```
Agent pushes code
soft-serve receives push
Webhook fires to Woodpecker
Woodpecker reads .woodpecker.yml
Kaniko builds image, pushes to zot
Woodpecker calls rdev-api: POST /projects/{id}/deploy
rdev-api creates/updates K8s resources
Project live at https://{name}.threesix.ai
```
---
## Implementation Checklist
### Phase 1: Foundation
- [ ] Create `threesix` namespace
- [ ] Create Cloudflare API secret
- [ ] Configure ClusterIssuer for DNS-01 challenge
- [ ] Request wildcard certificate
- [ ] Deploy soft-serve StatefulSet
- [ ] Configure soft-serve LoadBalancer for SSH
- [ ] Deploy Zot registry
- [ ] Create initial DNS records (git, registry, ci, wildcard)
- [ ] Test: `ssh git@git.threesix.ai` works
- [ ] Test: `https://registry.threesix.ai` shows Zot UI
### Phase 2: CI/CD
- [ ] Generate Woodpecker agent secret
- [ ] Deploy Woodpecker server
- [ ] Deploy Woodpecker agents
- [ ] Configure soft-serve webhook to Woodpecker
- [ ] Test: push triggers build
- [ ] Test: Kaniko builds and pushes to Zot
### Phase 3: rdev-api
- [ ] Add GitRepository port interface
- [ ] Add DNSProvider port interface
- [ ] Add Deployer port interface
- [ ] Implement soft-serve adapter
- [ ] Implement Cloudflare adapter
- [ ] Implement K8s deployer adapter
- [ ] Add database migration
- [ ] Add new handlers
- [ ] Test: API can create repos
- [ ] Test: API can manage DNS
- [ ] Test: API can deploy apps
### Phase 4: Integration
- [ ] Wire up webhook: build → deploy
- [ ] Add project commands to Pantheon
- [ ] Test: end-to-end "create project" → "push code" → "live site"
### Phase 5: Polish
- [ ] Custom domain support
- [ ] Build notifications to Pantheon
- [ ] Deployment logs streaming
- [ ] Resource limits per project
- [ ] Usage metrics
---
## Resource Estimates
| Component | CPU Request | Memory Request | Storage |
|-----------|-------------|----------------|---------|
| soft-serve | 50m | 64Mi | 10Gi |
| Zot | 100m | 128Mi | 50Gi |
| Woodpecker Server | 100m | 128Mi | 5Gi |
| Woodpecker Agent (x2) | 200m each | 256Mi each | - |
| **Total** | ~650m | ~832Mi | 65Gi |
---
## Security Considerations
1. **soft-serve admin key** - Only jordan's key is admin initially
2. **Registry access** - Internal only, no auth needed (ClusterIP)
3. **Woodpecker** - Closed registration, admin-only access
4. **Cloudflare token** - Scoped to DNS edit only
5. **Deploy permissions** - rdev-api ServiceAccount limited to `threesix` and `projects` namespaces
---
## Next Steps
1. Review this plan
2. I deploy Phase 1 infrastructure
3. Test git and registry
4. Deploy Phase 2 CI/CD
5. Implement Phase 3 rdev-api changes
6. Integration testing
7. Pantheon integration