Major refactoring to hexagonal (ports & adapters) architecture: - Add service layer (apikey_service, project_service) for business logic - Add webhook system with dispatcher and delivery tracking - Add command queue with priority-based processing - Add rate limiting with sliding window algorithm - Add audit logging for command execution - Add OpenTelemetry integration (traces, metrics, spans) - Add circuit breaker for fault tolerance - Add cached repository wrapper for performance - Add comprehensive validation package - Add Kubernetes client integration for pod management - Add database migrations (allowed_ips, audit_log, rate_limiting, queue, webhooks) - Add network policy and PodDisruptionBudget for k8s - Remove legacy executor and projects/registry packages - Untrack secrets.yaml (now managed via envault) - Add coverage.out to .gitignore - Add e2e test infrastructure with docker-compose - Add comprehensive documentation (API, architecture, operations, plans) - Add golangci-lint config and pre-commit hook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
Runbook: Authentication Failures
Alert
RdevAPIAuthFailures: High rate of authentication failures
Impact
- Legitimate users unable to access API
- Potential security incident (brute force)
- Service degradation
Investigation
1. Confirm the Issue
# Check auth failure metrics
curl -s http://rdev-api:8080/metrics | grep auth_failures
# Check auth logs
kubectl -n rdev logs -l app=rdev-api --since=10m | grep -E "(UNAUTHORIZED|KEY_REVOKED|KEY_EXPIRED|IP_NOT_ALLOWED)"
2. Identify Failure Type
# Count by failure reason
kubectl -n rdev logs -l app=rdev-api --since=10m | \
grep -oE '"code":"[^"]+' | sort | uniq -c | sort -rn
Common reasons:
UNAUTHORIZED- Invalid or missing keyKEY_REVOKED- Key was revokedKEY_EXPIRED- Key has expiredIP_NOT_ALLOWED- IP not in allowlist
3. Check for Attack Patterns
# Check unique IPs making failed requests
kubectl -n rdev logs -l app=rdev-api --since=10m | \
grep UNAUTHORIZED | grep -oE '"client_ip":"[^"]+' | sort | uniq -c | sort -rn
# Check request patterns
kubectl -n rdev logs -l app=rdev-api --since=10m | \
grep UNAUTHORIZED | grep -oE '"path":"[^"]+' | sort | uniq -c | sort -rn
Remediation
If Keys Are Invalid (UNAUTHORIZED)
-
Verify keys exist in database:
kubectl -n rdev exec -it deployment/rdev-api -- sh psql $DATABASE_URL -c "SELECT id, name, key_prefix, revoked_at FROM api_keys;" -
Help users create new keys if needed
-
If brute force detected:
- Block offending IPs at ingress level
- Increase rate limiting
If Keys Are Revoked (KEY_REVOKED)
-
Check who revoked and when:
SELECT id, name, revoked_at, revoked_by FROM api_keys WHERE revoked_at IS NOT NULL; -
Determine if revocation was intentional
-
Issue new keys to affected users if legitimate
If Keys Are Expired (KEY_EXPIRED)
-
Check which keys expired:
SELECT id, name, expires_at FROM api_keys WHERE expires_at < NOW(); -
Issue new keys to affected users
-
Consider extending default expiration if too short
If IP Not Allowed (IP_NOT_ALLOWED)
-
Check which keys have IP restrictions:
SELECT id, name, allowed_ips FROM api_keys WHERE allowed_ips IS NOT NULL; -
Verify client IPs match allowlist
-
Update allowlist if legitimate IPs changed:
- Cloud provider IP ranges change
- User moved networks
If Under Attack
-
Immediate: Block at ingress
# Add to ingress annotations nginx.ingress.kubernetes.io/whitelist-source-range: "10.0.0.0/8,192.168.0.0/16" -
Short-term: Increase rate limits
kubectl -n rdev set env deployment/rdev-api RATE_LIMIT_RPS=2 -
Long-term:
- Implement IP-based blocking
- Add fail2ban-style lockout
- Review API key issuance process
Verification
# Check auth success rate
curl -s http://rdev-api:8080/metrics | grep -E "auth_(requests|failures)"
# Test authentication
curl -H "X-API-Key: $VALID_KEY" http://rdev-api:8080/projects
# Check logs for successful auths
kubectl -n rdev logs -l app=rdev-api --since=5m | grep "request completed" | head -5
Post-Incident
- Review auth failure patterns
- Update IP allowlists if needed
- Communicate with affected users
- Consider additional security measures:
- API key rotation policy
- Automated key expiration alerts
- IP-based anomaly detection