# Runbook: Authentication Failures ## Alert **RdevAPIAuthFailures**: High rate of authentication failures ## Impact - Legitimate users unable to access API - Potential security incident (brute force) - Service degradation ## Investigation ### 1. Confirm the Issue ```bash # Check auth failure metrics curl -s http://rdev-api:8080/metrics | grep auth_failures # Check auth logs kubectl -n rdev logs -l app=rdev-api --since=10m | grep -E "(UNAUTHORIZED|KEY_REVOKED|KEY_EXPIRED|IP_NOT_ALLOWED)" ``` ### 2. Identify Failure Type ```bash # Count by failure reason kubectl -n rdev logs -l app=rdev-api --since=10m | \ grep -oE '"code":"[^"]+' | sort | uniq -c | sort -rn ``` Common reasons: - `UNAUTHORIZED` - Invalid or missing key - `KEY_REVOKED` - Key was revoked - `KEY_EXPIRED` - Key has expired - `IP_NOT_ALLOWED` - IP not in allowlist ### 3. Check for Attack Patterns ```bash # Check unique IPs making failed requests kubectl -n rdev logs -l app=rdev-api --since=10m | \ grep UNAUTHORIZED | grep -oE '"client_ip":"[^"]+' | sort | uniq -c | sort -rn # Check request patterns kubectl -n rdev logs -l app=rdev-api --since=10m | \ grep UNAUTHORIZED | grep -oE '"path":"[^"]+' | sort | uniq -c | sort -rn ``` ## Remediation ### If Keys Are Invalid (UNAUTHORIZED) 1. Verify keys exist in database: ```bash kubectl -n rdev exec -it deployment/rdev-api -- sh psql $DATABASE_URL -c "SELECT id, name, key_prefix, revoked_at FROM api_keys;" ``` 2. Help users create new keys if needed 3. If brute force detected: - Block offending IPs at ingress level - Increase rate limiting ### If Keys Are Revoked (KEY_REVOKED) 1. Check who revoked and when: ```sql SELECT id, name, revoked_at, revoked_by FROM api_keys WHERE revoked_at IS NOT NULL; ``` 2. Determine if revocation was intentional 3. Issue new keys to affected users if legitimate ### If Keys Are Expired (KEY_EXPIRED) 1. Check which keys expired: ```sql SELECT id, name, expires_at FROM api_keys WHERE expires_at < NOW(); ``` 2. Issue new keys to affected users 3. Consider extending default expiration if too short ### If IP Not Allowed (IP_NOT_ALLOWED) 1. Check which keys have IP restrictions: ```sql SELECT id, name, allowed_ips FROM api_keys WHERE allowed_ips IS NOT NULL; ``` 2. Verify client IPs match allowlist 3. Update allowlist if legitimate IPs changed: - Cloud provider IP ranges change - User moved networks ### If Under Attack 1. **Immediate**: Block at ingress using Traefik ipAllowList Middleware ```yaml # Use Traefik ipAllowList Middleware CRD instead: # apiVersion: traefik.io/v1alpha1 # kind: Middleware # metadata: # name: internal-only # spec: # ipAllowList: # sourceRange: # - "10.0.0.0/8" # - "192.168.0.0/16" ``` 2. **Short-term**: Increase rate limits ```bash kubectl -n rdev set env deployment/rdev-api RATE_LIMIT_RPS=2 ``` 3. **Long-term**: - Implement IP-based blocking - Add fail2ban-style lockout - Review API key issuance process ## Verification ```bash # Check auth success rate curl -s http://rdev-api:8080/metrics | grep -E "auth_(requests|failures)" # Test authentication curl -H "X-API-Key: $VALID_KEY" http://rdev-api:8080/projects # Check logs for successful auths kubectl -n rdev logs -l app=rdev-api --since=5m | grep "request completed" | head -5 ``` ## Post-Incident 1. Review auth failure patterns 2. Update IP allowlists if needed 3. Communicate with affected users 4. Consider additional security measures: - API key rotation policy - Automated key expiration alerts - IP-based anomaly detection