# Runbook: Pod Not Found ## Alert **RdevAPIProjectNotFound**: Project pod not found errors increasing ## Impact - Users cannot execute commands on their projects - API returns 404 for valid project IDs ## Investigation ### 1. Confirm the Issue ```bash # Check for NOT_FOUND errors in logs kubectl -n rdev logs -l app=rdev-api --since=10m | grep "project not found" # Check metrics curl -s http://rdev-api:8080/metrics | grep 'http_requests_total.*status="404"' ``` ### 2. Verify Target Pods Exist ```bash # List all project pods kubectl -n rdev get pods -l rdev.orchard9.ai/project=true # Check specific project kubectl -n rdev get pods -l rdev.orchard9.ai/project-id= ``` ### 3. Check Pod Discovery ```bash # Verify API can see pods kubectl -n rdev exec -it deployment/rdev-api -- sh curl localhost:8080/projects # Check RBAC permissions kubectl auth can-i list pods -n rdev --as=system:serviceaccount:rdev:rdev-api ``` ### 4. Common Causes - **Pod terminated**: Project pod was deleted or crashed - **Wrong namespace**: API looking in wrong namespace - **Missing labels**: Pod missing required labels - **RBAC issues**: API can't list pods - **Cache stale**: Project list cache is outdated ## Remediation ### If Pod Is Missing 1. Check if pod should exist: ```bash kubectl -n rdev get deployments ``` 2. Recreate if needed: ```bash kubectl -n rdev apply -f ``` ### If Labels Are Wrong 1. Check current labels: ```bash kubectl -n rdev get pod --show-labels ``` 2. Add required labels: ```bash kubectl -n rdev label pod rdev.orchard9.ai/project=true kubectl -n rdev label pod rdev.orchard9.ai/project-id= ``` ### If RBAC Is Broken 1. Verify ServiceAccount: ```bash kubectl -n rdev get serviceaccount rdev-api ``` 2. Check RoleBinding: ```bash kubectl -n rdev get rolebinding rdev-api-binding -o yaml ``` 3. Reapply RBAC: ```bash kubectl apply -f deployments/k8s/base/rdev-api.yaml ``` ### If Cache Is Stale 1. Force cache refresh by restarting: ```bash kubectl -n rdev rollout restart deployment/rdev-api ``` 2. Or reduce cache TTL: ```bash kubectl -n rdev set env deployment/rdev-api CACHE_TTL=5s ``` ### If Wrong Namespace 1. Check configured namespace: ```bash kubectl -n rdev get deployment rdev-api -o jsonpath='{.spec.template.spec.containers[0].env}' | jq ``` 2. Update if wrong: ```bash kubectl -n rdev set env deployment/rdev-api RDEV_NAMESPACE=rdev ``` ## Verification ```bash # List projects from API curl -H "X-API-Key: $API_KEY" http://rdev-api:8080/projects # Get specific project curl -H "X-API-Key: $API_KEY" http://rdev-api:8080/projects/ # Execute test command curl -X POST -H "X-API-Key: $API_KEY" -H "Content-Type: application/json" \ http://rdev-api:8080/projects//shell \ -d '{"command": "echo hello"}' ``` ## Post-Incident 1. Review pod lifecycle management 2. Consider adding pod status monitoring 3. Review label conventions 4. Add alerts for project pod terminations