fix(ci): prevent Woodpecker PVC false failures
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Woodpecker's K8s backend creates a PVC per pipeline for workspace sharing. If the agent misses cleanup, stale PVCs cause "already exists" errors that mark pipelines as failed despite all steps succeeding. Two-part fix: 1. Scale woodpecker-agent from 2 to 1 replica (eliminates PVC name race between agents processing the same repo) 2. Add CronJob that garbage-collects wp-* PVCs older than 30 minutes every 5 minutes (handles crash/restart edge cases) Includes dedicated ServiceAccount and least-privilege RBAC (PVC list/delete only in threesix namespace). Ref: https://github.com/woodpecker-ci/woodpecker/issues/1594 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
f85fa181cf
commit
f8554a5e6f
104
deployments/k8s/base/woodpecker-pvc-cleanup.yaml
Normal file
104
deployments/k8s/base/woodpecker-pvc-cleanup.yaml
Normal file
@ -0,0 +1,104 @@
|
||||
# CronJob to garbage-collect stale Woodpecker pipeline PVCs.
|
||||
#
|
||||
# Woodpecker's Kubernetes backend creates a PVC per pipeline for workspace
|
||||
# sharing between step pods. If the agent crashes or restarts, PVCs can leak.
|
||||
# A subsequent pipeline with a colliding name gets "already exists" and is
|
||||
# marked as failed even though all steps succeed.
|
||||
#
|
||||
# This CronJob runs every 5 minutes and deletes wp-* PVCs older than 30 minutes.
|
||||
# Normal pipelines finish in ~12 minutes, so 30 minutes is a safe threshold.
|
||||
#
|
||||
# See: https://github.com/woodpecker-ci/woodpecker/issues/1594
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: woodpecker-pvc-cleanup
|
||||
namespace: threesix
|
||||
labels:
|
||||
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||
app.kubernetes.io/part-of: woodpecker
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: woodpecker-pvc-cleanup
|
||||
namespace: threesix
|
||||
labels:
|
||||
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||
app.kubernetes.io/part-of: woodpecker
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["persistentvolumeclaims"]
|
||||
verbs: ["get", "list", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: woodpecker-pvc-cleanup
|
||||
namespace: threesix
|
||||
labels:
|
||||
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||
app.kubernetes.io/part-of: woodpecker
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: woodpecker-pvc-cleanup
|
||||
namespace: threesix
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: woodpecker-pvc-cleanup
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
---
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: woodpecker-pvc-cleanup
|
||||
namespace: threesix
|
||||
labels:
|
||||
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||
app.kubernetes.io/part-of: woodpecker
|
||||
spec:
|
||||
schedule: "*/5 * * * *"
|
||||
concurrencyPolicy: Forbid
|
||||
successfulJobsHistoryLimit: 1
|
||||
failedJobsHistoryLimit: 3
|
||||
jobTemplate:
|
||||
spec:
|
||||
activeDeadlineSeconds: 60
|
||||
template:
|
||||
spec:
|
||||
serviceAccountName: woodpecker-pvc-cleanup
|
||||
restartPolicy: Never
|
||||
containers:
|
||||
- name: cleanup
|
||||
image: bitnami/kubectl:latest
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
set -e
|
||||
echo "Checking for stale Woodpecker pipeline PVCs..."
|
||||
NOW=$(date +%s)
|
||||
THRESHOLD=1800 # 30 minutes in seconds
|
||||
|
||||
# Get wp-* PVCs as "name creationTimestamp" pairs via jsonpath
|
||||
kubectl get pvc -n threesix \
|
||||
-o jsonpath='{range .items[*]}{.metadata.name} {.metadata.creationTimestamp}{"\n"}{end}' \
|
||||
| grep '^wp-' | while read -r NAME TS; do
|
||||
# Parse ISO timestamp to epoch (busybox date -d handles ISO 8601)
|
||||
CREATED=$(date -d "$TS" +%s 2>/dev/null || echo 0)
|
||||
AGE=$((NOW - CREATED))
|
||||
if [ "$AGE" -gt "$THRESHOLD" ]; then
|
||||
echo "Deleting stale PVC: $NAME (age: ${AGE}s)"
|
||||
kubectl delete pvc -n threesix "$NAME" --wait=false
|
||||
fi
|
||||
done
|
||||
|
||||
echo "Cleanup complete."
|
||||
resources:
|
||||
requests:
|
||||
cpu: 10m
|
||||
memory: 32Mi
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 64Mi
|
||||
Loading…
Reference in New Issue
Block a user