fix(ci): prevent Woodpecker PVC false failures
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Woodpecker's K8s backend creates a PVC per pipeline for workspace sharing. If the agent misses cleanup, stale PVCs cause "already exists" errors that mark pipelines as failed despite all steps succeeding. Two-part fix: 1. Scale woodpecker-agent from 2 to 1 replica (eliminates PVC name race between agents processing the same repo) 2. Add CronJob that garbage-collects wp-* PVCs older than 30 minutes every 5 minutes (handles crash/restart edge cases) Includes dedicated ServiceAccount and least-privilege RBAC (PVC list/delete only in threesix namespace). Ref: https://github.com/woodpecker-ci/woodpecker/issues/1594 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
f85fa181cf
commit
f8554a5e6f
104
deployments/k8s/base/woodpecker-pvc-cleanup.yaml
Normal file
104
deployments/k8s/base/woodpecker-pvc-cleanup.yaml
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
# CronJob to garbage-collect stale Woodpecker pipeline PVCs.
|
||||||
|
#
|
||||||
|
# Woodpecker's Kubernetes backend creates a PVC per pipeline for workspace
|
||||||
|
# sharing between step pods. If the agent crashes or restarts, PVCs can leak.
|
||||||
|
# A subsequent pipeline with a colliding name gets "already exists" and is
|
||||||
|
# marked as failed even though all steps succeed.
|
||||||
|
#
|
||||||
|
# This CronJob runs every 5 minutes and deletes wp-* PVCs older than 30 minutes.
|
||||||
|
# Normal pipelines finish in ~12 minutes, so 30 minutes is a safe threshold.
|
||||||
|
#
|
||||||
|
# See: https://github.com/woodpecker-ci/woodpecker/issues/1594
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ServiceAccount
|
||||||
|
metadata:
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
namespace: threesix
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||||
|
app.kubernetes.io/part-of: woodpecker
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: Role
|
||||||
|
metadata:
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
namespace: threesix
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||||
|
app.kubernetes.io/part-of: woodpecker
|
||||||
|
rules:
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["persistentvolumeclaims"]
|
||||||
|
verbs: ["get", "list", "delete"]
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: RoleBinding
|
||||||
|
metadata:
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
namespace: threesix
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||||
|
app.kubernetes.io/part-of: woodpecker
|
||||||
|
subjects:
|
||||||
|
- kind: ServiceAccount
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
namespace: threesix
|
||||||
|
roleRef:
|
||||||
|
kind: Role
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
apiGroup: rbac.authorization.k8s.io
|
||||||
|
---
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: woodpecker-pvc-cleanup
|
||||||
|
namespace: threesix
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: woodpecker-pvc-cleanup
|
||||||
|
app.kubernetes.io/part-of: woodpecker
|
||||||
|
spec:
|
||||||
|
schedule: "*/5 * * * *"
|
||||||
|
concurrencyPolicy: Forbid
|
||||||
|
successfulJobsHistoryLimit: 1
|
||||||
|
failedJobsHistoryLimit: 3
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
activeDeadlineSeconds: 60
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
serviceAccountName: woodpecker-pvc-cleanup
|
||||||
|
restartPolicy: Never
|
||||||
|
containers:
|
||||||
|
- name: cleanup
|
||||||
|
image: bitnami/kubectl:latest
|
||||||
|
command:
|
||||||
|
- /bin/sh
|
||||||
|
- -c
|
||||||
|
- |
|
||||||
|
set -e
|
||||||
|
echo "Checking for stale Woodpecker pipeline PVCs..."
|
||||||
|
NOW=$(date +%s)
|
||||||
|
THRESHOLD=1800 # 30 minutes in seconds
|
||||||
|
|
||||||
|
# Get wp-* PVCs as "name creationTimestamp" pairs via jsonpath
|
||||||
|
kubectl get pvc -n threesix \
|
||||||
|
-o jsonpath='{range .items[*]}{.metadata.name} {.metadata.creationTimestamp}{"\n"}{end}' \
|
||||||
|
| grep '^wp-' | while read -r NAME TS; do
|
||||||
|
# Parse ISO timestamp to epoch (busybox date -d handles ISO 8601)
|
||||||
|
CREATED=$(date -d "$TS" +%s 2>/dev/null || echo 0)
|
||||||
|
AGE=$((NOW - CREATED))
|
||||||
|
if [ "$AGE" -gt "$THRESHOLD" ]; then
|
||||||
|
echo "Deleting stale PVC: $NAME (age: ${AGE}s)"
|
||||||
|
kubectl delete pvc -n threesix "$NAME" --wait=false
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "Cleanup complete."
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 10m
|
||||||
|
memory: 32Mi
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 64Mi
|
||||||
Loading…
Reference in New Issue
Block a user