rdev/internal
jordan f20fc6c51c
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
feat(saga): implement enterprise-grade resilience architecture
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
..
adapter feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
auth feat: implement Visual Verification API layer (Week 2) 2026-02-03 19:29:40 -07:00
circuitbreaker feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
claudebox fix: use raw JSON responses in claudebox server 2026-02-07 16:41:21 -07:00
cmdlimit feat: Add CI pipeline proxy, DNS alias management, and worker executor system 2026-01-27 21:05:28 -07:00
db feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
domain feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
envutil feat: implement composable monorepo template system with component architecture 2026-01-31 19:11:42 -07:00
handlers feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
logging feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
metrics feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
middleware feat: Add CI pipeline proxy, DNS alias management, and worker executor system 2026-01-27 21:05:28 -07:00
port feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
ratelimit feat: Add CI pipeline proxy, DNS alias management, and worker executor system 2026-01-27 21:05:28 -07:00
sanitize feat: Add claude-config API, security hardening, and testing infrastructure 2026-01-25 01:29:13 -07:00
sdlc fix: cookbook tree runner stdout/stderr separation and bash brace expansion 2026-02-02 15:15:02 -07:00
service feat(saga): implement enterprise-grade resilience architecture 2026-02-08 01:58:02 -07:00
telemetry fix: Use FQDN for k8s service hostnames and remove broken commonLabels 2026-01-31 20:46:04 -07:00
testutil feat: Implement hexagonal architecture with services, webhooks, queue, and telemetry 2026-01-25 19:57:46 -07:00
validate feat: Add CI pipeline proxy, DNS alias management, and worker executor system 2026-01-27 21:05:28 -07:00
webhook fix: go.work race condition with batch components and idempotent provisioning 2026-02-05 12:31:40 -07:00
worker fix: preserve work on build retry, clear stale audit data 2026-02-07 08:40:36 -07:00