Commit Graph

3 Commits

Author SHA1 Message Date
jordan
a9ad3d8304 chore: accumulated platform hardening and CI fixes
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
CI / Woodpecker:
- Add explicit depends_on to all .woodpecker.yml steps (rdev + templates)
- Fix skip_tls_verify -> skip-tls-verify (correct Kaniko flag name)
- Add replicasets get/list to deployer RBAC for rollout status
- Skeleton template: add failure:ignore on docs steps, Traefik TLS
  annotations on ingress, depends_on on verify step

Component templates:
- Fix container name in deploy steps (PROJECT_NAME-COMPONENT_NAME)
- Replace kubectl scale with kubectl patch for replicas
- Add post-deploy image verification and rollout status checks
- Applied consistently across all 5 component templates

Adapters:
- gitea: Add HTTP client timeout (30s), context cancellation checks,
  handle 404 on GetRepo/DeleteRepo
- zot: Add retry with exponential backoff (doWithRetry), limit response
  body reads to 10MB
- cockroach: Use net.JoinHostPort for IPv6-safe DSN construction
- woodpecker: Fix error wrapping (%v -> %w)
- redis: Fix error wrapping (%v -> %w)
- deployer: Add context cancellation checks

Services:
- apikey_service: Fix error wrapping (%v -> %w)
- component_deploy: Fix error wrapping (%v -> %w)
- project_infra: Fix error wrapping (%v -> %w)
- webhook/dispatcher: Fix error wrapping (%v -> %w)

Other:
- CLAUDE.md: Add guide links for Gitea, Go 1.25, Woodpecker v3,
  Traefik v3, Zot registry
- circuitbreaker: Add test for error wrapping
- docs: Update deployment, troubleshooting, and runbook docs
- health: Fix error wrapping (%v -> %w)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:16:56 -07:00
jordan
f20fc6c51c feat(saga): implement enterprise-grade resilience architecture
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Fixes issues from code review of resilience implementation:

- Wire saga system in main.go (SagaRepository, SagaExecutor, SagaHandler)
- Fix CompletedSteps() to include skipped steps for dependency resolution
- Fix reverse loop bug in saga compensation (use standard swap pattern)
- Add circuit breaker state change callbacks for Prometheus metrics

Phase 1 (Build Resilience):
- Add failure:retry to all component Kaniko build steps
- Add preflight registry health check before builds
- Add services-deployed sync point to decouple docs from critical path

Phase 2 (API Resilience):
- Add pipeline retry endpoint (POST /projects/{id}/pipelines/{number}/retry)
- Wire circuit breakers with metrics callbacks
- Add /health/circuits endpoint for circuit breaker status

Phase 3 (Saga Engine):
- Full domain model (Saga, SagaStep, RetryPolicy, BackoffType)
- PostgreSQL saga repository with CRUD and step management
- Saga executor with retry, compensation, skip step support
- Saga API handlers with CRUD and control operations

Phase 4 (Observability):
- Add saga metrics (total, step_duration, retry, circuit_breaker_state)
- Add logging fields (saga_id, saga_name, step_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 01:58:02 -07:00
jordan
72d16929ca feat: Implement hexagonal architecture with services, webhooks, queue, and telemetry
Major refactoring to hexagonal (ports & adapters) architecture:

- Add service layer (apikey_service, project_service) for business logic
- Add webhook system with dispatcher and delivery tracking
- Add command queue with priority-based processing
- Add rate limiting with sliding window algorithm
- Add audit logging for command execution
- Add OpenTelemetry integration (traces, metrics, spans)
- Add circuit breaker for fault tolerance
- Add cached repository wrapper for performance
- Add comprehensive validation package
- Add Kubernetes client integration for pod management
- Add database migrations (allowed_ips, audit_log, rate_limiting, queue, webhooks)
- Add network policy and PodDisruptionBudget for k8s
- Remove legacy executor and projects/registry packages
- Untrack secrets.yaml (now managed via envault)
- Add coverage.out to .gitignore
- Add e2e test infrastructure with docker-compose
- Add comprehensive documentation (API, architecture, operations, plans)
- Add golangci-lint config and pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 19:57:46 -07:00