stemedb/ai-lookup/features/circuit-breakers.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

4.3 KiB

Circuit Breakers

Last Updated: 2026-02-03 Confidence: High

Summary

Per-agent circuit breakers temporarily ban misbehaving agents to protect system integrity. Part of "The Shield" (Phase 7D) - the last line of defense after admission control, EigenTrust, and content defense.

Key Facts:

  • State machine: Closed (normal) → Open (banned) → HalfOpen (testing) → Closed
  • 5 failures within 60-second window trips circuit to Open
  • Open state lasts 30 seconds, then transitions to HalfOpen
  • 1 success in HalfOpen closes circuit (back to normal)
  • 1 failure in HalfOpen re-trips circuit
  • Middleware runs FIRST (outermost layer) to block before resource consumption

File Pointers:

  • crates/stemedb-storage/src/circuit_breaker_store/ - Store trait and implementation
  • crates/stemedb-api/src/middleware/circuit_breaker.rs - Tower layer
  • crates/stemedb-api/src/handlers/circuit_breaker.rs - Admin endpoints
  • crates/stemedb-api/src/dto/circuit_breaker.rs - API types

Failure Types

Type Trigger Description
InvalidSignature IngestError::InvalidSignature Cryptographic signature verification failed
InputValidation IngestError::InputValidation Malformed JSON, missing fields, invalid values
PowError AdmissionLayer Invalid proof-of-work solution
QuotaExceeded MeterLayer Agent exceeded quota limit
ApplicationError Handler errors General application errors attributed to agent

State Machine

         ┌─────────────────────────────────────────┐
         │                                         │
         ▼                                         │
    ┌─────────┐    5 failures    ┌─────────┐      │
    │ CLOSED  │ ───────────────► │  OPEN   │      │
    │ (normal)│                  │ (banned)│      │
    └─────────┘                  └────┬────┘      │
         ▲                            │           │
         │                     30 sec timeout     │
         │                            │           │
         │                            ▼           │
         │   1 success         ┌───────────┐     │ 1 failure
         └─────────────────────│ HALF_OPEN │─────┘
                               │ (testing) │
                               └───────────┘

API Endpoints

Method Path Description
GET /v1/admin/circuit-breaker/{agent_id} Get circuit status for agent
POST /v1/admin/circuit-breaker/reset Manually reset a circuit
GET /v1/admin/circuit-breakers/tripped List all Open/HalfOpen circuits

Response When Blocked

  • HTTP Status: 503 Service Unavailable
  • Headers:
    • X-Circuit-Breaker-State: open
    • X-Circuit-Breaker-Retry-After: 25 (seconds)
    • X-Circuit-Breaker-Failures: 5
    • Retry-After: 25 (standard HTTP header)

Configuration

CircuitBreakerConfig {
    failure_threshold: 5,           // Failures to trip
    open_duration_secs: 30,         // Time in Open state
    failure_window_secs: 60,        // Window for counting failures
    half_open_success_threshold: 1, // Successes to close
}

Middleware Stack Order

Circuit breaker runs FIRST (outermost) to block banned agents before any resource consumption:

Router::new()
    .layer(MeterLayer)           // Inner: runs third (quota check)
    .layer(AdmissionLayer)       // Middle: runs second (PoW check)
    .layer(CircuitBreakerLayer)  // Outer: runs FIRST (ban check)

What Does NOT Trip Circuit

Infrastructure faults do NOT count as agent misbehavior:

  • StorageError::Backend - Database issues
  • StorageError::Io - Disk issues
  • IngestError::Wal - WAL issues

These are system problems, not agent problems.