# Circuit Breakers **Last Updated:** 2026-02-03 **Confidence:** High ## Summary Per-agent circuit breakers temporarily ban misbehaving agents to protect system integrity. Part of "The Shield" (Phase 7D) - the last line of defense after admission control, EigenTrust, and content defense. **Key Facts:** - State machine: Closed (normal) → Open (banned) → HalfOpen (testing) → Closed - 5 failures within 60-second window trips circuit to Open - Open state lasts 30 seconds, then transitions to HalfOpen - 1 success in HalfOpen closes circuit (back to normal) - 1 failure in HalfOpen re-trips circuit - Middleware runs FIRST (outermost layer) to block before resource consumption **File Pointers:** - `crates/stemedb-storage/src/circuit_breaker_store/` - Store trait and implementation - `crates/stemedb-api/src/middleware/circuit_breaker.rs` - Tower layer - `crates/stemedb-api/src/handlers/circuit_breaker.rs` - Admin endpoints - `crates/stemedb-api/src/dto/circuit_breaker.rs` - API types ## Failure Types | Type | Trigger | Description | |------|---------|-------------| | `InvalidSignature` | `IngestError::InvalidSignature` | Cryptographic signature verification failed | | `InputValidation` | `IngestError::InputValidation` | Malformed JSON, missing fields, invalid values | | `PowError` | `AdmissionLayer` | Invalid proof-of-work solution | | `QuotaExceeded` | `MeterLayer` | Agent exceeded quota limit | | `ApplicationError` | Handler errors | General application errors attributed to agent | ## State Machine ``` ┌─────────────────────────────────────────┐ │ │ ▼ │ ┌─────────┐ 5 failures ┌─────────┐ │ │ CLOSED │ ───────────────► │ OPEN │ │ │ (normal)│ │ (banned)│ │ └─────────┘ └────┬────┘ │ ▲ │ │ │ 30 sec timeout │ │ │ │ │ ▼ │ │ 1 success ┌───────────┐ │ 1 failure └─────────────────────│ HALF_OPEN │─────┘ │ (testing) │ └───────────┘ ``` ## API Endpoints | Method | Path | Description | |--------|------|-------------| | `GET` | `/v1/admin/circuit-breaker/{agent_id}` | Get circuit status for agent | | `POST` | `/v1/admin/circuit-breaker/reset` | Manually reset a circuit | | `GET` | `/v1/admin/circuit-breakers/tripped` | List all Open/HalfOpen circuits | ## Response When Blocked - **HTTP Status:** 503 Service Unavailable - **Headers:** - `X-Circuit-Breaker-State: open` - `X-Circuit-Breaker-Retry-After: 25` (seconds) - `X-Circuit-Breaker-Failures: 5` - `Retry-After: 25` (standard HTTP header) ## Configuration ```rust CircuitBreakerConfig { failure_threshold: 5, // Failures to trip open_duration_secs: 30, // Time in Open state failure_window_secs: 60, // Window for counting failures half_open_success_threshold: 1, // Successes to close } ``` ## Middleware Stack Order Circuit breaker runs FIRST (outermost) to block banned agents before any resource consumption: ```rust Router::new() .layer(MeterLayer) // Inner: runs third (quota check) .layer(AdmissionLayer) // Middle: runs second (PoW check) .layer(CircuitBreakerLayer) // Outer: runs FIRST (ban check) ``` ## What Does NOT Trip Circuit Infrastructure faults do NOT count as agent misbehavior: - `StorageError::Backend` - Database issues - `StorageError::Io` - Disk issues - `IngestError::Wal` - WAL issues These are system problems, not agent problems. ## Related Topics - [Admission Control](./admission-control.md) - PoW-based spam protection - [Content Defense](./content-defense.md) - Similarity and quality checks - [TrustRank](./trust-rank.md) - Agent reputation system