Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
107 lines
4.3 KiB
Markdown
107 lines
4.3 KiB
Markdown
# Circuit Breakers
|
|
|
|
**Last Updated:** 2026-02-03
|
|
**Confidence:** High
|
|
|
|
## Summary
|
|
|
|
Per-agent circuit breakers temporarily ban misbehaving agents to protect system integrity. Part of "The Shield" (Phase 7D) - the last line of defense after admission control, EigenTrust, and content defense.
|
|
|
|
**Key Facts:**
|
|
- State machine: Closed (normal) → Open (banned) → HalfOpen (testing) → Closed
|
|
- 5 failures within 60-second window trips circuit to Open
|
|
- Open state lasts 30 seconds, then transitions to HalfOpen
|
|
- 1 success in HalfOpen closes circuit (back to normal)
|
|
- 1 failure in HalfOpen re-trips circuit
|
|
- Middleware runs FIRST (outermost layer) to block before resource consumption
|
|
|
|
**File Pointers:**
|
|
- `crates/stemedb-storage/src/circuit_breaker_store/` - Store trait and implementation
|
|
- `crates/stemedb-api/src/middleware/circuit_breaker.rs` - Tower layer
|
|
- `crates/stemedb-api/src/handlers/circuit_breaker.rs` - Admin endpoints
|
|
- `crates/stemedb-api/src/dto/circuit_breaker.rs` - API types
|
|
|
|
## Failure Types
|
|
|
|
| Type | Trigger | Description |
|
|
|------|---------|-------------|
|
|
| `InvalidSignature` | `IngestError::InvalidSignature` | Cryptographic signature verification failed |
|
|
| `InputValidation` | `IngestError::InputValidation` | Malformed JSON, missing fields, invalid values |
|
|
| `PowError` | `AdmissionLayer` | Invalid proof-of-work solution |
|
|
| `QuotaExceeded` | `MeterLayer` | Agent exceeded quota limit |
|
|
| `ApplicationError` | Handler errors | General application errors attributed to agent |
|
|
|
|
## State Machine
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ │
|
|
▼ │
|
|
┌─────────┐ 5 failures ┌─────────┐ │
|
|
│ CLOSED │ ───────────────► │ OPEN │ │
|
|
│ (normal)│ │ (banned)│ │
|
|
└─────────┘ └────┬────┘ │
|
|
▲ │ │
|
|
│ 30 sec timeout │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ 1 success ┌───────────┐ │ 1 failure
|
|
└─────────────────────│ HALF_OPEN │─────┘
|
|
│ (testing) │
|
|
└───────────┘
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| `GET` | `/v1/admin/circuit-breaker/{agent_id}` | Get circuit status for agent |
|
|
| `POST` | `/v1/admin/circuit-breaker/reset` | Manually reset a circuit |
|
|
| `GET` | `/v1/admin/circuit-breakers/tripped` | List all Open/HalfOpen circuits |
|
|
|
|
## Response When Blocked
|
|
|
|
- **HTTP Status:** 503 Service Unavailable
|
|
- **Headers:**
|
|
- `X-Circuit-Breaker-State: open`
|
|
- `X-Circuit-Breaker-Retry-After: 25` (seconds)
|
|
- `X-Circuit-Breaker-Failures: 5`
|
|
- `Retry-After: 25` (standard HTTP header)
|
|
|
|
## Configuration
|
|
|
|
```rust
|
|
CircuitBreakerConfig {
|
|
failure_threshold: 5, // Failures to trip
|
|
open_duration_secs: 30, // Time in Open state
|
|
failure_window_secs: 60, // Window for counting failures
|
|
half_open_success_threshold: 1, // Successes to close
|
|
}
|
|
```
|
|
|
|
## Middleware Stack Order
|
|
|
|
Circuit breaker runs FIRST (outermost) to block banned agents before any resource consumption:
|
|
|
|
```rust
|
|
Router::new()
|
|
.layer(MeterLayer) // Inner: runs third (quota check)
|
|
.layer(AdmissionLayer) // Middle: runs second (PoW check)
|
|
.layer(CircuitBreakerLayer) // Outer: runs FIRST (ban check)
|
|
```
|
|
|
|
## What Does NOT Trip Circuit
|
|
|
|
Infrastructure faults do NOT count as agent misbehavior:
|
|
- `StorageError::Backend` - Database issues
|
|
- `StorageError::Io` - Disk issues
|
|
- `IngestError::Wal` - WAL issues
|
|
|
|
These are system problems, not agent problems.
|
|
|
|
## Related Topics
|
|
|
|
- [Admission Control](./admission-control.md) - PoW-based spam protection
|
|
- [Content Defense](./content-defense.md) - Similarity and quality checks
|
|
- [TrustRank](./trust-rank.md) - Agent reputation system
|