stemedb/ai-lookup/features/circuit-breakers.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

107 lines
4.3 KiB
Markdown

# Circuit Breakers
**Last Updated:** 2026-02-03
**Confidence:** High
## Summary
Per-agent circuit breakers temporarily ban misbehaving agents to protect system integrity. Part of "The Shield" (Phase 7D) - the last line of defense after admission control, EigenTrust, and content defense.
**Key Facts:**
- State machine: Closed (normal) → Open (banned) → HalfOpen (testing) → Closed
- 5 failures within 60-second window trips circuit to Open
- Open state lasts 30 seconds, then transitions to HalfOpen
- 1 success in HalfOpen closes circuit (back to normal)
- 1 failure in HalfOpen re-trips circuit
- Middleware runs FIRST (outermost layer) to block before resource consumption
**File Pointers:**
- `crates/stemedb-storage/src/circuit_breaker_store/` - Store trait and implementation
- `crates/stemedb-api/src/middleware/circuit_breaker.rs` - Tower layer
- `crates/stemedb-api/src/handlers/circuit_breaker.rs` - Admin endpoints
- `crates/stemedb-api/src/dto/circuit_breaker.rs` - API types
## Failure Types
| Type | Trigger | Description |
|------|---------|-------------|
| `InvalidSignature` | `IngestError::InvalidSignature` | Cryptographic signature verification failed |
| `InputValidation` | `IngestError::InputValidation` | Malformed JSON, missing fields, invalid values |
| `PowError` | `AdmissionLayer` | Invalid proof-of-work solution |
| `QuotaExceeded` | `MeterLayer` | Agent exceeded quota limit |
| `ApplicationError` | Handler errors | General application errors attributed to agent |
## State Machine
```
┌─────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ 5 failures ┌─────────┐ │
│ CLOSED │ ───────────────► │ OPEN │ │
│ (normal)│ │ (banned)│ │
└─────────┘ └────┬────┘ │
▲ │ │
│ 30 sec timeout │
│ │ │
│ ▼ │
│ 1 success ┌───────────┐ │ 1 failure
└─────────────────────│ HALF_OPEN │─────┘
│ (testing) │
└───────────┘
```
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/v1/admin/circuit-breaker/{agent_id}` | Get circuit status for agent |
| `POST` | `/v1/admin/circuit-breaker/reset` | Manually reset a circuit |
| `GET` | `/v1/admin/circuit-breakers/tripped` | List all Open/HalfOpen circuits |
## Response When Blocked
- **HTTP Status:** 503 Service Unavailable
- **Headers:**
- `X-Circuit-Breaker-State: open`
- `X-Circuit-Breaker-Retry-After: 25` (seconds)
- `X-Circuit-Breaker-Failures: 5`
- `Retry-After: 25` (standard HTTP header)
## Configuration
```rust
CircuitBreakerConfig {
failure_threshold: 5, // Failures to trip
open_duration_secs: 30, // Time in Open state
failure_window_secs: 60, // Window for counting failures
half_open_success_threshold: 1, // Successes to close
}
```
## Middleware Stack Order
Circuit breaker runs FIRST (outermost) to block banned agents before any resource consumption:
```rust
Router::new()
.layer(MeterLayer) // Inner: runs third (quota check)
.layer(AdmissionLayer) // Middle: runs second (PoW check)
.layer(CircuitBreakerLayer) // Outer: runs FIRST (ban check)
```
## What Does NOT Trip Circuit
Infrastructure faults do NOT count as agent misbehavior:
- `StorageError::Backend` - Database issues
- `StorageError::Io` - Disk issues
- `IngestError::Wal` - WAL issues
These are system problems, not agent problems.
## Related Topics
- [Admission Control](./admission-control.md) - PoW-based spam protection
- [Content Defense](./content-defense.md) - Similarity and quality checks
- [TrustRank](./trust-rank.md) - Agent reputation system