stemedb/ai-lookup/features/content-defense.md
jordan a734be3a0d feat: Phase 7 Content Defense + code structure refactoring
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching

Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:44:05 -07:00

249 lines
7.8 KiB
Markdown

# Content Defense (The Shield)
Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph.
## Overview
Content Defense provides three layers of protection:
1. **MinHash + LSH**: Near-duplicate detection with O(1) average-case lookup
2. **Quality Scoring**: Heuristic-based spam detection (entropy, length, structure)
3. **Quarantine Store**: Suspicious assertions held for admin review
Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review.
## Key Concepts
### Quarantine Reasons
| Reason | Description | Trigger |
|--------|-------------|---------|
| `LowQuality` | Content failed quality checks | score < 0.4 |
| `Duplicate` | Near-duplicate detected | Jaccard >= 0.9 |
| `UntrustedHighConfidence` | Suspicious pattern | trust < 0.5 AND confidence > 0.8 |
| `PatternMatch` | Known spam pattern | Pattern match |
### Quality Scoring
The quality score is computed from multiple signals:
| Component | Weight | Description |
|-----------|--------|-------------|
| Entropy | 40% | Shannon entropy (low = repetitive/random noise) |
| Length | 20% | Subject/predicate length (min 3 chars each) |
| Structure | 20% | Bonus for structured data (JSON, URLs, numbers) |
| Trust Pattern | 20% | Penalty for untrusted + high confidence |
Threshold: `score < 0.4` triggers quarantine.
### Similarity Detection
MinHash + LSH parameters:
- **MinHash k=128**: Hash functions for signature
- **LSH 16 bands x 8 rows**: 99.96% recall at 0.9 Jaccard
- **Bloom filter**: Fast "definitely not duplicate" pre-check
- **Shingle size**: 3 characters (language-agnostic)
## HTTP API
### GET /v1/admin/quarantine
List pending quarantined assertions.
**Query Parameters:**
- `limit` (optional): Maximum events to return (default: 100)
- `include_reviewed` (optional): Include reviewed events (default: false)
**Response:**
```json
{
"quarantined": [
{
"hash": "abc123...",
"reason": "duplicate",
"reason_description": "Near-duplicate of existing assertion detected.",
"quality": {
"score": 0.35,
"entropy": 2.1,
"structured": false,
"duplicate": true
},
"timestamp": 1706918400000000000,
"reviewed": false,
"similar_to": "def456..."
}
],
"count": 1,
"pending_count": 1
}
```
### GET /v1/admin/quarantine/{hash}
Get a single quarantine event with assertion bytes.
**Response:**
```json
{
"event": {
"hash": "abc123...",
"assertion_bytes_hex": "...",
"assertion_bytes_base64": "...",
"reason": "low_quality",
"reason_description": "Content failed quality checks.",
"quality": { ... },
"timestamp": 1706918400000000000,
"reviewed": false
}
}
```
### POST /v1/admin/quarantine/{hash}/approve
Approve a quarantined assertion for indexing.
**Response:**
```json
{
"hash": "abc123...",
"message": "Assertion approved and ready for indexing",
"assertion_bytes_hex": "..."
}
```
### POST /v1/admin/quarantine/{hash}/reject
Reject a quarantined assertion permanently.
**Response:**
```json
{
"hash": "abc123...",
"message": "Assertion rejected"
}
```
## Implementation Details
### Core Types
**ContentQuality** (`stemedb-core/src/types/content_defense.rs`):
- `score`: Overall quality [0.0, 1.0]
- `entropy`: Shannon entropy (bits/char)
- `structured`: Has structured data
- `duplicate`: Is near-duplicate
**QuarantineReason** (`stemedb-core/src/types/content_defense.rs`):
- Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch
- Method: `description()` returns human-readable string
**QuarantineEvent** (`stemedb-core/src/types/content_defense.rs`):
- `hash`: BLAKE3 hash of assertion
- `assertion_bytes`: Original serialized assertion
- `reason`: Why quarantined
- `quality`: Quality metrics at quarantine time
- `reviewed`/`approved`: Admin review status
### Storage
**QuarantineStore** (`stemedb-storage/src/quarantine_store.rs`):
- Primary key: `QUAR:{timestamp}:{hash_hex}` (time-ordered scan)
- Index key: `QUAR_IDX:{hash_hex}` → timestamp (O(1) hash lookup)
- Methods: `write_quarantine()`, `get_quarantine()`, `list_pending()`, `approve()`, `reject()`
**SimilarityIndex** (`stemedb-storage/src/similarity_index/`):
- MinHash signature: `MH:{content_hash_hex}` → 1KB signature
- LSH bucket: `LSH:{band:02}:{bucket_hash_hex}` → member list
- Bloom filter: In-memory, rebuilt from `MH:` scan on startup
### Ingestion Integration
**ContentDefenseLayer** (`stemedb-ingest/src/content_defense.rs`):
- Orchestrates Bloom filter → LSH → Quality scoring
- Returns `QuarantineDecision::Pass` or `QuarantineDecision::Quarantine(reason)`
- Hooks into `process_record()` after signature verification
### Quality Scoring
**ContentQualityScorer** (`stemedb-storage/src/content_defense/quality.rs`):
- `score()` computes composite quality metric
- Configurable thresholds via `QualityScoringConfig`
- Default thresholds:
- Min subject length: 3
- Min predicate length: 3
- Min entropy: 1.5 bits/char
- Quality threshold: 0.4
## Flow Diagram
```
[Assertion arrives]
|
v
[Signature verification] ──── FAIL ────> [Reject]
|
PASS
|
v
[Bloom filter check] ──── "definitely not seen" ────> [Quality scoring]
| |
"maybe seen" |
| |
v |
[MinHash + LSH lookup] ────> [Jaccard >= 0.9?] |
| | |
| YES: Quarantine(Duplicate) |
| | |
NO | |
| | |
v <─────────────────────────+────────────────────────+
[Quality scoring]
|
v
[Score < 0.4?] ────> YES: Quarantine(LowQuality)
|
NO
|
v
[Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence)
|
NO
|
v
[Pass] ────> [Store, Index, Broadcast]
```
## Security Properties
- **Probabilistic Dedup**: Bloom filter + LSH have false positive/negative rates
- **No False Rejections**: Quarantine preserves data for admin review
- **Rebuild on Startup**: Bloom filter rebuilt from persisted MinHash signatures
- **O(1) Lookups**: LSH buckets and hash index enable constant-time checks
- **Separate from Trust**: Content defense is orthogonal to EigenTrust
## Admin Workflow
1. Agent submits assertion
2. Content defense flags it as duplicate
3. Assertion stored at `QUAR:{ts}:{hash}`, NOT indexed
4. Admin lists pending: `GET /v1/admin/quarantine`
5. Admin reviews: `GET /v1/admin/quarantine/{hash}` (includes bytes)
6. Admin approves: `POST .../approve` → returns bytes for indexing
7. Or admin rejects: `POST .../reject` → remains quarantined, logged
## Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `assertions_quarantined` | Counter | Total quarantined assertions |
| `assertions_approved` | Counter | Admin-approved assertions |
| `assertions_rejected` | Counter | Admin-rejected assertions |
| `content_defense_check_duration_seconds` | Histogram | Check latency |
| `similarity_index_size` | Gauge | Number of MinHash signatures |
## Future: Phase 7D (Circuit Breakers)
Phase 7D will build on this foundation:
- Per-agent circuit breakers for repeated bad behavior
- Automatic recovery with exponential backoff
- Integration with quarantine triggers