# Content Defense (The Shield) Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph. ## Overview Content Defense provides three layers of protection: 1. **MinHash + LSH**: Near-duplicate detection with O(1) average-case lookup 2. **Quality Scoring**: Heuristic-based spam detection (entropy, length, structure) 3. **Quarantine Store**: Suspicious assertions held for admin review Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review. ## Key Concepts ### Quarantine Reasons | Reason | Description | Trigger | |--------|-------------|---------| | `LowQuality` | Content failed quality checks | score < 0.4 | | `Duplicate` | Near-duplicate detected | Jaccard >= 0.9 | | `UntrustedHighConfidence` | Suspicious pattern | trust < 0.5 AND confidence > 0.8 | | `PatternMatch` | Known spam pattern | Pattern match | ### Quality Scoring The quality score is computed from multiple signals: | Component | Weight | Description | |-----------|--------|-------------| | Entropy | 40% | Shannon entropy (low = repetitive/random noise) | | Length | 20% | Subject/predicate length (min 3 chars each) | | Structure | 20% | Bonus for structured data (JSON, URLs, numbers) | | Trust Pattern | 20% | Penalty for untrusted + high confidence | Threshold: `score < 0.4` triggers quarantine. ### Similarity Detection MinHash + LSH parameters: - **MinHash k=128**: Hash functions for signature - **LSH 16 bands x 8 rows**: 99.96% recall at 0.9 Jaccard - **Bloom filter**: Fast "definitely not duplicate" pre-check - **Shingle size**: 3 characters (language-agnostic) ## HTTP API ### GET /v1/admin/quarantine List pending quarantined assertions. **Query Parameters:** - `limit` (optional): Maximum events to return (default: 100) - `include_reviewed` (optional): Include reviewed events (default: false) **Response:** ```json { "quarantined": [ { "hash": "abc123...", "reason": "duplicate", "reason_description": "Near-duplicate of existing assertion detected.", "quality": { "score": 0.35, "entropy": 2.1, "structured": false, "duplicate": true }, "timestamp": 1706918400000000000, "reviewed": false, "similar_to": "def456..." } ], "count": 1, "pending_count": 1 } ``` ### GET /v1/admin/quarantine/{hash} Get a single quarantine event with assertion bytes. **Response:** ```json { "event": { "hash": "abc123...", "assertion_bytes_hex": "...", "assertion_bytes_base64": "...", "reason": "low_quality", "reason_description": "Content failed quality checks.", "quality": { ... }, "timestamp": 1706918400000000000, "reviewed": false } } ``` ### POST /v1/admin/quarantine/{hash}/approve Approve a quarantined assertion for indexing. **Response:** ```json { "hash": "abc123...", "message": "Assertion approved and ready for indexing", "assertion_bytes_hex": "..." } ``` ### POST /v1/admin/quarantine/{hash}/reject Reject a quarantined assertion permanently. **Response:** ```json { "hash": "abc123...", "message": "Assertion rejected" } ``` ## Implementation Details ### Core Types **ContentQuality** (`stemedb-core/src/types/content_defense.rs`): - `score`: Overall quality [0.0, 1.0] - `entropy`: Shannon entropy (bits/char) - `structured`: Has structured data - `duplicate`: Is near-duplicate **QuarantineReason** (`stemedb-core/src/types/content_defense.rs`): - Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch - Method: `description()` returns human-readable string **QuarantineEvent** (`stemedb-core/src/types/content_defense.rs`): - `hash`: BLAKE3 hash of assertion - `assertion_bytes`: Original serialized assertion - `reason`: Why quarantined - `quality`: Quality metrics at quarantine time - `reviewed`/`approved`: Admin review status ### Storage **QuarantineStore** (`stemedb-storage/src/quarantine_store.rs`): - Primary key: `QUAR:{timestamp}:{hash_hex}` (time-ordered scan) - Index key: `QUAR_IDX:{hash_hex}` → timestamp (O(1) hash lookup) - Methods: `write_quarantine()`, `get_quarantine()`, `list_pending()`, `approve()`, `reject()` **SimilarityIndex** (`stemedb-storage/src/similarity_index/`): - MinHash signature: `MH:{content_hash_hex}` → 1KB signature - LSH bucket: `LSH:{band:02}:{bucket_hash_hex}` → member list - Bloom filter: In-memory, rebuilt from `MH:` scan on startup ### Ingestion Integration **ContentDefenseLayer** (`stemedb-ingest/src/content_defense.rs`): - Orchestrates Bloom filter → LSH → Quality scoring - Returns `QuarantineDecision::Pass` or `QuarantineDecision::Quarantine(reason)` - Hooks into `process_record()` after signature verification ### Quality Scoring **ContentQualityScorer** (`stemedb-storage/src/content_defense/quality.rs`): - `score()` computes composite quality metric - Configurable thresholds via `QualityScoringConfig` - Default thresholds: - Min subject length: 3 - Min predicate length: 3 - Min entropy: 1.5 bits/char - Quality threshold: 0.4 ## Flow Diagram ``` [Assertion arrives] | v [Signature verification] ──── FAIL ────> [Reject] | PASS | v [Bloom filter check] ──── "definitely not seen" ────> [Quality scoring] | | "maybe seen" | | | v | [MinHash + LSH lookup] ────> [Jaccard >= 0.9?] | | | | | YES: Quarantine(Duplicate) | | | | NO | | | | | v <─────────────────────────+────────────────────────+ [Quality scoring] | v [Score < 0.4?] ────> YES: Quarantine(LowQuality) | NO | v [Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence) | NO | v [Pass] ────> [Store, Index, Broadcast] ``` ## Security Properties - **Probabilistic Dedup**: Bloom filter + LSH have false positive/negative rates - **No False Rejections**: Quarantine preserves data for admin review - **Rebuild on Startup**: Bloom filter rebuilt from persisted MinHash signatures - **O(1) Lookups**: LSH buckets and hash index enable constant-time checks - **Separate from Trust**: Content defense is orthogonal to EigenTrust ## Admin Workflow 1. Agent submits assertion 2. Content defense flags it as duplicate 3. Assertion stored at `QUAR:{ts}:{hash}`, NOT indexed 4. Admin lists pending: `GET /v1/admin/quarantine` 5. Admin reviews: `GET /v1/admin/quarantine/{hash}` (includes bytes) 6. Admin approves: `POST .../approve` → returns bytes for indexing 7. Or admin rejects: `POST .../reject` → remains quarantined, logged ## Metrics | Metric | Type | Description | |--------|------|-------------| | `assertions_quarantined` | Counter | Total quarantined assertions | | `assertions_approved` | Counter | Admin-approved assertions | | `assertions_rejected` | Counter | Admin-rejected assertions | | `content_defense_check_duration_seconds` | Histogram | Check latency | | `similarity_index_size` | Gauge | Number of MinHash signatures | ## Future: Phase 7D (Circuit Breakers) Phase 7D will build on this foundation: - Per-agent circuit breakers for repeated bad behavior - Automatic recovery with exponential backoff - Integration with quarantine triggers