Content Defense (Phase 7): - Add SimilarityIndex with MinHash/LSH for near-duplicate detection - Add QuarantineStore for flagged assertions awaiting admin review - Add CircuitBreakerStore for per-agent circuit breaker state - Add ContentDefenseLayer for ingestion pipeline integration - Add API endpoints for quarantine and circuit breaker management - Add research module with gap detection and documentation fetching Code Structure Improvements: - Extract research CLI commands to research_commands.rs - Extract API routers to routers.rs module - Extract key_codec extraction functions to separate module - Extract test modules to separate files across multiple crates - All files now under 500 line limit per pre-commit hook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.8 KiB
Content Defense (The Shield)
Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph.
Overview
Content Defense provides three layers of protection:
- MinHash + LSH: Near-duplicate detection with O(1) average-case lookup
- Quality Scoring: Heuristic-based spam detection (entropy, length, structure)
- Quarantine Store: Suspicious assertions held for admin review
Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review.
Key Concepts
Quarantine Reasons
| Reason | Description | Trigger |
|---|---|---|
LowQuality |
Content failed quality checks | score < 0.4 |
Duplicate |
Near-duplicate detected | Jaccard >= 0.9 |
UntrustedHighConfidence |
Suspicious pattern | trust < 0.5 AND confidence > 0.8 |
PatternMatch |
Known spam pattern | Pattern match |
Quality Scoring
The quality score is computed from multiple signals:
| Component | Weight | Description |
|---|---|---|
| Entropy | 40% | Shannon entropy (low = repetitive/random noise) |
| Length | 20% | Subject/predicate length (min 3 chars each) |
| Structure | 20% | Bonus for structured data (JSON, URLs, numbers) |
| Trust Pattern | 20% | Penalty for untrusted + high confidence |
Threshold: score < 0.4 triggers quarantine.
Similarity Detection
MinHash + LSH parameters:
- MinHash k=128: Hash functions for signature
- LSH 16 bands x 8 rows: 99.96% recall at 0.9 Jaccard
- Bloom filter: Fast "definitely not duplicate" pre-check
- Shingle size: 3 characters (language-agnostic)
HTTP API
GET /v1/admin/quarantine
List pending quarantined assertions.
Query Parameters:
limit(optional): Maximum events to return (default: 100)include_reviewed(optional): Include reviewed events (default: false)
Response:
{
"quarantined": [
{
"hash": "abc123...",
"reason": "duplicate",
"reason_description": "Near-duplicate of existing assertion detected.",
"quality": {
"score": 0.35,
"entropy": 2.1,
"structured": false,
"duplicate": true
},
"timestamp": 1706918400000000000,
"reviewed": false,
"similar_to": "def456..."
}
],
"count": 1,
"pending_count": 1
}
GET /v1/admin/quarantine/{hash}
Get a single quarantine event with assertion bytes.
Response:
{
"event": {
"hash": "abc123...",
"assertion_bytes_hex": "...",
"assertion_bytes_base64": "...",
"reason": "low_quality",
"reason_description": "Content failed quality checks.",
"quality": { ... },
"timestamp": 1706918400000000000,
"reviewed": false
}
}
POST /v1/admin/quarantine/{hash}/approve
Approve a quarantined assertion for indexing.
Response:
{
"hash": "abc123...",
"message": "Assertion approved and ready for indexing",
"assertion_bytes_hex": "..."
}
POST /v1/admin/quarantine/{hash}/reject
Reject a quarantined assertion permanently.
Response:
{
"hash": "abc123...",
"message": "Assertion rejected"
}
Implementation Details
Core Types
ContentQuality (stemedb-core/src/types/content_defense.rs):
score: Overall quality [0.0, 1.0]entropy: Shannon entropy (bits/char)structured: Has structured dataduplicate: Is near-duplicate
QuarantineReason (stemedb-core/src/types/content_defense.rs):
- Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch
- Method:
description()returns human-readable string
QuarantineEvent (stemedb-core/src/types/content_defense.rs):
hash: BLAKE3 hash of assertionassertion_bytes: Original serialized assertionreason: Why quarantinedquality: Quality metrics at quarantine timereviewed/approved: Admin review status
Storage
QuarantineStore (stemedb-storage/src/quarantine_store.rs):
- Primary key:
QUAR:{timestamp}:{hash_hex}(time-ordered scan) - Index key:
QUAR_IDX:{hash_hex}→ timestamp (O(1) hash lookup) - Methods:
write_quarantine(),get_quarantine(),list_pending(),approve(),reject()
SimilarityIndex (stemedb-storage/src/similarity_index/):
- MinHash signature:
MH:{content_hash_hex}→ 1KB signature - LSH bucket:
LSH:{band:02}:{bucket_hash_hex}→ member list - Bloom filter: In-memory, rebuilt from
MH:scan on startup
Ingestion Integration
ContentDefenseLayer (stemedb-ingest/src/content_defense.rs):
- Orchestrates Bloom filter → LSH → Quality scoring
- Returns
QuarantineDecision::PassorQuarantineDecision::Quarantine(reason) - Hooks into
process_record()after signature verification
Quality Scoring
ContentQualityScorer (stemedb-storage/src/content_defense/quality.rs):
score()computes composite quality metric- Configurable thresholds via
QualityScoringConfig - Default thresholds:
- Min subject length: 3
- Min predicate length: 3
- Min entropy: 1.5 bits/char
- Quality threshold: 0.4
Flow Diagram
[Assertion arrives]
|
v
[Signature verification] ──── FAIL ────> [Reject]
|
PASS
|
v
[Bloom filter check] ──── "definitely not seen" ────> [Quality scoring]
| |
"maybe seen" |
| |
v |
[MinHash + LSH lookup] ────> [Jaccard >= 0.9?] |
| | |
| YES: Quarantine(Duplicate) |
| | |
NO | |
| | |
v <─────────────────────────+────────────────────────+
[Quality scoring]
|
v
[Score < 0.4?] ────> YES: Quarantine(LowQuality)
|
NO
|
v
[Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence)
|
NO
|
v
[Pass] ────> [Store, Index, Broadcast]
Security Properties
- Probabilistic Dedup: Bloom filter + LSH have false positive/negative rates
- No False Rejections: Quarantine preserves data for admin review
- Rebuild on Startup: Bloom filter rebuilt from persisted MinHash signatures
- O(1) Lookups: LSH buckets and hash index enable constant-time checks
- Separate from Trust: Content defense is orthogonal to EigenTrust
Admin Workflow
- Agent submits assertion
- Content defense flags it as duplicate
- Assertion stored at
QUAR:{ts}:{hash}, NOT indexed - Admin lists pending:
GET /v1/admin/quarantine - Admin reviews:
GET /v1/admin/quarantine/{hash}(includes bytes) - Admin approves:
POST .../approve→ returns bytes for indexing - Or admin rejects:
POST .../reject→ remains quarantined, logged
Metrics
| Metric | Type | Description |
|---|---|---|
assertions_quarantined |
Counter | Total quarantined assertions |
assertions_approved |
Counter | Admin-approved assertions |
assertions_rejected |
Counter | Admin-rejected assertions |
content_defense_check_duration_seconds |
Histogram | Check latency |
similarity_index_size |
Gauge | Number of MinHash signatures |
Future: Phase 7D (Circuit Breakers)
Phase 7D will build on this foundation:
- Per-agent circuit breakers for repeated bad behavior
- Automatic recovery with exponential backoff
- Integration with quarantine triggers