Content Defense (Phase 7): - Add SimilarityIndex with MinHash/LSH for near-duplicate detection - Add QuarantineStore for flagged assertions awaiting admin review - Add CircuitBreakerStore for per-agent circuit breaker state - Add ContentDefenseLayer for ingestion pipeline integration - Add API endpoints for quarantine and circuit breaker management - Add research module with gap detection and documentation fetching Code Structure Improvements: - Extract research CLI commands to research_commands.rs - Extract API routers to routers.rs module - Extract key_codec extraction functions to separate module - Extract test modules to separate files across multiple crates - All files now under 500 line limit per pre-commit hook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
249 lines
7.8 KiB
Markdown
249 lines
7.8 KiB
Markdown
# Content Defense (The Shield)
|
|
|
|
Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph.
|
|
|
|
## Overview
|
|
|
|
Content Defense provides three layers of protection:
|
|
1. **MinHash + LSH**: Near-duplicate detection with O(1) average-case lookup
|
|
2. **Quality Scoring**: Heuristic-based spam detection (entropy, length, structure)
|
|
3. **Quarantine Store**: Suspicious assertions held for admin review
|
|
|
|
Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review.
|
|
|
|
## Key Concepts
|
|
|
|
### Quarantine Reasons
|
|
|
|
| Reason | Description | Trigger |
|
|
|--------|-------------|---------|
|
|
| `LowQuality` | Content failed quality checks | score < 0.4 |
|
|
| `Duplicate` | Near-duplicate detected | Jaccard >= 0.9 |
|
|
| `UntrustedHighConfidence` | Suspicious pattern | trust < 0.5 AND confidence > 0.8 |
|
|
| `PatternMatch` | Known spam pattern | Pattern match |
|
|
|
|
### Quality Scoring
|
|
|
|
The quality score is computed from multiple signals:
|
|
|
|
| Component | Weight | Description |
|
|
|-----------|--------|-------------|
|
|
| Entropy | 40% | Shannon entropy (low = repetitive/random noise) |
|
|
| Length | 20% | Subject/predicate length (min 3 chars each) |
|
|
| Structure | 20% | Bonus for structured data (JSON, URLs, numbers) |
|
|
| Trust Pattern | 20% | Penalty for untrusted + high confidence |
|
|
|
|
Threshold: `score < 0.4` triggers quarantine.
|
|
|
|
### Similarity Detection
|
|
|
|
MinHash + LSH parameters:
|
|
- **MinHash k=128**: Hash functions for signature
|
|
- **LSH 16 bands x 8 rows**: 99.96% recall at 0.9 Jaccard
|
|
- **Bloom filter**: Fast "definitely not duplicate" pre-check
|
|
- **Shingle size**: 3 characters (language-agnostic)
|
|
|
|
## HTTP API
|
|
|
|
### GET /v1/admin/quarantine
|
|
|
|
List pending quarantined assertions.
|
|
|
|
**Query Parameters:**
|
|
- `limit` (optional): Maximum events to return (default: 100)
|
|
- `include_reviewed` (optional): Include reviewed events (default: false)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"quarantined": [
|
|
{
|
|
"hash": "abc123...",
|
|
"reason": "duplicate",
|
|
"reason_description": "Near-duplicate of existing assertion detected.",
|
|
"quality": {
|
|
"score": 0.35,
|
|
"entropy": 2.1,
|
|
"structured": false,
|
|
"duplicate": true
|
|
},
|
|
"timestamp": 1706918400000000000,
|
|
"reviewed": false,
|
|
"similar_to": "def456..."
|
|
}
|
|
],
|
|
"count": 1,
|
|
"pending_count": 1
|
|
}
|
|
```
|
|
|
|
### GET /v1/admin/quarantine/{hash}
|
|
|
|
Get a single quarantine event with assertion bytes.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"event": {
|
|
"hash": "abc123...",
|
|
"assertion_bytes_hex": "...",
|
|
"assertion_bytes_base64": "...",
|
|
"reason": "low_quality",
|
|
"reason_description": "Content failed quality checks.",
|
|
"quality": { ... },
|
|
"timestamp": 1706918400000000000,
|
|
"reviewed": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### POST /v1/admin/quarantine/{hash}/approve
|
|
|
|
Approve a quarantined assertion for indexing.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"hash": "abc123...",
|
|
"message": "Assertion approved and ready for indexing",
|
|
"assertion_bytes_hex": "..."
|
|
}
|
|
```
|
|
|
|
### POST /v1/admin/quarantine/{hash}/reject
|
|
|
|
Reject a quarantined assertion permanently.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"hash": "abc123...",
|
|
"message": "Assertion rejected"
|
|
}
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### Core Types
|
|
|
|
**ContentQuality** (`stemedb-core/src/types/content_defense.rs`):
|
|
- `score`: Overall quality [0.0, 1.0]
|
|
- `entropy`: Shannon entropy (bits/char)
|
|
- `structured`: Has structured data
|
|
- `duplicate`: Is near-duplicate
|
|
|
|
**QuarantineReason** (`stemedb-core/src/types/content_defense.rs`):
|
|
- Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch
|
|
- Method: `description()` returns human-readable string
|
|
|
|
**QuarantineEvent** (`stemedb-core/src/types/content_defense.rs`):
|
|
- `hash`: BLAKE3 hash of assertion
|
|
- `assertion_bytes`: Original serialized assertion
|
|
- `reason`: Why quarantined
|
|
- `quality`: Quality metrics at quarantine time
|
|
- `reviewed`/`approved`: Admin review status
|
|
|
|
### Storage
|
|
|
|
**QuarantineStore** (`stemedb-storage/src/quarantine_store.rs`):
|
|
- Primary key: `QUAR:{timestamp}:{hash_hex}` (time-ordered scan)
|
|
- Index key: `QUAR_IDX:{hash_hex}` → timestamp (O(1) hash lookup)
|
|
- Methods: `write_quarantine()`, `get_quarantine()`, `list_pending()`, `approve()`, `reject()`
|
|
|
|
**SimilarityIndex** (`stemedb-storage/src/similarity_index/`):
|
|
- MinHash signature: `MH:{content_hash_hex}` → 1KB signature
|
|
- LSH bucket: `LSH:{band:02}:{bucket_hash_hex}` → member list
|
|
- Bloom filter: In-memory, rebuilt from `MH:` scan on startup
|
|
|
|
### Ingestion Integration
|
|
|
|
**ContentDefenseLayer** (`stemedb-ingest/src/content_defense.rs`):
|
|
- Orchestrates Bloom filter → LSH → Quality scoring
|
|
- Returns `QuarantineDecision::Pass` or `QuarantineDecision::Quarantine(reason)`
|
|
- Hooks into `process_record()` after signature verification
|
|
|
|
### Quality Scoring
|
|
|
|
**ContentQualityScorer** (`stemedb-storage/src/content_defense/quality.rs`):
|
|
- `score()` computes composite quality metric
|
|
- Configurable thresholds via `QualityScoringConfig`
|
|
- Default thresholds:
|
|
- Min subject length: 3
|
|
- Min predicate length: 3
|
|
- Min entropy: 1.5 bits/char
|
|
- Quality threshold: 0.4
|
|
|
|
## Flow Diagram
|
|
|
|
```
|
|
[Assertion arrives]
|
|
|
|
|
v
|
|
[Signature verification] ──── FAIL ────> [Reject]
|
|
|
|
|
PASS
|
|
|
|
|
v
|
|
[Bloom filter check] ──── "definitely not seen" ────> [Quality scoring]
|
|
| |
|
|
"maybe seen" |
|
|
| |
|
|
v |
|
|
[MinHash + LSH lookup] ────> [Jaccard >= 0.9?] |
|
|
| | |
|
|
| YES: Quarantine(Duplicate) |
|
|
| | |
|
|
NO | |
|
|
| | |
|
|
v <─────────────────────────+────────────────────────+
|
|
[Quality scoring]
|
|
|
|
|
v
|
|
[Score < 0.4?] ────> YES: Quarantine(LowQuality)
|
|
|
|
|
NO
|
|
|
|
|
v
|
|
[Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence)
|
|
|
|
|
NO
|
|
|
|
|
v
|
|
[Pass] ────> [Store, Index, Broadcast]
|
|
```
|
|
|
|
## Security Properties
|
|
|
|
- **Probabilistic Dedup**: Bloom filter + LSH have false positive/negative rates
|
|
- **No False Rejections**: Quarantine preserves data for admin review
|
|
- **Rebuild on Startup**: Bloom filter rebuilt from persisted MinHash signatures
|
|
- **O(1) Lookups**: LSH buckets and hash index enable constant-time checks
|
|
- **Separate from Trust**: Content defense is orthogonal to EigenTrust
|
|
|
|
## Admin Workflow
|
|
|
|
1. Agent submits assertion
|
|
2. Content defense flags it as duplicate
|
|
3. Assertion stored at `QUAR:{ts}:{hash}`, NOT indexed
|
|
4. Admin lists pending: `GET /v1/admin/quarantine`
|
|
5. Admin reviews: `GET /v1/admin/quarantine/{hash}` (includes bytes)
|
|
6. Admin approves: `POST .../approve` → returns bytes for indexing
|
|
7. Or admin rejects: `POST .../reject` → remains quarantined, logged
|
|
|
|
## Metrics
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `assertions_quarantined` | Counter | Total quarantined assertions |
|
|
| `assertions_approved` | Counter | Admin-approved assertions |
|
|
| `assertions_rejected` | Counter | Admin-rejected assertions |
|
|
| `content_defense_check_duration_seconds` | Histogram | Check latency |
|
|
| `similarity_index_size` | Gauge | Number of MinHash signatures |
|
|
|
|
## Future: Phase 7D (Circuit Breakers)
|
|
|
|
Phase 7D will build on this foundation:
|
|
- Per-agent circuit breakers for repeated bad behavior
|
|
- Automatic recovery with exponential backoff
|
|
- Integration with quarantine triggers
|