stemedb/ai-lookup/features/content-defense.md

# Content Defense (The Shield)

Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph.

## Overview

Content Defense provides three layers of protection:
1. **MinHash + LSH**: Near-duplicate detection with O(1) average-case lookup
2. **Quality Scoring**: Heuristic-based spam detection (entropy, length, structure)
3. **Quarantine Store**: Suspicious assertions held for admin review

Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review.

## Key Concepts

### Quarantine Reasons

| Reason | Description | Trigger |
|--------|-------------|---------|
| `LowQuality` | Content failed quality checks | score < 0.4 |
| `Duplicate` | Near-duplicate detected | Jaccard >= 0.9 |
| `UntrustedHighConfidence` | Suspicious pattern | trust < 0.5 AND confidence > 0.8 |
| `PatternMatch` | Known spam pattern | Pattern match |

### Quality Scoring

The quality score is computed from multiple signals:

| Component | Weight | Description |
|-----------|--------|-------------|
| Entropy | 40% | Shannon entropy (low = repetitive/random noise) |
| Length | 20% | Subject/predicate length (min 3 chars each) |
| Structure | 20% | Bonus for structured data (JSON, URLs, numbers) |
| Trust Pattern | 20% | Penalty for untrusted + high confidence |

Threshold: `score < 0.4` triggers quarantine.

### Similarity Detection

MinHash + LSH parameters:
- **MinHash k=128**: Hash functions for signature
- **LSH 16 bands x 8 rows**: 99.96% recall at 0.9 Jaccard
- **Bloom filter**: Fast "definitely not duplicate" pre-check
- **Shingle size**: 3 characters (language-agnostic)

## HTTP API

### GET /v1/admin/quarantine

List pending quarantined assertions.

**Query Parameters:**
- `limit` (optional): Maximum events to return (default: 100)
- `include_reviewed` (optional): Include reviewed events (default: false)

**Response:**
```json
{
  "quarantined": [
    {
      "hash": "abc123...",
      "reason": "duplicate",
      "reason_description": "Near-duplicate of existing assertion detected.",
      "quality": {
        "score": 0.35,
        "entropy": 2.1,
        "structured": false,
        "duplicate": true
      },
      "timestamp": 1706918400000000000,
      "reviewed": false,
      "similar_to": "def456..."
    }
  ],
  "count": 1,
  "pending_count": 1
}
```

### GET /v1/admin/quarantine/{hash}

Get a single quarantine event with assertion bytes.

**Response:**
```json
{
  "event": {
    "hash": "abc123...",
    "assertion_bytes_hex": "...",
    "assertion_bytes_base64": "...",
    "reason": "low_quality",
    "reason_description": "Content failed quality checks.",
    "quality": { ... },
    "timestamp": 1706918400000000000,
    "reviewed": false
  }
}
```

### POST /v1/admin/quarantine/{hash}/approve

Approve a quarantined assertion for indexing.

**Response:**
```json
{
  "hash": "abc123...",
  "message": "Assertion approved and ready for indexing",
  "assertion_bytes_hex": "..."
}
```

### POST /v1/admin/quarantine/{hash}/reject

Reject a quarantined assertion permanently.

**Response:**
```json
{
  "hash": "abc123...",
  "message": "Assertion rejected"
}
```

## Implementation Details

### Core Types

**ContentQuality** (`stemedb-core/src/types/content_defense.rs`):
- `score`: Overall quality [0.0, 1.0]
- `entropy`: Shannon entropy (bits/char)
- `structured`: Has structured data
- `duplicate`: Is near-duplicate

**QuarantineReason** (`stemedb-core/src/types/content_defense.rs`):
- Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch
- Method: `description()` returns human-readable string

**QuarantineEvent** (`stemedb-core/src/types/content_defense.rs`):
- `hash`: BLAKE3 hash of assertion
- `assertion_bytes`: Original serialized assertion
- `reason`: Why quarantined
- `quality`: Quality metrics at quarantine time
- `reviewed`/`approved`: Admin review status

### Storage

**QuarantineStore** (`stemedb-storage/src/quarantine_store.rs`):
- Primary key: `QUAR:{timestamp}:{hash_hex}` (time-ordered scan)
- Index key: `QUAR_IDX:{hash_hex}` → timestamp (O(1) hash lookup)
- Methods: `write_quarantine()`, `get_quarantine()`, `list_pending()`, `approve()`, `reject()`

**SimilarityIndex** (`stemedb-storage/src/similarity_index/`):
- MinHash signature: `MH:{content_hash_hex}` → 1KB signature
- LSH bucket: `LSH:{band:02}:{bucket_hash_hex}` → member list
- Bloom filter: In-memory, rebuilt from `MH:` scan on startup

### Ingestion Integration

**ContentDefenseLayer** (`stemedb-ingest/src/content_defense.rs`):
- Orchestrates Bloom filter → LSH → Quality scoring
- Returns `QuarantineDecision::Pass` or `QuarantineDecision::Quarantine(reason)`
- Hooks into `process_record()` after signature verification

### Quality Scoring

**ContentQualityScorer** (`stemedb-storage/src/content_defense/quality.rs`):
- `score()` computes composite quality metric
- Configurable thresholds via `QualityScoringConfig`
- Default thresholds:
  - Min subject length: 3
  - Min predicate length: 3
  - Min entropy: 1.5 bits/char
  - Quality threshold: 0.4

## Flow Diagram

```
[Assertion arrives]
        |
        v
[Signature verification] ──── FAIL ────> [Reject]
        |
        PASS
        |
        v
[Bloom filter check] ──── "definitely not seen" ────> [Quality scoring]
        |                                                    |
        "maybe seen"                                         |
        |                                                    |
        v                                                    |
[MinHash + LSH lookup] ────> [Jaccard >= 0.9?]              |
        |                           |                        |
        |                      YES: Quarantine(Duplicate)    |
        |                           |                        |
        NO                          |                        |
        |                           |                        |
        v <─────────────────────────+────────────────────────+
[Quality scoring]
        |
        v
[Score < 0.4?] ────> YES: Quarantine(LowQuality)
        |
        NO
        |
        v
[Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence)
        |
        NO
        |
        v
[Pass] ────> [Store, Index, Broadcast]
```

## Security Properties

- **Probabilistic Dedup**: Bloom filter + LSH have false positive/negative rates
- **No False Rejections**: Quarantine preserves data for admin review
- **Rebuild on Startup**: Bloom filter rebuilt from persisted MinHash signatures
- **O(1) Lookups**: LSH buckets and hash index enable constant-time checks
- **Separate from Trust**: Content defense is orthogonal to EigenTrust

## Admin Workflow

1. Agent submits assertion
2. Content defense flags it as duplicate
3. Assertion stored at `QUAR:{ts}:{hash}`, NOT indexed
4. Admin lists pending: `GET /v1/admin/quarantine`
5. Admin reviews: `GET /v1/admin/quarantine/{hash}` (includes bytes)
6. Admin approves: `POST .../approve` → returns bytes for indexing
7. Or admin rejects: `POST .../reject` → remains quarantined, logged

## Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `assertions_quarantined` | Counter | Total quarantined assertions |
| `assertions_approved` | Counter | Admin-approved assertions |
| `assertions_rejected` | Counter | Admin-rejected assertions |
| `content_defense_check_duration_seconds` | Histogram | Check latency |
| `similarity_index_size` | Gauge | Number of MinHash signatures |

## Future: Phase 7D (Circuit Breakers)

Phase 7D will build on this foundation:
- Per-agent circuit breakers for repeated bad behavior
- Automatic recovery with exponential backoff
- Integration with quarantine triggers