stemedb/ai-lookup/features/content-defense.md
jordan a734be3a0d feat: Phase 7 Content Defense + code structure refactoring
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching

Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:44:05 -07:00

7.8 KiB

Content Defense (The Shield)

Phase 7C introduces content defense mechanisms to detect spam, near-duplicates, and suspicious assertions before they enter the knowledge graph.

Overview

Content Defense provides three layers of protection:

  1. MinHash + LSH: Near-duplicate detection with O(1) average-case lookup
  2. Quality Scoring: Heuristic-based spam detection (entropy, length, structure)
  3. Quarantine Store: Suspicious assertions held for admin review

Assertions that fail these checks are quarantined rather than indexed, keeping the knowledge graph clean while preserving the data for manual review.

Key Concepts

Quarantine Reasons

Reason Description Trigger
LowQuality Content failed quality checks score < 0.4
Duplicate Near-duplicate detected Jaccard >= 0.9
UntrustedHighConfidence Suspicious pattern trust < 0.5 AND confidence > 0.8
PatternMatch Known spam pattern Pattern match

Quality Scoring

The quality score is computed from multiple signals:

Component Weight Description
Entropy 40% Shannon entropy (low = repetitive/random noise)
Length 20% Subject/predicate length (min 3 chars each)
Structure 20% Bonus for structured data (JSON, URLs, numbers)
Trust Pattern 20% Penalty for untrusted + high confidence

Threshold: score < 0.4 triggers quarantine.

Similarity Detection

MinHash + LSH parameters:

  • MinHash k=128: Hash functions for signature
  • LSH 16 bands x 8 rows: 99.96% recall at 0.9 Jaccard
  • Bloom filter: Fast "definitely not duplicate" pre-check
  • Shingle size: 3 characters (language-agnostic)

HTTP API

GET /v1/admin/quarantine

List pending quarantined assertions.

Query Parameters:

  • limit (optional): Maximum events to return (default: 100)
  • include_reviewed (optional): Include reviewed events (default: false)

Response:

{
  "quarantined": [
    {
      "hash": "abc123...",
      "reason": "duplicate",
      "reason_description": "Near-duplicate of existing assertion detected.",
      "quality": {
        "score": 0.35,
        "entropy": 2.1,
        "structured": false,
        "duplicate": true
      },
      "timestamp": 1706918400000000000,
      "reviewed": false,
      "similar_to": "def456..."
    }
  ],
  "count": 1,
  "pending_count": 1
}

GET /v1/admin/quarantine/{hash}

Get a single quarantine event with assertion bytes.

Response:

{
  "event": {
    "hash": "abc123...",
    "assertion_bytes_hex": "...",
    "assertion_bytes_base64": "...",
    "reason": "low_quality",
    "reason_description": "Content failed quality checks.",
    "quality": { ... },
    "timestamp": 1706918400000000000,
    "reviewed": false
  }
}

POST /v1/admin/quarantine/{hash}/approve

Approve a quarantined assertion for indexing.

Response:

{
  "hash": "abc123...",
  "message": "Assertion approved and ready for indexing",
  "assertion_bytes_hex": "..."
}

POST /v1/admin/quarantine/{hash}/reject

Reject a quarantined assertion permanently.

Response:

{
  "hash": "abc123...",
  "message": "Assertion rejected"
}

Implementation Details

Core Types

ContentQuality (stemedb-core/src/types/content_defense.rs):

  • score: Overall quality [0.0, 1.0]
  • entropy: Shannon entropy (bits/char)
  • structured: Has structured data
  • duplicate: Is near-duplicate

QuarantineReason (stemedb-core/src/types/content_defense.rs):

  • Enum: LowQuality, Duplicate, UntrustedHighConfidence, PatternMatch
  • Method: description() returns human-readable string

QuarantineEvent (stemedb-core/src/types/content_defense.rs):

  • hash: BLAKE3 hash of assertion
  • assertion_bytes: Original serialized assertion
  • reason: Why quarantined
  • quality: Quality metrics at quarantine time
  • reviewed/approved: Admin review status

Storage

QuarantineStore (stemedb-storage/src/quarantine_store.rs):

  • Primary key: QUAR:{timestamp}:{hash_hex} (time-ordered scan)
  • Index key: QUAR_IDX:{hash_hex} → timestamp (O(1) hash lookup)
  • Methods: write_quarantine(), get_quarantine(), list_pending(), approve(), reject()

SimilarityIndex (stemedb-storage/src/similarity_index/):

  • MinHash signature: MH:{content_hash_hex} → 1KB signature
  • LSH bucket: LSH:{band:02}:{bucket_hash_hex} → member list
  • Bloom filter: In-memory, rebuilt from MH: scan on startup

Ingestion Integration

ContentDefenseLayer (stemedb-ingest/src/content_defense.rs):

  • Orchestrates Bloom filter → LSH → Quality scoring
  • Returns QuarantineDecision::Pass or QuarantineDecision::Quarantine(reason)
  • Hooks into process_record() after signature verification

Quality Scoring

ContentQualityScorer (stemedb-storage/src/content_defense/quality.rs):

  • score() computes composite quality metric
  • Configurable thresholds via QualityScoringConfig
  • Default thresholds:
    • Min subject length: 3
    • Min predicate length: 3
    • Min entropy: 1.5 bits/char
    • Quality threshold: 0.4

Flow Diagram

[Assertion arrives]
        |
        v
[Signature verification] ──── FAIL ────> [Reject]
        |
        PASS
        |
        v
[Bloom filter check] ──── "definitely not seen" ────> [Quality scoring]
        |                                                    |
        "maybe seen"                                         |
        |                                                    |
        v                                                    |
[MinHash + LSH lookup] ────> [Jaccard >= 0.9?]              |
        |                           |                        |
        |                      YES: Quarantine(Duplicate)    |
        |                           |                        |
        NO                          |                        |
        |                           |                        |
        v <─────────────────────────+────────────────────────+
[Quality scoring]
        |
        v
[Score < 0.4?] ────> YES: Quarantine(LowQuality)
        |
        NO
        |
        v
[Untrusted + confidence > 0.8?] ────> YES: Quarantine(UntrustedHighConfidence)
        |
        NO
        |
        v
[Pass] ────> [Store, Index, Broadcast]

Security Properties

  • Probabilistic Dedup: Bloom filter + LSH have false positive/negative rates
  • No False Rejections: Quarantine preserves data for admin review
  • Rebuild on Startup: Bloom filter rebuilt from persisted MinHash signatures
  • O(1) Lookups: LSH buckets and hash index enable constant-time checks
  • Separate from Trust: Content defense is orthogonal to EigenTrust

Admin Workflow

  1. Agent submits assertion
  2. Content defense flags it as duplicate
  3. Assertion stored at QUAR:{ts}:{hash}, NOT indexed
  4. Admin lists pending: GET /v1/admin/quarantine
  5. Admin reviews: GET /v1/admin/quarantine/{hash} (includes bytes)
  6. Admin approves: POST .../approve → returns bytes for indexing
  7. Or admin rejects: POST .../reject → remains quarantined, logged

Metrics

Metric Type Description
assertions_quarantined Counter Total quarantined assertions
assertions_approved Counter Admin-approved assertions
assertions_rejected Counter Admin-rejected assertions
content_defense_check_duration_seconds Histogram Check latency
similarity_index_size Gauge Number of MinHash signatures

Future: Phase 7D (Circuit Breakers)

Phase 7D will build on this foundation:

  • Per-agent circuit breakers for repeated bad behavior
  • Automatic recovery with exponential backoff
  • Integration with quarantine triggers