jordan a734be3a0d feat: Phase 7 Content Defense + code structure refactoring

Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching

Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-03 12:44:05 -07:00

6.8 KiB

Raw Blame History

Phase 7 UAT: The Shield

Status: Ready for Testing Target Date: 2026-02-03 Confidence: High (7A, 7B complete; 7C core complete)

Summary

Phase 7 (The Shield) defends against spam, Sybil attacks, and knowledge poisoning. This UAT validates the trust-at-scale infrastructure for opening Episteme to millions of agents.

Scope:

7A Admission Control: PoW-based spam protection, trust tiers, graduated quotas
7B EigenTrust: Sybil-resistant global trust propagation
7C Content Defense: Quality scoring, quarantine store, admin API (partial - MinHash/LSH pending)
7D Circuit Breakers: NOT included (pending implementation)

Test Coverage (Verified)

Area	Tests	Status
Trust Graph Store	23	PASS
Trust Rank Store	22	PASS
Domain Trust Store	18	PASS
Admission Store	16	PASS
PoW types	19	PASS
Content Defense (quality)	13	PASS
Quarantine Store	9	PASS
Trust Tier types	8	PASS
API Admission integration	6	PASS
Content Defense Layer	5	PASS
Total Phase 7	139	ALL PASS

Realistic Usage Scenarios

Scenario 1: New Agent Onboarding

Goal: Verify graduated difficulty protects against spam bots while not blocking legitimate agents.

# 1. New agent with no history should require PoW
curl -X GET http://localhost:3000/v1/admission/status \
  -H "X-Agent-Id: 0000000000000000000000000000000000000000000000000000000000000001"
# Expected: 200 with pow_required: true, difficulty: 16

# 2. Submit first assertions with PoW proof
# Agent must solve: BLAKE3(nonce || agent_id || timestamp) has 16 leading zero bits
# This takes ~16 seconds on average

# 3. After 10 assertions, difficulty drops to 1 bit (trivial)
# 4. After 50 assertions OR trust > 0.6, PoW exempt

Acceptance Criteria:

New agents see pow_required: true, difficulty: 16
HTTP 428 returned when PoW missing/invalid
Difficulty graduates: 16 bits (1-10) → 1 bit (11-50) → 0 (51+)
Trusted agents (>0.6) are exempt regardless of assertion count

Scenario 2: Trust Tier Quotas

Goal: Verify rate limiting scales with trust level.

Tier	Trust Range	Quota Multiplier	Hourly Limit
Untrusted	0.0-0.3	0.1x	1,000/hr
Limited	0.3-0.5	0.5x	5,000/hr
Verified	0.5-0.7	1.0x	10,000/hr
Trusted	0.7-0.9	2.0x	20,000/hr
Authority	0.9-1.0	10.0x	100,000/hr

Acceptance Criteria:

Quota headers present in responses (X-RateLimit-*)
Untrusted agents limited to 0.1x base quota
Authority agents get 10x quota
HTTP 429 returned when quota exceeded

Scenario 3: EigenTrust Sybil Resistance

Goal: Verify isolated trust rings get near-zero global trust.

Legitimate Network:          Sybil Ring:
   Seed ─────> A              X ──> Y
     │         │              │     │
     v         v              v     v
     B ──────> C              Z <── W

Acceptance Criteria:

Seed-connected agents (A, B, C) accumulate positive global trust
Isolated ring (X, Y, Z, W) converges to near-zero trust
Power iteration converges in <100 iterations (ε = 1e-4)
Domain-specific trust factors applied correctly

Scenario 4: Content Quality Filtering

Goal: Verify spam/noise detection without blocking legitimate content.

Content Type	Expected Quality	Should Quarantine?
Normal assertion: "Aspirin:treats:Headache"	>0.6	No
Low entropy: "aaaa:bbbb:cccc"	<0.4	Yes
Structured data with JSON	>0.7 (bonus)	No
Untrusted agent + high confidence	<0.5 (penalty)	Yes

Acceptance Criteria:

Shannon entropy check flags random noise (< 1.5 bits/char)
Minimum subject/predicate length enforced (default 3 chars)
Structured data (JSON, URLs, dates) gets +0.1 bonus
Untrusted + high confidence gets -0.5 penalty
Quality < 0.4 triggers quarantine

Scenario 5: Quarantine Admin Workflow

Goal: Verify suspicious content can be reviewed and processed.

# 1. List pending quarantine events
curl http://localhost:3000/v1/admin/quarantine?limit=20

# 2. Review specific event
curl http://localhost:3000/v1/admin/quarantine/{hash}

# 3. Approve or reject
curl -X POST http://localhost:3000/v1/admin/quarantine/{hash}/approve
curl -X POST http://localhost:3000/v1/admin/quarantine/{hash}/reject

Acceptance Criteria:

GET /v1/admin/quarantine lists pending events with reasons
GET /v1/admin/quarantine/{hash} returns full assertion bytes
POST .../approve moves assertion to main index
POST .../reject marks as reviewed but keeps quarantined
Quarantine reasons clearly indicate why flagged

Integration Points to Verify

Ingestion Pipeline Integration
- Content defense layer called before indexing
- Quarantine bypasses normal index path
- Bloom filter restored on restart
Trust Store Interplay
- EigenTrust feeds into TrustTier calculation
- Domain trust factors into Authority lens weights
- Trust decay applies to computed scores
API Middleware Chain
- AdmissionLayer checks PoW before rate limiting
- MeterLayer applies tier-based quotas
- Headers reflect current trust state

Known Limitations

7C Incomplete: MinHash/LSH bucketing not implemented
- Duplicate detection uses Bloom filter only (no near-duplicate)
- Jaccard similarity threshold (0.9) not yet enforced
7D Not Started: Circuit breakers pending
- No automatic agent banning
- No half-open recovery states
Performance Untested:
- EigenTrust computation on large graphs (>10k agents)
- Bloom filter memory at scale
- Quarantine store scan performance

Commands to Run

# Full test suite
cargo test --workspace

# Phase 7 specific crates
cargo test -p stemedb-storage -- trust_graph
cargo test -p stemedb-storage -- domain_trust
cargo test -p stemedb-storage -- admission
cargo test -p stemedb-storage -- quarantine
cargo test -p stemedb-storage -- content_defense
cargo test -p stemedb-ingest -- content_defense
cargo test -p stemedb-api --test admission_integration
cargo test -p stemedb-core -- trust_tier
cargo test -p stemedb-core -- pow

# Clippy must pass
cargo clippy --workspace -- -D warnings

# Go SDK examples
cd sdk/go && go test ./...

Success Criteria

Phase 7 UAT passes when:

All ~139 Phase 7 tests pass
All 5 usage scenarios verified manually
Clippy clean with no warnings
Go SDK examples pass
API endpoints return correct responses
Quarantine workflow complete end-to-end

6.8 KiB Raw Blame History

Phase 7 UAT: The Shield

Summary

Test Coverage (Verified)

Realistic Usage Scenarios

Scenario 1: New Agent Onboarding

Scenario 2: Trust Tier Quotas

Scenario 3: EigenTrust Sybil Resistance

Scenario 4: Content Quality Filtering

Scenario 5: Quarantine Admin Workflow

Integration Points to Verify

Known Limitations

Commands to Run

Success Criteria

Related Documentation

6.8 KiB

Raw Blame History