stemedb/roadmap.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

1606 lines
105 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Episteme (StemeDB) Roadmap
> **Goal:** Build the "Git for Truth" substrate for autonomous AI research.
> **Current Phase:** Phase 7-8 (The Shield + The Swarm) — Phase 6 complete ✅
> **Target Vertical:** BioTech/Pharma ("The Living Review")
> **Endgame:** Distributed multi-writer cluster for millions of concurrent agents
---
## High-Level Timeline
| Phase | Codename | Focus | Key Deliverable |
| :--- | :--- | :--- | :--- |
| **1** | **The Spine** | Storage & Safety | Append-only WAL + KV Store |
| **2** | **The Lattice** | Indexing & Async | Materialized Views + Ballot Box |
| **2.5** | **Hardening** | Camp 2 Fixes | MV staleness, epoch behavior, lens cleanup |
| **3** | **The Pilot** | Vertical Integration | Pharma Ingestion + Living Review Agent |
| **4** | **The Hive** | Trust & Learning | TrustRank, metadata indexing, change tracking |
| **5** | **The Forge** | Foundation Hardening | Replace sled, fix WAL, persist indices, concept hierarchy |
| **6** | **The Mesh** | Distributed Writes | CRDT replication, Raft coordination, cluster membership |
| **7** | **The Shield** | Trust at Scale | EigenTrust, PoW admission, anti-spam, quarantine |
| **8** | **The Swarm** | Production Cluster | Chaos testing, observability, geo-distribution |
| **9** | **The Bunker** | Disaster Planning | Backup/restore, corruption recovery, GDPR compliance |
---
## Detailed Milestones
### Phase 1: The Spine (Foundation)
*Goal: Securely ingest assertions and persist them without data loss.*
- [x] **Project Scaffold**: Initialize Rust workspace, set up linting/CI (clippy, fmt).
- [x] **Assertion Schema**: Define the `Assertion` struct with `rkyv` serialization.
- [x] Add dependencies: `rkyv`, `blake3`, `ed25519-dalek`, `image_hasher`.
- [x] Define `Assertion` struct (Subject, Predicate, Object, Confidence, SourceHash).
- [x] **Multi-Sig Expansion**: Implement `SignatureEntry` struct and `signatures: Vec<SignatureEntry>` field.
- [x] **Visual Expansion**: Add `visual_hash: Option<pHash>` field for image provenance.
- [x] Test serialization round-trips.
- [x] **Ballot Schema**: Define the `Vote` struct for multi-agent consensus.
- [x] Add `Vote` struct: `assertion_hash`, `agent_id`, `weight`, `signature`.
- [x] Test serialization round-trips.
- [x] **Paradigm Schema (Epochs)**: Define the `Epoch` and `SupersessionType` structs.
- [x] Add `epoch: Option<EpochId>` to `Assertion`.
- [x] Implement `Epoch` struct with `supersedes` and `SupersessionType`.
- [x] Test serialization round-trips.
- [x] **WAL Integration**: Implement the Quarantine Pattern for write-ahead logging.
- [x] Create `stemedb-wal` crate.
- [x] Port `FsyncGuard` and `Record` logic from established durability patterns.
- [x] Implement Record format with BLAKE3 checksums and Headers.
- [x] Verify `fsync` behavior with tests.
- [x] **Storage Engine**: Implement the `Store` trait using `sled` (embedded KV).
- [x] Add `sled` dependency.
- [x] Define `KVStore` trait (put, get, delete, scan_prefix, flush).
- [x] Implement `SledStore` wrapper.
- [x] **Basic Ingestor**: Background worker that tails WAL and writes to KV.
- [x] Implement async loop reading from WAL.
- [x] Write deserialized assertions, votes, and epochs to `sled`.
- [x] Ed25519 signature verification during ingestion.
- [x] Maintains S: and SP: indexes on ingest.
- [x] Persistent cursor/checkpoint (resumes from `__CURSOR__:ingest` in KV store).
- [x] **Verification**: Crash recovery tests (write -> crash -> restart -> read).
- [x] Single and multi-record crash recovery.
- [x] Multiple crash cycles tested.
### Phase 2: The Lattice (Connectivity)
*Goal: Query data with sub-millisecond latency using Materialized Views.*
- [x] **Lifecycle Schema**: Add `LifecycleStage` to Assertion.
- [x] Define enum: `Proposed`, `UnderReview`, `Approved`, `Deprecated`, `Rejected`.
- [x] Update `Assertion` struct and serialization tests.
- [x] **The Ballot Box**: Implement high-velocity vote ingestion.
- [x] `VoteStore` trait and implementation.
- [x] `VoteAwareConsensusLens` for real vote-based resolution.
- [x] **Index Infrastructure**: Compound indexes for O(1) queries.
- [x] `IndexStore` trait with S: and SP: indexes.
- [x] `QueryEngine` smart routing (SP -> S -> scan).
- [x] **Materializer**: Background worker for O(1) Read Performance.
- [x] `MaterializedView` type in `stemedb-core`.
- [x] `Materializer` worker in `stemedb-query` with `step()` and `run()`.
- [x] Aggregates Votes via `VoteAwareConsensusLens` (or any `AsyncLens`).
- [x] Updates `MV:{Subject}:{Predicate}` with the winning Assertion + metadata.
- [x] Event-driven mode via `run_notified()` with `tokio::sync::Notify`.
- [x] Fast-path MV lookup in `QueryEngine::try_fast_path()`.
- [x] **The Meter**: Implement Economic Throttling (TAN).
- [x] `QuotaStore` trait and `GenericQuotaStore` implementation.
- [x] Token Bucket algorithm with per-agent per-hour quotas.
- [x] `MeterLayer` tower middleware for request cost tracking.
- [x] Cost model: Assert=10, Vote=1, Query=5+lens, +1/KB payload.
- [x] `GET /v1/meter/quota` endpoint to check remaining quota.
- [x] `POST /v1/meter/quota/limit` admin endpoint to set custom limits.
- [x] **API Surface**: `axum` HTTP server with OpenAPI (utoipa).
- [x] `POST /v1/assert` -> Accepts JSON, writes to WAL.
- [x] `POST /v1/vote` -> High-throughput vote endpoint.
- [x] `POST /v1/epoch` -> Create epoch with optional supersession.
- [x] `GET /v1/query` -> Subject/Predicate/Lens/Lifecycle/Epoch filtering.
- [x] `GET /v1/health` -> Health check with assertion count.
- [x] `GET /swagger-ui` -> Interactive API docs.
- [x] 5 lens types available: Recency, Consensus, Authority, VoteAwareConsensus, TrustAwareAuthority.
- [x] **Query Audit**: Log every read with provenance.
- [x] Define `QueryAudit` struct: query_id, agent_id, timestamp, params, result_hash, contributing_assertions.
- [x] Storage at `AUD:{query_id}` with agent index at `AUDA:{agent_id}:{timestamp}:{query_id}`.
- [x] `GET /v1/audit/queries` -> Returns history of agent decisions.
- [x] `GET /v1/audit/query/{id}` -> Full reasoning trace for a single query.
- [x] Auto-logging on every query via `X-Agent-Id` header.
### Phase 2.5: Hardening (Camp 2 Fixes)
*Goal: Close the gaps between "built" and "works right." Every item here addresses a feature that exists but doesn't fully deliver on its promise.*
- [x] **2.1 MV Staleness Detection**: Make the fast-path aware of stale materialized views.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `max_stale: Option<u64>` to `Query` struct in `crates/stemedb-query/src/query.rs`.
- [x] Added `.max_stale(secs)` builder method to `QueryBuilder`.
- [x] In `try_fast_path()`: if `query.max_stale` is set and MV age exceeds threshold, falls through to slow path with `debug!` log.
- [x] Added `max_stale` to API `QueryParams` DTO in `crates/stemedb-api/src/dto.rs`.
- [x] Wired through query handler in `crates/stemedb-api/src/handlers/query.rs`.
- **Tests:**
- [x] `test_fast_path_stale_view_falls_back`: MV 1000 seconds old, `max_stale = 60` → slow path used.
- [x] `test_fast_path_fresh_view_used`: Fresh MV, `max_stale = 300` → fast path used.
- [x] `test_fast_path_no_max_stale_always_uses_mv`: No `max_stale` → any MV age accepted (backward compatible).
- [x] `test_fast_path_max_stale_zero_rejects_old_mv`: `max_stale = 0`, MV 1 second old → slow path.
- [x] `test_fast_path_max_stale_zero_accepts_brand_new_mv`: `max_stale = 0`, brand new MV → fast path.
- [x] **2.2 AuthorityLens -> ConfidenceLens Rename**: Eliminate the misleading name.
- **Problem:** `AuthorityLens` selects by `confidence` field, not by agent reputation. `TrustAwareAuthorityLens` is the real authority lens. The name creates confusion about what "Authority" means.
- **Solution implemented:**
- [x] Renamed `authority.rs``confidence.rs`, `AuthorityLens``ConfidenceLens`
- [x] Added `LensDto::Confidence` for the confidence-field selector
- [x] Changed `LensDto::Authority` to route to `TrustAwareAuthorityLens` (the real authority lens)
- [x] Updated query handler routing
- [x] Updated ai-lookup/services/lens.md and skill documentation
- [x] **2.3 EpochAwareLens**: Give epoch supersession runtime behavior.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `EpochAwareLens` in `crates/stemedb-lens/src/epoch_aware.rs`
- [x] Decorator pattern wrapping any inner lens (default: RecencyLens)
- [x] Walks supersession chain from `E:{epoch_id}` keys
- [x] Cycle detection + max depth guard (100)
- [x] Fail-open on missing epochs
- [x] `LensDto::EpochAware` added to API
- [x] 11 tests: excludes_superseded, chain_supersession, no_epochs_passes_all, missing_epoch_includes, cycle_detection, consensus_lens_inner, mixed_epochs, etc.
- [x] Documentation updated in `ai-lookup/services/lens.md`
- **Known Limitation:** Filtering only occurs when assertions from the superseding epoch are present in candidates. If all candidates are from old epoch (no new epoch assertions), they pass through (fail-open behavior).
- [x] **2.4 Visual Hash Query Support**: Make the stored `visual_hash` queryable.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `hamming_distance(a: &PHash, b: &PHash) -> u32` in `crates/stemedb-query/src/query.rs` (lines 26-28)
- [x] `visual_near: Option<String>` and `visual_threshold: Option<u32>` in `Query` struct (lines 84-90)
- [x] `.visual_near(hash, threshold)` builder method
- [x] `Query::matches()` computes hamming distance when `visual_near` is set
- [x] API `QueryParams` DTO has `visual_near` and `visual_threshold`
- [x] 10+ tests: exact_match, within_threshold, exceeds_threshold, skips_without_hash, invalid_hex, wrong_length, combines_with_subject, default_threshold, max_threshold, threshold_63_rejects
- **Note:** Brute-force O(N) scan. VP-tree/BK-tree index is Phase 3+.
- [x] **2.5 Vector Field**: No changes needed. Already roadmapped for Phase 3.
- **Status:** ✅ N/A (No Phase 2 work required)
- **Current state:** `vector: Option<Vec<f32>>` on `Assertion`. Stored and returned by API. No index, no search.
- **Phase 3 plan:** Integrate `hnsw-rs` or `lance` for k-NN search.
- [x] **2.6 E2E Integration Test (Write -> Materialize -> Read)**: Prove the full pipeline works end-to-end.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `crates/stemedb-query/tests/e2e_pipeline.rs` with 5 comprehensive tests:
- `test_e2e_write_materialize_read` - Basic happy path
- `test_e2e_vote_consensus` - Vote-weighted resolution
- `test_e2e_update_winner` - Winner changes on re-materialize
- `test_e2e_cursor_persistence` - Cursor survives worker restart
- `test_e2e_notify_integration` - Event-driven notification channel
- [x] `stemedb-wal` and `stemedb-ingest` added as dev-dependencies
- [x] Helper functions: `create_signed_assertion()`, `compute_assertion_hash()`, `create_vote()`
- [x] Uses Ed25519 signing for authentic signature verification
- [x] Also: `crates/stemedb-api/tests/e2e_flow_test.rs` tests the HTTP API layer end-to-end.
### Phase 3: The Pilot (BioTech/Pharma)
*Goal: Prove value in the "High-Liability" beachhead. Close every Camp 4 gap that blocks a credible demo.*
#### 3A. Schema Expansion (Prerequisite for everything below)
- [x] **3A.1 Source-Class Field**: Add `source_class: SourceClass` to Assertion.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `SourceClass` enum in `crates/stemedb-core/src/types.rs` (lines 68-88).
- [x] 6-tier system: `Regulatory` (0), `Clinical` (1), `Observational` (2), `Expert` (3), `Community` (4), `Anecdotal` (5).
- [x] `tier()` method returns tier number for ordering.
- [x] `default_decay_days()` method for tier-specific confidence decay.
- [x] `authority_weight()` method for conflict resolution weighting.
- [x] Field on `Assertion` struct at line 152.
- [x] Full serialization and indexing support.
- [x] **3A.2 Conflict Score on Resolution**: Add `conflict_score: f32` to Resolution.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `conflict_score: f32` field to `Resolution` in `crates/stemedb-lens/src/traits.rs`.
- [x] Updated `Resolution::empty()` to set `conflict_score: 0.0`.
- [x] Updated `Resolution::with_winner()` to accept `conflict_score` parameter.
- [x] Added `compute_conflict_score(candidates: &[Assertion]) -> f32` utility function:
- Uses normalized variance of confidence values.
- 0 or 1 candidates → 0.0 (no conflict possible).
- All same confidence → 0.0 (unanimous).
- Max variance (0.0 vs 1.0) → 1.0 (maximum conflict).
- Defensive NaN handling (returns 0.0 for malformed data).
- [x] Updated all lens implementations to compute and pass conflict score:
- `crates/stemedb-lens/src/recency.rs`
- `crates/stemedb-lens/src/consensus.rs`
- `crates/stemedb-lens/src/confidence.rs`
- `crates/stemedb-lens/src/vote_aware_consensus.rs`
- `crates/stemedb-lens/src/trust_aware_authority.rs`
- [x] Added `conflict_score: f32` to `MaterializedView` in `crates/stemedb-core/src/types.rs`.
- [x] Updated `Materializer::materialize_pair()` to write `conflict_score` from resolution.
- [x] Added `conflict_score: Option<f32>` and `resolution_confidence: Option<f32>` to `QueryResponse` DTO in `crates/stemedb-api/src/dto.rs` (only present when lens is applied).
- [x] Wired through query handler in `crates/stemedb-api/src/handlers/query.rs`.
- **Tests:**
- [x] `test_conflict_score_zero_for_empty`: Empty candidates → 0.0.
- [x] `test_conflict_score_zero_for_single`: 1 candidate → 0.0.
- [x] `test_conflict_score_zero_for_agreement`: All same confidence → near 0.0.
- [x] `test_conflict_score_high_for_disagreement`: Candidates at 0.1, 0.5, 0.9 → score > 0.3.
- [x] `test_conflict_score_max_for_extremes`: 0.0 vs 1.0 → score ≈ 1.0.
- [x] `test_conflict_score_handles_nan_defensively`: NaN confidences → 0.0 (fail-safe).
- [x] **3A.3 Rich Source Metadata**: Add structured provenance beyond `source_hash`.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `source_metadata: Option<Vec<u8>>` field to `Assertion` in `crates/stemedb-core/src/types.rs` (after `epoch`, before `lifecycle`).
- [x] Uses `Vec<u8>` (not `String`) for rkyv zero-copy compatibility. Callers encode/decode JSON on their side.
- [x] Added `source_metadata: Option<Vec<u8>>` to `AssertionBuilder` in `crates/stemedb-core/src/testing.rs`.
- [x] Added `.source_metadata_json(json: &str)` and `.source_metadata(bytes)` builder methods.
- [x] Added `source_metadata: Option<String>` to `CreateAssertionRequest` DTO (JSON string in API, converted to bytes internally).
- [x] Added `source_metadata: Option<String>` to `AssertionResponse` DTO (bytes converted to JSON string with defensive UTF-8 handling).
- [x] Wired through create handler (`dto_to_assertion()`) and query handler (`assertion_to_dto()`).
- **Tests:**
- [x] `test_serialize_deserialize_assertion_with_metadata`: Serialization roundtrip with metadata present.
- [x] `test_serialize_deserialize_assertion_without_metadata`: Serialization roundtrip with metadata absent.
- **Note:** Metadata is stored but NOT indexed in Phase 3. Indexing individual metadata fields is Phase 4+.
#### 3B. Time & Decay (Core Query Features)
- [x] **3B.1 Time-Travel Engine**: `as_of` parameter for historical queries.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `as_of: Option<u64>` to `Query` struct in `crates/stemedb-query/src/query.rs:92-99`.
- [x] Added `.as_of(timestamp: u64)` to `QueryBuilder`.
- [x] In `Query::matches()`: if `as_of` is `Some(ts)`, check `assertion.timestamp <= ts`. Assertions created after `as_of` are excluded.
- [x] In `QueryEngine::execute()`: if `query.as_of` is set, **skip the fast path entirely** (MVs reflect current state, not historical).
- [x] Added `as_of: Option<u64>` to `QueryParams` DTO in `crates/stemedb-api/src/dto.rs`.
- [x] Wired through query handler.
- **Tests:**
- [x] `test_as_of_excludes_future_assertions`: Assertions filtered by timestamp.
- [x] `test_as_of_bypasses_fast_path`: MV exists, but `as_of` is set. Slow path used.
- [x] `test_as_of_none_uses_fast_path`: Normal query still uses fast path (backwards-compatible).
- [x] `test_as_of_with_lens_resolves_among_historical_candidates`: Time-travel + lens = resolve only among pre-as_of candidates.
- [x] `test_as_of_returns_empty_when_all_assertions_are_future`: All assertions are future, returns empty.
- [x] `test_as_of_with_exact_timestamp_match`: Edge case where assertion.timestamp == as_of.
- [x] **3B.2 Semantic Decay**: Confidence Half-Life at query time.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `decay_halflife: Option<u64>` to `Query` struct in `crates/stemedb-query/src/query.rs`.
- [x] Added `.decay_halflife(seconds: u64)` and `.source_class_decay(enabled: bool)` to `QueryBuilder`.
- [x] Added `decay_halflife` and `source_class_decay` to `QueryParams` DTO.
- [x] Created new `crates/stemedb-query/src/decay.rs` module with:
- `apply_decay()`: Uniform decay using formula `confidence * 2^(-(age / halflife))`.
- `apply_source_class_decay()`: Tier-specific decay (Regulatory=none, Clinical=2yr, Anecdotal=30d).
- `compute_decayed_confidence()`: Core decay calculation with clamping.
- [x] Integrated in `QueryEngine::execute()`: decay applied after filtering, before lens resolution.
- [x] Time-travel compatible: uses `as_of` timestamp if set, otherwise current time.
- [x] Source-class-aware decay fully implemented using `SourceClass::default_decay_days()`.
- **Tests:** (11 unit tests in decay.rs + 1 E2E test)
- [x] `test_decay_reduces_old_assertion_confidence`: 1yr old, 1yr halflife → ~50% confidence.
- [x] `test_decay_preserves_fresh_assertions`: 1hr old, 1yr halflife → ~100% confidence.
- [x] `test_decay_interacts_with_lens`: Older high-confidence loses to newer low-confidence after decay.
- [x] `test_source_aware_decay_tier0_no_decay`: Regulatory never decays.
- [x] `test_source_aware_decay_tier5_rapid_decay`: Anecdotal decays rapidly (30-day halflife).
- [x] `test_source_aware_decay_mixed_tiers`: Clinical vs Anecdotal tier comparison.
- [x] `test_decay_zero_halflife_no_change`: Zero halflife skips decay (avoids div-by-zero).
- [x] `test_decay_future_assertion_no_change`: Future assertions don't decay.
- [x] `test_decay_empty_assertions`: Empty input returns empty output.
- [x] `test_decay_confidence_clamps_to_valid_range`: Very old assertions clamp to [0.0, 1.0].
- [x] `test_decay_preserves_other_fields`: Only confidence changes; other fields preserved.
- [x] `test_e2e_decay_reduces_old_confidence`: Full pipeline E2E test in e2e_pipeline.rs.
- **Note:** When decay is enabled, materialized views (fast path) are bypassed because MVs store pre-computed winners without decay applied.
#### 3C. New Lenses
- [x] **3C.1 Skeptic Lens**: Surface disagreement, not winners. ✅ **COMPLETED**
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `crates/stemedb-lens/src/skeptic.rs` - Full implementation.
- [x] `AnalysisLens` trait for lenses that analyze conflict instead of resolving it.
- [x] `SkepticLens` uses normalized Shannon entropy for conflict scoring.
- [x] Returns `ConflictAnalysis` with:
- `conflict_score: f32` (0.0 = unanimous, 1.0 = chaos)
- `status: ResolutionStatus` (Unanimous, Agreed, Contested)
- `claims: Vec<ClaimSummary>` - all claims ranked by weight
- [x] `SkepticResolver` + `SkepticView` in `stemedb-query/src/skeptic.rs`.
- [x] `GET /v1/skeptic?subject=X&predicate=Y` API endpoint.
- [x] Core types in `stemedb-core/src/types.rs`:
- `ResolutionStatus` enum
- `ConflictAnalysis` struct
- `ClaimSummary`, `SourceSummary`, `AgentSummary`
- [x] Comprehensive test coverage (21 test cases).
- [x] **3C.2 Layered Consensus Lens**: Per-source-class consensus.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `crates/stemedb-lens/src/layered_consensus.rs` - Full implementation.
- [x] `TierResolution` struct: per-tier result with tier, source_class, winner, candidates_count, conflict_score, resolution_confidence.
- [x] `LayeredResolution` struct: multi-tier result with tiers vec, overall_winner, overall_conflict_score, total_candidates.
- [x] `LayeredLens` trait: `resolve_layered(&[Assertion]) -> LayeredResolution`, `name() -> &'static str`.
- [x] `LayeredConsensusLens` implements both `LayeredLens` and `Lens` traits.
- [x] Cross-tier conflict score uses normalized Shannon entropy of tier winner object values.
- [x] `LensDto::LayeredConsensus` variant (redirects to `/v1/layered` endpoint).
- [x] `GET /v1/layered?subject=X&predicate=Y` API endpoint with `LayeredQueryResponse`.
- [x] Exported from `crates/stemedb-lens/src/lib.rs`.
- **Tests:**
- [x] `test_layered_empty_candidates`: Empty input returns empty resolution.
- [x] `test_layered_single_tier`: All same source_class, returns one tier result.
- [x] `test_layered_multi_tier_agreement`: Tier 0 and Tier 5 agree, low cross-tier conflict.
- [x] `test_layered_multi_tier_disagreement`: Tier 1 vs Tier 5 disagree, high conflict, Tier 1 wins.
- [x] `test_layered_overall_winner_from_highest_authority`: Tier 0 wins despite fewer assertions.
- [x] `test_layered_lens_trait_compatibility`: Standard Lens trait works.
- [x] `test_layered_within_tier_conflict`: High internal conflict within a tier.
- [x] `test_layered_all_tiers_present`: One assertion from each tier.
- [x] `test_layered_lens_name`: Both trait names work.
- [x] `test_layered_numeric_values`: Works with numeric object values.
- [x] **3C.3 Constraints Lens**: Pre-flight check for must_use/forbidden.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `crates/stemedb-lens/src/constraints.rs` - Full implementation.
- [x] `ConstraintSet` struct: holds categorized assertions (must_use, forbidden, prefer) with conflict_score.
- [x] `ConstraintsLens` struct with `resolve_constraints(&[Assertion]) -> ConstraintSet` method.
- [x] Categorizes by predicate pattern: `must_use:*`, `forbidden:*`, `prefer:*`.
- [x] Implements `Lens` trait for compatibility (priority: must_use > forbidden > prefer).
- [x] Sorted by confidence (highest first), with timestamp as tiebreaker.
- [x] `LensDto::Constraints` added (redirects to `/v1/constraints` endpoint).
- [x] `GET /v1/constraints?subject=X` API endpoint with `ConstraintsResponse`.
- [x] DTOs: `ConstraintsQueryParams`, `ConstraintEntryDto`, `ConstraintsResponse`.
- [x] Exported from `crates/stemedb-lens/src/lib.rs`.
- **Tests:** (16 test cases)
- [x] `test_constraints_categorizes_by_predicate`: Mixed predicates sorted into must_use/forbidden/prefer.
- [x] `test_constraints_empty_categories`: Only prefer, no must_use/forbidden.
- [x] `test_constraints_non_constraint_predicates_ignored`: Regular predicates filtered out.
- [x] `test_constraints_sorted_by_confidence`: Within-category confidence ordering.
- [x] `test_constraints_empty_candidates`: Empty input returns empty set.
- [x] `test_constraints_has_constraints_true`: Helper method works.
- [x] `test_constraints_all_regular_predicates`: All non-constraint predicates returns no constraints.
- [x] `test_lens_trait_picks_must_use_winner`: Standard Lens trait picks must_use first.
- [x] `test_lens_trait_falls_back_to_forbidden`: Falls back to forbidden when no must_use.
- [x] `test_lens_trait_falls_back_to_prefer`: Falls back to prefer when no must_use/forbidden.
- [x] `test_lens_trait_empty_for_no_constraints`: Returns empty when no constraint predicates.
- [x] `test_lens_name`: Name returns "Constraints".
- [x] `test_lens_empty_candidates`: Empty input to Lens trait returns empty resolution.
- [x] `test_multiple_must_use_picks_highest_confidence`: Multiple must_use picks highest confidence.
- [x] `test_confidence_tiebreaker_uses_timestamp`: Same confidence uses newer timestamp.
- [x] `test_predicate_pattern_exact_prefix`: `must_use_something` not matched (only `must_use:*`).
#### 3D. Epoch Enhancement
- [x] **3D.1 Epoch Cascade Logic** (enhancement of Phase 2.5 EpochAwareLens):
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `write_supersession_cascade()` in `crates/stemedb-ingest/src/worker.rs`:
- Writes `SUPERSEDED:{old_epoch_id}` markers for full transitive closure.
- All markers point to the LATEST superseding epoch.
- Max depth guard (100 levels) and cycle detection via visited set.
- [x] `is_epoch_superseded()` in `crates/stemedb-lens/src/epoch_aware.rs`:
- O(1) marker lookup instead of O(chain_length) chain walks.
- Fail-open semantics: missing marker = not superseded.
- [x] `compute_superseded_epochs()` uses marker lookups for filtering.
- **Tests:**
- [x] `test_cascade_writes_superseded_marker`: Epoch B supersedes A → `SUPERSEDED:A` exists.
- [x] `test_cascade_transitive`: C→B→A chain → both `SUPERSEDED:A` and `SUPERSEDED:B` point to C.
- [x] `test_cascade_cycle_detection`: Mutual supersession handled gracefully.
- [x] `test_epoch_aware_uses_marker`: EpochAwareLens uses O(1) marker lookup.
- [x] `test_superseded_epoch_filtered_even_without_new_assertions`: Marker-based filtering works.
#### 3E. Similarity Search
- [x] **3E.1 Vector Search**: Semantic k-NN queries via embeddings.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `hnsw_rs = "0.3"` and `parking_lot = "0.12"` to `stemedb-storage/Cargo.toml`.
- [x] New module: `crates/stemedb-storage/src/vector_index.rs`.
- [x] `VectorIndex` trait: `insert(hash: &Hash, vector: &[f32])`, `search(query: &[f32], k: usize) -> Vec<(Hash, f32)>`, `dimension()`, `len()`, `is_empty()`.
- [x] `HnswVectorIndex` implementation with HNSW graph, RwLock protection, hash↔ID mappings.
- [x] Input validation: dimension mismatch, NaN, Infinite values rejected.
- [x] Idempotent insert (same hash twice = no-op).
- [x] `Arc<dyn VectorIndex>` trait object support for sharing.
- [x] `IngestWorker::with_vector_index()` builder method for index attachment.
- [x] IngestWorker: if assertion has `vector`, inserts into vector index after KV write.
- [x] Added `vector_near: Option<Vec<f32>>` and `k: Option<usize>` to `Query` struct in `crates/stemedb-query/src/query.rs`.
- [x] Added `.vector_near(vector, k)` builder method to `QueryBuilder`.
- [x] Added `vector_near` and `k` to API `QueryParams` DTO.
- [x] `QueryEngine::with_vector_index()` builder method for index attachment.
- [x] QueryEngine: if `vector_near` is set and index configured, uses O(log N) HNSW lookup for candidates.
- [x] Falls back to standard path if no index configured (with debug log).
- **Tests:** (12 unit tests for VectorIndex + 4 integration tests in engine.rs)
- [x] `test_create_index`, `test_insert_and_search`, `test_dimension_mismatch`.
- [x] `test_idempotent_insert`, `test_search_empty_index`, `test_search_k_zero`.
- [x] `test_nan_rejection`, `test_infinite_rejection`, `test_contains`.
- [x] `test_larger_scale` (100 vectors, exact match first), `test_custom_params`, `test_zero_dimension_panics`.
- [x] `test_vector_search_returns_nearest_neighbors`, `test_vector_search_with_subject_filter`.
- [x] `test_vector_search_without_index_falls_back`, `test_vector_search_with_as_of_filter`.
- **Note:** Index is in-memory only. Persistence is Phase 4+.
- [x] **3E.2 Visual Hash Index**: BK-tree for O(log N) visual similarity.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] New module: `crates/stemedb-storage/src/visual_index.rs`.
- [x] `VisualIndex` trait: `insert(hash: &Hash, phash: &PHash)`, `search(query: &PHash, threshold: u32) -> Vec<(Hash, u32)>`, `len()`, `is_empty()`.
- [x] `BkTreeVisualIndex` implementation using BK-tree over hamming distance.
- [x] `hamming_distance(a: &PHash, b: &PHash) -> u32` utility function.
- [x] Threshold clamped to 0-64 range (max 64 bits).
- [x] Results sorted by distance ascending.
- [x] Idempotent insert (same hash twice = no-op).
- [x] `Arc<dyn VisualIndex>` trait object support for sharing.
- [x] `IngestWorker::with_visual_index()` builder method for index attachment.
- [x] IngestWorker: if assertion has `visual_hash`, inserts into BK-tree after KV write.
- [x] `QueryEngine::with_visual_index()` builder method for index attachment.
- [x] QueryEngine: if `visual_near` is set and index configured, uses O(log N) BK-tree lookup.
- [x] Falls back to brute-force scan (via `query.matches()`) if no index configured.
- [x] Invalid hex input returns `QueryError::InvalidInput` with clear message.
- **Tests:** (14 unit tests for VisualIndex + 6 integration tests in engine.rs)
- [x] `test_hamming_distance_zero`, `test_hamming_distance_max`, `test_hamming_distance_partial`.
- [x] `test_create_index`, `test_insert_and_search_exact`, `test_search_within_threshold`.
- [x] `test_search_no_matches`, `test_search_empty_index`, `test_idempotent_insert`.
- [x] `test_contains`, `test_results_sorted_by_distance`, `test_threshold_clamped_to_64`.
- [x] `test_larger_scale` (1000 hashes), `test_default_impl`.
- [x] `test_visual_search_returns_similar_images`, `test_visual_search_with_lifecycle_filter`.
- [x] `test_visual_search_invalid_hex_returns_error`, `test_visual_search_without_index_uses_brute_force`.
- [x] `test_visual_search_with_limit`, `test_vector_search_empty_index`.
- **Note:** Index is in-memory only. Persistence is Phase 4+.
#### 3F. Provenance
- [x] **3F.1 Source Document Storage & Provenance Lookup**: Enable 100% citation recall.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `POST /v1/source` endpoint to store source documents by BLAKE3 content hash.
- [x] `GET /v1/provenance/{hash}` endpoint to retrieve source documents by hash.
- [x] Source storage at `SRC:{hash}` keys with format: `[content_type_len:4][content_type][content]`.
- [x] Base64 encoding for binary-safe JSON transport.
- [x] 10MB size limit per document.
- [x] Content-addressed storage: same content → same hash (idempotent uploads).
- [x] DTOs: `StoreSourceRequest`, `StoreSourceResponse`, `ProvenanceResponse`.
- [x] OpenAPI documentation under "provenance" tag.
- **Tests:** (5 test cases)
- [x] `test_store_and_retrieve_source`: Happy path store + retrieve.
- [x] `test_store_source_invalid_base64`: Bad base64 → 400.
- [x] `test_get_provenance_not_found`: Unknown hash → 404.
- [x] `test_get_provenance_invalid_hash`: Bad hash format → 400.
- [x] `test_store_source_idempotent`: Same content twice → same hash.
- **Note:** Benchmark utility for verifying all assertions have retrievable sources is future work.
#### 3G. API Cleanup
- [x] **3G.1 Document epoch supersession via existing endpoint**: No new `/epoch/supersede` endpoint needed.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Updated use case docs (consumer-health-intelligence.md, glp1-living-review.md) to use `POST /v1/epoch` with `supersedes` field.
- [x] Added OpenAPI examples showing both new epoch and supersession flows in handlers/epoch.rs.
- [x] Documented all 5 supersession types: Invalidate, Temporal, Refinement, RequiresReview, Additive.
- **No code change.** Documentation fix only.
### Phase 4: The Hive (Trust & Scale)
*Goal: Change tracking, metadata indexing, and the database primitives for training pipelines.*
- [x] **TrustRank Engine**: Foundation for trust-based resolution.
- [x] `TrustRankStore` for per-agent reputation storage.
- [x] `TrustAwareAuthorityLens` for reputation-weighted resolution.
- [x] **Confidence Half-Life**: Implement decay calculation engine.
- [x] Learning loop: `record_outcome()` for accuracy tracking.
- [x] **4.1 "Since" Parameter**: Change tracking for returning consumers.
- **Status:** ✅ COMPLETE
- **Problem:** Consumer Health shows `GET /query?since=2023-10-01` returning `changes_since` with dated change entries. The "returning consumer" story: "What changed since I last looked?"
- **Depends on:** Time-Travel (3B.1) and Materializer.
- **Implementation:**
- [x] Added `since: Option<u64>` to `Query` struct and `QueryBuilder` in `crates/stemedb-query/src/query.rs`.
- [x] Added `since` to `QueryParams` DTO in `crates/stemedb-api/src/dto.rs`.
- [x] **MV Changelog**: Track when materialized views change.
- [x] New key pattern: `MVC:{subject}:{predicate}:{timestamp_nanos}` using nanosecond precision to prevent collisions.
- [x] `ChangeEntry` struct in `crates/stemedb-core/src/types.rs` with: `timestamp`, `previous_winner_hash`, `new_winner_hash`, `subject`, `predicate`, `lens_name`.
- [x] In `Materializer::materialize_pair()`: before overwriting MV, read existing MV. If winner changed (different hash), write changelog entry.
- [x] `get_previous_winner_hash()` and `write_changelog_entry()` helper methods.
- [x] In QueryEngine: if `since` is set, skip fast path (MVs reflect current state). `fetch_changes_since()` scans `MVC:` keys and returns entries >= since timestamp.
- [x] `ChangeEntryDto` in `crates/stemedb-api/src/dto.rs` with hex-encoded hashes.
- [x] `changes_since: Option<Vec<ChangeEntryDto>>` added to `QueryResponse` DTO.
- [x] `fetch_changelog_if_needed()` helper in query handler.
- [x] Re-exported `ChangeEntry` from `stemedb-core` crate root.
- **Tests:**
- [x] `test_changelog_written_on_first_materialization`: First MV creates changelog with `previous_winner_hash: None`.
- [x] `test_changelog_written_on_winner_change`: Winner change creates changelog with previous hash.
- [x] `test_no_changelog_when_winner_unchanged`: Re-materialize same winner = no new changelog.
- [x] `test_fetch_changes_since_returns_entries`: Fetch filtered by timestamp, sorted ascending.
- [x] `test_fetch_changes_since_empty_on_no_entries`: No entries returns empty vec.
- [x] `test_since_bypasses_fast_path`: Query with `since` uses slow path, returns all assertions.
- [x] **4.2 Source Metadata Indexing** (extension of 3A.3): Index key metadata fields.
- **Status:** ✅ COMPLETE
- **Problem:** Phase 3 stores `source_metadata` as an opaque blob. Phase 4 makes key fields queryable.
- **Depends on:** Rich Source Metadata (3A.3).
- **Implementation:**
- [x] Defined indexed metadata keys: `journal`, `doi`, `platform`, `study_design` in `INDEXED_METADATA_FIELDS`.
- [x] New module: `crates/stemedb-storage/src/source_metadata_index.rs`.
- [x] Key pattern: `SMV:{field}:{value}` storing `Vec<Hash>` (rkyv-serialized) for efficient batch lookup.
- [x] `SourceMetadataIndexStore` trait with `add_to_metadata_indexes()` and `get_by_metadata_field()`.
- [x] `GenericSourceMetadataIndexStore` implementation with case-insensitive value normalization.
- [x] IngestWorker: on ingestion, if `source_metadata` is present, parse JSON, extract indexed fields, write index entries.
- [x] Added `source_journal`, `source_doi`, `source_platform`, `source_study_design` to `Query` struct.
- [x] Added `.source_journal()`, `.source_doi()`, `.source_platform()`, `.source_study_design()` builder methods.
- [x] Added `matches_metadata_filters()` helper with AND semantics and case-insensitive matching.
- [x] Added metadata field filters to `QueryParams` DTO (e.g., `?source_journal=NEJM`).
- [x] Wired through query handler in `crates/stemedb-api/src/handlers/query.rs`.
- **Tests:** (9 unit tests in source_metadata_index.rs + 9 unit tests in query.rs)
- [x] `test_add_and_get_by_metadata_field`: Basic round-trip.
- [x] `test_case_insensitive_indexing`: "NEJM" stored, "nejm" query matches.
- [x] `test_multiple_assertions_same_field`: Vec accumulates correctly.
- [x] `test_malformed_json_gracefully_skipped`: Invalid JSON logs warning, doesn't error.
- [x] `test_missing_field_not_indexed`: Only indexed fields processed.
- [x] `test_idempotent_insert`: Same assertion twice = no duplicates.
- [x] `test_all_indexed_fields`: All 4 fields indexed correctly.
- [x] `test_empty_index_returns_empty_vec`: Non-existent field returns empty.
- [x] `test_non_string_values_ignored`: Non-string JSON values skipped.
- [x] **4.3 Batch TrustRank Decay API**: Expose scheduled decay for external orchestration.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] New module: `crates/stemedb-api/src/handlers/admin.rs`
- [x] `POST /v1/admin/decay-trust-ranks` endpoint.
- [x] Request accepts `now: Option<u64>` (defaults to current time) and `half_life_seconds: Option<u64>` (defaults to 30 days).
- [x] Response includes `decayed_count`, `timestamp_used`, `half_life_used`, `status`.
- [x] OpenAPI docs with "admin" tag.
- [x] Thin handler delegates to `TrustRankStore::decay_trust_ranks()`.
- **Tests:** (3 integration tests in http_integration.rs)
- [x] `test_decay_trust_ranks_empty_store`: Empty store returns 0 decayed.
- [x] `test_decay_trust_ranks_with_custom_params`: Custom timestamp and half-life honored.
- [x] `test_decay_trust_ranks_response_structure`: All 4 fields present in response.
- **Note:** The Gardener (Camp 5.2, app layer) calls this endpoint on a schedule. The database just exposes the primitive.
- [x] **4.4 Vote Provenance Witness**: Transform votes from opinions into cryptographic witnesses.
- **Status:** ✅ COMPLETE
- **Problem:** Votes record "I agree" but the browser extension product needs "I saw this exact text at this URL at this time." Without provenance, votes are Tier 5 noise.
- **Implementation:**
- [x] Added `source_url: Option<String>` to Vote struct — URL where claim was observed.
- [x] Added `observed_context: Option<Vec<u8>>` to Vote struct — page context bytes (rkyv zero-copy pattern).
- [x] API DTOs: `source_url` and `observed_context` on CreateVoteRequest/VoteResponse.
- [x] Input validation: source_url max 2048 chars (non-empty if provided), observed_context max 64KB.
- [x] Backward compatible: existing votes without provenance remain valid.
- **Tests:** Serialization roundtrip (with/without provenance, mixed fields), storage roundtrip, API integration (validation).
- [x] **4.5 Conflict Score Filtering**: Query-time filtering by conflict score.
- **Status:** ✅ COMPLETE
- **Problem:** The browser extension needs "only show me claims where conflict > 0.7" to implement contradiction-only overlays. Conflict score was computed but not filterable.
- **Implementation:**
- [x] Added `min_conflict_score: Option<f32>` and `max_conflict_score: Option<f32>` to Query struct.
- [x] Builder methods: `.min_conflict_score()`, `.max_conflict_score()`.
- [x] Fast-path filtering: checks MV conflict_score against thresholds.
- [x] API validation: scores must be 0.0-1.0, finite (rejects NaN/Inf).
- [x] QueryParams DTO wired through query handler.
- **Tests:** 13 engine tests (min/max thresholds, exact boundary, range, combination with lifecycle), API validation tests.
- **Note:** Filtering works on fast path (MV reads) only. Slow path returns raw assertions without conflict scores.
- [x] **4.6 Escalation Triggers**: Active safety system for high-conflict assertions.
- **Status:** ✅ COMPLETE
- **Problem:** High conflict was invisible. Episteme should be an active safety system that fires escalations when disagreement exceeds thresholds.
- **Implementation:**
- [x] `EscalationLevel` enum: Low, Medium, High, Critical.
- [x] `EscalationEvent` struct: content-addressed (BLAKE3 ID), subject, predicate, conflict_score, level, reason, timestamp, resolved.
- [x] `EscalationPolicy` struct: configurable threshold + level + optional predicate pattern.
- [x] `EscalationStore` trait + `GenericEscalationStore`: write, query since, resolve, get pending.
- [x] Key pattern: `ESC:{timestamp_nanos}:{id_hex}`.
- [x] Materializer integration: after computing conflict_score, checks policies and writes events.
- [x] `with_escalation(store, policies)` builder on Materializer.
- [x] `GET /v1/admin/escalations` and `POST /v1/admin/escalations/{id}/resolve` endpoints.
- **Tests:** Core type tests, storage roundtrip, materializer integration (high/low conflict, predicate matching, no store configured).
- [x] **4.7 Gold Standard Verification**: Sybil defense via proof of knowledge.
- **Status:** ✅ COMPLETE
- **Problem:** New agents need a way to earn TrustRank. Gold standards are admin-verified assertions that agents can be tested against.
- **Implementation:**
- [x] `GoldStandard` struct with rkyv serialization: assertion_hash, subject, predicate, expected_object, created_at, created_by.
- [x] `GoldStandardStore` trait + `GenericGoldStandardStore`: set, get, list, remove.
- [x] Key pattern: `GS:{subject}:{predicate}`.
- [x] `TrustAdjustment` enum: Rewarded(+0.05), Penalized(-0.1), AlreadyVerified.
- [x] `verify_agent_against_gold_standard()` on TrustRankStore with deduplication (agents can only verify each gold standard once).
- [x] Verification markers at `GS_VERIFIED:{agent_id}:{subject}:{predicate}` prevent gaming.
- [x] API: `POST/GET/DELETE /v1/admin/gold-standards`, `POST /v1/admin/verify-agent`.
- **Tests:** Core (creation, matching, case-sensitivity), storage CRUD, trust adjustment (reward, penalty, clamping, dedup), API integration (5 tests).
> **Note:** The following items were reclassified as **Application Layer** responsibilities (see `tmp/ambition-vs-reality.md`, Camp 5). They are not Episteme database features. They consume the Episteme API and are built by integrators or vertical-specific teams.
>
> - **The Simulator** (Training Data Pipeline) -> Camp 5.3
> - **The Super Curator** (Reviewer Agent swarm) -> Camp 5.4
> - **Background Gardener** (Cluster detection, signal processing) -> Camp 5.2
> - **Agent Wallet** (Key management sidecar) -> Camp 5.1
### Phase 5: The Forge (Foundation Hardening)
*Goal: Replace abandoned dependencies, fix WAL gaps, persist indices. Prerequisite for distribution.*
> **Research:** [docs/research/wal-crash-recovery-research.md](docs/research/wal-crash-recovery-research.md)
#### 5A. Storage Engine Replacement
- [x] **5A.1 Replace sled with redb + fjall**: sled is abandoned (author recommends alternatives).
- **Problem:** sled is alpha-stage with known performance regressions and no active development. Our entire storage layer depends on it.
- **Solution:** HybridStore routes keys by prefix — **fjall** (LSM) for write-heavy paths (`H:`, `V:`, `VC:`, `VW:`, `E:`, `SUPERSEDED:`, `__CURSOR__:`) and **redb** (B-tree) for read-heavy paths (`S:`, `SP:`, `MV:`, `TR:`, `QA:`, `QT:`, `TP:`, `GS:`, `ESC:`).
- **Tasks:**
- [x] Generalize `StorageError::Sled` to `StorageError::Backend(String)`.
- [x] Implement `FjallStore` backend with DashMap per-key locks for atomics.
- [x] Implement `RedbStore` backend with ACID transactions.
- [x] Implement `HybridStore` routing layer with prefix-based dispatch.
- [x] Migrate all ~500 tests from `SledStore` to `HybridStore`.
- [x] Remove sled dependency entirely.
- [x] Add criterion benchmarks (sequential put, random get, prefix scan, atomic increment, mixed workload).
- **Crates:** `redb = "2"`, `fjall = "2"`, `dashmap = "6"`
- [x] **5A.2 Key Layout Redesign**: Prepare keys for subject-prefix range sharding.
- **Problem:** Current keys (`H:{hash}`, `S:{subject}`, `MV:{subject}:{predicate}`) scatter related data across the keyspace. Distributed sharding needs co-location.
- **Solution:** Subject-prefix key layout with `\x00` separator for subject-scoped keys, `\x00` prefix for global keys (sort-first):
```
Subject-prefixed (co-located):
{subject}\x00H:{hash} → Assertion data
{subject}\x00S:{hash_list} → Subject index
{subject}\x00SP:{predicate} → Compound index
{subject}\x00MV:{predicate} → Materialized view
{subject}\x00V:{hash}:{vh} → Votes
{subject}\x00VC:{hash} → Vote count cache
{subject}\x00VW:{hash} → Vote weight cache
{subject}\x00GS:{predicate} → Gold standards
Global (sort first via \x00 prefix):
\x00TRUST:{agent_id} → Trust ranks
\x00QUOTA:{agent_id}:{win} → Quota records
\x00QLIMIT:{agent_id} → Quota limits
\x00E:{epoch_id} → Epochs
\x00SUPERSEDED:{epoch_id} → Supersession markers
\x00SUP:{hash} → Supersession records
\x00AUD:{query_id} → Audit records
\x00ESC:{ts}:{id} → Escalation events
\x00TP:{pack_id} → Trust packs
\x00META:{key} → System metadata
\x00HASH_SUBJECT:{hash} → Reverse lookup index
\x00SUBJECTS:{subject} → Known subjects index
\x00GS_LIST:{subj}:{pred} → Gold standard listing
```
- **Implementation:**
- [x] `key_codec.rs` (573 lines): 40+ key builder functions, subject validation, extraction utilities, 30+ unit tests.
- [x] All stores migrated to `key_codec::` functions (91 call sites across 10 store files, zero hardcoded key patterns).
- [x] Ingestion pipeline uses `key_codec` (11 usages across 3 files).
- [x] Query engine uses `key_codec` (34 usages across 7 files).
- [x] Subject co-location verified by `test_subject_colocation` test.
- [x] Global key sort-first verified by `test_global_keys_sort_first` test.
#### 5B. WAL Hardening
- [x] **5B.1 CRC32C Checksums**: Add hardware-accelerated torn write detection.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] Added `crc32c = "0.6"` crate dependency.
- [x] Updated WAL record format with CRC32C in `format.rs`.
- [x] CRC32C verified on read before deserializing.
- [x] Hardware-accelerated via SSE 4.2 on supported CPUs.
- [x] **5B.2 Crash Recovery Implementation**: Replace recovery stub with production recovery.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `recovery/mod.rs` (236 lines): Full sequential record scan with CRC32C validation.
- [x] Truncates file at first corrupted/incomplete record.
- [x] `RecoveryMetrics` struct tracks: valid_records, invalid_records, bytes_truncated, recovery_duration.
- [x] Comprehensive test suite in `recovery/tests.rs`.
- [x] **5B.3 Group Commit**: Batch fsync for throughput.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `group_commit.rs` (342 lines): `GroupCommitBuffer` with configurable `max_writes` and `max_duration`.
- [x] Writers append to buffer and wait on `Notify`.
- [x] Background flusher calls fsync and notifies all waiters.
- [x] `GroupCommitConfig` for tuning batch size and flush interval.
- [x] **5B.4 Log Rotation**: Bounded WAL disk usage.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `segment.rs` (368 lines): Full segment management.
- [x] Segment naming: `{seq:08x}.wal`.
- [x] Rotation when segment exceeds configurable threshold.
- [x] `SegmentManager` tracks active and archived segments.
- [x] Safe deletion of segments after cursor passes them.
#### 5C. Index Persistence
- [x] **5C.1 Persistent Vector Index**: Move HNSW from in-memory to disk-backed.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `PersistentVectorIndex` in `crates/stemedb-storage/src/vector_index/persistent.rs`.
- [x] Hybrid hot/cold architecture:
- **Hot:** In-memory HNSW for recent vectors.
- **Cold:** Disk-backed HNSW loaded from checkpoint files.
- [x] `persistence/format.rs`: `SnapshotMetadata`, `IdMappingTable` with rkyv serialization.
- [x] `hot_cold.rs`: `merge_search_results()` for combining hot and cold query results.
- [x] Background checkpoint task with atomic write pattern.
- [x] CRC32C integrity verification on load.
- [x] MAX_PAYLOAD_SIZE (1GB) validation to prevent memory exhaustion.
- **Crates:** `memmap2 = "0.9"`, `crc32c = "0.6"`, `byteorder = "1.5"`
- **Known Limitation:** Cold index currently stores ID mappings only; full vector persistence with mmap'd HNSW graphs planned for future phase.
- [x] **5C.2 Persistent Visual Index**: Persist BK-tree to disk.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `PersistentVisualIndex` in `crates/stemedb-storage/src/visual_index.rs`.
- [x] `BkTreeSnapshot` with rkyv serialization for BK-tree state.
- [x] Checkpoint file format: `[MAGIC:4][VERSION:1][RESERVED:3][PAYLOAD_LEN:u64][CRC32C:u32][PAYLOAD]`.
- [x] Atomic write pattern: temp file → fsync → rename → fsync parent.
- [x] Background checkpoint task with configurable interval.
- [x] CRC32C integrity verification on load.
- [x] Shared `checkpoint_format.rs` module for common read/write utilities.
#### 5D. Concept Hierarchy ✅ COMPLETE
> **Spec:** [docs/specs/concept-hierarchy.md](docs/specs/concept-hierarchy.md)
> **Purpose:** Hierarchical, scheme-qualified subject identifiers with cross-scheme alias resolution. Enables applications like Aphoria that need to connect `code://` paths to `rfc://` paths.
- [x] **5D.1 ConceptPath Type**: Structured subject identifiers. ✅
- **Tasks:**
- [x] Add `ConceptPath` struct to `stemedb-core/src/types/concept.rs`.
- [x] Wire format: `{scheme}://{segment_0}/{segment_1}/.../{leaf}`.
- [x] `parse()`, `to_wire_format()`, `leaf()`, `parent()`, `is_prefix_of()`.
- [x] Backward-compatible: bare strings parse as `custom://{string}`.
- [x] Unit tests for parsing, round-trip, prefix matching (Battery 8).
- **Crate:** `stemedb-core`
- [x] **5D.2 Source Scheme Registry**: Map schemes to default source tiers. ✅
- **Tasks:**
- [x] Add `SourceScheme` enum to `stemedb-core`.
- [x] Scheme → default `SourceClass` mapping (e.g., `rfc://` → Tier 0, `code://` → Tier 3).
- [x] `ConceptPath::default_source_class()` method.
- **Crate:** `stemedb-core`
- [x] **5D.3 Alias Store**: Cross-scheme entity resolution. ✅
- **Tasks:**
- [x] Add `ConceptAlias` struct to `stemedb-core`.
- [x] Add `AliasStore` trait to `stemedb-storage`.
- [x] Key prefixes: `CA:{alias_path}` → canonical, `CAR:{canonical}` → all aliases.
- [x] Transitive alias resolution with cycle detection.
- [x] `GenericAliasStore` implementation over `KVStore`.
- **Crates:** `stemedb-core`, `stemedb-storage`
- [x] **5D.4 Hierarchical Query**: Prefix-based subject queries. ✅
- **Tasks:**
- [x] `fetch_by_subject_prefix()` using `scan_prefix` in query engine (already implemented in Battery 5).
- [x] Trailing `/` handling to prevent `auth` matching `authentication`.
- **Crate:** `stemedb-query`
- **Note:** Hierarchical prefix scanning was already working; Battery 5 validates it.
- [x] **5D.5 Alias Resolution in Queries**: Expand queries to aliased paths. ✅
- **Tasks:**
- [x] `AliasStore::resolve_all()` for transitive alias expansion.
- [x] API endpoint `GET /v1/concepts/resolve?path=...&transitive=true`.
- **Crate:** `stemedb-query`, `stemedb-api`
- **Note:** Resolution available via API; QueryEngine integration is future work.
- [x] **5D.6 Source Class Inference**: Infer tier from scheme. ✅
- **Tasks:**
- [x] `ConceptPath::default_source_class()` returns tier based on scheme.
- [x] `SourceScheme::parse()` maps scheme strings to enum variants.
- **Crate:** `stemedb-core`
- **Note:** Inference at ingestion time would break content-addressing (signature verification). Inference is available at query time or before signing.
- [x] **5D.7 Concept API Endpoints**: CRUD for aliases and hierarchy browsing. ✅
- **Tasks:**
- [x] `POST /v1/concepts/alias` — Create alias.
- [x] `GET /v1/concepts/aliases` — List all aliases (with optional canonical filter).
- [x] `DELETE /v1/concepts/alias` — Remove alias.
- [x] `GET /v1/concepts/resolve` — Resolve path to canonical/transitive aliases.
- [x] `GET /v1/concepts/suggest` — Suggested aliases (shared leaf detection).
- [x] `GET /v1/concepts/parse` — Parse path and return ConceptPath info.
- **Crate:** `stemedb-api`
- [x] **5D.8 Battery Tests**: Validate concept hierarchy end-to-end. ✅
- **Tests:**
- [x] Battery 8 (7 tests): ConceptPath parsing, round-trip, prefix matching, source class inference.
- [x] Battery 9 (8 tests): Alias resolution, transitive resolution, cycle detection, bidirectional lookup, delete, suggestions.
- **Crate:** `stemedb-query/tests/battery_pre_sentinel.rs`
### Phase 6: The Mesh (Distributed Writes)
*Goal: Multi-node cluster with CRDT replication and Raft coordination. The endgame.*
> **Research:** [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md)
> **Agent:** `distributed-systems-engineer`
> **Key Insight:** Episteme's append-only model eliminates ~75% of CockroachDB complexity. Assertions are a G-Set CRDT. Votes are G-Counters. No distributed transactions needed.
#### 6A. CRDT Foundation (Single-Node Validation) ✅ COMPLETE
- [x] **6A.1 Integrate CRDT Crate**: Wrap assertion storage in G-Set semantics.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `CrdtAssertionStore` in `crates/stemedb-storage/src/crdt/assertion_store.rs` — G-Set semantics for assertions.
- [x] `CrdtVoteStore` in `crates/stemedb-storage/src/crdt/vote_store.rs` — G-Counter semantics for votes.
- [x] `CrdtMerge` trait in `crates/stemedb-storage/src/crdt/traits.rs` for generic merge operations.
- [x] Property tests: commutativity, associativity, idempotence (proptest-based).
- [x] `AssertionTransfer` type for efficient cross-node data transfer.
- **Tests:** 9 unit tests + 3 property tests (assertion_store), 6 unit tests (vote_store).
- **Note:** Did not use external `crdts` crate — implemented native CRDT semantics over existing storage.
- [x] **6A.2 Hybrid Logical Clocks**: Add causal ordering to supersessions.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `HlcTimestamp` in `crates/stemedb-core/src/types/hlc.rs` — serializable HLC with `uhlc` integration.
- [x] Added `uhlc = "0.8"` dependency to `stemedb-core`.
- [x] `HlcTimestamp::from_uhlc()`, `to_uhlc()`, `now()` for clock management.
- [x] Total ordering via NTP64 time + node_id tiebreaker.
- [x] `detect_clock_skew()` utility for monitoring clock drift between nodes.
- [x] `millis()`, `is_before()`, `is_concurrent_with()` helper methods.
- **Tests:** 10 unit tests covering ordering, equality, concurrency, serialization, clock skew detection.
- **Crate:** `uhlc = "0.8"`
- [x] **6A.3 Merkle Tree Over Assertions**: Efficient diff detection.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] New `stemedb-merkle` crate with BLAKE3-based Merkle tree.
- [x] `MerkleTree` struct: O(log N) insert, O(1) root, O(log N) diff.
- [x] `DiffResult::diff()` for computing missing hashes between trees.
- [x] `roots_equal()` for O(1) identity check.
- [x] Zero-copy serialization via rkyv for network transfer.
- [x] `MerkleTreeManager` in `stemedb-sync` for persistence and coordination.
- **Crate:** `crates/stemedb-merkle/`
#### 6B. Two-Node Replication (Proof of Concept) ✅ COMPLETE
> **Why "Proof of Concept":** All primitives are implemented and unit/integration tested. The PoC validates that CRDT merge, HLC ordering, Merkle diff, gossip broadcast, and anti-entropy sync work correctly in isolation. Full network tests (two running gRPC servers, partition tolerance, concurrent writes) are deferred to 6C where cluster infrastructure provides a natural testing environment.
- [x] **6B.1 RPC Layer**: Node-to-node communication.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] New `stemedb-rpc` crate with tonic gRPC.
- [x] `proto/sync.proto` defines: `GossipRequest/Response`, `RootExchangeRequest/Response`, `FetchRequest/Response`, `PingRequest/Response`, `GetLeavesRequest/Response`.
- [x] `SyncClient` in `src/client.rs` with `RetryConfig` for exponential backoff.
- [x] `SyncServiceHandler` in `src/server.rs` implementing `SyncService` trait.
- [x] `SyncStorage` trait for pluggable storage backends.
- **Crates:** `tonic = "0.12"`, `prost = "0.13"`
- **Crate:** `crates/stemedb-rpc/`
- [x] **6B.2 Gossip Broadcast**: Push new assertions to peers.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `GossipBroadcaster` in `crates/stemedb-sync/src/gossip.rs`.
- [x] Configurable fanout (default: 3 peers).
- [x] Token bucket rate limiting via `with_rate_limit()`.
- [x] Enable/disable support for maintenance windows.
- [x] Metrics: `messages_sent`, `send_failures`, `rate_limited`.
- [x] Best-effort delivery: failures logged but don't block ingestion.
- [x] `GossipBroadcast` trait in `stemedb-ingest` for dependency injection.
- **Tests:** 3 unit tests (noop, no peers, enable/disable).
- [x] **6B.3 Merkle Anti-Entropy Sync**: Background convergence.
- **Status:** ✅ COMPLETE
- **Implementation:**
- [x] `AntiEntropyWorker` in `crates/stemedb-sync/src/anti_entropy.rs`.
- [x] Periodic root exchange via `RootExchangeRequest`.
- [x] `compute_missing_hashes()` compares local and remote leaf sets.
- [x] `FetchRequest` retrieves missing assertion data by hash.
- [x] Merge via `CrdtAssertionStore::merge_with_data()`.
- [x] Merkle tree update after merge.
- [x] Configurable interval via `SyncConfig`.
- [x] Metrics: `sync_cycles`, `sync_failures`, `assertions_synced`.
- [x] Graceful shutdown support.
- **Tests:** 1 unit test (SyncResult variants).
- [x] **6B.4 Integration Test: Two-Node Convergence**:
- **Status:** ✅ COMPLETE (component-level validation)
- **Implementation:**
- [x] `battery11_replication.rs` with 8 tests validating replication primitives:
- `test_identical_trees_same_root` — Merkle root equality.
- `test_different_trees_different_roots` — Merkle root divergence.
- `test_merkle_diff_finds_missing` — Diff algorithm correctness.
- `test_gossip_enable_disable` — Gossip control.
- `test_merkle_checkpoint_restore` — Persistence roundtrip.
- `test_content_addressed_idempotent` — Idempotent storage.
- `test_crdt_merge_with_data` — CRDT merge semantics.
- `test_sync_config_builder` — Configuration validation.
- **Note:** Tests validate primitives in isolation. Live network tests (real gRPC servers, partition healing, concurrent writes) deferred to 6C cluster testing.
- **Crate:** `crates/stemedb-query/tests/battery/battery11_replication.rs`
#### 6C. Multi-Node Cluster ✅
- [x] **6C.1 Cluster Membership (SWIM Gossip)**: Node discovery and failure detection.
- **Tasks:**
- [x] Implement `SwimMembership` with SWIM-like protocol in `stemedb-cluster`.
- [x] `NodeId` (UUID-based), `NodeInfo`, `NodeState`, `MembershipEvent` types.
- [x] Seed-node based discovery (bootstrap nodes in config).
- [x] Failure detection: ping, indirect probe, suspicion with timeouts.
- [x] Membership change events via `tokio::broadcast` channel.
- [x] Gossip queue for piggybacked membership propagation.
- [x] `ClusterConfig` with `SwimConfig` (tunable intervals, timeouts).
- **Crate:** `stemedb-cluster`
- [x] **6C.2 Subject-Prefix Range Sharding**: Distribute data across nodes.
- **Tasks:**
- [x] Implement `RangeRouter`: map subject → shard via BLAKE3 + jump hash.
- [x] `RangeDescriptor`: start key, end key, replicas, size, generation.
- [x] `MetaRange`: collection of descriptors with version and merge logic.
- [x] Automatic range split when size exceeds threshold (configurable, default 64MB).
- [x] Range merge when adjacent ranges shrink below threshold (configurable, default 20MB).
- [x] Meta-range gossip merge for cluster-wide propagation.
- [x] `ShardingConfig` with tunable shard count, replication factor, thresholds.
- **Crate:** `stemedb-cluster`
- [ ] **6C.3 Raft for MV Coordination (Optional)**: DEFERRED.
- **Decision:** Skipped for this delivery. MVs are eventually consistent (converge once assertions sync via anti-entropy). Lenses are deterministic: same inputs produce same output. Can add Raft later if strong MV consistency becomes a requirement.
- [x] **6C.4 Gateway**: Stateless request routing.
- **Tasks:**
- [x] Implement `Gateway` HTTP service (axum) with full routing.
- [x] Route writes by subject hash → shard → leader node.
- [x] Route reads to nearest replica (prefer local).
- [x] Health check endpoint (`/v1/health`).
- [x] Cluster status endpoint (`/v1/cluster/status`).
- [x] Shard info and route test endpoints.
- [x] CORS and tracing middleware.
- **Crate:** `stemedb-cluster`
- [x] **6C.5 Integration Tests**: 82 tests covering membership, sharding, and gateway.
- Membership: 3-node discovery, failure detection, rejoin, gossip propagation.
- Sharding: routing consistency, distribution, split/merge, meta-range gossip.
- Gateway: HTTP endpoint testing via axum `oneshot` for all routes.
#### 6D. Consistency Guarantees
| Property | Guarantee | Mechanism |
|----------|-----------|-----------|
| **Convergence** | Eventually consistent | G-Set merge (CRDT) |
| **Causality** | Supersessions ordered | HLC timestamps |
| **Partition Tolerance** | Writes never blocked | Any node accepts via CRDT |
| **Availability** | Reads/writes always succeed | Every node is master for CRDTs |
| **Durability** | WAL + fsync per node | Existing WAL infra |
| **Conflict Resolution** | Deterministic | Lens algorithms (same inputs → same output) |
### Phase 7: The Shield (Trust at Scale)
*Goal: Defend against spam, Sybil attacks, and knowledge poisoning when open to millions of agents.*
> **Key Insight:** Open agent access requires layered defense. PoW for admission, EigenTrust for reputation, circuit breakers for misbehavior, quarantine for suspicious content.
#### 7A. Admission Control ✅
- [x] **7A.1 Proof-of-Work Admission**: BLAKE3 hashcash for new agents.
- New agents (trust < 0.3) must solve PoW puzzle before first N assertions accepted.
- Graduated difficulty: first 10 assertions = 16 bits (~16 sec), 11-50 = 1 bit (trivial), 50+ or trust > 0.6 = exempt.
- Puzzle: `BLAKE3(nonce || agent_id || timestamp)` must have `difficulty` leading zero bits.
- Implemented: `PowProof` struct, `AdmissionLayer` middleware, HTTP 428 responses.
- [x] **7A.2 Graduated Trust Tiers**: Privilege escalation based on reputation.
- Untrusted (0.0-0.3): Quarantine mode, assertions hidden by default, 0.1x quota.
- Limited (0.3-0.5): Low quota, PoW required, 0.5x quota.
- Verified (0.5-0.7): Standard quota, normal privileges.
- Trusted (0.7-0.9): 2x quota, skip PoW.
- Authority (0.9-1.0): 10x quota, no limits.
- Implemented: `TrustTier` enum, `AdmissionStore` trait, `/v1/admission/status` endpoint.
#### 7B. EigenTrust ✅
- [x] **7B.1 Trust Graph Store**: Store direct trust relationships for propagation.
- Key pattern: `TG:{from}:{to}` → TrustEdge, `TGR:{to}:{from}` → reverse index.
- Methods: `add_trust_edge()`, `get_trusts()`, `get_trusted_by()`, `compute_eigentrust()`.
- Seed trust: `ET:seed:{agent}` for pre-trusted agents (P vector).
- [x] **7B.2 EigenTrust Computation**: Global trust via power iteration (daily batch).
- Formula: `T = (1-α)C^T*T + αP` where C = normalized trust matrix, P = seed trust, α = 0.1.
- Convergence: 10-100 iterations, ε = 1e-4 threshold.
- Sybil resistance: isolated rings get near-zero trust (not connected to seeds).
- Dangling node handling: redistribute to seed vector.
- [x] **7B.3 Domain-Specific Trust**: Per-predicate-namespace reputation.
- `DomainTrust` tracks accuracy by domain (medicine, finance, technology, etc.).
- `extract_domain()` maps predicates to domains.
- `domain_factor = 0.5 + (score × 0.5)` scales weight by expertise.
- `EigenTrustAuthorityLens`: `weight = confidence × eigentrust × domain_factor`.
#### 7C. Content Defense ✅ COMPLETE
- [x] **7C.1 MinHash Deduplication**: Near-duplicate detection with LSH bucketing.
- `SimilarityIndex` trait with `GenericSimilarityIndex<S: KVStore>` implementation.
- MinHash signatures (k=128) for `{subject}:{predicate}` pairs.
- LSH buckets (16 bands × 8 rows) for O(1) average-case lookup.
- Bloom filter pre-check for fast "definitely not duplicate" path.
- Threshold: 0.9 Jaccard similarity = duplicate.
- Implemented: `similarity_index/` module in `stemedb-storage`.
- [x] **7C.2 Content Quality Scoring**: Heuristic-based spam detection.
- `ContentQualityScorer` with configurable thresholds.
- Shannon entropy check (low entropy = likely random noise/repetitive).
- Minimum subject/predicate length (3 chars default).
- Structured data bonus (JSON objects, numbers, URLs).
- Untrusted agent + high confidence (>0.8) = suspicious.
- Implemented: `content_defense/quality.rs` in `stemedb-storage`.
- [x] **7C.3 Quarantine Store**: Suspicious assertions held for review.
- `QuarantineStore` trait with `GenericQuarantineStore<S: KVStore>` implementation.
- Key pattern: `QUAR:{timestamp}:{hash}` → assertion data (time-ordered).
- Secondary index: `QUAR_IDX:{hash}` → timestamp for O(1) hash lookups.
- Quarantined assertions NOT indexed (invisible to queries).
- Triggers: quality < 0.4, duplicate, untrusted high-confidence.
- Admin API: `GET /v1/admin/quarantine`, `POST .../approve`, `POST .../reject`.
- `ContentDefenseLayer` integration in `stemedb-ingest`.
#### 7D. Circuit Breakers ✅
- [x] **7D.1 Per-Agent Circuit Breakers**: Ban misbehaving agents temporarily.
- States: Closed (normal), Open (banned), HalfOpen (testing recovery).
- 5 failures Open for 30 seconds HalfOpen 1 success Closed.
- Failure types: InvalidSignature, InputValidation, PowError, QuotaExceeded, ApplicationError.
- `CircuitBreakerStore` trait + `GenericCircuitBreakerStore<S: KVStore>`.
- `CircuitBreakerLayer` middleware (outermost in stack, runs first).
- Admin API: `GET /v1/admin/circuit-breaker/{agent_id}`, `POST .../reset`, `GET .../tripped`.
- 25 unit tests covering full state machine lifecycle.
### Phase 8: The Swarm (Production Cluster)
*Goal: Production-ready distributed deployment with chaos testing and observability.*
#### 8A. Chaos Testing ✅
- [x] **8A.1 Partition Testing**: Verify convergence after network partitions.
- 5-node cluster, kill 2 nodes, verify remaining nodes converge.
- Network partition between groups, verify both sides accept writes, verify convergence after healing.
- Message reordering and duplication tests.
- Cascading failure recovery.
- SWIM suspicion handling (false positive prevention).
- Asymmetric partition testing.
- Write availability during partition.
- Crate: `stemedb-chaos` with `TestCluster`, `ChaosNode`, `NetworkController`.
- [x] **8A.2 Jepsen-Style Consistency Testing**: Formal verification.
- CRDT eventual consistency (1000 concurrent writes across 5 nodes).
- CRDT commutativity, associativity, idempotence verification.
- Clock skew injection (+5s/-5s), verify HLC handles drift.
- HLC monotonicity under partition.
- Supersession ordering with clock skew.
- Concurrent writes to same subject under partition (both survive).
- Large Merkle diff convergence (1500 vs 500 assertions).
- Crate: `stemedb-chaos::crdt_properties` with property verification functions.
#### 8B. Observability
- [ ] **8B.1 Distributed Metrics**: Per-node, per-range, per-agent metrics.
- `sync_lag_seconds{peer}`, `merkle_diff_size{peer}`, `convergence_latency_p99`.
- `raft_leader_changes{range}`, `raft_commit_latency{range}`.
- `assertions_total{node}`, `writes_per_second{node}`.
- Crate: `metrics` + `metrics-exporter-prometheus`.
- [ ] **8B.2 Admin Dashboard**: Cluster health visibility.
- `GET /v1/admin/cluster` node list, range assignments, leader locations.
- `GET /v1/admin/ranges` range sizes, split/merge history.
- `POST /v1/admin/sync` force anti-entropy sync.
#### 8C. Production Hardening
- [ ] **8C.1 Snapshot/Restore**: Fast replica bootstrap.
- Serialize full node state as snapshot.
- New nodes join by restoring snapshot + replaying recent WAL.
- Faster than full Merkle sync for new nodes.
- [ ] **8C.2 Backpressure**: Don't overwhelm slow nodes.
- Track per-peer sync queue depth.
- Throttle gossip to slow peers.
- Alert when peer is consistently behind.
- [ ] **8C.3 Geo-Distribution**: Multi-region deployment.
- Regional clusters with CRDT federation between regions.
- Lazy cross-region sync (background, low priority).
- Locality-aware reads (query nearest replica).
- Regional compliance (GDPR data residency).
### Phase 9: The Bunker (Disaster Planning)
*Goal: Survive the worst. Backup, restore, recover from corruption, comply with regulations, and plan for unbounded growth.*
> **Key Insight:** Append-only CRDTs are a double-edged sword. They provide partition tolerance and conflict-free merge, but once bad data is merged, it's everywhere forever. Phase 9 addresses the failure modes that Phases 6-8 introduce.
#### 9A. Backup & Cold Storage
- [ ] **9A.1 Full Cluster Backup**: Point-in-time snapshot to cold storage.
- **Problem:** 8C.1 snapshots are for node bootstrap, not disaster recovery. Need immutable backups to S3/GCS.
- **Tasks:**
- [ ] `BackupCoordinator`: elect leader, pause writes, snapshot all nodes, upload to object storage.
- [ ] Incremental backups: WAL segments since last full backup.
- [ ] Backup manifest: cluster topology, Merkle roots, HLC high-water mark.
- [ ] Retention policy: 7 daily, 4 weekly, 12 monthly.
- [ ] `POST /v1/admin/backup/trigger`, `GET /v1/admin/backup/status`.
- [ ] **9A.2 Point-in-Time Recovery (PITR)**: Restore to any timestamp.
- **Problem:** "Restore yesterday's backup" isn't enough. Need "restore to 3:47pm yesterday."
- **Tasks:**
- [ ] WAL archiving to object storage (continuous).
- [ ] Restore = snapshot + replay WAL until target HLC timestamp.
- [ ] `POST /v1/admin/restore?target_hlc=<timestamp>`.
- [ ] Validation: Merkle root matches expected state after restore.
- [ ] **9A.3 Backup Verification**: Prove backups actually work.
- **Problem:** Backups that can't restore are useless. Verify automatically.
- **Tasks:**
- [ ] Weekly "fire drill": restore backup to ephemeral cluster, run integrity checks.
- [ ] Merkle root comparison: restored cluster root == source cluster root at backup time.
- [ ] Alert on verification failure.
- [ ] `GET /v1/admin/backup/verification-history`.
#### 9B. Data Corruption & Rollback
- [ ] **9B.1 Corruption Detection**: Catch bad data before it spreads.
- **Problem:** Malformed assertions, invalid signatures, or logical corruption can poison the cluster via CRDT merge.
- **Tasks:**
- [ ] `IngestionValidator`: deep validation before accepting gossip (beyond signature check).
- [ ] Schema validation: required fields, type constraints, value ranges.
- [ ] Semantic validation: subject/predicate format, confidence bounds, timestamp sanity.
- [x] `QuarantineStore`: Implemented in Phase 7C. Extend with new `QuarantineReason` variants.
- [ ] Metrics: `assertions_quarantined`, `assertions_rejected`.
- [ ] **9B.2 Assertion Tombstones**: "Delete" in an append-only world.
- **Problem:** Can't actually delete from a G-Set. Need a way to mark assertions as invalid.
- **Tasks:**
- [ ] `TombstoneAssertion`: special assertion type that marks another assertion as dead.
- [ ] Tombstones propagate via CRDT like regular assertions.
- [ ] Lenses skip tombstoned assertions during resolution.
- [ ] `POST /v1/admin/tombstone/{assertion_hash}` (admin only).
- [ ] Tombstone reasons: `Corrupted`, `Malicious`, `Legal`, `Retracted`.
- [ ] **9B.3 Cluster Rollback**: "Undo" a time range across all nodes.
- **Problem:** If bad data got merged cluster-wide, need to roll back the entire cluster.
- **Tasks:**
- [ ] `RollbackCoordinator`: elect leader, compute affected assertions, generate tombstones.
- [ ] Input: time range (HLC from/to) or list of assertion hashes.
- [ ] Output: batch of `TombstoneAssertion` propagated cluster-wide.
- [ ] Audit log: who triggered rollback, why, what was affected.
- [ ] `POST /v1/admin/rollback?from_hlc=X&to_hlc=Y&reason=...`.
- [ ] **9B.4 Fork Recovery**: Heal split-brain after extended partition.
- **Problem:** Two clusters evolve independently during partition. After healing, they have divergent state that technically "merges" but may have semantic conflicts.
- **Tasks:**
- [ ] `ForkDetector`: identify assertions created during partition on each side.
- [ ] `ConflictReport`: list all subject/predicate pairs with divergent winners.
- [ ] Manual resolution: admin reviews conflicts, chooses winners, tombstones losers.
- [ ] `GET /v1/admin/fork-analysis`, `POST /v1/admin/fork-resolve`.
#### 9C. Compliance & Legal
- [ ] **9C.1 GDPR Right to Erasure**: Handle deletion requests in append-only system.
- **Problem:** GDPR requires "right to be forgotten." Append-only means data exists forever. Legal conflict.
- **Strategy:** Cryptographic erasure encrypt agent data with per-agent key, delete key to "erase."
- **Tasks:**
- [ ] Agent data encrypted with per-agent key (AES-256-GCM).
- [ ] Key stored in `AgentKeyStore` (separate from assertion data).
- [ ] "Erasure" = delete agent's key their data becomes unreadable garbage.
- [ ] Tombstones for their assertions (semantically dead).
- [ ] `DELETE /v1/agents/{agent_id}` triggers erasure workflow.
- [ ] Audit log: erasure requests, completion timestamp, affected assertion count.
- [ ] **9C.2 Data Retention Policies**: Don't keep data forever.
- **Problem:** Append-only doesn't mean keep-forever. Old data has storage cost and legal liability.
- **Tasks:**
- [ ] `RetentionPolicy`: per-subject or per-predicate retention rules.
- [ ] Default: 7 years (financial), configurable per use case.
- [ ] `RetentionWorker`: background job generates tombstones for expired assertions.
- [ ] "Archive tier": cold storage for expired-but-not-deleted assertions.
- [ ] `GET/PUT /v1/admin/retention-policies`.
- [ ] **9C.3 Audit Trail for Compliance**: Prove what happened when.
- **Problem:** Regulators ask "who changed what when." Need immutable audit log.
- **Tasks:**
- [ ] `AuditStore`: immutable log of admin actions (separate from assertions).
- [ ] Events: backup, restore, rollback, tombstone, erasure, policy change.
- [ ] Tamper-evident: Merkle chain over audit entries.
- [ ] `GET /v1/admin/audit?from=X&to=Y`.
- [ ] Export to external SIEM (Splunk, DataDog, etc.).
#### 9D. Storage Management
- [ ] **9D.1 Compaction**: Reclaim space from tombstoned data.
- **Problem:** Tombstones don't free storage. Need compaction to actually reclaim space.
- **Tasks:**
- [ ] `CompactionWorker`: background job removes tombstoned assertions from storage.
- [ ] Compaction delay: wait N days after tombstone before physical deletion.
- [ ] Update Merkle tree after compaction (tree shrinks).
- [ ] Compaction manifest: what was removed, when.
- [ ] Metrics: `storage_reclaimed_bytes`, `assertions_compacted`.
- [ ] **9D.2 Tiered Storage**: Hot/warm/cold based on access patterns.
- **Problem:** Most queries hit recent data. Old assertions waste fast storage.
- **Tasks:**
- [ ] Hot tier: NVMe (< 30 days old, frequently accessed).
- [ ] Warm tier: SSD (30-365 days, occasionally accessed).
- [ ] Cold tier: Object storage (> 365 days, rarely accessed).
- [ ] Transparent access: queries fetch from appropriate tier.
- [ ] Migration worker: move data between tiers based on age/access.
- [ ] Metrics: `tier_hot_bytes`, `tier_warm_bytes`, `tier_cold_bytes`.
- [ ] **9D.3 Storage Quotas**: Prevent runaway growth.
- **Problem:** Open agent access + append-only = potential unbounded growth.
- **Tasks:**
- [ ] Per-agent storage quota (in bytes or assertion count).
- [ ] Per-subject storage quota (prevent subject stuffing).
- [ ] Cluster-wide storage limit with alerting.
- [ ] Rejection when quota exceeded: HTTP 429 with `Retry-After`.
- [ ] `GET /v1/admin/storage/usage`, `PUT /v1/admin/storage/quotas`.
#### 9E. Incident Response
- [ ] **9E.1 Alerting & Escalation**: Know when things break.
- **Tasks:**
- [ ] Alert definitions: sync lag > 5min, Merkle divergence, node unreachable, storage > 80%.
- [ ] Escalation tiers: P1 (page immediately), P2 (Slack + 15min), P3 (email).
- [ ] Integration: PagerDuty, OpsGenie, Slack, email.
- [ ] Runbook links in alerts (what to do when this fires).
- [ ] **9E.2 Operational Runbooks**: Documented procedures for common failures.
- **Runbooks to write:**
- [ ] Node won't start (WAL corruption, disk full, config error).
- [ ] Node behind on sync (network, slow disk, backpressure).
- [ ] Cluster split-brain (partition detection, resolution).
- [ ] Restore from backup (step-by-step with validation).
- [ ] Emergency rollback (bad data merged, need to undo).
- [ ] Capacity expansion (add nodes, rebalance ranges).
- [ ] Security incident (compromised node, leaked keys).
- [ ] **9E.3 Chaos Engineering**: Break things on purpose.
- **Problem:** Can't trust disaster recovery you've never tested.
- **Tasks:**
- [ ] Scheduled chaos: monthly "game days" with controlled failures.
- [ ] Scenarios: node death, network partition, disk corruption, clock skew.
- [ ] Automated chaos: `chaos-monkey` style random failures in staging.
- [ ] Post-mortem template and review process.
#### 9F. Security Hardening
- [ ] **9F.1 TLS Everywhere**: Encrypt all node-to-node traffic.
- **Tasks:**
- [ ] mTLS for gRPC (SyncService, gossip, anti-entropy).
- [ ] Certificate rotation without downtime.
- [ ] CA management: internal CA or external (Vault, ACME).
- [ ] Reject unencrypted connections.
- [ ] **9F.2 Encryption at Rest**: Protect stored data.
- **Tasks:**
- [ ] WAL encryption (AES-256-GCM).
- [ ] KV store encryption (fjall supports this).
- [ ] Key management: external KMS (AWS KMS, Vault) or local.
- [ ] Key rotation without full re-encryption.
- [ ] **9F.3 Node Authentication**: Verify cluster membership.
- **Tasks:**
- [ ] Node identity via Ed25519 keypair.
- [ ] Cluster join requires signed invitation from existing member.
- [ ] Revocation: remove compromised node's key, propagate via gossip.
- [ ] Audit: log all join/leave/revoke events.
---
## Tracking
### Active Tasks
* [x] **Phase 3 The Pilot**: Consumer Health vertical integration. ✅ COMPLETE
* [x] **Phase 4 The Hive**: Trust & Scale + Extension Primitives. ✅ COMPLETE
* [x] **Phase 5 The Forge**: Foundation hardening — replace sled, fix WAL, persist indices. ✅ COMPLETE
* [x] **5A**: Replace sled with redb/fjall (HybridStore), key layout redesign. ✅ COMPLETE
* [x] **5B**: WAL hardening — CRC32C, crash recovery, group commit, log rotation. ✅ COMPLETE
* [x] **5C**: Index persistence — vector hot/cold, visual checkpoint. ✅ COMPLETE
* [x] **5D**: Concept hierarchy — ConceptPath, AliasStore, scheme-based inference. ✅ COMPLETE
### Phase 6 Progress
* [x] **6A**: CRDT Foundation — G-Set/G-Counter stores, HLC timestamps, Merkle tree. ✅ COMPLETE
* [x] **6B**: Two-Node Replication (PoC) — RPC layer, gossip, anti-entropy. ✅ COMPLETE
* [x] **6C**: Multi-Node Cluster — SWIM membership, range sharding, gateway. ✅ COMPLETE
### Phase 7 Progress
* [x] **7A**: Admission Control — TrustTier, PowProof, AdmissionLayer, /v1/admission/status. ✅ COMPLETE
* [x] **7B**: EigenTrust — TrustGraphStore, DomainTrustStore, EigenTrustAuthorityLens. ✅ COMPLETE
* [x] **7C**: Content Defense — SimilarityIndex, ContentQualityScorer, QuarantineStore, Admin API. ✅ COMPLETE
* [x] **7D**: Circuit Breakers — CircuitBreakerStore, CircuitBreakerLayer middleware, Admin API. ✅ COMPLETE
### Next Up
* **Phase 8**: Chaos testing, observability, geo-distribution (The Swarm).
### App Layer (External)
* **Browser Extension Phase 1** (Read-Only Overlay) -> All DB dependencies complete. Extension is app layer.
* **Browser Extension Phase 2** (Active Layer / Vote-to-See) -> ✅ All blockers resolved. 7A PoW + 7B EigenTrust complete.
* **The Simulator** (Training Data Pipeline) -> App layer, consumes Episteme API.
* **The Super Curator** (Reviewer Agent swarm) -> App layer.
* **Background Gardener** (Cluster detection, signal processing) -> App layer.
* **Agent Wallet** (Key management sidecar) -> App layer.
### Recently Completed
* [x] **Phase 7D Circuit Breakers** (The Shield): Per-agent misbehavior isolation.
* State machine: Closed → Open (5 failures) → HalfOpen (30 sec timeout) → Closed (1 success).
* `CircuitBreakerStore` trait with `GenericCircuitBreakerStore<S: KVStore>` implementation.
* Failure types: InvalidSignature, InputValidation, PowError, QuotaExceeded, ApplicationError.
* `CircuitBreakerLayer` Tower middleware (outermost layer, runs first).
* Admin API: `GET /v1/admin/circuit-breaker/{agent_id}`, `POST .../reset`, `GET .../tripped`.
* 503 Service Unavailable with `X-Circuit-Breaker-State`, `Retry-After` headers.
* 25 unit tests covering full state machine lifecycle and edge cases.
* [x] **Phase 7C Content Defense** (The Shield): Spam and duplicate detection with quarantine workflow.
* `SimilarityIndex` trait with MinHash (k=128) + LSH (16 bands × 8 rows) for near-duplicate detection.
* Bloom filter pre-check for O(1) "definitely not duplicate" fast path.
* `ContentQualityScorer` with Shannon entropy, length checks, structured data detection.
* `QuarantineStore` with time-ordered keys + O(1) hash index for admin lookups.
* `ContentDefenseLayer` in `stemedb-ingest` orchestrating all checks.
* Admin API: `GET /v1/admin/quarantine`, `POST .../approve`, `POST .../reject`.
* Triggers: quality < 0.4, 0.9+ Jaccard similarity, untrusted + confidence > 0.8.
* [x] **Phase 6C Multi-Node Cluster** (The Mesh): Distributed cluster infrastructure.
* `SwimMembership` with SWIM gossip protocol for node discovery and failure detection.
* `RangeRouter` with BLAKE3 + jump hash for subject-prefix range sharding.
* `Gateway` HTTP service with routing, health checks, and read-your-writes.
* 82 integration tests covering membership, sharding, availability, partition tolerance.
* [x] **Phase 7B EigenTrust** (The Shield): Sybil-resistant global trust propagation.
* `TrustGraphStore` trait with edge CRUD, seed trust management, EigenTrust computation.
* Power iteration: `T = (1-α)C^T*T + αP` with dangling node handling.
* `DomainTrustStore` for per-domain expertise tracking (medicine, finance, etc.).
* `EigenTrustAuthorityLens`: `weight = confidence × eigentrust × domain_factor`.
* Sybil resistance: isolated rings get near-zero trust (not connected to seeds).
* [x] **Phase 7A Admission Control** (The Shield): PoW-based spam protection for new agents.
* `TrustTier` enum with 5 tiers, quota multipliers, PoW requirements.
* `PowProof` struct with BLAKE3 verification, graduated difficulty (16→1→0 bits).
* `AdmissionStore` trait + `AdmissionLayer` middleware + `/v1/admission/status` endpoint.
* Fail-open design for availability, milestone tracking for client UX.
* [x] **Phase 5D Concept Hierarchy**: Hierarchical subjects with cross-scheme alias resolution.
* `ConceptPath` struct with scheme://segments/leaf format, backward-compatible parsing.
* `SourceScheme` enum mapping schemes to source tiers (rfc→Regulatory, code→Expert, etc.).
* `AliasStore` trait with transitive resolution and cycle detection.
* API: `POST/DELETE /v1/concepts/alias`, `GET /v1/concepts/resolve|aliases|suggest|parse`.
* Battery 8 (7 tests) + Battery 9 (8 tests).
* [x] **Phase 5C Index Persistence**: Vector hot/cold tiering, visual checkpoint.
* [x] **Phase 5B WAL Hardening**: CRC32C checksums, crash recovery, group commit, log rotation.
* [x] **Gold Standard Verification** (4.7): Sybil defense via proof of knowledge.
* `GoldStandard` struct with rkyv serialization, `GoldStandardStore` trait + implementation.
* `TrustAdjustment` enum: Rewarded(+0.05), Penalized(-0.1), AlreadyVerified.
* Anti-gaming: agents can only verify each gold standard once (dedup markers).
* API: `POST/GET/DELETE /v1/admin/gold-standards`, `POST /v1/admin/verify-agent`.
* 21 trust rank tests including 4 new verification tests.
* [x] **Escalation Triggers** (4.6): Active safety system for high-conflict assertions.
* `EscalationPolicy` with configurable conflict_score thresholds and predicate matching.
* `EscalationEvent` (content-addressed BLAKE3 ID) written by Materializer when policies fire.
* `EscalationStore` at `ESC:{timestamp_nanos}:{id_hex}` with resolve workflow.
* `GET /v1/admin/escalations`, `POST /v1/admin/escalations/{id}/resolve`.
* [x] **Conflict Score Filtering** (4.5): Query-time filtering by conflict score.
* `min_conflict_score` and `max_conflict_score` on Query + QueryParams.
* Fast-path MV filtering with 0.0-1.0 validation.
* 13 engine tests covering thresholds, boundaries, ranges, combinations.
* [x] **Vote Provenance Witness** (4.4): Transform votes into cryptographic witnesses.
* `source_url: Option<String>` and `observed_context: Option<Vec<u8>>` on Vote.
* Input validation: URL max 2048 chars, context max 64KB.
* Backward compatible with existing votes.
* [x] **"Since" Parameter** (4.1): Change tracking for returning consumers.
* `since: Option<u64>` on Query for incremental sync.
* MV Changelog at `MVC:{subject}:{predicate}:{timestamp_nanos}` with nanosecond precision.
* `ChangeEntry` struct with `previous_winner_hash`, `new_winner_hash`, `lens_name`.
* `fetch_changes_since()` in QueryEngine with timestamp filtering.
* `changes_since: Option<Vec<ChangeEntryDto>>` on QueryResponse.
* 6 tests covering changelog creation, fetching, and fast path bypass.
* [x] **Batch TrustRank Decay API** (4.3): Admin endpoint for scheduled trust decay.
* `POST /v1/admin/decay-trust-ranks` with optional `now` and `half_life_seconds`.
* Returns `decayed_count`, `timestamp_used`, `half_life_used`, `status`.
* 3 integration tests.
### Recently Completed
* [x] **Source Metadata Indexing** (4.2): Index key source metadata fields for filtered queries.
* New module: `crates/stemedb-storage/src/source_metadata_index.rs`.
* Indexed fields: `journal`, `doi`, `platform`, `study_design`.
* Key pattern: `SMV:{field}:{value}` -> `Vec<Hash>` (case-insensitive).
* Query filters: `?source_journal=NEJM&source_platform=PubMed`.
* AND semantics for multiple metadata filters.
* 18 unit tests covering indexing and query filtering.
* [x] **Source Document Storage** (3F.1): Provenance lookup for 100% citation recall.
* `POST /v1/source` stores source documents by BLAKE3 content hash.
* `GET /v1/provenance/{hash}` retrieves source documents.
* Content-addressed storage at `SRC:{hash}` keys.
* Base64 encoding, 10MB limit, idempotent uploads.
* 5 unit tests covering happy path and error cases.
* [x] **Epoch Cascade Logic** (3D.1): O(1) supersession lookup via pre-computed markers.
* `write_supersession_cascade()` writes `SUPERSEDED:` markers for full transitive closure at ingest time.
* `is_epoch_superseded()` uses O(1) marker lookup instead of chain walking.
* Cycle detection and max depth guard (100 levels).
* 5 tests covering markers, transitive closure, cycles, and marker-based filtering.
* [x] **Semantic Decay** (3B.2): Confidence half-life at query time.
* `decay_halflife: Option<u64>` and `source_class_decay: bool` on Query.
* New `decay.rs` module with `apply_decay()` and `apply_source_class_decay()`.
* Formula: `effective_confidence = confidence * 2^(-(age / halflife))`.
* Tier-specific decay: Regulatory=none, Clinical=2yr, Anecdotal=30d.
* 11 unit tests + 1 E2E integration test.
* [x] **Layered Consensus Lens** (3C.2): Per-source-class consensus with tier-by-tier visibility.
* `LayeredConsensusLens` with `LayeredLens` trait.
* `TierResolution` and `LayeredResolution` types.
* `GET /v1/layered?subject=X&predicate=Y` endpoint.
* Cross-tier conflict score using Shannon entropy.
* 10 comprehensive tests.
* [x] **Time-Travel Engine** (3B.1): `as_of` parameter for historical queries.
* `as_of: Option<u64>` field on `Query` for querying historical state.
* Bypasses fast path (MVs reflect current state).
* `Query::matches()` filters by `assertion.timestamp <= as_of`.
* 6 tests covering edge cases.
* [x] **Rich Source Metadata** (3A.3): Structured provenance beyond `source_hash`.
* `source_metadata: Option<Vec<u8>>` field on `Assertion`.
* `Vec<u8>` for rkyv zero-copy compatibility, callers handle JSON encoding.
* Builder methods: `.source_metadata_json()` and `.source_metadata()`.
* API exposes as `Option<String>` with defensive UTF-8 handling.
* 2 serialization tests.
* [x] **Conflict Score on Resolution** (3A.2): Numeric disagreement metric across all lenses.
* `conflict_score: f32` field on `Resolution` (0.0 = unanimous, 1.0 = max conflict).
* `compute_conflict_score()` utility using normalized variance.
* Updated all 5 lens implementations to compute and propagate conflict score.
* `MaterializedView` now stores conflict score.
* API `QueryResponse` exposes `conflict_score` and `resolution_confidence` when lens is applied.
* 7 unit tests including NaN handling.
* [x] **SkepticLens + SkepticView** (3C.1): "Trust but Verify" conflict analysis that surfaces all claims with conflict scores.
* `AnalysisLens` trait for lenses that map conflict instead of resolving it.
* `SkepticLens` using normalized Shannon entropy for conflict scoring.
* `SkepticResolver` + `SkepticView` in stemedb-query.
* `GET /v1/skeptic?subject=X&predicate=Y` API endpoint.
* Types: `ResolutionStatus`, `ConflictAnalysis`, `ClaimSummary`, `SourceSummary`, `AgentSummary`.
* [x] **Source-Class Field** (3A.1): 6-tier `SourceClass` enum with authority weighting and decay rates.
* `SourceClass` enum: Regulatory, Clinical, Observational, Expert, Community, Anecdotal.
* `tier()`, `default_decay_days()`, `authority_weight()` methods.
* Field on Assertion struct with full serialization support.
* [x] **The Meter**: Token bucket quota system with MeterLayer middleware (10K tokens/agent/hour).
* [x] **Query Audit Trail**: Every query logged with provenance at `AUD:{query_id}`. `X-Agent-Id` header for attribution.
* [x] **Event-Driven Materialization**: `run_notified()` + IngestWorker Notify integration.
* [x] **Fast-Path MV Lookup**: `QueryEngine::try_fast_path()` for O(1) reads.
* [x] **Materializer**: Background worker for O(1) MV reads via `AsyncLens`.
* [x] **VoteAwareConsensusLens**: Real vote-based consensus resolution.
* [x] **Compound SP Index**: O(1) subject+predicate lookups.
* [x] **TrustRank System**: Agent reputation with decay and learning loop.
* [x] **API Surface**: axum HTTP server with 7 endpoints + OpenAPI docs.
### Research Documents
* [docs/research/wal-crash-recovery-research.md](docs/research/wal-crash-recovery-research.md) — WAL patterns from CockroachDB, TiKV, FoundationDB, SQLite.
* [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) — Spanner/CockroachDB-style distributed writes adapted for append-only model.
### Key Architectural Decisions
* **sled → redb/fjall**: sled is abandoned. HybridStore routes by key prefix: redb for reads, fjall for writes. ✅ COMPLETE
* **Raft log = WAL**: TiKV eliminated duplicate WAL in v5.4. We should too.
* **CRDT for data, Raft for coordination**: Assertions are a G-Set CRDT (merge = set union). Only cluster metadata needs Raft.
* **Subject-prefix ranges**: Co-locate all data for a subject on one shard. Split hot subjects via range split.
* **HLC over TrueTime**: Hybrid Logical Clocks work on commodity hardware. No GPS/atomic clocks needed.
* **AP model**: Writes never blocked during partitions. Eventual consistency via CRDT convergence.
### Blockers
* **Phase 5**: ✅ COMPLETE — All foundation hardening done.
* **Phase 6**: ✅ COMPLETE — CRDT foundation, two-node replication, multi-node cluster.
* **Phase 7**: ✅ COMPLETE — The Shield (Admission control, EigenTrust, Content Defense, Circuit Breakers).
* **Phase 8**: Unblocked. Can proceed with chaos testing, observability, geo-distribution.
* **Phase 9**: Partially blocked. 9A-9B need Phase 8 (can't backup what doesn't exist). 9C-9F can start earlier (compliance planning, security design).
---
## Dependency Graph
```
Phase 2.5 (Hardening) Phase 3 (The Pilot) Phase 4 (The Hive)
======================== ======================== ==================
[2.1 MV Staleness] ---------> [3B.1 Time-Travel] ✅ --+
| |
[2.2 Confidence Rename] -----> (API clarity for all) +----------> [4.1 "Since" Param] ✅
|
[2.3 EpochAwareLens] --------> [3D.1 Epoch Cascade] ✅ |----------> Invalidation Cascades pillar
|
[2.4 Visual Hash Query] -----> [3E.2 Visual Hash Index] ✅
|
[2.6 E2E Integration] -------> (pipeline confidence) |
|
[3A.1 Source-Class] ✅ --+----------> [3C.2 Layered Consensus] ✅
| [3B.2 Semantic Decay] ✅
+-----------------------------[4.2 Metadata Indexing] ✅
|
[3A.2 Conflict Score] ✅ --> (enhance Resolution)
|
[3A.3 Source Metadata] ✅ --> [4.2 Metadata Indexing] ✅
|
[3C.1 Skeptic Lens] ✅ (standalone, COMPLETE)
[3C.3 Constraints Lens] ✅ (standalone, COMPLETE)
[3E.1 Vector Search] ✅ (standalone, COMPLETE)
[3E.2 Visual Hash Index] ✅ (standalone, COMPLETE)
[3F.1 Provenance] ✅ (standalone, COMPLETE)
```
### Critical Path for Consumer Health Demo
```
[3A.1 Source-Class] ✅ --> [3A.2 Conflict Score] ✅ --> [3C.2 Layered Consensus] ✅
|
+----> CONSUMER HEALTH MVP ✅
|
[3B.1 Time-Travel] ✅ ---------------------------------------+
|
[3A.3 Source Metadata] ✅ -----------------------------------+
|
[3C.1 Skeptic Lens] ✅ --------------------------------------+
```
### Critical Path for Financial DD Demo
```
[3A.2 Conflict Score] ✅ --> [3C.1 Skeptic Lens] ✅ ---+
|
[3B.1 Time-Travel] ✅ ----------------------------------+----> FINANCIAL DD MVP
|
[2.3 EpochAwareLens] --> [3D.1 Epoch Cascade] ✅ -------+
|
[3B.2 Semantic Decay] ✅ -------------------------------+
```
### Critical Path for Agile Agent Team Demo
```
[3C.3 Constraints Lens] (standalone) ------+
|
[3B.1 Time-Travel] ✅ --------------------+----> AGENT TEAM MVP
|
[2.3 EpochAwareLens] ✅ ------------------+
|
[Query Audit (Phase 2)] ✅ ----------------+
```
### Critical Path for Browser Extension
```
Phase 3 (Data Foundation) Phase 4 (Extension Primitives) Extension Product
========================= ============================== =================
[3A.1 Source-Class] ✅ ──────────> [4.5 Conflict Filtering] ✅ ──┐
|
[3A.2 Conflict Score] ✅ ─────────────────────────────────────────┤
|
[3C.1 Skeptic Lens] ✅ ──────────────────────────────────────────-┤
├──> PHASE 1: Read-Only Overlay
[3B.2 Semantic Decay] ✅ ────────────────────────────────────────-┤ (Benign Layer)
|
[3C.2 Layered Consensus] ✅ ─────────────────────────────────────-┘
[Phase 2 Vote System] ✅ ────────> [4.4 Vote Provenance] ✅ ─────┐
|
[TrustRank Engine] ✅ ───────────> [4.7 Gold Standards] ✅ ───────┤
├──> PHASE 2: Active Layer
[3A.2 Conflict Score] ✅ ────────> [4.6 Escalation Triggers] ✅ ──┤ (Vote to See)
|
[7A PoW Admission] ✅ ───────────┤
[7B EigenTrust] ✅ ──────────────┘
```
**Phase 1 (Read-Only)** requires: Source tiers, conflict scores, conflict filtering, skeptic lens, decay, layered consensus. **All complete.**
**Phase 2 (Active)** requires: Vote provenance, gold standards, escalation triggers, PLUS Phase 7 Sybil defense. **✅ All complete. Ready to build.**
### Critical Path to Distributed Cluster
```
Phase 5 (The Forge) ✅ Phase 6 (The Mesh) ✅ Phase 7+8
======================= ======================= ==================
[5A.1 Replace sled ✅] ───────────> [6A.1 CRDT Foundation ✅] ──┐
| |
[5A.2 Key Layout ✅] ────────────> [6C.2 Range Sharding ✅] ───> |
|
[5B.1 CRC32C Checksums ✅] ──┐ |
[5B.2 Crash Recovery ✅] ────┼───> [6B.1 RPC Layer ✅] ─────────┤
[5B.3 Group Commit ✅] ──────┘ | |
v |
[5C.1 Persistent Vector ✅] ─── (independent, no blocker) |
[5C.2 Persistent Visual ✅] ─── (independent, no blocker) |
|
[6A.2 HLC Timestamps ✅] ────┤
[6A.3 Merkle Tree ✅] ───────┤
| |
v v
[6B.2 Gossip ✅] ──> [6B.3 Anti-Entropy ✅] ──> [6B.4 PoC Tests ✅]
|
v
[6C.1 SWIM Membership ✅] ───> [6C.3 Raft MV Coord] (DEFERRED)
[6C.4 Gateway ✅] ───────────> │
v
DISTRIBUTED CLUSTER ✅
|
[7A PoW Admission ✅] ┐
[7B EigenTrust ✅] ──┤──> THE SHIELD ✅
[7C Content Defense ✅]┤
[7D Circuit Breakers ✅]┘
|
[8A Chaos Testing] ──┐
[8B Observability] ──┤──> THE SWARM
[8C Geo-Distribution]┘
|
[9A Backup/PITR] ─────┐
[9B Corruption/Rollback]┤
[9C GDPR/Retention] ──┤──> THE BUNKER
[9D Storage Mgmt] ────┤
[9E Incident Response]┤
[9F Security Hardening]┘
```
### New Crates (Phases 5-9)
```
stemedb-merkle (Phase 6A) ── BLAKE3 Merkle tree for diff detection ✅ IMPLEMENTED
stemedb-rpc (Phase 6B) ── gRPC services for node-to-node communication ✅ IMPLEMENTED
stemedb-sync (Phase 6B) ── Merkle sync, gossip broadcast, anti-entropy ✅ IMPLEMENTED
stemedb-cluster (Phase 6C) ── Cluster membership, range routing, gateway ✅ IMPLEMENTED
stemedb-backup (Phase 9A) ── Backup coordination, PITR, verification (PLANNED)
stemedb-admin (Phase 9B) ── Tombstones, rollback, fork recovery, compliance (PLANNED)
```