Content Defense (Phase 7): - Add SimilarityIndex with MinHash/LSH for near-duplicate detection - Add QuarantineStore for flagged assertions awaiting admin review - Add CircuitBreakerStore for per-agent circuit breaker state - Add ContentDefenseLayer for ingestion pipeline integration - Add API endpoints for quarantine and circuit breaker management - Add research module with gap detection and documentation fetching Code Structure Improvements: - Extract research CLI commands to research_commands.rs - Extract API routers to routers.rs module - Extract key_codec extraction functions to separate module - Extract test modules to separate files across multiple crates - All files now under 500 line limit per pre-commit hook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.6 KiB
6.6 KiB
Episteme (StemeDB)
A probabilistic knowledge graph database that stores Claims, not Facts. Append-only Merkle DAG with read-time resolution via Lenses.
Core Concept: "Git for Truth" - conflicting assertions coexist, resolved at query time through Consensus, Recency, Authority, or custom Lenses.
Find Your Guide
| If you need to... | Read this |
|---|---|
| Get started fast | quickstart.md |
| Understand what Episteme is | what-is-episteme.md |
| Understand the technical vision | vision.md |
| See use cases | use-cases/README.md |
| Understand architecture | architecture.md |
| Learn data structures | docs/data-structures.md |
| See the roadmap | roadmap.md |
| Build apps on Episteme | docs/app-concepts/index.md |
| Consumer Health vertical | docs/app-concepts/consumer-health.md |
| Use Go SDK | ai-lookup/services/sdk.md |
| Write Rust code | .claude/guides/backend/rust-guidelines.md |
| Set up local dev | .claude/guides/local/setup.md |
| Run tests | .claude/guides/local/testing.md |
| Understand quality checks | .claude/guides/local/quality-checks.md |
| Learn about simulation | ai-lookup/features/simulation.md |
| Advance the simulator | arena-roadmap.md |
| Work on storage/DAG | Load skill: stemedb-core |
| Implement a Lens | Load skill: stemedb-lens |
| Plan a milestone | /plan-milestone command |
| Analyze use case gaps | /analyze-gaps command |
| Add an API endpoint | .claude/guides/backend/api-endpoints.md |
| Integrate with AI tools | .claude/guides/integrations/ai-coding-assistant-integration.md |
| ADK-Go + Episteme | .claude/guides/integrations/adk-go-episteme.md |
| Distributed architecture | docs/research/distributed-write-path.md |
| Write UAT reports | .claude/guides/local/uat-reports.md |
| Phase 6 UAT results | ai-lookup/features/phase6-uat.md |
Critical Rules
- Append-Only: NEVER mutate existing Assertions. Create new ones.
- Content-Addressed: Assertion ID = BLAKE3 hash of content.
- No Unwrap: NEVER use
unwrap()orexpect()in production code. CI enforces viaclippy::unwrap_usedandclippy::expect_usedat deny level. - Defensive Writes: All writes go through WAL with fsync.
- Zero-Copy: Use
rkyvfor serialization. ALWAYS usestemedb_core::serde::{serialize, deserialize}— NEVER use rawAllocSerializerin production code. - Instrument Critical Paths: Use
#[instrument]on public methods in WAL, storage, ingestion, and lens code. Include meaningful fields (key_len, payload_len, offset, candidates_count, lens). - Structured Logging: Use
tracing(info!, warn!, error!) instead ofprintln!/eprintln!. Clippy enforces viaprint_stdout/print_stderrat warn level. CLI binaries (e.g.,stemedb-sim) may use#![allow()]for user-facing output. - Document Changes: Update
ai-lookup/when adding new types/concepts. Keep skills in sync with code. - No Git Operations: NEVER use git stash, git branch, git checkout, or any git operations unless the user explicitly tells you to.
Quick Reference
# Build
cargo build --workspace
# Test
cargo test --workspace
# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check
Specialized Agents
| Domain | Agent | When to use |
|---|---|---|
| Product Vision | episteme-product-visionary |
Use cases, "why not Postgres?", product-market fit |
| General Rust | primary-developer |
Feature implementation, refactoring |
| Code Quality | rust-quality-engineer |
Reviews, test coverage, clippy |
| Storage | storage-engine-architect |
WAL, LSM, crash recovery |
| Graph Engine | rust-graph-engine-architect |
Lock-free structures, cache optimization |
| Defensive | defensive-systems-architect |
Rate limiting, circuit breakers, hostile input |
| Distributed | distributed-systems-engineer |
CRDT replication, Raft coordination, Merkle sync, clustering |
| Lenses | stemedb-lens-architect |
Query resolution, ranking algorithms |
| Planning | stemedb-planner |
Milestone planning, roadmap |
Architecture Overview
Write Path (Spine): Read Path (Cortex):
[Agent] -> [Ingestion] [Agent] <- [Lens Engine]
| |
v |
[WAL/Fsync] [Index Lookup]
| |
v |
[KV Store] <--------------------+
Crates
| Crate | Purpose | Status |
|---|---|---|
stemedb-core |
Assertion, LifecycleStage, MaterializedView, types | ✅ Implemented |
stemedb-wal |
Write-ahead log with crash recovery | ✅ Implemented |
stemedb-storage |
KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex | ✅ Implemented |
stemedb-ingest |
Ingestion pipeline, signature verification, ContentDefenseLayer | ✅ Implemented |
stemedb-query |
Query engine, Materializer for O(1) MV: reads | ✅ Implemented |
stemedb-lens |
Lenses (Recency, Consensus, Authority, Vote/Trust-aware) | ✅ Implemented |
stemedb-api |
HTTP API with axum + utoipa OpenAPI docs | ✅ Implemented |
stemedb-sim |
Simulation for testing the pipeline | ✅ Implemented |
stemedb-merkle |
BLAKE3 Merkle tree for diff detection | ✅ Implemented |
stemedb-rpc |
gRPC services for node-to-node communication | ✅ Implemented |
stemedb-sync |
Merkle sync, gossip broadcast, anti-entropy | ✅ Implemented |
stemedb-cluster |
Cluster membership (SWIM), sharding, gateway | ✅ Implemented |
SDKs
| SDK | Purpose | Status |
|---|---|---|
sdk/go/steme |
Go HTTP client with Ed25519 signing and fluent builders | ✅ Implemented |
sdk/go/adk |
ADK-Go tools and callbacks for AI agents | ✅ Implemented |