stemedb/CLAUDE.md
jordan 99b81adf8c perf: speed up test suite with profile.test optimization
- Add [profile.test] with opt-level=1 and debug=0 for faster compile/link
- Add [profile.test.build-override] with opt-level=3 for proc-macros
- Add tiered test targets: test-fast (single crate), test-lib (unit tests)
- Add install-nextest target for parallel test runner
- Update CLAUDE.md with new test command options
- Add CRATE variable guard to test-fast for helpful error messages

Expected improvement: ~50% faster incremental test builds

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 20:21:25 -07:00

11 KiB

Episteme (StemeDB)

A probabilistic knowledge graph database that stores Claims, not Facts. Append-only Merkle DAG with read-time resolution via Lenses.

Core Concept: "Git for Truth" - conflicting assertions coexist, resolved at query time through Consensus, Recency, Authority, or custom Lenses.

Find Your Guide

If you need to... Read this
Get started fast quickstart.md
Understand what Episteme is what-is-episteme.md
Understand the technical vision vision.md
See use cases use-cases/README.md
Understand architecture architecture.md
Learn data structures docs/data-structures.md
Understand governance models docs/specs/governance-models.md
See the roadmap roadmap.md
See completed phases roadmap-archive.md
Build apps on Episteme docs/app-concepts/index.md
Consumer Health vertical docs/app-concepts/consumer-health.md
Use Go SDK ai-lookup/services/sdk.md
Write Rust code .claude/guides/backend/rust-guidelines.md
Set up local dev .claude/guides/local/setup.md
Run tests .claude/guides/local/testing.md
Understand quality checks .claude/guides/local/quality-checks.md
Learn about simulation ai-lookup/features/simulation.md
Advance the simulator arena-roadmap.md
Work on storage/DAG Load skill: stemedb-core
Implement a Lens Load skill: stemedb-lens
Work on domain ontology crates/stemedb-ontology/
Consumer Health UAT uat/consumer-health/README.md
Verify production readiness uat/production-readiness/README.md
Plan a milestone /plan-milestone command
Analyze use case gaps /analyze-gaps command
Add an API endpoint .claude/guides/backend/api-endpoints.md
Integrate with AI tools .claude/guides/integrations/ai-coding-assistant-integration.md
ADK-Go + Episteme .claude/guides/integrations/adk-go-episteme.md
Distributed architecture docs/research/distributed-write-path.md
Write UAT reports .claude/guides/local/uat-reports.md
Phase 6 UAT results ai-lookup/features/phase6-uat.md
Configure Aphoria hosted mode .claude/guides/services/aphoria-hosted-mode.md
Aphoria config reference ai-lookup/features/aphoria-config.md
Work on Admin Dashboard applications/stemedb-dashboard/ (Next.js + shadcn/ui)
Work on Disputed app applications/disputed/
Understand repo structure ai-lookup/repo-structure.md
Aphoria LLM eval Load skill: aphoria-llm-optimization
General LLM optimization Load skill: llm-optimization
Install Aphoria Load skill: aphoria-install
Run Aphoria self-review Load skill: aphoria-self-review

Roadmap Maintenance

Two files, strict separation:

File Contains When to modify
roadmap.md Current + future work only Add new phases, update task status
roadmap-archive.md Completed phases (1-7, 8A, MVP) Move items when phase completes

Rules:

  • When a phase completes: Move entire phase section to archive, update status table in both files
  • When adding tasks: Add to current phase in roadmap.md with - [ ] checkbox format
  • When completing tasks: Change - [ ] to - [x], add brief implementation notes
  • Keep roadmap.md under 500 lines — if it grows, archive more aggressively
  • Current phase always has "🎯" marker in status table

Task format:

- [ ] **P1.2 Feature Name**: Brief description
    - [ ] Subtask one
    - [ ] Subtask two

Phase completion checklist:

  1. All tasks marked [x] in roadmap.md
  2. Cut entire phase section, paste into roadmap-archive.md
  3. Update status tables in both files
  4. Update "Current Focus" in roadmap.md header

Critical Rules

  • Append-Only: NEVER mutate existing Assertions. Create new ones.
  • Content-Addressed: Assertion ID = BLAKE3 hash of content.
  • No Unwrap: NEVER use unwrap() or expect() in production code. CI enforces via clippy::unwrap_used and clippy::expect_used at deny level.
  • Defensive Writes: All writes go through WAL with fsync.
  • Zero-Copy: Use rkyv for serialization. ALWAYS use stemedb_core::serde::{serialize, deserialize} — NEVER use raw AllocSerializer in production code.
  • Instrument Critical Paths: Use #[instrument] on public methods in WAL, storage, ingestion, and lens code. Include meaningful fields (key_len, payload_len, offset, candidates_count, lens).
  • Structured Logging: Use tracing (info!, warn!, error!) instead of println!/eprintln!. Clippy enforces via print_stdout/print_stderr at warn level. CLI binaries (e.g., stemedb-sim) may use #![allow()] for user-facing output.
  • Document Changes: Update ai-lookup/ when adding new types/concepts. Keep skills in sync with code.
  • No Git Operations: NEVER use git stash, git branch, git checkout, or any git operations unless the user explicitly tells you to.
  • No GitHub Workflows: We use pre-commit hooks, not GitHub Actions CI.

Quick Reference

# Build
cargo build --workspace

# Test (choose based on need)
cargo test -p stemedb-core        # Fast: single crate (~30s)
cargo test --workspace --lib      # Medium: all unit tests (~3min)
cargo nextest run                 # Full: parallel runner (~5min)
cargo test --workspace            # Legacy: sequential (~15min)

# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check

Port Scheme (181XX)

Offset Service Default Env Var
+0 HTTP API 18180 STEMEDB_BIND_ADDR
+1 Cluster Gateway 18181 STEMEDB_NODE_API_ADDR
+2 Cluster RPC 18182 STEMEDB_NODE_RPC_ADDR
+3 SWIM Gossip 18183 via SwimConfig
+4 Metrics 18184 (reserved)
+5 Admin 18185 (reserved)
+6 Latent Signal 18186
+7 Community App 18187

Specialized Agents

Domain Agent When to use
Product Vision episteme-product-visionary Use cases, "why not Postgres?", product-market fit
Pilot Prep enterprise-skeptic-buyer Pressure-test demos, find gaps, prepare for tough questions
Aphoria Pitch aphoria-skeptic-buyer Pressure-test Aphoria demos, security tool buyer objections
Aphoria Phase 7 declarative-extractor-skeptic Pressure-test declarative extractors, LLM extraction, pattern learning
Aphoria Phase 9 autonomous-learning-skeptic Pressure-test autonomous promotion, shadow mode, cross-project learning
General Rust primary-developer Feature implementation, refactoring
Code Quality rust-quality-engineer Reviews, test coverage, clippy
Storage storage-engine-architect WAL, LSM, crash recovery
Graph Engine rust-graph-engine-architect Lock-free structures, cache optimization
Defensive defensive-systems-architect Rate limiting, circuit breakers, hostile input
Distributed distributed-systems-engineer CRDT replication, Raft coordination, Merkle sync, clustering
Lenses stemedb-lens-architect Query resolution, ranking algorithms
Planning stemedb-planner Milestone planning, roadmap

Architecture Overview

Write Path (Spine):           Read Path (Cortex):
[Agent] -> [Ingestion]        [Agent] <- [Lens Engine]
              |                              |
              v                              |
         [WAL/Fsync]                  [Index Lookup]
              |                              |
              v                              |
         [KV Store] <--------------------+

Crates

Crate Purpose Status
stemedb-core Assertion, LifecycleStage, MaterializedView, types, signing utilities Implemented
stemedb-wal Write-ahead log with crash recovery Implemented
stemedb-storage KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex Implemented
stemedb-ingest Ingestion pipeline, signature verification, ContentDefenseLayer Implemented
stemedb-query Query engine, Materializer for O(1) MV: reads Implemented
stemedb-lens Lenses (Recency, Consensus, Authority, Vote/Trust-aware) Implemented
stemedb-api HTTP API with axum + utoipa OpenAPI docs Implemented
stemedb-sim Simulation for testing the pipeline Implemented
stemedb-merkle BLAKE3 Merkle tree for diff detection Implemented
stemedb-rpc gRPC services for node-to-node communication Implemented
stemedb-sync Merkle sync, gossip broadcast, anti-entropy Implemented
stemedb-cluster Cluster membership (SWIM), sharding, gateway Implemented
stemedb-ontology Domain definitions (Pharma), subject builders, medical extractors Implemented

SDKs

SDK Purpose Status
sdk/go/steme Go HTTP client with Ed25519 signing and fluent builders Implemented
sdk/go/adk ADK-Go tools and callbacks for AI agents Implemented

Latent Signal (latent/)

Python CLI tools for adverse event signal detection. Different rules from Rust crates:

Allowed:

  • print() for user-facing CLI output (these are scripts, not libraries)
  • except Exception as e: for CLI error handling (log and continue)

Required:

  • Environment Variables for URLs: NEVER hardcode localhost URLs without env fallback
    • Use os.getenv("VAR", "http://localhost:...") in Python
    • Use process.env.VAR || 'http://localhost:...' in TypeScript
  • StemeDB Integration: New ingestors should use StemeDBClient pattern from adk-agent/, not write to JSONL files