**Git Commit Tracking** - Automatically capture git commit hash when claims/observations are ingested - Store in assertion metadata for temporal context and audit trails - Graceful degradation in non-git environments - Solves double-commit problem by capturing hash at ingestion time **Implementation** - walker/git.rs: get_current_commit_hash() utility function - bridge.rs: Accept optional git_commit parameter in all conversion functions - episteme/local: Store project_root, capture git hash during ingestion - 5 new tests for git hash tracking + metadata validation - All 1162 aphoria tests passing **Documentation Overhaul** - README: Added Observations vs Claims distinction, git tracking, dashboard - CLI Reference: New sections for git integration and ignore/exclusion system - Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods - Enhanced verification engine docs with matching capabilities - DOCUMENTATION_UPDATES.md: Complete audit summary **Dashboard Separation** - Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard - Clean separation of concerns: StemeDB for core, Aphoria for security - Added dashboard documentation and setup guides Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
316 lines
15 KiB
Markdown
316 lines
15 KiB
Markdown
# Episteme (StemeDB)
|
||
|
||
A probabilistic knowledge graph database that stores Claims, not Facts. Append-only Merkle DAG with read-time resolution via Lenses.
|
||
|
||
**Core Concept:** "Git for Truth" - conflicting assertions coexist, resolved at query time through Consensus, Recency, Authority, or custom Lenses.
|
||
|
||
## Find Your Guide
|
||
|
||
| If you need to... | Read this |
|
||
|-------------------|-----------|
|
||
| **Get started fast** | [quickstart.md](./quickstart.md) |
|
||
| **Understand what Episteme is** | [what-is-episteme.md](./what-is-episteme.md) |
|
||
| **Understand the technical vision** | [vision.md](./vision.md) |
|
||
| **See use cases** | [use-cases/README.md](./use-cases/README.md) |
|
||
| **Understand architecture** | [architecture.md](./architecture.md) |
|
||
| **Learn data structures** | [docs/data-structures.md](./docs/data-structures.md) |
|
||
| **Understand governance models** | [docs/specs/governance-models.md](./docs/specs/governance-models.md) |
|
||
| **See the roadmap** | [roadmap.md](./roadmap.md) |
|
||
| **See completed phases** | [roadmap-archive.md](./roadmap-archive.md) |
|
||
| **Build apps on Episteme** | [docs/app-concepts/index.md](./docs/app-concepts/index.md) |
|
||
| **Consumer Health vertical** | [docs/app-concepts/consumer-health.md](./docs/app-concepts/consumer-health.md) |
|
||
| **Use Go SDK** | [ai-lookup/services/sdk.md](ai-lookup/services/sdk.md) |
|
||
| **Write Rust code** | [.claude/guides/backend/rust-guidelines.md](.claude/guides/backend/rust-guidelines.md) |
|
||
| **Set up local dev** | [.claude/guides/local/setup.md](.claude/guides/local/setup.md) |
|
||
| **Run tests** | [.claude/guides/local/testing.md](.claude/guides/local/testing.md) |
|
||
| **Understand quality checks** | [.claude/guides/local/quality-checks.md](.claude/guides/local/quality-checks.md) |
|
||
| **Learn about simulation** | [ai-lookup/features/simulation.md](ai-lookup/features/simulation.md) |
|
||
| **Advance the simulator** | [arena-roadmap.md](./arena-roadmap.md) |
|
||
| **Work on storage/DAG** | Load skill: `stemedb-core` |
|
||
| **Implement a Lens** | Load skill: `stemedb-lens` |
|
||
| **Work on domain ontology** | `crates/stemedb-ontology/` |
|
||
| **Consumer Health UAT** | [uat/consumer-health/README.md](./uat/consumer-health/README.md) |
|
||
| **Verify production readiness** | [uat/production-readiness/README.md](./uat/production-readiness/README.md) |
|
||
| **Plan a milestone** | `/plan-milestone` command |
|
||
| **Analyze use case gaps** | `/analyze-gaps` command |
|
||
| **Add an API endpoint** | [.claude/guides/backend/api-endpoints.md](.claude/guides/backend/api-endpoints.md) |
|
||
| **Integrate with AI tools** | [.claude/guides/integrations/ai-coding-assistant-integration.md](.claude/guides/integrations/ai-coding-assistant-integration.md) |
|
||
| **ADK-Go + Episteme** | [.claude/guides/integrations/adk-go-episteme.md](.claude/guides/integrations/adk-go-episteme.md) |
|
||
| **Distributed architecture** | [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) |
|
||
| **Write UAT reports** | [.claude/guides/local/uat-reports.md](.claude/guides/local/uat-reports.md) |
|
||
| **Phase 6 UAT results** | [ai-lookup/features/phase6-uat.md](ai-lookup/features/phase6-uat.md) |
|
||
| **Configure Aphoria hosted mode** | [.claude/guides/services/aphoria-hosted-mode.md](.claude/guides/services/aphoria-hosted-mode.md) |
|
||
| **Aphoria config reference** | [ai-lookup/features/aphoria-config.md](ai-lookup/features/aphoria-config.md) |
|
||
| **Work on Admin Dashboard** | `applications/stemedb-dashboard/` (Next.js + shadcn/ui) |
|
||
| **Work on Disputed app** | `applications/disputed/` |
|
||
| **Understand repo structure** | [ai-lookup/repo-structure.md](ai-lookup/repo-structure.md) |
|
||
| **Aphoria LLM eval** | Load skill: `aphoria-llm-optimization` |
|
||
| **General LLM optimization** | Load skill: `llm-optimization` |
|
||
| **Install Aphoria** | Load skill: `aphoria-install` |
|
||
| **Run Aphoria self-review** | Load skill: `aphoria-self-review` |
|
||
| **Author claims from diffs** | Load skill: `aphoria-claims` |
|
||
| **Suggest new claims** | Load skill: `aphoria-suggest` |
|
||
|
||
## Roadmap Maintenance
|
||
|
||
Two files, strict separation:
|
||
|
||
| File | Contains | When to modify |
|
||
|------|----------|----------------|
|
||
| `roadmap.md` | Current + future work only | Add new phases, update task status |
|
||
| `roadmap-archive.md` | Completed phases (1-7, 8A, MVP) | Move items when phase completes |
|
||
|
||
**Rules:**
|
||
- When a phase completes: Move entire phase section to archive, update status table in both files
|
||
- When adding tasks: Add to current phase in `roadmap.md` with `- [ ]` checkbox format
|
||
- When completing tasks: Change `- [ ]` to `- [x]`, add brief implementation notes
|
||
- Keep `roadmap.md` under 500 lines — if it grows, archive more aggressively
|
||
- Current phase always has "🎯" marker in status table
|
||
|
||
**Task format:**
|
||
```markdown
|
||
- [ ] **P1.2 Feature Name**: Brief description
|
||
- [ ] Subtask one
|
||
- [ ] Subtask two
|
||
```
|
||
|
||
**Phase completion checklist:**
|
||
1. All tasks marked `[x]` in `roadmap.md`
|
||
2. Cut entire phase section, paste into `roadmap-archive.md`
|
||
3. Update status tables in both files
|
||
4. Update "Current Focus" in `roadmap.md` header
|
||
|
||
## Aphoria: What Is a Claim?
|
||
|
||
A **claim** is a human-authored statement about what code MUST do and WHY, with provenance and consequences.
|
||
|
||
### Claims vs Observations
|
||
|
||
| Type | What it is | Who creates it | Example |
|
||
|------|-----------|----------------|---------|
|
||
| **Observation** | Grep result: "this code does X" | Extractors (automated) | `imports/tokio: true` |
|
||
| **Claim** | Rule: "code MUST do X because Y, or Z breaks" | Humans (via skill) | "Core MUST NOT import tokio because it creates runtime coupling. If tokio appears in core imports, the library becomes async-only and breaks sync users." |
|
||
|
||
**Observations are garbage.** They're indexed facts with no meaning. Nobody cares that `imports/format: true` — that's just grep output.
|
||
|
||
**Claims are the product.** They encode architectural decisions, safety invariants, and spec compliance with full context: provenance (where the rule came from), invariant (what must stay true), and consequence (what breaks if violated).
|
||
|
||
### Structure of a Claim
|
||
|
||
```toml
|
||
[[claim]]
|
||
id = "core-no-tokio-001"
|
||
concept_path = "stemedb/core/imports/tokio"
|
||
predicate = "imported"
|
||
value = false
|
||
comparison = "absent" # Code MUST NOT have this
|
||
provenance = "Architecture decision by jml 2024-12-15"
|
||
invariant = "Core modules MUST remain sync-only"
|
||
consequence = "Importing tokio makes core async-only, breaking sync library users"
|
||
authority_tier = "expert"
|
||
category = "architecture"
|
||
evidence = ["ADR-003", "design review notes"]
|
||
status = "active"
|
||
```
|
||
|
||
### Aphoria Workflows (Primary Use Cases)
|
||
|
||
**Day-to-day (commit-time claim authoring):**
|
||
1. Look at the entire diff
|
||
2. Use `aphoria-claims` skill to identify "claimable" patterns (spec constants, ordering changes, boundary violations, derive changes on wire types)
|
||
3. Skill does lookups: `aphoria claims list` to check what exists
|
||
4. If alignment needed, skill uses `aphoria claims update` or `supersede`
|
||
5. Skill crafts and submits new claims via `aphoria claims create`
|
||
6. If needed for audit, create paired extractor
|
||
|
||
**Audit (scan-time claim verification):**
|
||
1. **Direction 1**: `aphoria scan` runs extractors → observations, compares against authored claims → PASS/CONFLICT/MISSING
|
||
2. **Direction 2**: `aphoria verify run` walks all claims, verifies each one's pattern exists in code → PASS/CONFLICT/MISSING
|
||
|
||
The skill drives the CLI. The CLI doesn't know about the skill. They connect via skill calling `aphoria claims` commands in a loop.
|
||
|
||
### Inline Claim Markers (`@aphoria:claim`)
|
||
|
||
Capture claim intent while writing code with inline markers:
|
||
|
||
**1. Add marker in comment:**
|
||
```rust
|
||
// @aphoria:claim[safety] Pool size MUST NOT exceed 50 -- OOM under sustained load
|
||
const MAX_POOL_SIZE: u32 = 50;
|
||
```
|
||
|
||
**2. Enable in config** (`.aphoria/config.toml`):
|
||
```toml
|
||
[extractors.inline_markers]
|
||
enabled = true
|
||
sync_to_pending = true # Auto-sync during scan (default)
|
||
```
|
||
|
||
**3. Scan detects markers:**
|
||
```bash
|
||
aphoria scan
|
||
# Output: ℹ Detected 1 new claim marker(s). Run 'aphoria claims list-markers' to review.
|
||
```
|
||
|
||
**4. Review pending markers:**
|
||
```bash
|
||
aphoria claims list-markers --format table
|
||
# Shows: ID, file, line, category, invariant
|
||
|
||
aphoria claims list-markers --format json
|
||
# JSON output for skills to process
|
||
```
|
||
|
||
**5. Formalize via CLI:**
|
||
```bash
|
||
aphoria claims formalize-marker marker-abc123 \
|
||
--id myapp-pool-max-001 \
|
||
--tier expert \
|
||
--evidence "tests/pool_tests.rs load test" \
|
||
--by jml
|
||
# Creates full claim in .aphoria/claims.toml
|
||
# Updates marker status to "formalized"
|
||
```
|
||
|
||
**Or reject if not worth a claim:**
|
||
```bash
|
||
aphoria claims reject-marker marker-abc123 --reason "Implementation detail, not architecture"
|
||
```
|
||
|
||
**6. Update comment after formalization:**
|
||
```rust
|
||
// @aphoria:claimed myapp-pool-max-001
|
||
const MAX_POOL_SIZE: u32 = 50;
|
||
```
|
||
|
||
**Supported comment styles:**
|
||
- `// @aphoria:claim` (Rust, Go, C, TypeScript, JavaScript)
|
||
- `# @aphoria:claim` (Python, Ruby, Shell, YAML)
|
||
- `-- @aphoria:claim` (SQL)
|
||
- `/* @aphoria:claim */` (CSS, C-style blocks)
|
||
- `<!-- @aphoria:claim -->` (HTML, XML)
|
||
|
||
**Optional fields:**
|
||
- Category in brackets: `@aphoria:claim[category]`
|
||
- Consequence after ` -- `: `invariant -- consequence`
|
||
|
||
**Storage:**
|
||
- Detected markers → `.aphoria/pending_markers.toml` (auto-synced during scan)
|
||
- Formalized claims → `.aphoria/claims.toml`
|
||
- Already formalized → `@aphoria:claimed <claim-id>` (skipped by extractor)
|
||
|
||
## Critical Rules
|
||
|
||
- **Append-Only:** NEVER mutate existing Assertions. Create new ones.
|
||
- **Content-Addressed:** Assertion ID = BLAKE3 hash of content.
|
||
- **No Unwrap:** NEVER use `unwrap()` or `expect()` in production code. CI enforces via `clippy::unwrap_used` and `clippy::expect_used` at deny level.
|
||
- **Defensive Writes:** All writes go through WAL with fsync.
|
||
- **Zero-Copy:** Use `rkyv` for serialization. ALWAYS use `stemedb_core::serde::{serialize, deserialize}` — NEVER use raw `AllocSerializer` in production code.
|
||
- **Instrument Critical Paths:** Use `#[instrument]` on public methods in WAL, storage, ingestion, and lens code. Include meaningful fields (key_len, payload_len, offset, candidates_count, lens).
|
||
- **Structured Logging:** Use `tracing` (info!, warn!, error!) instead of `println!`/`eprintln!`. Clippy enforces via `print_stdout`/`print_stderr` at warn level. CLI binaries (e.g., `stemedb-sim`) may use `#![allow()]` for user-facing output.
|
||
- **Document Changes:** Update `ai-lookup/` when adding new types/concepts. Keep skills in sync with code.
|
||
- **No Git Operations:** NEVER use git stash, git branch, git checkout, or any git operations unless the user explicitly tells you to.
|
||
- **No GitHub Workflows:** We use pre-commit hooks, not GitHub Actions CI.
|
||
|
||
## Quick Reference
|
||
|
||
```bash
|
||
# Build
|
||
cargo build --workspace
|
||
|
||
# Test (choose based on need)
|
||
cargo test -p stemedb-core # Fast: single crate (~30s)
|
||
cargo test --workspace --lib # Medium: all unit tests (~3min)
|
||
cargo nextest run # Full: parallel runner (~5min)
|
||
cargo test --workspace # Legacy: sequential (~15min)
|
||
|
||
# Lint (must pass before commit)
|
||
cargo clippy --workspace -- -D warnings
|
||
cargo fmt --check
|
||
```
|
||
|
||
## Port Scheme (181XX)
|
||
|
||
| Offset | Service | Default | Env Var |
|
||
|--------|---------|---------|---------|
|
||
| +0 | HTTP API | 18180 | `STEMEDB_BIND_ADDR` |
|
||
| +1 | Cluster Gateway | 18181 | `STEMEDB_NODE_API_ADDR` |
|
||
| +2 | Cluster RPC | 18182 | `STEMEDB_NODE_RPC_ADDR` |
|
||
| +3 | SWIM Gossip | 18183 | via `SwimConfig` |
|
||
| +4 | Metrics | 18184 | (reserved) |
|
||
| +5 | Admin | 18185 | (reserved) |
|
||
| +6 | Latent Signal | 18186 | — |
|
||
| +7 | Community App | 18187 | — |
|
||
| +8 | StemeDB Dashboard | 18188 | — |
|
||
| +9 | Aphoria Dashboard | 18189 | — |
|
||
|
||
## Specialized Agents
|
||
|
||
| Domain | Agent | When to use |
|
||
|--------|-------|-------------|
|
||
| **Product Vision** | `episteme-product-visionary` | Use cases, "why not Postgres?", product-market fit |
|
||
| **Pilot Prep** | `enterprise-skeptic-buyer` | Pressure-test demos, find gaps, prepare for tough questions |
|
||
| **Aphoria Pitch** | `aphoria-skeptic-buyer` | Pressure-test Aphoria demos, security tool buyer objections |
|
||
| **Aphoria Phase 7** | `declarative-extractor-skeptic` | Pressure-test declarative extractors, LLM extraction, pattern learning |
|
||
| **Aphoria Phase 9** | `autonomous-learning-skeptic` | Pressure-test autonomous promotion, shadow mode, cross-project learning |
|
||
| General Rust | `primary-developer` | Feature implementation, refactoring |
|
||
| Code Quality | `rust-quality-engineer` | Reviews, test coverage, clippy |
|
||
| Storage | `storage-engine-architect` | WAL, LSM, crash recovery |
|
||
| Graph Engine | `rust-graph-engine-architect` | Lock-free structures, cache optimization |
|
||
| Defensive | `defensive-systems-architect` | Rate limiting, circuit breakers, hostile input |
|
||
| Distributed | `distributed-systems-engineer` | CRDT replication, Raft coordination, Merkle sync, clustering |
|
||
| Lenses | `stemedb-lens-architect` | Query resolution, ranking algorithms |
|
||
| Planning | `stemedb-planner` | Milestone planning, roadmap |
|
||
|
||
## Architecture Overview
|
||
|
||
```
|
||
Write Path (Spine): Read Path (Cortex):
|
||
[Agent] -> [Ingestion] [Agent] <- [Lens Engine]
|
||
| |
|
||
v |
|
||
[WAL/Fsync] [Index Lookup]
|
||
| |
|
||
v |
|
||
[KV Store] <--------------------+
|
||
```
|
||
|
||
## Crates
|
||
|
||
| Crate | Purpose | Status |
|
||
|-------|---------|--------|
|
||
| `stemedb-core` | Assertion, LifecycleStage, MaterializedView, types, signing utilities | ✅ Implemented |
|
||
| `stemedb-wal` | Write-ahead log with crash recovery | ✅ Implemented |
|
||
| `stemedb-storage` | KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex | ✅ Implemented |
|
||
| `stemedb-ingest` | Ingestion pipeline, signature verification, ContentDefenseLayer | ✅ Implemented |
|
||
| `stemedb-query` | Query engine, Materializer for O(1) MV: reads | ✅ Implemented |
|
||
| `stemedb-lens` | Lenses (Recency, Consensus, Authority, Vote/Trust-aware) | ✅ Implemented |
|
||
| `stemedb-api` | HTTP API with axum + utoipa OpenAPI docs | ✅ Implemented |
|
||
| `stemedb-sim` | Simulation for testing the pipeline | ✅ Implemented |
|
||
| `stemedb-merkle` | BLAKE3 Merkle tree for diff detection | ✅ Implemented |
|
||
| `stemedb-rpc` | gRPC services for node-to-node communication | ✅ Implemented |
|
||
| `stemedb-sync` | Merkle sync, gossip broadcast, anti-entropy | ✅ Implemented |
|
||
| `stemedb-cluster` | Cluster membership (SWIM), sharding, gateway | ✅ Implemented |
|
||
| `stemedb-ontology` | Domain definitions (Pharma), subject builders, medical extractors | ✅ Implemented |
|
||
|
||
## SDKs
|
||
|
||
| SDK | Purpose | Status |
|
||
|-----|---------|--------|
|
||
| `sdk/go/steme` | Go HTTP client with Ed25519 signing and fluent builders | ✅ Implemented |
|
||
| `sdk/go/adk` | ADK-Go tools and callbacks for AI agents | ✅ Implemented |
|
||
|
||
## Latent Signal (latent/)
|
||
|
||
Python CLI tools for adverse event signal detection. Different rules from Rust crates:
|
||
|
||
**Allowed:**
|
||
- `print()` for user-facing CLI output (these are scripts, not libraries)
|
||
- `except Exception as e:` for CLI error handling (log and continue)
|
||
|
||
**Required:**
|
||
- **Environment Variables for URLs:** NEVER hardcode `localhost` URLs without env fallback
|
||
- Use `os.getenv("VAR", "http://localhost:...")` in Python
|
||
- Use `process.env.VAR || 'http://localhost:...'` in TypeScript
|
||
- **StemeDB Integration:** New ingestors should use `StemeDBClient` pattern from `adk-agent/`, not write to JSONL files
|