## Problem Dashboard sends URL-encoded query parameters: ?sources%5B%5D=rfc&sources%5B%5D=owasp (%5B = '[', %5D = ']') But QsQuery extractor used strict mode, which rejects encoded brackets: Error: "Invalid field contains an encoded bracket" Result: All corpus filters in the dashboard failed silently. ## Solution Changed QsQuery to use serde_qs non-strict mode: Config::new(5, false) // false = non-strict Now accepts BOTH: - Literal brackets: ?sources[]=rfc - Encoded brackets: ?sources%5B%5D=rfc (browsers) ## Verification ✅ URL-encoded query: ?sources%5B%5D=rfc&sources%5B%5D=community Returns: 24 items (was: error) Logs: sources=Some(["rfc", "community"]) ✅ ✅ Literal brackets: ?sources[]=rfc (still works) ✅ All 4 extractor tests pass (added encoded brackets test) ✅ Clippy clean (0 warnings) ## Files Changed - crates/stemedb-api/src/extractors.rs: Use non-strict Config - crates/stemedb-api/README.md: Document QsQuery usage - .claude/guides/backend/api-endpoints.md: Add best practices - CLAUDE.md: Reference extractors documentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
16 KiB
Episteme (StemeDB)
A probabilistic knowledge graph database that stores Claims, not Facts. Append-only Merkle DAG with read-time resolution via Lenses.
Core Concept: "Git for Truth" - conflicting assertions coexist, resolved at query time through Consensus, Recency, Authority, or custom Lenses.
Find Your Guide
| If you need to... | Read this |
|---|---|
| Get started fast | quickstart.md |
| Understand what Episteme is | what-is-episteme.md |
| Understand the technical vision | vision.md |
| See use cases | use-cases/README.md |
| Understand architecture | architecture.md |
| Learn data structures | docs/data-structures.md |
| Understand governance models | docs/specs/governance-models.md |
| See the roadmap | roadmap.md |
| See completed phases | roadmap-archive.md |
| Build apps on Episteme | docs/app-concepts/index.md |
| Consumer Health vertical | docs/app-concepts/consumer-health.md |
| Use Go SDK | ai-lookup/services/sdk.md |
| Write Rust code | .claude/guides/backend/rust-guidelines.md |
| Set up local dev | .claude/guides/local/setup.md |
| Run tests | .claude/guides/local/testing.md |
| Understand quality checks | .claude/guides/local/quality-checks.md |
| Learn about simulation | ai-lookup/features/simulation.md |
| Advance the simulator | arena-roadmap.md |
| Work on storage/DAG | Load skill: stemedb-core |
| Implement a Lens | Load skill: stemedb-lens |
| Work on domain ontology | crates/stemedb-ontology/ |
| Consumer Health UAT | uat/consumer-health/README.md |
| Verify production readiness | uat/production-readiness/README.md |
| Plan a milestone | /plan-milestone command |
| Analyze use case gaps | /analyze-gaps command |
| Add an API endpoint | .claude/guides/backend/api-endpoints.md |
| Integrate with AI tools | .claude/guides/integrations/ai-coding-assistant-integration.md |
| ADK-Go + Episteme | .claude/guides/integrations/adk-go-episteme.md |
| Distributed architecture | docs/research/distributed-write-path.md |
| Write UAT reports | .claude/guides/local/uat-reports.md |
| Phase 6 UAT results | ai-lookup/features/phase6-uat.md |
| Configure Aphoria hosted mode | .claude/guides/services/aphoria-hosted-mode.md |
| Aphoria config reference | ai-lookup/features/aphoria-config.md |
| Work on Admin Dashboard | applications/stemedb-dashboard/ (Next.js + shadcn/ui) |
| Work on Disputed app | applications/disputed/ |
| Understand repo structure | ai-lookup/repo-structure.md |
| Aphoria LLM eval | Load skill: aphoria-llm-optimization |
| General LLM optimization | Load skill: llm-optimization |
| Install Aphoria | Load skill: aphoria-install |
| Run Aphoria self-review | Load skill: aphoria-self-review |
| Author claims from diffs | Load skill: aphoria-claims |
| Suggest new claims | Load skill: aphoria-suggest |
Roadmap Maintenance
Two files, strict separation:
| File | Contains | When to modify |
|---|---|---|
roadmap.md |
Current + future work only | Add new phases, update task status |
roadmap-archive.md |
Completed phases (1-7, 8A, MVP) | Move items when phase completes |
Rules:
- When a phase completes: Move entire phase section to archive, update status table in both files
- When adding tasks: Add to current phase in
roadmap.mdwith- [ ]checkbox format - When completing tasks: Change
- [ ]to- [x], add brief implementation notes - Keep
roadmap.mdunder 500 lines — if it grows, archive more aggressively - Current phase always has "🎯" marker in status table
Task format:
- [ ] **P1.2 Feature Name**: Brief description
- [ ] Subtask one
- [ ] Subtask two
Phase completion checklist:
- All tasks marked
[x]inroadmap.md - Cut entire phase section, paste into
roadmap-archive.md - Update status tables in both files
- Update "Current Focus" in
roadmap.mdheader
Aphoria: What Is a Claim?
A claim is a human-authored statement about what code MUST do and WHY, with provenance and consequences.
Claims vs Observations
| Type | What it is | Who creates it | Example |
|---|---|---|---|
| Observation | Grep result: "this code does X" | Extractors (automated) | imports/tokio: true |
| Claim | Rule: "code MUST do X because Y, or Z breaks" | Humans (via skill) | "Core MUST NOT import tokio because it creates runtime coupling. If tokio appears in core imports, the library becomes async-only and breaks sync users." |
Observations are garbage. They're indexed facts with no meaning. Nobody cares that imports/format: true — that's just grep output.
Claims are the product. They encode architectural decisions, safety invariants, and spec compliance with full context: provenance (where the rule came from), invariant (what must stay true), and consequence (what breaks if violated).
Structure of a Claim
[[claim]]
id = "core-no-tokio-001"
concept_path = "stemedb/core/imports/tokio"
predicate = "imported"
value = false
comparison = "absent" # Code MUST NOT have this
provenance = "Architecture decision by jml 2024-12-15"
invariant = "Core modules MUST remain sync-only"
consequence = "Importing tokio makes core async-only, breaking sync library users"
authority_tier = "expert"
category = "architecture"
evidence = ["ADR-003", "design review notes"]
status = "active"
Aphoria Workflows (Primary Use Cases)
Day-to-day (commit-time claim authoring):
- Look at the entire diff
- Use
aphoria-claimsskill to identify "claimable" patterns (spec constants, ordering changes, boundary violations, derive changes on wire types) - Skill does lookups:
aphoria claims listto check what exists - If alignment needed, skill uses
aphoria claims updateorsupersede - Skill crafts and submits new claims via
aphoria claims create - If needed for audit, create paired extractor
Audit (scan-time claim verification):
- Direction 1:
aphoria scanruns extractors → observations, compares against authored claims → PASS/CONFLICT/MISSING - Direction 2:
aphoria verify runwalks all claims, verifies each one's pattern exists in code → PASS/CONFLICT/MISSING
The skill drives the CLI. The CLI doesn't know about the skill. They connect via skill calling aphoria claims commands in a loop.
Inline Claim Markers (@aphoria:claim)
Capture claim intent while writing code with inline markers:
1. Add marker in comment:
// @aphoria:claim[safety] Pool size MUST NOT exceed 50 -- OOM under sustained load
const MAX_POOL_SIZE: u32 = 50;
2. Enable in config (.aphoria/config.toml):
[extractors.inline_markers]
enabled = true
sync_to_pending = true # Auto-sync during scan (default)
3. Scan detects markers:
aphoria scan
# Output: ℹ Detected 1 new claim marker(s). Run 'aphoria claims list-markers' to review.
4. Review pending markers:
aphoria claims list-markers --format table
# Shows: ID, file, line, category, invariant
aphoria claims list-markers --format json
# JSON output for skills to process
5. Formalize via CLI:
aphoria claims formalize-marker marker-abc123 \
--id myapp-pool-max-001 \
--tier expert \
--evidence "tests/pool_tests.rs load test" \
--by jml
# Creates full claim in .aphoria/claims.toml
# Updates marker status to "formalized"
Or reject if not worth a claim:
aphoria claims reject-marker marker-abc123 --reason "Implementation detail, not architecture"
6. Update comment after formalization:
// @aphoria:claimed myapp-pool-max-001
const MAX_POOL_SIZE: u32 = 50;
Supported comment styles:
// @aphoria:claim(Rust, Go, C, TypeScript, JavaScript)# @aphoria:claim(Python, Ruby, Shell, YAML)-- @aphoria:claim(SQL)/* @aphoria:claim */(CSS, C-style blocks)<!-- @aphoria:claim -->(HTML, XML)
Optional fields:
- Category in brackets:
@aphoria:claim[category] - Consequence after
--:invariant -- consequence
Storage:
- Detected markers →
.aphoria/pending_markers.toml(auto-synced during scan) - Formalized claims →
.aphoria/claims.toml - Already formalized →
@aphoria:claimed <claim-id>(skipped by extractor)
Critical Rules
- Append-Only: NEVER mutate existing Assertions. Create new ones.
- Content-Addressed: Assertion ID = BLAKE3 hash of content.
- No Unwrap: NEVER use
unwrap()orexpect()in production code. CI enforces viaclippy::unwrap_usedandclippy::expect_usedat deny level. - Defensive Writes: All writes go through WAL with fsync.
- Zero-Copy: Use
rkyvfor serialization. ALWAYS usestemedb_core::serde::{serialize, deserialize}— NEVER use rawAllocSerializerin production code. - Instrument Critical Paths: Use
#[instrument]on public methods in WAL, storage, ingestion, and lens code. Include meaningful fields (key_len, payload_len, offset, candidates_count, lens). - Structured Logging: Use
tracing(info!, warn!, error!) instead ofprintln!/eprintln!. Clippy enforces viaprint_stdout/print_stderrat warn level. CLI binaries (e.g.,stemedb-sim) may use#![allow()]for user-facing output. - Query Parameter Arrays: In API handlers, use
QsQueryextractor (not standardQuery) for any DTO withVec<T>orOption<Vec<T>>fields. Dashboard uses bracket notation (?sources[]=a&sources[]=b) which requiresserde_qs. StandardQuerysilently fails on array params. Seecrates/stemedb-api/src/extractors.rsfor details. - Document Changes: Update
ai-lookup/when adding new types/concepts. Keep skills in sync with code. - No Git Operations: NEVER use git stash, git branch, git checkout, or any git operations unless the user explicitly tells you to.
- No GitHub Workflows: We use pre-commit hooks, not GitHub Actions CI.
Quick Reference
# Build
cargo build --workspace
# Test (choose based on need)
cargo test -p stemedb-core # Fast: single crate (~30s)
cargo test --workspace --lib # Medium: all unit tests (~3min)
cargo nextest run # Full: parallel runner (~5min)
cargo test --workspace # Legacy: sequential (~15min)
# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check
Port Scheme (181XX)
| Offset | Service | Default | Env Var |
|---|---|---|---|
| +0 | HTTP API | 18180 | STEMEDB_BIND_ADDR |
| +1 | Cluster Gateway | 18181 | STEMEDB_NODE_API_ADDR |
| +2 | Cluster RPC | 18182 | STEMEDB_NODE_RPC_ADDR |
| +3 | SWIM Gossip | 18183 | via SwimConfig |
| +4 | Metrics | 18184 | (reserved) |
| +5 | Admin | 18185 | (reserved) |
| +6 | Latent Signal | 18186 | — |
| +7 | Community App | 18187 | — |
| +8 | StemeDB Dashboard | 18188 | — |
| +9 | Aphoria Dashboard | 18189 | — |
Specialized Agents
| Domain | Agent | When to use |
|---|---|---|
| Product Vision | episteme-product-visionary |
Use cases, "why not Postgres?", product-market fit |
| Pilot Prep | enterprise-skeptic-buyer |
Pressure-test demos, find gaps, prepare for tough questions |
| Aphoria Pitch | aphoria-skeptic-buyer |
Pressure-test Aphoria demos, security tool buyer objections |
| Aphoria Phase 7 | declarative-extractor-skeptic |
Pressure-test declarative extractors, LLM extraction, pattern learning |
| Aphoria Phase 9 | autonomous-learning-skeptic |
Pressure-test autonomous promotion, shadow mode, cross-project learning |
| General Rust | primary-developer |
Feature implementation, refactoring |
| Code Quality | rust-quality-engineer |
Reviews, test coverage, clippy |
| Storage | storage-engine-architect |
WAL, LSM, crash recovery |
| Graph Engine | rust-graph-engine-architect |
Lock-free structures, cache optimization |
| Defensive | defensive-systems-architect |
Rate limiting, circuit breakers, hostile input |
| Distributed | distributed-systems-engineer |
CRDT replication, Raft coordination, Merkle sync, clustering |
| Lenses | stemedb-lens-architect |
Query resolution, ranking algorithms |
| Planning | stemedb-planner |
Milestone planning, roadmap |
Architecture Overview
Write Path (Spine): Read Path (Cortex):
[Agent] -> [Ingestion] [Agent] <- [Lens Engine]
| |
v |
[WAL/Fsync] [Index Lookup]
| |
v |
[KV Store] <--------------------+
Crates
| Crate | Purpose | Status |
|---|---|---|
stemedb-core |
Assertion, LifecycleStage, MaterializedView, types, signing utilities | ✅ Implemented |
stemedb-wal |
Write-ahead log with crash recovery | ✅ Implemented |
stemedb-storage |
KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex | ✅ Implemented |
stemedb-ingest |
Ingestion pipeline, signature verification, ContentDefenseLayer | ✅ Implemented |
stemedb-query |
Query engine, Materializer for O(1) MV: reads | ✅ Implemented |
stemedb-lens |
Lenses (Recency, Consensus, Authority, Vote/Trust-aware) | ✅ Implemented |
stemedb-api |
HTTP API with axum + utoipa OpenAPI docs | ✅ Implemented |
stemedb-sim |
Simulation for testing the pipeline | ✅ Implemented |
stemedb-merkle |
BLAKE3 Merkle tree for diff detection | ✅ Implemented |
stemedb-rpc |
gRPC services for node-to-node communication | ✅ Implemented |
stemedb-sync |
Merkle sync, gossip broadcast, anti-entropy | ✅ Implemented |
stemedb-cluster |
Cluster membership (SWIM), sharding, gateway | ✅ Implemented |
stemedb-ontology |
Domain definitions (Pharma), subject builders, medical extractors | ✅ Implemented |
SDKs
| SDK | Purpose | Status |
|---|---|---|
sdk/go/steme |
Go HTTP client with Ed25519 signing and fluent builders | ✅ Implemented |
sdk/go/adk |
ADK-Go tools and callbacks for AI agents | ✅ Implemented |
Latent Signal (latent/)
Python CLI tools for adverse event signal detection. Different rules from Rust crates:
Allowed:
print()for user-facing CLI output (these are scripts, not libraries)except Exception as e:for CLI error handling (log and continue)
Required:
- Environment Variables for URLs: NEVER hardcode
localhostURLs without env fallback- Use
os.getenv("VAR", "http://localhost:...")in Python - Use
process.env.VAR || 'http://localhost:...'in TypeScript
- Use
- StemeDB Integration: New ingestors should use
StemeDBClientpattern fromadk-agent/, not write to JSONL files