stemedb/CLAUDE.md
jordan 9c36a8e3b3
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
docs: update CLAUDE.md deployment section — dual-remote workflow, no local Docker builds
Replace verbose deployment docs with token-efficient version reflecting
actual workflow: git push all main → Gitea → Woodpecker → Kaniko → Zot → k3s.
2026-03-07 18:23:01 -07:00

26 KiB
Raw Permalink Blame History

Episteme (StemeDB)

A probabilistic knowledge graph database that stores Claims, not Facts. Append-only Merkle DAG with read-time resolution via Lenses.

Core Concept: "Git for Truth" - conflicting assertions coexist, resolved at query time through Consensus, Recency, Authority, or custom Lenses.

ZERO TOLERANCE FOR MEDIOCRITY: We build enterprise-grade products that must survive in production. Panics are UNACCEPTABLE. Broken pipe errors are UNACCEPTABLE. Sloppy testing is UNACCEPTABLE. Every line of code ships to paying customers who depend on it. Test everything. Handle every error. No shortcuts. No excuses.

API Docs: OpenAPI spec includes conceptual guide (crates/stemedb-api/docs/api-intro.md) with semaglutide examples showing claims vs facts, authority tiers, time-travel queries, and conflict resolution.

Find Your Guide

If you need to... Read this
Get started fast quickstart.md
Understand what Episteme is what-is-episteme.md
Understand the technical vision vision.md
See use cases use-cases/README.md
Understand architecture architecture.md
Learn data structures docs/data-structures.md
Understand governance models docs/specs/governance-models.md
See the roadmap roadmap.md
See completed phases roadmap-archive.md
Build apps on Episteme docs/app-concepts/index.md
Consumer Health vertical docs/app-concepts/consumer-health.md
Use Go SDK ai-lookup/services/sdk.md
Write Rust code .claude/guides/backend/rust-guidelines.md
Set up local dev .claude/guides/local/setup.md
Run tests .claude/guides/local/testing.md
Understand quality checks .claude/guides/local/quality-checks.md
Learn about simulation ai-lookup/features/simulation.md
Advance the simulator roadmap.md#arena-simulation-roadmap
Work on storage/DAG Load skill: stemedb-core
Implement a Lens Load skill: stemedb-lens
Work on domain ontology crates/stemedb-ontology/
Consumer Health UAT uat/consumer-health/README.md
Verify production readiness uat/production-readiness/README.md
Deploy to production docs/operations/README.md
Manage cluster nodes docs/operations/node-lifecycle.md
Install admin CLI docs/operations/deployment/install-admin-cli.md
Troubleshoot incidents docs/operations/runbooks/
Size your deployment docs/operations/reference-architecture/resource-sizing.md
Validate pilot success docs/operations/pilot-success-criteria.md
Plan a milestone /plan-milestone command
Analyze use case gaps /analyze-gaps command
Add an API endpoint .claude/guides/backend/api-endpoints.md
Integrate with AI tools .claude/guides/integrations/ai-coding-assistant-integration.md
ADK-Go + Episteme .claude/guides/integrations/adk-go-episteme.md
Distributed architecture docs/research/distributed-write-path.md
Write UAT reports .claude/guides/local/uat-reports.md
Phase 6 UAT results ai-lookup/features/phase6-uat.md
Configure Aphoria hosted mode .claude/guides/services/aphoria-hosted-mode.md
Aphoria config reference ai-lookup/features/aphoria-config.md
Work on Admin Dashboard applications/stemedb-dashboard/ (Next.js + shadcn/ui)
Work on Disputed app applications/disputed/
Understand repo structure ai-lookup/repo-structure.md
Understand Aphoria flywheel ai-lookup/features/aphoria-flywheel.md
Aphoria LLM eval Load skill: aphoria-llm-optimization
General LLM optimization Load skill: llm-optimization
Install Aphoria Load skill: aphoria-install
Run Aphoria self-review Load skill: aphoria-self-review
Author claims from diffs Load skill: aphoria-claims
Suggest new claims Load skill: aphoria-suggest
Automate post-commit analysis Load skill: aphoria-post-commit-hook
Set up CI/CD automation Load skill: aphoria-ci-setup
Create declarative extractors applications/aphoria/docs/extractors/declarative-extractors.md
Learn extractor examples applications/aphoria/docs/examples/extractors/
Avoid dogfooding mistakes applications/aphoria/docs/dogfooding-common-mistakes.md

Roadmap Maintenance

Two files, strict separation:

File Contains When to modify
roadmap.md Current + future work only Add new phases, update task status
roadmap-archive.md Completed phases (1-7, 8A, MVP) Move items when phase completes

Rules:

  • When a phase completes: Move entire phase section to archive, update status table in both files
  • When adding tasks: Add to current phase in roadmap.md with - [ ] checkbox format
  • When completing tasks: Change - [ ] to - [x], add brief implementation notes
  • Keep roadmap.md under 500 lines — if it grows, archive more aggressively
  • Current phase always has "🎯" marker in status table

Task format:

- [ ] **P1.2 Feature Name**: Brief description
    - [ ] Subtask one
    - [ ] Subtask two

Phase completion checklist:

  1. All tasks marked [x] in roadmap.md
  2. Cut entire phase section, paste into roadmap-archive.md
  3. Update status tables in both files
  4. Update "Current Focus" in roadmap.md header

Aphoria: The Autonomous Flywheel

Aphoria is a continuous learning system that runs on EVERY commit, NOT a CLI tool you invoke manually.

The Commit-Time Loop (Runs Automatically):

Developer commits code
    ↓
1. SCAN: Extractors → observations
    ↓
2. CHECK: Compare observations against claims → violations
    ↓
3. FIX: Developer fixes violations
    ↓
4. GET REMAINING CLAIMS: Identify claims without extractors
    ↓
5. CREATE EXTRACTORS: Dynamically generate extractors for uncovered claims
    ↓
6. SUGGEST NEW CLAIMS: LLM analyzes patterns → suggests new claims
    ↓
7. CREATE NEW EXTRACTORS: Generate extractors for new claims
    ↓
(Loop repeats, knowledge compounds)

Knowledge Compounding: Each commit benefits from all previous commits' learning - not through ML training, but through accumulated structured decisions.

Remote vs Local: In remote mode, all claims are stored in the remote StemeDB instance (no local TOML files). Developers query remote claims to discover org patterns (specs at Tier 1, popular patterns at Tier 3), then manually decide whether to align their code. Convergence is inspection-driven, not automatic. Promotion to higher tiers is manual.

LLM Workflows ARE the Core Product

CRITICAL: Aphoria's autonomous operation REQUIRES LLM-driven automation:

  • Claude Code skills (/aphoria-claims, /aphoria-suggest, /aphoria-custom-extractor-creator)
  • Go ADK agents (custom agent implementations)
  • Other LLM methodology (API-driven workflows)

Manual CLI (aphoria scan, aphoria claims create) is debug interface for when LLM automation is unavailable. It is NOT the primary workflow.

Manual fallbacks to CLI operations are unacceptable in production workflows — if LLM automation is unavailable, the system is broken, not in "fallback mode."

Three Main Workflows:

  1. Commit-time (PRIMARY): Developer commits → Aphoria scans → checks policies → dynamically creates extractors for uncovered existing claims → LLM suggests new claims from patterns → LLM creates extractors for new claims
  2. Onboarding: New dev codes → Aphoria guides with team conventions + linked context (who, why, when)
  3. Graduation: Patterns with frequency + authority → auto-promote to conventions (shadow mode → promotion)

Critical: The commit-time workflow has TWO extractor creation phases:

  • Phase 1: Dynamic creation for existing claims without extractors (ensures all authored claims are verifiable)
  • Phase 2: Creation for new claims suggested by pattern analysis (expands coverage)

Skills That Drive the Flywheel:

Skill Purpose When Used
/aphoria-claims Analyze diffs, author/update claims Every commit with code changes
/aphoria-suggest Suggest new claims from patterns When growing coverage
/aphoria-custom-extractor-creator Generate extractors (for both existing uncovered claims AND new claims) Continuous - both phases of loop
/aphoria-corpus-import Import docs → create claims + extractors Bootstrap from external sources
/aphoria-post-commit-hook Automate all loop steps with post-commit hooks One-time setup per project
/aphoria-ci-setup Automate via CI/CD instead of local hooks One-time setup per repo

Dogfooding Day 3: The Extractor Creation Phase

Day 3 is where the flywheel validates. This is the step that separates Aphoria from static linters.

Why Day 3 is Critical:

  • Day 3 IS Steps 4-5 of the commit-time loop (identify gaps → create extractors)
  • Without Day 3 extractor creation, NO knowledge is captured
  • This is the CORE validation of autonomous learning

Workflow:

  1. Baseline scan → Detect X violations (often 0-20% on new domains)
  2. Gap analysis → Identify claims with no extractors (MISSING verdicts)
  3. Extractor creation → Use /aphoria-custom-extractor-creator to generate extractors (REQUIRED)
  4. Verification scan → Detect Y violations (target: ≥90%)
  5. Document → Record detection rate improvement (X% → Y%)

Success Criteria:

  • Detection rate ≥90% after extractor creation
  • All extractors produce correct observations (concept_path matches claim)
  • Learning documented (which patterns were added to corpus)
  • Time ≤2 hours (including all 5 phases)

Evidence of Correct Execution:

ls .aphoria/extractors/*.toml | wc -l  # Should be: 8+ (number of violations)
ls scan-v2.json                        # Must exist (verification scan)
ls DAY3-SUMMARY.md                     # Must exist (daily summary)

If ANY of these are missing, Day 3 was NOT completed correctly.

Common Mistake: Running scan once, seeing low detection rate, and moving on without creating extractors. This breaks the entire flywheel. See applications/aphoria/docs/dogfooding-common-mistakes.md for full details.


CRITICAL PROHIBITION:

NEVER describe Aphoria as:

  • "CLI tool with LLM features"
  • "Static scanner with optional automation"
  • "Tool you run when you want"

ALWAYS describe Aphoria as:

  • "Autonomous continuous learning system"
  • "LLM-driven commit-time flywheel"
  • "System that runs on every commit"

For questions about "what is the flywheel?" or "main use cases", read: /home/jml/Workspace/stemedb/applications/aphoria/vision.md


Aphoria: What Is a Claim?

A claim is a human-authored statement about what code MUST do and WHY, with provenance and consequences.

Storage today: Claims live in .aphoria/claims.toml (a flat mutable file), NOT in StemeDB. Observations flow through StemeDB (WAL, append-only, content-addressed). Claims do not. Closing this gap is tracked in tmp/aphoria-stemedb-gap-closure.md.

Claims vs Observations

Type What it is Who creates it Example
Observation Grep result: "this code does X" Extractors (automated) imports/tokio: true
Claim Rule: "code MUST do X because Y, or Z breaks" Humans (via skill) "Core MUST NOT import tokio because it creates runtime coupling. If tokio appears in core imports, the library becomes async-only and breaks sync users."

Observations are garbage. They're indexed facts with no meaning. Nobody cares that imports/format: true — that's just grep output.

Claims are the product. They encode architectural decisions, safety invariants, and spec compliance with full context: provenance (where the rule came from), invariant (what must stay true), and consequence (what breaks if violated).

Structure of a Claim

[[claim]]
id = "core-no-tokio-001"
concept_path = "stemedb/core/imports/tokio"
predicate = "imported"
value = false
comparison = "absent"  # Code MUST NOT have this
provenance = "Architecture decision by jml 2024-12-15"
invariant = "Core modules MUST remain sync-only"
consequence = "Importing tokio makes core async-only, breaking sync library users"
authority_tier = "expert"
category = "architecture"
evidence = ["ADR-003", "design review notes"]
status = "active"

Aphoria Workflows (Primary Use Cases)

Day-to-day (commit-time claim authoring):

  1. Look at the entire diff
  2. Use aphoria-claims skill to identify "claimable" patterns (spec constants, ordering changes, boundary violations, derive changes on wire types)
  3. Skill does lookups: aphoria claims list to check what exists
  4. If alignment needed, skill uses aphoria claims update or supersede
  5. Skill crafts and submits new claims via aphoria claims create
  6. If needed for audit, create paired extractor

Audit (scan-time claim verification):

  1. Direction 1: aphoria scan runs extractors → observations, compares against authored claims → PASS/CONFLICT/MISSING
  2. Direction 2: aphoria verify run walks all claims, verifies each one's pattern exists in code → PASS/CONFLICT/MISSING

The skill drives the CLI. The CLI doesn't know about the skill. They connect via skill calling aphoria claims commands in a loop.

Inline Claim Markers (@aphoria:claim)

Capture claim intent while writing code with inline markers:

1. Add marker in comment:

// @aphoria:claim[safety] Pool size MUST NOT exceed 50 -- OOM under sustained load
const MAX_POOL_SIZE: u32 = 50;

2. Enable in config (.aphoria/config.toml):

[extractors.inline_markers]
enabled = true
sync_to_pending = true  # Auto-sync during scan (default)

3. Scan detects markers:

aphoria scan
# Output:  Detected 1 new claim marker(s). Run 'aphoria claims list-markers' to review.

4. Review pending markers:

aphoria claims list-markers --format table
# Shows: ID, file, line, category, invariant

aphoria claims list-markers --format json
# JSON output for skills to process

5. Formalize via CLI:

aphoria claims formalize-marker marker-abc123 \
  --id myapp-pool-max-001 \
  --tier expert \
  --evidence "tests/pool_tests.rs load test" \
  --by jml
# Creates full claim in .aphoria/claims.toml
# Updates marker status to "formalized"

Or reject if not worth a claim:

aphoria claims reject-marker marker-abc123 --reason "Implementation detail, not architecture"

6. Update comment after formalization:

// @aphoria:claimed myapp-pool-max-001
const MAX_POOL_SIZE: u32 = 50;

Supported comment styles:

  • // @aphoria:claim (Rust, Go, C, TypeScript, JavaScript)
  • # @aphoria:claim (Python, Ruby, Shell, YAML)
  • -- @aphoria:claim (SQL)
  • /* @aphoria:claim */ (CSS, C-style blocks)
  • <!-- @aphoria:claim --> (HTML, XML)

Optional fields:

  • Category in brackets: @aphoria:claim[category]
  • Consequence after --: invariant -- consequence

Storage:

  • Detected markers → .aphoria/pending_markers.toml (auto-synced during scan)
  • Formalized claims → .aphoria/claims.toml
  • Already formalized → @aphoria:claimed <claim-id> (skipped by extractor)

Critical Rules

  • No Random Summaries: Do not create summary documents (like *-SUMMARY.md) unless explicitly requested.
  • Append-Only: NEVER mutate existing Assertions. Create new ones.
  • Content-Addressed: Assertion ID = BLAKE3 hash of content.
  • No Unwrap: NEVER use unwrap() or expect() in production code. CI enforces via clippy::unwrap_used and clippy::expect_used at deny level.
  • Defensive Writes: All writes go through WAL with fsync.
  • Zero-Copy: Use rkyv for serialization. ALWAYS use stemedb_core::serde::{serialize, deserialize} — NEVER use raw AllocSerializer in production code.
  • Instrument Critical Paths: Use #[instrument] on public methods in WAL, storage, ingestion, and lens code. Include meaningful fields (key_len, payload_len, offset, candidates_count, lens).
  • Structured Logging: Use tracing (info!, warn!, error!) instead of println!/eprintln!. Clippy enforces via print_stdout/print_stderr at warn level. CLI binaries (e.g., stemedb-sim) may use #![allow()] for user-facing output.
  • Query Parameter Arrays: In API handlers, use QsQuery extractor (not standard Query) for any DTO with Vec<T> or Option<Vec<T>> fields. Dashboard uses bracket notation (?sources[]=a&sources[]=b) which requires serde_qs. Standard Query silently fails on array params. See crates/stemedb-api/src/extractors.rs for details.
  • Document Changes: Update ai-lookup/ when adding new types/concepts. Keep skills in sync with code.
  • No Git Operations: NEVER use git stash, git branch, git checkout, or any git operations unless the user explicitly tells you to.
  • No GitHub Workflows: We use pre-commit hooks, not GitHub Actions CI.

Quick Reference

# Build
cargo build --workspace

# Test (choose based on need)
cargo test -p stemedb-core        # Fast: single crate (~30s)
cargo test --workspace --lib      # Medium: all unit tests (~3min)
cargo nextest run                 # Full: parallel runner (~5min)
cargo test --workspace            # Legacy: sequential (~15min)

# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check

Port Scheme (181XX)

Offset Service Default Env Var
+0 HTTP API 18180 STEMEDB_BIND_ADDR
+1 Cluster Gateway 18181 STEMEDB_NODE_API_ADDR
+2 Cluster RPC 18182 STEMEDB_NODE_RPC_ADDR
+3 SWIM Gossip 18183 via SwimConfig
+4 Metrics 18184 (reserved)
+5 Admin 18185 (reserved)
+6 Latent Signal 18186
+7 Community App 18187
+8 StemeDB Dashboard 18188
+9 Aphoria Dashboard 18189

Specialized Agents

Domain Agent When to use
Product Vision episteme-product-visionary Use cases, "why not Postgres?", product-market fit
Pilot Prep enterprise-skeptic-buyer Pressure-test demos, find gaps, prepare for tough questions
Aphoria Pitch aphoria-skeptic-buyer Pressure-test Aphoria demos, security tool buyer objections
Aphoria Phase 7 declarative-extractor-skeptic Pressure-test declarative extractors, LLM extraction, pattern learning
Aphoria Phase 9 autonomous-learning-skeptic Pressure-test autonomous promotion, shadow mode, cross-project learning
General Rust primary-developer Feature implementation, refactoring
Code Quality rust-quality-engineer Reviews, test coverage, clippy
Storage storage-engine-architect WAL, LSM, crash recovery
Graph Engine rust-graph-engine-architect Lock-free structures, cache optimization
Defensive defensive-systems-architect Rate limiting, circuit breakers, hostile input
Distributed distributed-systems-engineer CRDT replication, Raft coordination, Merkle sync, clustering
Lenses stemedb-lens-architect Query resolution, ranking algorithms
Planning stemedb-planner Milestone planning, roadmap

Architecture Overview

Write Path (Spine):           Read Path (Cortex):
[Agent] -> [Ingestion]        [Agent] <- [Lens Engine]
              |                              |
              v                              |
         [WAL/Fsync]                  [Index Lookup]
              |                              |
              v                              |
         [KV Store] <--------------------+

Crates

Crate Purpose Status
stemedb-core Assertion, LifecycleStage, MaterializedView, types, signing utilities Implemented
stemedb-wal Write-ahead log with crash recovery Implemented
stemedb-storage KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex Implemented
stemedb-ingest Ingestion pipeline, signature verification, ContentDefenseLayer Implemented
stemedb-query Query engine, Materializer for O(1) MV: reads Implemented
stemedb-lens Lenses (Recency, Consensus, Authority, Vote/Trust-aware) Implemented
stemedb-api HTTP API with axum + utoipa OpenAPI docs Implemented
stemedb-sim Simulation for testing the pipeline Implemented
stemedb-merkle BLAKE3 Merkle tree for diff detection Implemented
stemedb-rpc gRPC services for node-to-node communication Implemented
stemedb-sync Merkle sync, gossip broadcast, anti-entropy Implemented
stemedb-cluster Cluster membership (SWIM), sharding, gateway Implemented
stemedb-ontology Domain definitions (Pharma), subject builders, medical extractors Implemented

SDKs

SDK Purpose Status
sdk/go/steme Go HTTP client with Ed25519 signing and fluent builders Implemented
sdk/go/adk ADK-Go tools and callbacks for AI agents Implemented

Latent Signal (latent/)

Python CLI tools for adverse event signal detection. Different rules from Rust crates:

Allowed:

  • print() for user-facing CLI output (these are scripts, not libraries)
  • except Exception as e: for CLI error handling (log and continue)

Required:

  • Environment Variables for URLs: NEVER hardcode localhost URLs without env fallback
    • Use os.getenv("VAR", "http://localhost:...") in Python
    • Use process.env.VAR || 'http://localhost:...' in TypeScript
  • StemeDB Integration: New ingestors should use StemeDBClient pattern from adk-agent/, not write to JSONL files

Production Infrastructure

Git Remotes

Three remotes configured — all pushes to both:

Remote Target Purpose
origin github.com:orchard9/stemedb Source of truth
gitea git.threesix.ai/jordan/stemedb Triggers Woodpecker CI
all Both GitHub + Gitea Use this for deploys

NEVER build Docker images locally. Mac is ARM, cluster is amd64. Always push to Gitea to trigger Kaniko (native amd64 build on-cluster).

Deployment

# Deploy: push to both remotes (triggers Woodpecker → Kaniko → Zot → kubectl rollout)
git push all main

# Pipeline: git push → Gitea webhook → Woodpecker CI → Kaniko build → registry.threesix.ai (Zot) → kubectl set image
# Image tags: latest + ${CI_COMMIT_SHA:0:8}
# Build time: ~15-20 min cold, ~2-5 min warm (cargo-chef caches deps)

Pipeline config: .woodpecker.yml — Kaniko builds stemedb-api with --features cluster, deploys via kubectl set image statefulset/stemedb.

k3s Cluster

  • Kubeconfig: ~/.kube/orchard9-k3sf.yaml (use --kubeconfig flag)
  • Fleet repo: /Users/jordanwashburn/Workspace/orchard9/k3s-fleet
  • Nodes: 3-node cluster (2 servers + 1 agent), amd64
  • Registry: Zot at registry.threesix.ai (in-cluster, namespace threesix)
  • Storage: Longhorn CSI (storageClassName: longhorn, RWO)
  • Ingress: Traefik (ingressClassName: traefik)
  • TLS: cert-manager, ClusterIssuer: letsencrypt-prod
  • Secrets: ExternalSecrets Operator → GCP Secret Manager (project orchard9)
  • k8s manifests: k3s-fleet/deployments/k8s/base/stemedb/

Service URLs

Service URL
StemeDB API (external) https://stemedb.threesix.ai (→ Gateway :18181)
StemeDB Gateway (internal) http://stemedb-gateway.stemedb.svc:18181
StemeDB API (per-pod) http://stemedb-{0,1,2}.stemedb-headless.stemedb.svc:18180

GCP

  • Account: jordan@roamrhino.com, project orchard9
  • Secret Manager: stemedb-root-api-key, per-project: stemedb-key-<slug>

DNS (Cloudflare)

  • Domain: threesix.ai — env vars: THREESIX_CLOUDFLARE_API_TOKEN, THREESIX_CLOUDFLARE_ZONE_ID

Verify Deployment

kubectl --kubeconfig ~/.kube/orchard9-k3sf.yaml get pods -n stemedb
curl -s https://stemedb.threesix.ai/v1/health
curl -s https://stemedb.threesix.ai/v1/cluster/status
curl -s https://stemedb.threesix.ai/metrics | head -5