stemedb/quickstart.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

7.5 KiB

Quick Start

Get StemeDB running and validated in under 5 minutes.

Prerequisites

  • Rust 1.75+ (rustup update stable)
  • curl (for validation)

1. Validate It Works

# Clone and enter
git clone <repo-url>
cd stemedb

# Run end-to-end validation (builds, starts server, asserts, queries, shuts down)
make validate

Expected output:

==========================================
  StemeDB Validation
==========================================

[PASS] Build complete
[PASS] Server is healthy
[PASS] Health check passed
[PASS] Assertion created: abc123...
[PASS] Query returned correct data
[PASS] Lens query (Recency) works

==========================================
  All validation checks passed!
==========================================

If you see "All validation checks passed!" - StemeDB is working correctly.

2. Start the Server

cargo run --package stemedb-api

The server starts on http://localhost:18180.

3. Explore the API

Open the Swagger UI for interactive documentation:

http://localhost:18180/swagger-ui

Or check health via curl:

curl http://localhost:18180/v1/health
# {"status":"healthy","version":"0.1.0","assertions_count":0}

4. Create Your First Assertion

Using the Go SDK (recommended):

cd sdk/go/examples/basic
go run main.go

Or via curl (requires generating Ed25519 signatures):

# Generate a signed assertion
cargo run --package stemedb-api --example gen_test_assertion > /tmp/assertion.json

# Submit it
curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d @/tmp/assertion.json

5. Query It Back

# Query by subject and predicate
curl "http://localhost:18180/v1/query?subject=StemeDB_Validation&predicate=test_status"

# Query with a lens (conflict resolution)
curl "http://localhost:18180/v1/query?subject=StemeDB_Validation&predicate=test_status&lens=Recency"

6. See Conflict in Action (The "Git for Truth" Moment)

Episteme stores Claims, not Facts. When multiple agents assert conflicting values, the Skeptic endpoint shows you all competing claims instead of picking a winner.

Create Conflicting Assertions

Using the Go SDK, create assertions with different claims about the same subject:

cd sdk/go/examples/conflict
go run main.go

Query with Skeptic

The Skeptic endpoint reveals disagreement instead of hiding it:

curl "http://localhost:18180/v1/skeptic?subject=GLP1_Agonists&predicate=cardiovascular_benefit"

Response shows all competing claims:

{
  "status": "Contested",
  "conflict_score": 0.72,
  "claims": [
    {"value": {"type": "Boolean", "value": true}, "weight_share": 0.48, "assertion_count": 1},
    {"value": {"type": "Boolean", "value": false}, "weight_share": 0.52, "assertion_count": 1}
  ],
  "candidates_count": 2
}

Key insight: Instead of silently picking a winner, you see the disagreement. This is critical for health/finance domains where hiding conflict is dangerous.

7. Authority Tiers (Source-Class Resolution)

Different sources have different authority. A regulatory filing (FDA) outweighs an anecdotal tweet. The Layered endpoint shows per-tier consensus.

Query with Layered Consensus

The conflict example creates assertions with different source_class values (Clinical vs Anecdotal). The Layered endpoint shows how each tier resolves independently:

curl "http://localhost:18180/v1/layered?subject=GLP1_Agonists&predicate=cardiovascular_benefit"

Response shows tier-by-tier resolution:

{
  "tiers": [
    {"tier": 1, "source_class": "Clinical", "winner": {"object": {"type": "Boolean", "value": true}}, "conflict_score": 0.0},
    {"tier": 5, "source_class": "Anecdotal", "winner": {"object": {"type": "Boolean", "value": false}}, "conflict_score": 0.0}
  ],
  "overall_winner": {"object": {"type": "Boolean", "value": true}},
  "overall_conflict_score": 0.85
}

Key insight: Clinical tier (peer-reviewed research) wins despite Anecdotal tier (social media) disagreeing. The overall_conflict_score tells you the tiers disagree.

8. Distributed Mode (Cluster Node)

StemeDB supports horizontal scaling across multiple nodes. Each node runs SWIM membership for discovery, range sharding for data distribution, and a Gateway for request routing.

Start a Cluster Node

cargo run --package stemedb-cluster --bin stemedb-node

The node starts on http://localhost:18181 (Gateway API) and 127.0.0.1:18182 (RPC).

Check Cluster Health

curl http://localhost:18181/v1/health
# {"healthy":true,"reachable_nodes":0,"joined":true}

See Cluster Topology

curl http://localhost:18181/v1/cluster/status

Response shows shards and nodes:

{
  "node_count": 0,
  "shard_count": 4,
  "meta_version": 1,
  "nodes": []
}

Test Subject Routing

See which shard a subject maps to:

curl "http://localhost:18181/v1/route?subject=Tesla_Inc"
# {"subject":"Tesla_Inc","shard_id":0,"replicas":["abc12345"]}

curl "http://localhost:18181/v1/route?subject=Bitcoin"
# {"subject":"Bitcoin","shard_id":3,"replicas":["abc12345"]}

Different subjects hash to different shards for load distribution.

Inspect a Shard

curl http://localhost:18181/v1/shards/0

Response shows shard metadata:

{
  "shard_id": 0,
  "replicas": ["abc12345"],
  "size_bytes": 0,
  "assertion_count": 0,
  "generation": 1
}

Note: The cluster node demonstrates routing topology. Full assertion storage requires running stemedb-api nodes as backends (integration in progress).

What's Next?

Goal Resource
Understand the vision vision.md
See real use cases use-cases/README.md
Use the Go SDK sdk/go/steme/README.md
Build AI agents sdk/go/adk/README.md
Understand architecture architecture.md
API reference crates/stemedb-api/README.md
Distributed architecture docs/research/distributed-write-path.md

Common Issues

Build fails

rustup update stable
cargo clean
cargo build --workspace

Server won't start (port in use)

# Use a different port
STEMEDB_BIND_ADDR=127.0.0.1:18190 cargo run --package stemedb-api

Validation script fails

Check the server log in the temp directory:

cat tmp/validate-*/server.log

Query returns empty results

The ingestion worker runs asynchronously. If you're writing directly to the WAL (not via API), wait ~500ms before querying.

Environment Variables

Single-Node API (stemedb-api)

Variable Default Description
STEMEDB_BIND_ADDR 127.0.0.1:18180 HTTP server address
STEMEDB_WAL_DIR data/wal Write-ahead log directory
STEMEDB_DB_DIR data/db KV store directory
STEMEDB_METER_ENABLED true Enable economic throttling

Cluster Node (stemedb-node)

Variable Default Description
STEMEDB_NODE_API_ADDR 127.0.0.1:18181 Gateway HTTP address
STEMEDB_NODE_RPC_ADDR 127.0.0.1:18182 gRPC sync address
STEMEDB_SEED_NODES (empty) Comma-separated seed node RPC addresses
STEMEDB_NUM_SHARDS 4 Number of shards (power of 2 recommended)
STEMEDB_REPLICATION_FACTOR 1 Replica count per shard
STEMEDB_DATACENTER (empty) Datacenter/region label