jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI

Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 01:24:14 -07:00

7.5 KiB

Raw Blame History

Quick Start

Get StemeDB running and validated in under 5 minutes.

Prerequisites

Rust 1.75+ (rustup update stable)
curl (for validation)

1. Validate It Works

# Clone and enter
git clone <repo-url>
cd stemedb

# Run end-to-end validation (builds, starts server, asserts, queries, shuts down)
make validate

Expected output:

==========================================
  StemeDB Validation
==========================================

[PASS] Build complete
[PASS] Server is healthy
[PASS] Health check passed
[PASS] Assertion created: abc123...
[PASS] Query returned correct data
[PASS] Lens query (Recency) works

==========================================
  All validation checks passed!
==========================================

If you see "All validation checks passed!" - StemeDB is working correctly.

2. Start the Server

cargo run --package stemedb-api

The server starts on http://localhost:18180.

3. Explore the API

Open the Swagger UI for interactive documentation:

http://localhost:18180/swagger-ui

Or check health via curl:

curl http://localhost:18180/v1/health
# {"status":"healthy","version":"0.1.0","assertions_count":0}

4. Create Your First Assertion

Using the Go SDK (recommended):

cd sdk/go/examples/basic
go run main.go

Or via curl (requires generating Ed25519 signatures):

# Generate a signed assertion
cargo run --package stemedb-api --example gen_test_assertion > /tmp/assertion.json

# Submit it
curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d @/tmp/assertion.json

5. Query It Back

# Query by subject and predicate
curl "http://localhost:18180/v1/query?subject=StemeDB_Validation&predicate=test_status"

# Query with a lens (conflict resolution)
curl "http://localhost:18180/v1/query?subject=StemeDB_Validation&predicate=test_status&lens=Recency"

6. See Conflict in Action (The "Git for Truth" Moment)

Episteme stores Claims, not Facts. When multiple agents assert conflicting values, the Skeptic endpoint shows you all competing claims instead of picking a winner.

Create Conflicting Assertions

Using the Go SDK, create assertions with different claims about the same subject:

cd sdk/go/examples/conflict
go run main.go

Query with Skeptic

The Skeptic endpoint reveals disagreement instead of hiding it:

curl "http://localhost:18180/v1/skeptic?subject=GLP1_Agonists&predicate=cardiovascular_benefit"

Response shows all competing claims:

{
  "status": "Contested",
  "conflict_score": 0.72,
  "claims": [
    {"value": {"type": "Boolean", "value": true}, "weight_share": 0.48, "assertion_count": 1},
    {"value": {"type": "Boolean", "value": false}, "weight_share": 0.52, "assertion_count": 1}
  ],
  "candidates_count": 2
}

Key insight: Instead of silently picking a winner, you see the disagreement. This is critical for health/finance domains where hiding conflict is dangerous.

7. Authority Tiers (Source-Class Resolution)

Different sources have different authority. A regulatory filing (FDA) outweighs an anecdotal tweet. The Layered endpoint shows per-tier consensus.

Query with Layered Consensus

The conflict example creates assertions with different source_class values (Clinical vs Anecdotal). The Layered endpoint shows how each tier resolves independently:

curl "http://localhost:18180/v1/layered?subject=GLP1_Agonists&predicate=cardiovascular_benefit"

Response shows tier-by-tier resolution:

{
  "tiers": [
    {"tier": 1, "source_class": "Clinical", "winner": {"object": {"type": "Boolean", "value": true}}, "conflict_score": 0.0},
    {"tier": 5, "source_class": "Anecdotal", "winner": {"object": {"type": "Boolean", "value": false}}, "conflict_score": 0.0}
  ],
  "overall_winner": {"object": {"type": "Boolean", "value": true}},
  "overall_conflict_score": 0.85
}

Key insight: Clinical tier (peer-reviewed research) wins despite Anecdotal tier (social media) disagreeing. The overall_conflict_score tells you the tiers disagree.

8. Distributed Mode (Cluster Node)

StemeDB supports horizontal scaling across multiple nodes. Each node runs SWIM membership for discovery, range sharding for data distribution, and a Gateway for request routing.

Start a Cluster Node

cargo run --package stemedb-cluster --bin stemedb-node

The node starts on http://localhost:18181 (Gateway API) and 127.0.0.1:18182 (RPC).

Check Cluster Health

curl http://localhost:18181/v1/health
# {"healthy":true,"reachable_nodes":0,"joined":true}

See Cluster Topology

curl http://localhost:18181/v1/cluster/status

Response shows shards and nodes:

{
  "node_count": 0,
  "shard_count": 4,
  "meta_version": 1,
  "nodes": []
}

Test Subject Routing

See which shard a subject maps to:

curl "http://localhost:18181/v1/route?subject=Tesla_Inc"
# {"subject":"Tesla_Inc","shard_id":0,"replicas":["abc12345"]}

curl "http://localhost:18181/v1/route?subject=Bitcoin"
# {"subject":"Bitcoin","shard_id":3,"replicas":["abc12345"]}

Different subjects hash to different shards for load distribution.

Inspect a Shard

curl http://localhost:18181/v1/shards/0

Response shows shard metadata:

{
  "shard_id": 0,
  "replicas": ["abc12345"],
  "size_bytes": 0,
  "assertion_count": 0,
  "generation": 1
}

Note: The cluster node demonstrates routing topology. Full assertion storage requires running stemedb-api nodes as backends (integration in progress).

What's Next?

Goal	Resource
Understand the vision	vision.md
See real use cases	use-cases/README.md
Use the Go SDK	sdk/go/steme/README.md
Build AI agents	sdk/go/adk/README.md
Understand architecture	architecture.md
API reference	crates/stemedb-api/README.md
Distributed architecture	docs/research/distributed-write-path.md

Common Issues

Build fails

rustup update stable
cargo clean
cargo build --workspace

Server won't start (port in use)

# Use a different port
STEMEDB_BIND_ADDR=127.0.0.1:18190 cargo run --package stemedb-api

Validation script fails

Check the server log in the temp directory:

cat tmp/validate-*/server.log

Query returns empty results

The ingestion worker runs asynchronously. If you're writing directly to the WAL (not via API), wait ~500ms before querying.

Environment Variables

Single-Node API (`stemedb-api`)

Variable	Default	Description
`STEMEDB_BIND_ADDR`	`127.0.0.1:18180`	HTTP server address
`STEMEDB_WAL_DIR`	`data/wal`	Write-ahead log directory
`STEMEDB_DB_DIR`	`data/db`	KV store directory
`STEMEDB_METER_ENABLED`	`true`	Enable economic throttling

Cluster Node (`stemedb-node`)

Variable	Default	Description
`STEMEDB_NODE_API_ADDR`	`127.0.0.1:18181`	Gateway HTTP address
`STEMEDB_NODE_RPC_ADDR`	`127.0.0.1:18182`	gRPC sync address
`STEMEDB_SEED_NODES`	(empty)	Comma-separated seed node RPC addresses
`STEMEDB_NUM_SHARDS`	`4`	Number of shards (power of 2 recommended)
`STEMEDB_REPLICATION_FACTOR`	`1`	Replica count per shard
`STEMEDB_DATACENTER`	(empty)	Datacenter/region label

7.5 KiB Raw Blame History