stemedb/crates/stemedb-api/README.md
jordan 3320c24afa feat: WAL hardening (Phase 5B) - CRC32C, crash recovery, group commit, log rotation
Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.

Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 12:36:35 -07:00

179 lines
4.6 KiB
Markdown

# stemedb-api
HTTP API for Episteme (StemeDB) - a probabilistic knowledge graph database.
## Architecture
The API follows the standard axum pattern:
- **DTOs** (`dto.rs`) - JSON request/response types with hex-encoded binary data
- **Handlers** (`handlers/`) - Thin HTTP handlers that delegate to underlying engines
- **State** (`state.rs`) - Shared application state (Journal, Store)
- **Router** (`lib.rs`) - axum router with OpenAPI support via utoipa
## Write Path
```
POST /v1/assert → DTO → Assertion → serialize → append to WAL → return hash
```
## Read Path
```
GET /v1/query → QueryParams → Query → QueryEngine → Lens (optional) → DTOs
```
## Running the Server
```bash
# Start the API server (defaults to http://127.0.0.1:3000)
cargo run --package stemedb-api
# With custom configuration
STEMEDB_WAL_DIR=./my-wal STEMEDB_DB_DIR=./my-db STEMEDB_BIND_ADDR=0.0.0.0:8080 cargo run --package stemedb-api
```
The server automatically:
1. Opens Journal (WAL) and HybridStore (KV storage)
2. Spawns IngestWorker background task to tail WAL
3. Starts HTTP server with OpenAPI documentation
## API Documentation
Once the server is running, visit:
```
http://127.0.0.1:3000/swagger-ui
```
This provides interactive OpenAPI documentation for all endpoints.
## Endpoints
### POST /v1/assert
Create a new assertion.
**Request:**
```json
{
"subject": "Tesla_Inc",
"predicate": "has_revenue",
"object": {
"type": "Number",
"value": 96.7
},
"confidence": 0.95,
"signatures": [{
"agent_id": "0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20",
"signature": "0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40",
"timestamp": 1706745600
}],
"source_hash": "0000000000000000000000000000000000000000000000000000000000000000"
}
```
**Response:**
```json
{
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"status": "created"
}
```
### POST /v1/vote
Create a vote on an existing assertion.
**Request:**
```json
{
"assertion_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"agent_id": "0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20",
"weight": 0.8,
"signature": "0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40"
}
```
**Response:**
```json
{
"hash": "f3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"status": "created"
}
```
### GET /v1/query
Query assertions with optional filters and lens.
**Query Parameters:**
- `subject` (optional) - Filter by subject entity
- `predicate` (optional) - Filter by predicate/relation
- `lifecycle` (optional) - Filter by lifecycle stage (Proposed, UnderReview, Approved, Deprecated, Rejected)
- `epoch` (optional) - Filter by epoch (hex-encoded)
- `lens` (optional) - Apply lens for conflict resolution (Recency, Consensus, Authority, VoteAwareConsensus, TrustAwareAuthority)
- `limit` (optional) - Maximum results (default: 100)
**Example:**
```
GET /v1/query?subject=Tesla_Inc&predicate=has_revenue&lifecycle=Approved&lens=Recency
```
**Response:**
```json
{
"assertions": [{
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"subject": "Tesla_Inc",
"predicate": "has_revenue",
"object": {
"type": "Number",
"value": 96.7
},
"confidence": 0.95,
"lifecycle": "Approved",
"signatures": [...],
"timestamp": 1706745600,
"source_hash": "0000000000000000000000000000000000000000000000000000000000000000"
}],
"total_count": 1,
"has_more": false
}
```
### GET /v1/health
Health check endpoint.
**Response:**
```json
{
"status": "healthy",
"version": "0.1.0",
"assertions_count": 42
}
```
## Environment Variables
- `STEMEDB_WAL_DIR` - Directory for WAL files (default: `data/wal`)
- `STEMEDB_DB_DIR` - Directory for KV store (default: `data/db`)
- `STEMEDB_BIND_ADDR` - HTTP server bind address (default: `127.0.0.1:3000`)
## Binary Data Encoding
All binary data (hashes, signatures, agent IDs) use hex encoding in JSON:
- Assertion hash: 32 bytes (64 hex characters)
- Agent ID (public key): 32 bytes (64 hex characters)
- Signature: 64 bytes (128 hex characters)
- Source hash: 32 bytes (64 hex characters)
- Visual hash (optional): 8 bytes (16 hex characters)
## Critical Rules
- **Append-Only**: The API never mutates existing assertions. Create new ones.
- **Content-Addressed**: Assertion ID = BLAKE3 hash of content.
- **No Unwrap**: All error handling uses `?` with context (enforced by clippy).
- **Defensive Writes**: All writes go through WAL with fsync.