stemedb/ai-lookup/services/ingestor.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

84 lines
2.2 KiB
Markdown

# Ingestor Service
> **Crate:** `stemedb-ingest`
> **Status:** Implemented (Phase 1)
## Purpose
The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to sled using content-addressed keys.
## Architecture
```
[WAL Journal] ---> [IngestWorker] ---> [KVStore (sled)]
|
v
[Subject Index]
```
## Key Components
### RecordType
Discriminator for WAL payloads (8-byte aligned header):
- `Assertion = 0` - Knowledge claims
- `Vote = 1` - Consensus votes
- `Epoch = 2` - Paradigm definitions
### Storage Layout
| Key Pattern | Value | Description |
|-------------|-------|-------------|
| `H:{blake3_hash}` | Serialized Assertion | Content-addressed assertion store |
| `V:{assertion_hash}:{vote_hash}` | Serialized Vote | Votes on assertions |
| `E:{epoch_id_hex}` | Serialized Epoch | Epoch definitions |
| `S:{subject}` | BLAKE3 hash bytes | Subject adjacency index |
## Usage
```rust
use stemedb_ingest::{Ingestor, serialize_assertion};
use stemedb_wal::Journal;
use stemedb_storage::SledStore;
// Create components
let journal = Arc::new(Mutex::new(Journal::open("./wal")?));
let store = Arc::new(SledStore::open("./db")?);
// Create and start ingestor
let mut ingestor = Ingestor::new(journal.clone(), store);
ingestor.start(); // Spawns background task
// Write to WAL (records will be ingested automatically)
let assertion = Assertion { ... };
let payload = serialize_assertion(&assertion)?;
journal.lock().await.append(payload)?;
```
## Serialization
Records are serialized with an 8-byte header to maintain rkyv alignment:
```
[type: u8][padding: 7 bytes][rkyv payload...]
```
Helper functions:
- `serialize_assertion(&Assertion) -> Result<Vec<u8>>`
- `serialize_vote(&Vote) -> Result<Vec<u8>>`
- `serialize_epoch(&Epoch) -> Result<Vec<u8>>`
## Testing
The ingestor has integration tests covering:
- Single assertion ingestion
- Vote ingestion
- Epoch ingestion
- Multiple record processing
- Subject index creation
## Related
- [Storage Service](./storage.md) - KVStore trait and SledStore
- [Content Addressing](../patterns/content-addressing.md) - BLAKE3 hashing