stemedb/ai-lookup/services/ingestor.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

2.2 KiB

Ingestor Service

Crate: stemedb-ingest Status: Implemented (Phase 1)

Purpose

The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to sled using content-addressed keys.

Architecture

[WAL Journal] ---> [IngestWorker] ---> [KVStore (sled)]
                         |
                         v
                   [Subject Index]

Key Components

RecordType

Discriminator for WAL payloads (8-byte aligned header):

  • Assertion = 0 - Knowledge claims
  • Vote = 1 - Consensus votes
  • Epoch = 2 - Paradigm definitions

Storage Layout

Key Pattern Value Description
H:{blake3_hash} Serialized Assertion Content-addressed assertion store
V:{assertion_hash}:{vote_hash} Serialized Vote Votes on assertions
E:{epoch_id_hex} Serialized Epoch Epoch definitions
S:{subject} BLAKE3 hash bytes Subject adjacency index

Usage

use stemedb_ingest::{Ingestor, serialize_assertion};
use stemedb_wal::Journal;
use stemedb_storage::SledStore;

// Create components
let journal = Arc::new(Mutex::new(Journal::open("./wal")?));
let store = Arc::new(SledStore::open("./db")?);

// Create and start ingestor
let mut ingestor = Ingestor::new(journal.clone(), store);
ingestor.start(); // Spawns background task

// Write to WAL (records will be ingested automatically)
let assertion = Assertion { ... };
let payload = serialize_assertion(&assertion)?;
journal.lock().await.append(payload)?;

Serialization

Records are serialized with an 8-byte header to maintain rkyv alignment:

[type: u8][padding: 7 bytes][rkyv payload...]

Helper functions:

  • serialize_assertion(&Assertion) -> Result<Vec<u8>>
  • serialize_vote(&Vote) -> Result<Vec<u8>>
  • serialize_epoch(&Epoch) -> Result<Vec<u8>>

Testing

The ingestor has integration tests covering:

  • Single assertion ingestion
  • Vote ingestion
  • Epoch ingestion
  • Multiple record processing
  • Subject index creation