stemedb/ai-lookup/services/materializer.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

5.4 KiB

Materializer

Last Updated: 2026-01-31 Confidence: High Status: Implemented

Summary

The Materializer is a background worker that pre-computes winning assertions for each subject+predicate pair, storing them at MV:{subject}:{predicate} for O(1) reads. It bridges the gap between O(N) lens resolution and sub-millisecond query latency.

Key Facts:

  • Scans all SP: compound indexes to discover pairs
  • Resolves each pair through an AsyncLens (default: VoteAwareConsensus)
  • Stores MaterializedView with winner + provenance metadata
  • step() for one-shot, run() for polling loop, run_notified() for event-driven mode
  • Event-driven: IngestWorker signals tokio::sync::Notify on new data; Materializer reacts immediately
  • QueryEngine uses fast-path: checks MV: key before falling back to SP: index
  • Error-resilient: individual pair failures are logged and skipped

File Pointers:

  • crates/stemedb-query/src/materializer.rs - Materializer worker
  • crates/stemedb-core/src/types.rs - MaterializedView type

Storage Layout

Key Pattern Value Purpose
MV:{subject}:{predicate} Serialized MaterializedView Pre-computed winner + metadata

MaterializedView Type

pub struct MaterializedView {
    pub winner: Assertion,              // The resolved winner
    pub lens_name: String,              // Which lens produced this (e.g., "VoteAwareConsensus")
    pub resolution_confidence: f32,     // Confidence in the resolution [0.0, 1.0]
    pub candidates_count: usize,        // How many candidates were considered
    pub materialized_at: u64,           // When this view was last computed
}

Materializer Interface

pub struct Materializer<S> {
    store: Arc<S>,
    index_store: GenericIndexStore<Arc<S>>,
    lens: Box<dyn AsyncLens>,
}

impl<S: KVStore + 'static> Materializer<S> {
    /// Create with any AsyncLens implementation
    pub fn new(store: Arc<S>, lens: Box<dyn AsyncLens>) -> Self;

    /// One full materialization pass over all SP: pairs
    pub async fn step(&self) -> Result<MaterializeReport>;

    /// Materialize a single subject+predicate pair
    pub async fn materialize_pair(&self, subject: &str, predicate: &str)
        -> Result<Option<MaterializedView>>;

    /// Read a pre-computed view (O(1))
    pub async fn get_materialized_view(&self, subject: &str, predicate: &str)
        -> Result<Option<MaterializedView>>;

    /// Run continuously with configurable interval (polling mode)
    pub async fn run(&self, interval: Duration);

    /// Run in event-driven mode, triggered by IngestWorker notifications
    pub async fn run_notified(&self, notify: Arc<Notify>, max_interval: Duration);
}

Read Path Integration

The QueryEngine automatically uses the fast path when both subject and predicate are specified.

Fast Path (O(1)):

QueryEngine::execute() -> MV:{subject}:{predicate} -> MaterializedView.winner -> QueryResult

Slow Path (O(N)):

QueryEngine::execute() -> SP:{subject}:{predicate} -> [H:{hash}...] -> candidates -> filter -> QueryResult

The fast path is used when a materialized view exists and the winner matches query filters (lifecycle, epoch). The slow path is the fallback when no MV exists, the winner doesn't match filters, or only a subject is specified.

File: crates/stemedb-query/src/engine.rstry_fast_path() method

Event-Driven Mode

The Materializer supports two operating modes:

Polling mode (run(interval)): Fixed-interval passes. Simple but wastes cycles when idle and adds latency after writes.

Event-driven mode (run_notified(notify, max_interval)): The IngestWorker signals a tokio::sync::Notify after each successful record ingestion. The Materializer awaits this signal, running a pass immediately when new data arrives. A max_interval timeout acts as a safety net for missed notifications.

IngestWorker::step() -> notify.notify_one() -> Materializer::run_notified() wakes -> step()

Wiring:

let notify = Arc::new(tokio::sync::Notify::new());
let worker = IngestWorker::new(journal, store.clone()).await?.with_notify(Arc::clone(&notify));
let materializer = Materializer::new(store, Box::new(lens));
// In separate tasks:
tokio::spawn(async move { worker.run().await });
tokio::spawn(async move { materializer.run_notified(notify, Duration::from_secs(30)).await });

File: crates/stemedb-ingest/src/worker.rswith_notify() method

Design Rationale

Why a Background Worker?

Inline materialization (on every write) would:

  1. Add latency to the write path
  2. Create contention when many agents write simultaneously
  3. Couple write and read concerns

The background worker approach:

  1. Keeps the write path fast (append-only)
  2. Batches resolution work efficiently
  3. Tolerates temporary staleness (eventual consistency)

Why Store Metadata?

The MaterializedView includes lens_name, confidence, candidates_count, and materialized_at because:

  1. Provenance: Agents can verify how truth was determined
  2. Debugging: "Why does the system think Tesla's revenue is X?"
  3. Staleness detection: Readers can check materialized_at to decide if a slow-path re-resolution is needed
  • Ballot Box - Vote data consumed by the Materializer
  • Storage - KV layout and key patterns
  • Architecture - Section 3 (Write Path) and Section 4 (Read Path)