stemedb/ai-lookup/services/materializer.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

142 lines
5.4 KiB
Markdown

# Materializer
**Last Updated:** 2026-01-31
**Confidence:** High
**Status:** Implemented
## Summary
The Materializer is a background worker that pre-computes winning assertions for each subject+predicate pair, storing them at `MV:{subject}:{predicate}` for O(1) reads. It bridges the gap between O(N) lens resolution and sub-millisecond query latency.
**Key Facts:**
- Scans all `SP:` compound indexes to discover pairs
- Resolves each pair through an `AsyncLens` (default: VoteAwareConsensus)
- Stores `MaterializedView` with winner + provenance metadata
- `step()` for one-shot, `run()` for polling loop, `run_notified()` for event-driven mode
- Event-driven: IngestWorker signals `tokio::sync::Notify` on new data; Materializer reacts immediately
- QueryEngine uses fast-path: checks `MV:` key before falling back to `SP:` index
- Error-resilient: individual pair failures are logged and skipped
**File Pointers:**
- `crates/stemedb-query/src/materializer.rs` - Materializer worker
- `crates/stemedb-core/src/types.rs` - MaterializedView type
## Storage Layout
| Key Pattern | Value | Purpose |
|-------------|-------|---------|
| `MV:{subject}:{predicate}` | Serialized `MaterializedView` | Pre-computed winner + metadata |
## MaterializedView Type
```rust
pub struct MaterializedView {
pub winner: Assertion, // The resolved winner
pub lens_name: String, // Which lens produced this (e.g., "VoteAwareConsensus")
pub resolution_confidence: f32, // Confidence in the resolution [0.0, 1.0]
pub candidates_count: usize, // How many candidates were considered
pub materialized_at: u64, // When this view was last computed
}
```
## Materializer Interface
```rust
pub struct Materializer<S> {
store: Arc<S>,
index_store: GenericIndexStore<Arc<S>>,
lens: Box<dyn AsyncLens>,
}
impl<S: KVStore + 'static> Materializer<S> {
/// Create with any AsyncLens implementation
pub fn new(store: Arc<S>, lens: Box<dyn AsyncLens>) -> Self;
/// One full materialization pass over all SP: pairs
pub async fn step(&self) -> Result<MaterializeReport>;
/// Materialize a single subject+predicate pair
pub async fn materialize_pair(&self, subject: &str, predicate: &str)
-> Result<Option<MaterializedView>>;
/// Read a pre-computed view (O(1))
pub async fn get_materialized_view(&self, subject: &str, predicate: &str)
-> Result<Option<MaterializedView>>;
/// Run continuously with configurable interval (polling mode)
pub async fn run(&self, interval: Duration);
/// Run in event-driven mode, triggered by IngestWorker notifications
pub async fn run_notified(&self, notify: Arc<Notify>, max_interval: Duration);
}
```
## Read Path Integration
The `QueryEngine` automatically uses the fast path when both subject and predicate are specified.
**Fast Path (O(1)):**
```
QueryEngine::execute() -> MV:{subject}:{predicate} -> MaterializedView.winner -> QueryResult
```
**Slow Path (O(N)):**
```
QueryEngine::execute() -> SP:{subject}:{predicate} -> [H:{hash}...] -> candidates -> filter -> QueryResult
```
The fast path is used when a materialized view exists and the winner matches query filters (lifecycle, epoch). The slow path is the fallback when no MV exists, the winner doesn't match filters, or only a subject is specified.
**File:** `crates/stemedb-query/src/engine.rs``try_fast_path()` method
## Event-Driven Mode
The Materializer supports two operating modes:
**Polling mode** (`run(interval)`): Fixed-interval passes. Simple but wastes cycles when idle and adds latency after writes.
**Event-driven mode** (`run_notified(notify, max_interval)`): The IngestWorker signals a `tokio::sync::Notify` after each successful record ingestion. The Materializer awaits this signal, running a pass immediately when new data arrives. A `max_interval` timeout acts as a safety net for missed notifications.
```
IngestWorker::step() -> notify.notify_one() -> Materializer::run_notified() wakes -> step()
```
**Wiring:**
```rust
let notify = Arc::new(tokio::sync::Notify::new());
let worker = IngestWorker::new(journal, store.clone()).await?.with_notify(Arc::clone(&notify));
let materializer = Materializer::new(store, Box::new(lens));
// In separate tasks:
tokio::spawn(async move { worker.run().await });
tokio::spawn(async move { materializer.run_notified(notify, Duration::from_secs(30)).await });
```
**File:** `crates/stemedb-ingest/src/worker.rs``with_notify()` method
## Design Rationale
### Why a Background Worker?
Inline materialization (on every write) would:
1. Add latency to the write path
2. Create contention when many agents write simultaneously
3. Couple write and read concerns
The background worker approach:
1. Keeps the write path fast (append-only)
2. Batches resolution work efficiently
3. Tolerates temporary staleness (eventual consistency)
### Why Store Metadata?
The `MaterializedView` includes `lens_name`, `confidence`, `candidates_count`, and `materialized_at` because:
1. **Provenance:** Agents can verify how truth was determined
2. **Debugging:** "Why does the system think Tesla's revenue is X?"
3. **Staleness detection:** Readers can check `materialized_at` to decide if a slow-path re-resolution is needed
## Related Topics
- [Ballot Box](./ballot-box.md) - Vote data consumed by the Materializer
- [Storage](./storage.md) - KV layout and key patterns
- [Architecture](../../architecture.md) - Section 3 (Write Path) and Section 4 (Read Path)