feat: WAL hardening (Phase 5B) - CRC32C, crash recovery, group commit, log rotation

Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.

Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
jordan 2026-02-02 12:36:35 -07:00
parent 55349845d0
commit 3320c24afa
100 changed files with 5034 additions and 1629 deletions

View File

@ -30,7 +30,7 @@ The Arena simulation tests fundamental write→read paths through the system, bu
``` ```
Agent.sign_assertion() → write_assertion_to_wal() → Journal.append() Agent.sign_assertion() → write_assertion_to_wal() → Journal.append()
→ IngestWorker.step() → IngestWorker.ingest_assertion() → IngestWorker.step() → IngestWorker.ingest_assertion()
SledStore.put() → IndexStore.add_to_indexes() HybridStore.put() → IndexStore.add_to_indexes()
``` ```
**What Works:** **What Works:**

View File

@ -55,10 +55,10 @@ trust_store.decay_trust_ranks(current_timestamp, Some(custom_half_life)).await?;
```rust ```rust
use stemedb_lens::TrustAwareAuthorityLens; use stemedb_lens::TrustAwareAuthorityLens;
use stemedb_storage::{SledStore, GenericTrustRankStore}; use stemedb_storage::{HybridStore, GenericTrustRankStore};
use std::sync::Arc; use std::sync::Arc;
let store = SledStore::open("./data")?; let store = HybridStore::open("./data")?;
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);

View File

@ -36,7 +36,7 @@ pub enum StemeError {
InvalidSignature { agent: AgentId }, InvalidSignature { agent: AgentId },
#[error("storage error: {0}")] #[error("storage error: {0}")]
Storage(#[from] sled::Error), Storage(String),
#[error("serialization error: {0}")] #[error("serialization error: {0}")]
Serialization(String), Serialization(String),

View File

@ -53,11 +53,11 @@ pub trait VoteStore: Send + Sync {
## Usage Example ## Usage Example
```rust ```rust
use stemedb_storage::{SledStore, GenericVoteStore, VoteStore}; use stemedb_storage::{HybridStore, GenericVoteStore, VoteStore};
use stemedb_core::types::Vote; use stemedb_core::types::Vote;
// Create vote store backed by sled // Create vote store backed by HybridStore (fjall + redb)
let kv_store = SledStore::open("./data")?; let kv_store = HybridStore::open("./data")?;
let vote_store = GenericVoteStore::new(kv_store); let vote_store = GenericVoteStore::new(kv_store);
// High-velocity vote ingestion // High-velocity vote ingestion

View File

@ -5,12 +5,12 @@
## Purpose ## Purpose
The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to sled using content-addressed keys. The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to the HybridStore (fjall + redb) using content-addressed keys.
## Architecture ## Architecture
``` ```
[WAL Journal] ---> [IngestWorker] ---> [KVStore (sled)] [WAL Journal] ---> [IngestWorker] ---> [KVStore (HybridStore)]
| |
v v
[Subject Index] [Subject Index]
@ -39,11 +39,11 @@ Discriminator for WAL payloads (8-byte aligned header):
```rust ```rust
use stemedb_ingest::{Ingestor, serialize_assertion}; use stemedb_ingest::{Ingestor, serialize_assertion};
use stemedb_wal::Journal; use stemedb_wal::Journal;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
// Create components // Create components
let journal = Arc::new(Mutex::new(Journal::open("./wal")?)); let journal = Arc::new(Mutex::new(Journal::open("./wal")?));
let store = Arc::new(SledStore::open("./db")?); let store = Arc::new(HybridStore::open("./db")?);
// Create and start ingestor // Create and start ingestor
let mut ingestor = Ingestor::new(journal.clone(), store); let mut ingestor = Ingestor::new(journal.clone(), store);
@ -79,5 +79,5 @@ The ingestor has integration tests covering:
## Related ## Related
- [Storage Service](./storage.md) - KVStore trait and SledStore - [Storage Service](./storage.md) - KVStore trait and HybridStore (fjall + redb)
- [Content Addressing](../patterns/content-addressing.md) - BLAKE3 hashing - [Content Addressing](../patterns/content-addressing.md) - BLAKE3 hashing

View File

@ -74,10 +74,10 @@ confidence = winner_weight / total_weight_across_all_candidates
**Example:** **Example:**
```rust ```rust
use stemedb_lens::VoteAwareConsensusLens; use stemedb_lens::VoteAwareConsensusLens;
use stemedb_storage::{SledStore, GenericVoteStore}; use stemedb_storage::{HybridStore, GenericVoteStore};
use std::sync::Arc; use std::sync::Arc;
let store = SledStore::open("./data").await?; let store = HybridStore::open("./data").await?;
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
@ -112,10 +112,10 @@ confidence = weighted_score // Direct weighted score
**Example:** **Example:**
```rust ```rust
use stemedb_lens::TrustAwareAuthorityLens; use stemedb_lens::TrustAwareAuthorityLens;
use stemedb_storage::{SledStore, GenericTrustRankStore}; use stemedb_storage::{HybridStore, GenericTrustRankStore};
use std::sync::Arc; use std::sync::Arc;
let store = SledStore::open("./data").await?; let store = HybridStore::open("./data").await?;
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);
@ -189,10 +189,10 @@ GET /v1/query?subject=Acme&predicate=lease_liability&lens=EpochAware
**Example:** **Example:**
```rust ```rust
use stemedb_lens::EpochAwareLens; use stemedb_lens::EpochAwareLens;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use std::sync::Arc; use std::sync::Arc;
let store = Arc::new(SledStore::open("./data").expect("store")); let store = Arc::new(HybridStore::open("./data").expect("store"));
// Default: filter superseded epochs, then pick most recent // Default: filter superseded epochs, then pick most recent
let lens = EpochAwareLens::with_recency(store.clone()); let lens = EpochAwareLens::with_recency(store.clone());
@ -250,10 +250,10 @@ GET /v1/skeptic?subject=Semaglutide&predicate=muscle_effect
**Example:** **Example:**
```rust ```rust
use stemedb_lens::SkepticLens; use stemedb_lens::SkepticLens;
use stemedb_storage::{SledStore, GenericVoteStore, GenericTrustRankStore}; use stemedb_storage::{HybridStore, GenericVoteStore, GenericTrustRankStore};
use std::sync::Arc; use std::sync::Arc;
let store = SledStore::open("./data").await?; let store = HybridStore::open("./data").await?;
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);

View File

@ -10,12 +10,14 @@ Episteme uses a Log-Structured, Content-Addressed storage model. Writes append t
**Key Facts:** **Key Facts:**
- Append-only (never mutate) - Append-only (never mutate)
- WAL for durability (fsync on write) - WAL for durability (fsync on write)
- KV store for indexes (sled MVP, trait-abstracted) - KV store: HybridStore (fjall for writes, redb for reads)
- Content-addressed by BLAKE3 hash - Content-addressed by BLAKE3 hash
**File Pointers:** **File Pointers:**
- `crates/stemedb-storage/src/traits.rs` - KVStore trait - `crates/stemedb-storage/src/traits.rs` - KVStore trait
- `crates/stemedb-storage/src/sled_backend.rs` - Sled implementation - `crates/stemedb-storage/src/hybrid_backend.rs` - HybridStore (routes to fjall or redb)
- `crates/stemedb-storage/src/fjall_backend.rs` - FjallStore (write-heavy keys)
- `crates/stemedb-storage/src/redb_backend.rs` - RedbStore (read-heavy keys)
- `crates/stemedb-storage/src/serde_helpers.rs` - Storage-layer serialize/deserialize helpers - `crates/stemedb-storage/src/serde_helpers.rs` - Storage-layer serialize/deserialize helpers
- `crates/stemedb-storage/src/vote_store.rs` - VoteStore (Ballot Box) - `crates/stemedb-storage/src/vote_store.rs` - VoteStore (Ballot Box)
- `crates/stemedb-storage/src/index_store.rs` - IndexStore (S: and SP: indexes) - `crates/stemedb-storage/src/index_store.rs` - IndexStore (S: and SP: indexes)

Binary file not shown.

View File

@ -1,10 +1,10 @@
# Sentinel Roadmap # Aphoria Roadmap
--- ---
## Phase 0: StemeDB Foundation ## Phase 0: StemeDB Foundation
Changes to the core database that Sentinel depends on. These ship before the CLI. Changes to the core database that Aphoria depends on. These ship before the CLI.
### 0.1 ConceptPath Type ### 0.1 ConceptPath Type
@ -53,7 +53,7 @@ GET /v1/concepts/suggest Suggested aliases (shared leaf detection)
## Phase 1: Authoritative Corpus ## Phase 1: Authoritative Corpus
Before Sentinel can find conflicts, Episteme needs the authoritative sources to conflict against. Before Aphoria can find conflicts, Episteme needs the authoritative sources to conflict against.
### 1.1 RFC Ingester ### 1.1 RFC Ingester
@ -94,13 +94,13 @@ For v1, manually curate a small set of vendor doc claims:
These are `vendor://{product}/{topic}/{claim}` at Tier 2. These are `vendor://{product}/{topic}/{claim}` at Tier 2.
This doesn't need to be exhaustive. It needs to cover the claims that Sentinel's extractors will actually find in code. This doesn't need to be exhaustive. It needs to cover the claims that Aphoria's extractors will actually find in code.
--- ---
## Phase 2: CLI Core ## Phase 2: CLI Core
The Sentinel binary itself. The Aphoria binary itself.
### 2.1 Project Walker ### 2.1 Project Walker
@ -174,7 +174,7 @@ The bridge handles:
- ConceptPath construction from extractor output - ConceptPath construction from extractor output
- Source hash computation (BLAKE3 of the file at scan time) - Source hash computation (BLAKE3 of the file at scan time)
- Source metadata encoding (file path, line number, extraction method) - Source metadata encoding (file path, line number, extraction method)
- Signing with the Sentinel agent's keypair - Signing with the Aphoria agent's keypair
### 2.4 Conflict Query ### 2.4 Conflict Query
@ -201,10 +201,10 @@ The Skeptic lens returns all claims for the concept across all aliased paths, wi
### 2.5 Report Output ### 2.5 Report Output
``` ```
$ sentinel scan ./citadeldb --format table $ aphoria scan ./citadeldb --format table
┌──────────────────────────────────────────────────────────────────────┐ ┌──────────────────────────────────────────────────────────────────────┐
Sentinel Report: citadeldb Aphoria Report: citadeldb │
│ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │ │ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │
├──────────┬───────────────────────────────────────┬──────────┬───────┤ ├──────────┬───────────────────────────────────────┬──────────┬───────┤
│ Verdict │ Concept │ Score │ Tier │ │ Verdict │ Concept │ Score │ Tier │
@ -219,12 +219,12 @@ Details:
BLOCK code://rust/citadeldb/auth/jwt/audience_validation BLOCK code://rust/citadeldb/auth/jwt/audience_validation
Your code: aud validation disabled (src/auth/jwt.rs:47) Your code: aud validation disabled (src/auth/jwt.rs:47)
RFC 7519: aud validation MUST be enabled (Tier 0) RFC 7519: aud validation MUST be enabled (Tier 0)
Action: Fix or acknowledge with: sentinel ack <path> --reason "..." Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
BLOCK code://rust/citadeldb/net/tls/cert_verification BLOCK code://rust/citadeldb/net/tls/cert_verification
Your code: verify = false (src/net/client.rs:23) Your code: verify = false (src/net/client.rs:23)
OWASP: verification required (Tier 1) OWASP: verification required (Tier 1)
Action: Fix or acknowledge with: sentinel ack <path> --reason "..." Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
FLAG code://rust/citadeldb/http/timeout FLAG code://rust/citadeldb/http/timeout
Your code: timeout = 0 (infinite) (config/production.yaml:8) Your code: timeout = 0 (infinite) (config/production.yaml:8)
@ -237,7 +237,7 @@ Output formats: `table` (default), `json`, `sarif` (for CI integration), `markdo
### 2.6 Acknowledge Command ### 2.6 Acknowledge Command
``` ```
$ sentinel ack code://rust/citadeldb/auth/jwt/audience_validation \ $ aphoria ack code://rust/citadeldb/auth/jwt/audience_validation \
--reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003." --reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003."
``` ```
@ -256,27 +256,27 @@ The conflict still exists in Episteme, but the acknowledgment is recorded. Next
### 3.1 Claude Code Skill ### 3.1 Claude Code Skill
A `/sentinel` skill that wraps the CLI: A `/aphoria` skill that wraps the CLI:
``` ```
/sentinel scan Scan current project, report conflicts /aphoria scan Scan current project, report conflicts
/sentinel scan --fix Scan and offer to fix each conflict /aphoria scan --fix Scan and offer to fix each conflict
/sentinel ack <path> Acknowledge a conflict with a reason /aphoria ack <path> Acknowledge a conflict with a reason
/sentinel status Show current conflict summary /aphoria status Show current conflict summary
/sentinel diff Show new conflicts since last scan /aphoria diff Show new conflicts since last scan
``` ```
The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session. The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session.
### 3.2 Agent Pre-Flight Hook ### 3.2 Agent Pre-Flight Hook
A Claude Code hook that runs Sentinel before certain operations: A Claude Code hook that runs Aphoria before certain operations:
```json ```json
{ {
"hooks": { "hooks": {
"pre-commit": "sentinel scan --format sarif --exit-code", "pre-commit": "aphoria scan --format sarif --exit-code",
"pre-deploy": "sentinel scan --strict --exit-code" "pre-deploy": "aphoria scan --strict --exit-code"
} }
} }
``` ```
@ -285,7 +285,7 @@ A Claude Code hook that runs Sentinel before certain operations:
### 3.3 Alias Suggestion Workflow ### 3.3 Alias Suggestion Workflow
When Sentinel scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts: When Aphoria scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts:
``` ```
New concept detected: code://rust/newproject/auth/jwt/audience_validation New concept detected: code://rust/newproject/auth/jwt/audience_validation
@ -305,8 +305,8 @@ Accepting creates the alias. Deferring flags it for later review. Rejecting reco
### 4.1 GitHub Action ### 4.1 GitHub Action
```yaml ```yaml
- name: Sentinel Scan - name: Aphoria Scan
uses: orchard9/sentinel-action@v1 uses: orchard9/aphoria-action@v1
with: with:
episteme-url: ${{ secrets.EPISTEME_URL }} episteme-url: ${{ secrets.EPISTEME_URL }}
fail-on: block fail-on: block
@ -317,10 +317,10 @@ Publishes SARIF results to GitHub Security tab. BLOCK verdicts fail the check. F
### 4.2 PR Comment Bot ### 4.2 PR Comment Bot
On pull request, Sentinel scans the diff (not the whole project) and comments: On pull request, Aphoria scans the diff (not the whole project) and comments:
``` ```
## Sentinel Report ## Aphoria Report
This PR introduces 1 new conflict: This PR introduces 1 new conflict:
@ -328,15 +328,15 @@ This PR introduces 1 new conflict:
|------|----------|-------| |------|----------|-------|
| src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 | | src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 |
Run `sentinel ack` to acknowledge, or fix before merge. Run `aphoria ack` to acknowledge, or fix before merge.
``` ```
### 4.3 Baseline Mode ### 4.3 Baseline Mode
For existing projects with many conflicts, `sentinel baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem. For existing projects with many conflicts, `aphoria baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem.
``` ```
$ sentinel baseline $ aphoria baseline
Baseline recorded: 12 existing conflicts frozen. Baseline recorded: 12 existing conflicts frozen.
Future scans will only report new conflicts. Future scans will only report new conflicts.
``` ```
@ -347,7 +347,7 @@ Future scans will only report new conflicts.
### 5.1 Gap Detection ### 5.1 Gap Detection
When Sentinel extracts a claim and no authoritative source exists for that concept, log it as a gap: When Aphoria extracts a claim and no authoritative source exists for that concept, log it as a gap:
``` ```
GAP: code://rust/citadeldb/cache/redis/max_memory_policy GAP: code://rust/citadeldb/cache/redis/max_memory_policy
@ -363,11 +363,11 @@ When a gap is seen across N projects (configurable, default 3), dispatch a resea
2. Finds Redis official docs 2. Finds Redis official docs
3. Extracts normative claims: "default is `noeviction`, recommended `allkeys-lru` for cache use cases" 3. Extracts normative claims: "default is `noeviction`, recommended `allkeys-lru` for cache use cases"
4. Ingests as `vendor://redis/cache/max_memory_policy` at Tier 2 4. Ingests as `vendor://redis/cache/max_memory_policy` at Tier 2
5. Future Sentinel scans now have something to conflict against 5. Future Aphoria scans now have something to conflict against
### 5.3 Community Corpus Contributions ### 5.3 Community Corpus Contributions
Users who run Sentinel can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate: Users who run Aphoria can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate:
- "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries - "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries
- "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept - "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept
@ -381,7 +381,7 @@ Users who run Sentinel can opt in to contribute their alias mappings and acknowl
|-------|-------------|------------| |-------|-------------|------------|
| 0 | ConceptPath in StemeDB | concept-hierarchy spec | | 0 | ConceptPath in StemeDB | concept-hierarchy spec |
| 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 | | 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 |
| 2 | Sentinel CLI (scan, report, ack) | Phase 0, Phase 1 | | 2 | Aphoria CLI (scan, report, ack) | Phase 0, Phase 1 |
| 3 | Claude Code skill + hooks | Phase 2 | | 3 | Claude Code skill + hooks | Phase 2 |
| 4 | CI integration (GitHub Action, PR bot) | Phase 2 | | 4 | CI integration (GitHub Action, PR bot) | Phase 2 |
| 5 | Research agent loop | Phase 2, Phase 4 (gap data) | | 5 | Research agent loop | Phase 2, Phase 4 (gap data) |

View File

@ -1,4 +1,4 @@
# Sentinel Technical Spec # Aphoria Technical Spec
**Status:** Draft **Status:** Draft
**Date:** 2026-02-02 **Date:** 2026-02-02
@ -7,14 +7,14 @@
## Overview ## Overview
Sentinel is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources. Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.
``` ```
sentinel scan <project-root> [--config sentinel.toml] [--format table|json|sarif|markdown] aphoria scan <project-root> [--config aphoria.toml] [--format table|json|sarif|markdown]
sentinel ack <concept-path> --reason "..." aphoria ack <concept-path> --reason "..."
sentinel baseline aphoria baseline
sentinel diff aphoria diff
sentinel status aphoria status
``` ```
--- ---
@ -23,7 +23,7 @@ sentinel status
``` ```
┌──────────────────────────────────────────────────────────────┐ ┌──────────────────────────────────────────────────────────────┐
sentinel CLI │ aphoria CLI │
│ │ │ │
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │ │ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │
│ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │ │ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
@ -46,13 +46,13 @@ sentinel status
└──────────────────────────────────────────────────────────────┘ └──────────────────────────────────────────────────────────────┘
``` ```
Sentinel depends on: Aphoria depends on:
- `stemedb-core` (types: ConceptPath, Assertion, SourceClass) - `stemedb-core` (types: ConceptPath, Assertion, SourceClass)
- `stemedb-storage` (KVStore, IndexStore, AliasStore) - `stemedb-storage` (KVStore, IndexStore, AliasStore)
- `stemedb-ingest` (ingestion pipeline) - `stemedb-ingest` (ingestion pipeline)
- `stemedb-query` (query engine, lenses) - `stemedb-query` (query engine, lenses)
It does **not** depend on `stemedb-api`. Sentinel talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed). It does **not** depend on `stemedb-api`. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).
--- ---
@ -60,11 +60,11 @@ It does **not** depend on `stemedb-api`. Sentinel talks to Episteme directly thr
``` ```
crates/ crates/
sentinel/ aphoria/
Cargo.toml Cargo.toml
src/ src/
main.rs CLI entrypoint (clap) main.rs CLI entrypoint (clap)
config.rs sentinel.toml parsing config.rs aphoria.toml parsing
walker/ walker/
mod.rs Project walker orchestration mod.rs Project walker orchestration
language.rs Language detection language.rs Language detection
@ -96,7 +96,7 @@ crates/
## Configuration ## Configuration
`sentinel.toml` at project root (optional, sensible defaults): `aphoria.toml` at project root (optional, sensible defaults):
```toml ```toml
[project] [project]
@ -104,7 +104,7 @@ name = "citadeldb"
language = "rust" # auto-detected if omitted language = "rust" # auto-detected if omitted
[episteme] [episteme]
data_dir = "~/.sentinel/db" # local Episteme instance data_dir = "~/.aphoria/db" # local Episteme instance
# url = "http://localhost:3000" # future: remote instance # url = "http://localhost:3000" # future: remote instance
[thresholds] [thresholds]
@ -121,7 +121,7 @@ min_reasonable_ms = 1000 # flag timeouts below this
max_reasonable_ms = 300000 # flag timeouts above this max_reasonable_ms = 300000 # flag timeouts above this
[extractors.dep_versions] [extractors.dep_versions]
advisory_db = "~/.sentinel/advisory-db" # rustsec/advisory-db clone advisory_db = "~/.aphoria/advisory-db" # rustsec/advisory-db clone
[scan] [scan]
exclude = ["target/", "node_modules/", ".git/", "vendor/"] exclude = ["target/", "node_modules/", ".git/", "vendor/"]
@ -139,7 +139,7 @@ auto_accept_tier0 = true # auto-accept alias suggestions to Tier 0 sources
### Language Detection ### Language Detection
Priority order: Priority order:
1. Explicit `language` in `sentinel.toml` 1. Explicit `language` in `aphoria.toml`
2. Dominant language heuristic (count files by extension) 2. Dominant language heuristic (count files by extension)
3. Per-file extension mapping 3. Per-file extension mapping
@ -207,7 +207,7 @@ docker-compose.yml
``` ```
The project name comes from: The project name comes from:
1. `sentinel.toml` `project.name` 1. `aphoria.toml` `project.name`
2. `Cargo.toml` `[package] name` 2. `Cargo.toml` `[package] name`
3. `go.mod` module name (last segment) 3. `go.mod` module name (last segment)
4. `package.json` `name` 4. `package.json` `name`
@ -456,7 +456,7 @@ Value: Text("1.0.2")
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX" Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
``` ```
The advisory databases are downloaded locally and refreshed periodically. Sentinel doesn't call external APIs during scan. The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan.
### Extractor: cors_config ### Extractor: cors_config
@ -519,7 +519,7 @@ fn to_assertion(
"line": claim.line, "line": claim.line,
"matched_text": claim.matched_text, "matched_text": claim.matched_text,
"extractor": claim.concept_path.leaf(), "extractor": claim.concept_path.leaf(),
"scan_tool": "sentinel", "scan_tool": "aphoria",
"scan_version": env!("CARGO_PKG_VERSION"), "scan_version": env!("CARGO_PKG_VERSION"),
})); }));
@ -557,7 +557,7 @@ When code changes between scans, new assertions are created. Old assertions rema
Each scan is recorded as an assertion about itself: Each scan is recorded as an assertion about itself:
``` ```
Subject: sentinel://scan/{project_name}/{scan_id} Subject: aphoria://scan/{project_name}/{scan_id}
Predicate: completed Predicate: completed
Object: Text(json!({ Object: Text(json!({
"project": "citadeldb", "project": "citadeldb",
@ -570,7 +570,7 @@ Object: Text(json!({
})) }))
``` ```
This enables `sentinel diff` — compare two scan records and their associated assertions. This enables `aphoria diff` — compare two scan records and their associated assertions.
--- ---
@ -662,7 +662,7 @@ async fn check_conflict(
### Acknowledged Conflicts ### Acknowledged Conflicts
When a conflict has been acknowledged (via `sentinel ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG: When a conflict has been acknowledged (via `aphoria ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:
``` ```
ACK code://rust/citadeldb/auth/jwt/audience_validation ACK code://rust/citadeldb/auth/jwt/audience_validation
@ -689,9 +689,9 @@ SARIF (Static Analysis Results Interchange Format) is the standard for CI securi
"runs": [{ "runs": [{
"tool": { "tool": {
"driver": { "driver": {
"name": "sentinel", "name": "aphoria",
"version": "0.1.0", "version": "0.1.0",
"informationUri": "https://github.com/orchard9/sentinel" "informationUri": "https://github.com/orchard9/aphoria"
} }
}, },
"results": [{ "results": [{
@ -754,26 +754,26 @@ SARIF (Static Analysis Results Interchange Format) is the standard for CI securi
### Baseline ### Baseline
`sentinel baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts. `aphoria baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts.
Implementation: store the baseline scan ID in `.sentinel/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's. Implementation: store the baseline scan ID in `.aphoria/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's.
``` ```
.sentinel/ .aphoria/
baseline # scan ID of the baseline baseline # scan ID of the baseline
config.toml # symlink or copy of sentinel.toml config.toml # symlink or copy of aphoria.toml
agent.key # Ed25519 keypair for this project's Sentinel agent agent.key # Ed25519 keypair for this project's Aphoria agent
``` ```
### Diff ### Diff
`sentinel diff` shows: `aphoria diff` shows:
- New conflicts (in current scan but not baseline) - New conflicts (in current scan but not baseline)
- Resolved conflicts (in baseline but not current scan) - Resolved conflicts (in baseline but not current scan)
- Changed conflicts (same concept, different score or verdict) - Changed conflicts (same concept, different score or verdict)
``` ```
$ sentinel diff $ aphoria diff
NEW code://rust/citadeldb/cache/redis/max_connections NEW code://rust/citadeldb/cache/redis/max_connections
Your code: max_connections = 10000 (config/redis.yaml:5) Your code: max_connections = 10000 (config/redis.yaml:5)
@ -791,12 +791,12 @@ $ sentinel diff
## Agent Keypair ## Agent Keypair
Sentinel signs assertions with a per-project Ed25519 keypair stored in `.sentinel/agent.key`. Generated on first `sentinel scan` if it doesn't exist. Aphoria signs assertions with a per-project Ed25519 keypair stored in `.aphoria/agent.key`. Generated on first `aphoria scan` if it doesn't exist.
The keypair identifies "Sentinel scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables: The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
- Per-project audit trails ("which Sentinel agent found this?") - Per-project audit trails ("which Aphoria agent found this?")
- TrustRank per Sentinel instance (a well-calibrated Sentinel gains reputation) - TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation)
- Distinguishing human-authored assertions from Sentinel-extracted ones - Distinguishing human-authored assertions from Aphoria-extracted ones
--- ---
@ -804,15 +804,15 @@ The keypair identifies "Sentinel scanning project X" as a distinct agent in Epis
### Local Mode (Default) ### Local Mode (Default)
Sentinel ships with an embedded Episteme instance. No server needed. The database lives at `~/.sentinel/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`). Aphoria ships with an embedded Episteme instance. No server needed. The database lives at `~/.aphoria/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`).
The authoritative corpus (RFCs, OWASP) is also in the local instance. `sentinel init` bootstraps it. The authoritative corpus (RFCs, OWASP) is also in the local instance. `aphoria init` bootstraps it.
``` ```
$ sentinel init $ aphoria init
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested. Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
Downloading OWASP cheat sheets ... 89 assertions ingested. Downloading OWASP cheat sheets ... 89 assertions ingested.
Ready. Run `sentinel scan <project>` to begin. Ready. Run `aphoria scan <project>` to begin.
``` ```
### Remote Mode (Future) ### Remote Mode (Future)
@ -820,12 +820,12 @@ Ready. Run `sentinel scan <project>` to begin.
```toml ```toml
[episteme] [episteme]
url = "https://episteme.example.com" url = "https://episteme.example.com"
api_key = "${SENTINEL_API_KEY}" api_key = "${APHORIA_API_KEY}"
``` ```
In remote mode, Sentinel ingests into and queries from a shared Episteme instance. This enables: In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables:
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos") - Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
- Shared authoritative corpus (ingested once, used by all Sentinel agents) - Shared authoritative corpus (ingested once, used by all Aphoria agents)
- Centralized acknowledgment management - Centralized acknowledgment management
--- ---
@ -839,7 +839,7 @@ In remote mode, Sentinel ingests into and queries from a shared Episteme instanc
| 2 | BLOCK-level conflicts found (with `--exit-code`) | | 2 | BLOCK-level conflicts found (with `--exit-code`) |
| 3 | Scan error (file access, Episteme connection, etc.) | | 3 | Scan error (file access, Episteme connection, etc.) |
`--exit-code` enables non-zero exits. Without it, Sentinel always exits 0 (for interactive use where the report is the output, not the exit code). `--exit-code` enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code).
--- ---
@ -866,7 +866,7 @@ The performance bottleneck is I/O (reading files), not extraction (regex matchin
| `ignore` | File walking (respects .gitignore, fast) | | `ignore` | File walking (respects .gitignore, fast) |
| `regex` | Pattern matching in extractors | | `regex` | Pattern matching in extractors |
| `serde` + `serde_json` | Config parsing, JSON output | | `serde` + `serde_json` | Config parsing, JSON output |
| `toml` | sentinel.toml parsing | | `toml` | aphoria.toml parsing |
| `comfy-table` | Terminal table output | | `comfy-table` | Terminal table output |
| `stemedb-core` | Types | | `stemedb-core` | Types |
| `stemedb-storage` | Local KV store | | `stemedb-storage` | Local KV store |

View File

@ -1,8 +1,8 @@
# Sentinel # Aphoria
**A code-level truth linter powered by Episteme.** **A code-level truth linter powered by Episteme.**
Sentinel scans a codebase, extracts the decisions embedded in config and code, and checks them against authoritative sources. It finds the places where what your code *does* contradicts what the specs *say*. Aphoria scans a codebase, extracts the decisions embedded in config and code, and checks them against authoritative sources. It finds the places where what your code *does* contradicts what the specs *say*.
--- ---
@ -24,10 +24,10 @@ AI agents make this worse. An agent deploying code doesn't read the RFC. It pick
## The Solution ## The Solution
Sentinel gives codebases an epistemic audit trail. Aphoria gives codebases an epistemic audit trail.
``` ```
$ sentinel scan ./citadeldb $ aphoria scan ./citadeldb
Scanning citadeldb (rust) ... Scanning citadeldb (rust) ...
@ -49,7 +49,7 @@ Scanning citadeldb (rust) ...
3 conflicts found (2 BLOCK, 1 FLAG) 3 conflicts found (2 BLOCK, 1 FLAG)
``` ```
Sentinel doesn't lint syntax. It lints *epistemic drift* — the gap between what your code asserts and what authoritative sources say. Aphoria doesn't lint syntax. It lints *epistemic drift* — the gap between what your code asserts and what authoritative sources say.
## How It Works ## How It Works
@ -65,34 +65,34 @@ The concept hierarchy is the backbone. `code://rust/citadeldb/auth/jwt/audience_
**Engineering leads** who deploy AI agents and need a pre-flight check. "Before the agent merges this PR, did it contradict any RFCs?" **Engineering leads** who deploy AI agents and need a pre-flight check. "Before the agent merges this PR, did it contradict any RFCs?"
**Platform teams** building internal developer tooling. Sentinel integrates into CI as a step between lint and deploy. **Platform teams** building internal developer tooling. Aphoria integrates into CI as a step between lint and deploy.
**Security teams** who audit configs across multiple services. "Across all our projects, which ones skip certificate verification?" **Security teams** who audit configs across multiple services. "Across all our projects, which ones skip certificate verification?"
## What This Is Not ## What This Is Not
- **Not a linter.** Linters check syntax rules. Sentinel checks claims against external authoritative sources. - **Not a linter.** Linters check syntax rules. Aphoria checks claims against external authoritative sources.
- **Not a SAST tool.** SAST finds vulnerability patterns. Sentinel finds where code decisions contradict standards, which is a superset. - **Not a SAST tool.** SAST finds vulnerability patterns. Aphoria finds where code decisions contradict standards, which is a superset.
- **Not a replacement for code review.** It augments review by surfacing conflicts that humans miss because they haven't memorized every RFC. - **Not a replacement for code review.** It augments review by surfacing conflicts that humans miss because they haven't memorized every RFC.
## The Skill Integration ## The Skill Integration
Sentinel ships as both a CLI and a Claude Code skill. When working in a project: Aphoria ships as both a CLI and a Claude Code skill. When working in a project:
``` ```
/sentinel scan /aphoria scan
``` ```
The skill runs the CLI, ingests claims, queries for conflicts, and reports inline. The developer fixes the conflict or explicitly acknowledges it — which creates a new assertion: "engineering team decided to skip aud validation for internal services" (Tier 3, Expert). Now the disagreement is structured, documented, and visible next time anyone touches that code. The skill runs the CLI, ingests claims, queries for conflicts, and reports inline. The developer fixes the conflict or explicitly acknowledges it — which creates a new assertion: "engineering team decided to skip aud validation for internal services" (Tier 3, Expert). Now the disagreement is structured, documented, and visible next time anyone touches that code.
The acknowledge flow is important. Not every conflict is a bug. Sometimes the code is right and the RFC is too strict for the context. Sentinel doesn't force compliance. It forces *visibility*. The decision to deviate from a standard becomes a recorded, auditable, queryable fact — not an invisible default. The acknowledge flow is important. Not every conflict is a bug. Sometimes the code is right and the RFC is too strict for the context. Aphoria doesn't force compliance. It forces *visibility*. The decision to deviate from a standard becomes a recorded, auditable, queryable fact — not an invisible default.
## The Flywheel ## The Flywheel
Every project Sentinel scans adds claims to Episteme. Every acknowledged deviation adds structured context. Over time: Every project Aphoria scans adds claims to Episteme. Every acknowledged deviation adds structured context. Over time:
- Common false positives get suppressed (the alias "internal services can skip aud" gets registered across projects) - Common false positives get suppressed (the alias "internal services can skip aud" gets registered across projects)
- Common true positives get elevated (the same JWT misconfiguration across 50 projects becomes a systemic signal) - Common true positives get elevated (the same JWT misconfiguration across 50 projects becomes a systemic signal)
- The authoritative source corpus grows (new RFCs, new OWASP entries, new vendor docs get ingested by research agents triggered by gaps) - The authoritative source corpus grows (new RFCs, new OWASP entries, new vendor docs get ingested by research agents triggered by gaps)
The more projects Sentinel scans, the smarter it gets — not through ML, but through accumulated structured disagreement. The more projects Aphoria scans, the smarter it gets — not through ML, but through accumulated structured disagreement.

407
batteries/pre-aphoria.md Normal file
View File

@ -0,0 +1,407 @@
# Pre-Aphoria Validation Battery
**Purpose:** Verify stemedb behaves as documented before building ConceptPath and Aphoria on top of it. Every test maps to a claim the product makes or a code path Aphoria depends on.
**Test file:** `crates/stemedb-query/tests/battery_pre_aphoria.rs`
---
## Battery 1: The Semaglutide Scenario
Reproduces the exact example from `what-is-episteme.md`. Four sources, four tiers, one subject, conflicting claims. If this doesn't work, the product demo fails.
### 1.1 `test_semaglutide_four_sources_ingest_and_query`
Setup:
- Agent A signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis_warning")`, source_class=Regulatory, confidence=1.0, timestamp=T
- Agent B signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+1
- Agent C signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis")`, source_class=Anecdotal, confidence=0.2, timestamp=T+2
- Agent D signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+3
Ingest all four through WAL + IngestWorker.
Assert:
- All four assertions are stored (query with no lens returns 4 results)
- Authority lens (TrustAwareAuthority) winner is the Regulatory assertion (FDA)
- Recency lens winner is Agent D (most recent)
- Consensus lens groups by object value: "no_gastroparesis_signal" has 2 assertions, "gastroparesis" variants have 2
### 1.2 `test_semaglutide_skeptic_analysis`
Using the same four assertions from 1.1:
Assert:
- Skeptic lens `analyze()` returns `ConflictAnalysis` with:
- `candidates_count` = 4
- `claims.len()` >= 2 (at least two distinct object values)
- `status` = `Contested` (conflict_score >= 0.4)
- `conflict_score` > 0.3 (there is real disagreement between object values)
- The claim with object `"no_gastroparesis_signal"` has `assertion_count` = 2
- Claims are sorted descending by `weight_share`
### 1.3 `test_semaglutide_source_class_decay`
Using the same four assertions, all with timestamp 6 months ago:
Query with `source_class_decay: true`:
- Regulatory assertion (Tier 0): confidence unchanged (no half-life)
- Clinical assertions (Tier 1, 730-day half-life): confidence decayed slightly (~0.9 * 2^(-180/730) ~ 0.75)
- Anecdotal assertion (Tier 5, 30-day half-life): confidence decayed to near zero (~0.2 * 2^(-180/30) ~ 0.003)
Assert:
- After decay, the Anecdotal assertion's effective confidence is < 0.01
- After decay, the Regulatory assertion's confidence is exactly 1.0
- After decay, Clinical assertions' confidence is between 0.7 and 0.85
- Authority lens after decay still picks Regulatory as winner
### 1.4 `test_semaglutide_time_travel`
Using the same four assertions with staggered timestamps (T, T+100, T+200, T+300):
Query with `as_of: T+150`:
- Only assertions at T and T+100 are included
- Assert exactly 2 candidates
- Conflict landscape is different from the full query (only FDA + NEJM)
---
## Battery 2: The JWT Conflict Scenario
Reproduces the JWT outage story. Validates escalation — the claim that Episteme is an "active safety system."
### 2.1 `test_jwt_conflict_escalation_fires`
Setup:
- RFC 7519 (Tier 0, confidence 1.0): predicate=`aud_validation`, object=`Boolean(true)`
- Internal wiki (Tier 3, confidence 0.8): predicate=`aud_validation`, object=`Boolean(false)`
- Stack Overflow (Tier 5, confidence 0.6): predicate=`aud_validation`, object=`Boolean(false)`
- Approved runbook (Tier 2, confidence 0.95): predicate=`aud_validation`, object=`Boolean(true)`
Configure escalation policy:
```
name: "security-config"
min_conflict_score: 0.5
level: High
predicate_pattern: None
```
Ingest all four. Run materializer with escalation policies.
Assert:
- Escalation event is created (query `ESC:` prefix, find at least one)
- Event has `level` = `High`
- Event has `conflict_score` >= 0.5
- Event has correct subject and predicate
- Event `resolved` = false
### 2.2 `test_jwt_escalation_predicate_filter`
Same four assertions as 2.1. Two policies:
- Policy A: `predicate_pattern: Some("aud")`, `min_conflict_score: 0.3`, `level: Critical`
- Policy B: `predicate_pattern: Some("revenue")`, `min_conflict_score: 0.3`, `level: Medium`
Assert:
- Policy A fires (predicate `aud_validation` contains "aud")
- Policy B does NOT fire (predicate doesn't contain "revenue")
- Only one escalation event exists, with level `Critical`
### 2.3 `test_jwt_layered_lens_tier_agreement`
Same four assertions. Query with Layered Consensus lens.
Assert:
- Tier 0 result: winner object = `Boolean(true)` (RFC says validate)
- Tier 2 result: winner object = `Boolean(true)` (Runbook agrees)
- Tier 3 result: winner object = `Boolean(false)` (Wiki says skip)
- Tier 5 result: winner object = `Boolean(false)` (SO says skip)
- `overall_conflict_score` > 0.5 (cross-tier disagreement between 0/2 and 3/5)
- `overall_winner` comes from Tier 0 (highest authority)
---
## Battery 3: Decay Math Precision
Aphoria computes conflict scores after decay. If decay is wrong, every conflict score is wrong.
### 3.1 `test_decay_tier0_never_decays`
Regulatory assertion, confidence 0.95, timestamp 10 years ago.
Query with `source_class_decay: true`.
Assert: effective confidence is exactly 0.95 (unchanged).
### 3.2 `test_decay_tier1_exact_halflife`
Clinical assertion, confidence 1.0, timestamp exactly 730 days ago.
Query with `source_class_decay: true`.
Assert: effective confidence is 0.5 (within tolerance of 0.02).
### 3.3 `test_decay_tier1_two_halflives`
Clinical assertion, confidence 1.0, timestamp exactly 1460 days ago.
Query with `source_class_decay: true`.
Assert: effective confidence is 0.25 (within tolerance of 0.02).
### 3.4 `test_decay_tier5_exact_halflife`
Anecdotal assertion, confidence 1.0, timestamp exactly 30 days ago.
Query with `source_class_decay: true`.
Assert: effective confidence is 0.5 (within tolerance of 0.02).
### 3.5 `test_decay_tier5_three_halflives`
Anecdotal assertion, confidence 1.0, timestamp exactly 90 days ago.
Query with `source_class_decay: true`.
Assert: effective confidence is 0.125 (within tolerance of 0.02).
### 3.6 `test_decay_zero_confidence_stays_zero`
Assertion with confidence 0.0, any tier, any age.
Assert: effective confidence is 0.0 after decay (0 * anything = 0).
### 3.7 `test_decay_never_goes_negative`
Anecdotal assertion, confidence 0.01, timestamp 365 days ago (12+ half-lives).
Assert: effective confidence >= 0.0.
### 3.8 `test_decay_uses_as_of_for_age_calculation`
Two assertions, both at timestamp T=1000:
- Assertion A: Clinical, confidence 0.9
- Assertion B: Anecdotal, confidence 0.9
Query with `as_of: T + 730*86400` (exactly 730 days after assertions) and `source_class_decay: true`.
Assert:
- A's effective confidence ~ 0.45 (Clinical, one half-life)
- B's effective confidence ~ near zero (Anecdotal, 24+ half-lives at 30-day rate)
---
## Battery 4: Conflict Score Calibration
Two conflict score implementations exist. `compute_conflict_score` in `traits.rs` uses confidence variance. `calculate_conflict_score` in `skeptic/analysis.rs` uses Shannon entropy over object value groups. Both need validation.
### 4.1 `test_variance_conflict_score_unanimous`
5 assertions, all confidence 0.8.
`compute_conflict_score()` returns 0.0 (no variance).
### 4.2 `test_variance_conflict_score_maximum`
2 assertions, confidence 0.0 and 1.0.
`compute_conflict_score()` returns 1.0 (maximum variance).
### 4.3 `test_variance_conflict_score_moderate`
3 assertions, confidence 0.2, 0.5, 0.8.
`compute_conflict_score()` returns a value between 0.2 and 0.8.
### 4.4 `test_variance_conflict_score_single`
1 assertion. Returns 0.0.
### 4.5 `test_variance_conflict_score_empty`
0 assertions. Returns 0.0.
### 4.6 `test_skeptic_entropy_same_confidence_different_objects` [POTENTIAL BUG DETECTOR]
Three assertions, ALL with confidence 0.9:
- Object A: `Text("yes")`, confidence 0.9
- Object B: `Text("no")`, confidence 0.9
- Object C: `Text("no")`, confidence 0.9
Skeptic lens `analyze()`:
- Groups into 2 claims: "yes" (weight 0.9) and "no" (weight 1.8)
- Entropy is non-zero because there are two groups with different weights
- `conflict_score` > 0.0
- `status` is NOT `Unanimous`
**Note:** The variance-based `compute_conflict_score` would return 0.0 for these candidates (all same confidence). The Skeptic entropy-based score correctly detects the disagreement. This test validates the Skeptic lens is the correct tool for Aphoria's conflict detection, NOT the variance-based score.
### 4.7 `test_skeptic_entropy_unanimous_different_confidence`
Three assertions, all same object `Text("yes")`, but different confidences (0.3, 0.6, 0.9):
Skeptic lens `analyze()`:
- Groups into 1 claim (all same object)
- `conflict_score` = 0.0 (unanimous — no disagreement on the value)
- `status` = `Unanimous`
**Note:** Even though confidences differ, there's no actual conflict — all sources agree. The Skeptic lens correctly identifies this as unanimous.
### 4.8 `test_variance_score_nan_defensive`
2 assertions with confidence `f32::NAN`.
`compute_conflict_score()` returns 0.0 (defensive, not NaN propagation).
---
## Battery 5: scan_prefix with ConceptPath-shaped Keys
Storage foundation for hierarchical queries.
### 5.1 `test_prefix_scan_concept_path_keys`
Store via IndexStore:
```
S:code://rust/citadeldb/auth/jwt/aud_validation → [hash_a]
S:code://rust/citadeldb/auth/jwt/expiry → [hash_b]
S:code://rust/citadeldb/net/tls/verify → [hash_c]
S:code://rust/citadeldb/auth/oauth/scopes → [hash_d]
```
Assert:
- `scan_prefix("S:code://rust/citadeldb/auth/jwt/")` → 2 keys (aud_validation, expiry)
- `scan_prefix("S:code://rust/citadeldb/auth/")` → 3 keys (jwt/aud, jwt/expiry, oauth/scopes)
- `scan_prefix("S:code://rust/citadeldb/")` → 4 keys (all)
- `scan_prefix("S:code://")` → 4 keys (all)
- `scan_prefix("S:rfc://")` → 0 keys (different scheme)
### 5.2 `test_prefix_scan_no_false_positives`
Store:
```
S:code://rust/citadeldb/auth → [hash_a]
S:code://rust/citadeldb/authentication → [hash_b]
```
Assert:
- `scan_prefix("S:code://rust/citadeldb/auth/")` → 0 keys (trailing slash prevents matching "auth" without children)
- `scan_prefix("S:code://rust/citadeldb/auth")` → 2 keys (both match the prefix "auth")
This validates that the trailing `/` in hierarchical queries is necessary to prevent `auth` from matching `authentication`.
### 5.3 `test_prefix_scan_sp_keys_with_concept_paths`
Store via IndexStore (using SP: compound keys):
```
SP:code://rust/citadeldb/auth/jwt/aud_validation:config_value → [hash_a]
SP:code://rust/citadeldb/auth/jwt/expiry:config_value → [hash_b]
```
Assert:
- `scan_prefix("SP:code://rust/citadeldb/auth/jwt/")` → 2 keys
- The parsed SP key for hash_a correctly splits into subject=`code://rust/citadeldb/auth/jwt/aud_validation` and predicate=`config_value` (validates the rfind fix)
---
## Battery 6: Signature Tamper Detection
Aphoria ingests signed assertions. If signature verification has gaps, tampered claims enter the graph.
### 6.1 `test_valid_signature_accepted`
Agent A signs an assertion. Ingest through IngestWorker.
Assert: assertion is stored, index entries exist.
### 6.2 `test_tampered_confidence_rejected`
Agent A signs assertion with confidence=0.8. Modify the serialized assertion bytes to change confidence to 1.0. Attempt to ingest.
Assert: `IngestError::InvalidSignature`. Assertion is NOT stored.
### 6.3 `test_tampered_subject_rejected`
Agent A signs assertion with subject="X". Clone the assertion, change subject to "Y", keep original signature.
Assert: ingestion fails with invalid signature.
### 6.4 `test_wrong_agent_id_rejected`
Agent A signs assertion. Replace `agent_id` in the `SignatureEntry` with Agent B's public key (but keep Agent A's signature bytes).
Assert: ingestion fails — the signature was made by A's private key but claims to be from B's public key.
### 6.5 `test_multi_sig_all_valid_accepted`
Agent A and Agent B both sign the same assertion (two valid SignatureEntries).
Assert: ingestion succeeds.
### 6.6 `test_multi_sig_one_invalid_rejected`
Agent A signs validly, Agent B's signature is invalid (tampered).
Assert: ingestion fails. ALL signatures must be valid.
---
## Battery 7: Materialized View Consistency
Aphoria queries MVs for fast conflict checks. Stale or inconsistent MVs produce wrong verdicts.
### 7.1 `test_mv_initial_materialization`
Ingest assertion A (confidence 0.9) for subject=S, predicate=P.
Run materializer `step()`.
Assert:
- MV exists at `MV:{S}:{P}`
- MV winner_hash matches A's content hash
- MV confidence = 0.9
- Changelog entry exists (first materialization)
### 7.2 `test_mv_winner_changes_on_update`
Ingest A (confidence 0.9), materialize. Then ingest B (same S/P, confidence 0.95), materialize again.
Assert:
- MV winner changes to B
- Changelog has 2 entries: initial (winner=A), update (previous=A, new=B)
### 7.3 `test_mv_no_changelog_when_winner_unchanged`
Ingest A (confidence 0.9), materialize. Ingest B (same S/P, confidence 0.5), materialize again.
Assert:
- MV winner stays A (B has lower confidence)
- No new changelog entry after second materialization
### 7.4 `test_mv_since_query_returns_changelog`
Ingest A at T=1000, materialize at T=1001. Ingest B at T=2000, materialize at T=2001.
Query with `since: 1500`:
- Returns changelog entries only from after T=1500
- Should include the B materialization but not the A materialization
### 7.5 `test_mv_max_stale_fast_path`
Ingest A, materialize. Query immediately with `max_stale: 60`.
Assert: fast path is used (MV is fresh).
### 7.6 `test_mv_max_stale_slow_path`
Ingest A, materialize. Wait (or mock time) so MV is 120 seconds old. Query with `max_stale: 60`.
Assert: slow path is used (MV is stale, falls through to index lookup).
---
## Findings to Watch For
### Known Risk: Two Conflict Score Implementations
`compute_conflict_score` in `traits.rs` (line 89) uses **confidence variance**. It measures how much confidence values disagree, not how much object values disagree. Three sources saying "yes" at 0.9 and two sources saying "no" at 0.9 produces a conflict score of **0.0** because all confidences are identical.
`calculate_conflict_score` in `skeptic/analysis.rs` (line 36) uses **Shannon entropy over object value groups**. It correctly detects that "yes" vs "no" is a real conflict regardless of confidence values.
**Aphoria must use the Skeptic lens for conflict detection, not the standard lens conflict score.** Battery 4.6 validates this distinction explicitly. If Aphoria were to use `compute_conflict_score` from standard lenses, it would miss conflicts where sources disagree on values but agree on confidence levels.
### Known Risk: Decay + Time-Travel Interaction
When both `source_class_decay` and `as_of` are set, the age calculation must use `as_of` as the reference time, not `now`. Battery 3.8 validates this. If the implementation uses `now` for age but filters by `as_of` for inclusion, the decay amounts will be wrong for historical queries.
### ConceptPath Readiness
Battery 5 validates the storage layer works with ConceptPath-shaped keys before any type changes. If these tests pass, the `scan_prefix` foundation is solid and ConceptPath implementation can proceed with confidence.

View File

@ -9,7 +9,7 @@ workspace = true
[dependencies] [dependencies]
stemedb-core = { path = "../stemedb-core" } stemedb-core = { path = "../stemedb-core" }
stemedb-wal = { path = "../stemedb-wal" } stemedb-wal = { path = "../stemedb-wal", features = ["group-commit"] }
stemedb-storage = { path = "../stemedb-storage" } stemedb-storage = { path = "../stemedb-storage" }
stemedb-ingest = { path = "../stemedb-ingest" } stemedb-ingest = { path = "../stemedb-ingest" }
stemedb-query = { path = "../stemedb-query" } stemedb-query = { path = "../stemedb-query" }

View File

@ -34,7 +34,7 @@ STEMEDB_WAL_DIR=./my-wal STEMEDB_DB_DIR=./my-db STEMEDB_BIND_ADDR=0.0.0.0:8080 c
``` ```
The server automatically: The server automatically:
1. Opens Journal (WAL) and SledStore (KV storage) 1. Opens Journal (WAL) and HybridStore (KV storage)
2. Spawns IngestWorker background task to tail WAL 2. Spawns IngestWorker background task to tail WAL
3. Starts HTTP server with OpenAPI documentation 3. Starts HTTP server with OpenAPI documentation

View File

@ -45,7 +45,7 @@ pub async fn decay_trust_ranks(
let half_life = req.half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS); let half_life = req.half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
// Create TrustRankStore from the shared store // Create TrustRankStore from the shared store
let trust_store = GenericTrustRankStore::new((*state.store).clone()); let trust_store = GenericTrustRankStore::new(state.store.clone());
// Apply decay to all trust ranks // Apply decay to all trust ranks
let decayed_count = trust_store.decay_trust_ranks(timestamp, Some(half_life)).await?; let decayed_count = trust_store.decay_trust_ranks(timestamp, Some(half_life)).await?;

View File

@ -52,9 +52,8 @@ pub async fn create_assertion(
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?; .map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
let hash = blake3::hash(&serialized_assertion); let hash = blake3::hash(&serialized_assertion);
// Append to WAL // Append to WAL via group commit buffer
let mut journal = state.journal.lock().await; state.commit_buffer.append(payload).await?;
journal.append(payload)?;
let response = let response =
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() }; CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };

View File

@ -89,9 +89,8 @@ pub async fn create_epoch(
// For the response, we return this same ID as a hex string // For the response, we return this same ID as a hex string
let epoch_id_hex = ::hex::encode(epoch.id); let epoch_id_hex = ::hex::encode(epoch.id);
// Append to WAL // Append to WAL via group commit buffer
let mut journal = state.journal.lock().await; state.commit_buffer.append(payload).await?;
journal.append(payload)?;
let response = CreateResponse { hash: epoch_id_hex, status: "created".to_string() }; let response = CreateResponse { hash: epoch_id_hex, status: "created".to_string() };

View File

@ -227,7 +227,7 @@ pub async fn verify_agent(
.unwrap_or(0); .unwrap_or(0);
// Verify the agent // Verify the agent
let trust_store = GenericTrustRankStore::new((*state.store).clone()); let trust_store = GenericTrustRankStore::new(state.store.clone());
let adjustment = trust_store let adjustment = trust_store
.verify_agent_against_gold_standard(&agent_id, &req.agent_object, &gs, timestamp) .verify_agent_against_gold_standard(&agent_id, &req.agent_object, &gs, timestamp)
.await?; .await?;

View File

@ -4,7 +4,7 @@ use axum::{extract::State, Json};
use tracing::instrument; use tracing::instrument;
use crate::{dto::HealthResponse, error::Result, state::AppState}; use crate::{dto::HealthResponse, error::Result, state::AppState};
use stemedb_storage::KVStore; use stemedb_storage::{key_codec, KVStore};
/// Health check endpoint. /// Health check endpoint.
/// ///
@ -32,7 +32,12 @@ pub async fn health_check(State(state): State<AppState>) -> Result<Json<HealthRe
/// Count the number of assertions in the database. /// Count the number of assertions in the database.
async fn count_assertions(state: &AppState) -> Result<u64> { async fn count_assertions(state: &AppState) -> Result<u64> {
// Scan all assertion keys (H: prefix) // Read the atomic assertion count maintained by the ingestion pipeline
let keys = state.store.scan_prefix(b"H:").await?; let count_key = key_codec::assertion_count_key();
Ok(keys.len() as u64) match state.store.get(&count_key).await? {
Some(bytes) if bytes.len() == 8 => {
Ok(u64::from_le_bytes(bytes.try_into().unwrap_or([0u8; 8])))
}
_ => Ok(0),
}
} }

View File

@ -344,7 +344,7 @@ fn build_contributing_from_metadata(
async fn apply_lens_with_confidence( async fn apply_lens_with_confidence(
lens_dto: LensDto, lens_dto: LensDto,
assertions: Vec<Assertion>, assertions: Vec<Assertion>,
store: std::sync::Arc<stemedb_storage::SledStore>, store: std::sync::Arc<stemedb_storage::HybridStore>,
) -> Result<(Vec<Assertion>, f32, f32)> { ) -> Result<(Vec<Assertion>, f32, f32)> {
let assertion_count = assertions.len(); let assertion_count = assertions.len();

View File

@ -59,8 +59,8 @@ pub async fn skeptic_query(
AxumQuery(params): AxumQuery<SkepticQueryParams>, AxumQuery(params): AxumQuery<SkepticQueryParams>,
) -> Result<Json<SkepticResponse>> { ) -> Result<Json<SkepticResponse>> {
// Create the resolver with vote and trust stores // Create the resolver with vote and trust stores
let vote_store = std::sync::Arc::new(GenericVoteStore::new((*state.store).clone())); let vote_store = std::sync::Arc::new(GenericVoteStore::new(state.store.clone()));
let trust_store = std::sync::Arc::new(GenericTrustRankStore::new((*state.store).clone())); let trust_store = std::sync::Arc::new(GenericTrustRankStore::new(state.store.clone()));
let resolver = SkepticResolver::new(state.store.clone(), vote_store, trust_store); let resolver = SkepticResolver::new(state.store.clone(), vote_store, trust_store);
// Execute the skeptic resolution // Execute the skeptic resolution

View File

@ -182,7 +182,7 @@ mod tests {
http::{Method, Request}, http::{Method, Request},
}; };
use serde_json::json; use serde_json::json;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tower::ServiceExt; use tower::ServiceExt;
@ -199,10 +199,12 @@ mod tests {
let wal_path = temp_dir.path().join("wal"); let wal_path = temp_dir.path().join("wal");
let store_path = temp_dir.path().join("store"); let store_path = temp_dir.path().join("store");
let journal = Journal::open(&wal_path).expect("failed to open journal"); let write_journal = Journal::open(&wal_path).expect("failed to open write journal");
let store = SledStore::open(&store_path).expect("failed to open store"); let read_journal = Journal::open(&wal_path).expect("failed to open read journal");
let store =
std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
let state = AppState::new(journal, store); let state = AppState::new(write_journal, read_journal, store);
let app = axum::Router::new() let app = axum::Router::new()
.route("/v1/source", axum::routing::post(store_source)) .route("/v1/source", axum::routing::post(store_source))

View File

@ -50,9 +50,8 @@ pub async fn create_vote(
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?; .map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
let hash = blake3::hash(&serialized_vote); let hash = blake3::hash(&serialized_vote);
// Append to WAL // Append to WAL via group commit buffer
let mut journal = state.journal.lock().await; state.commit_buffer.append(payload).await?;
journal.append(payload)?;
let response = let response =
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() }; CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };

View File

@ -23,7 +23,7 @@
//! ```ignore //! ```ignore
//! use stemedb_api::{create_router, AppState}; //! use stemedb_api::{create_router, AppState};
//! //!
//! let state = AppState::new(journal, store); //! let state = AppState::new(write_journal, read_journal, store);
//! let app = create_router(state); //! let app = create_router(state);
//! //!
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?; //! axum::Server::bind(&addr).serve(app.into_make_service()).await?;

View File

@ -1,10 +1,11 @@
//! Episteme (StemeDB) API server binary. //! Episteme (StemeDB) API server binary.
//! //!
//! This starts the HTTP API server with the following components: //! This starts the HTTP API server with the following components:
//! 1. Opens Journal (WAL) and SledStore (KV storage) //! 1. Opens Journal (WAL) for writes (via GroupCommitBuffer) and reads
//! 2. Spawns IngestWorker background task to tail WAL //! 2. Opens HybridStore (KV storage)
//! 3. Starts axum HTTP server with OpenAPI documentation //! 3. Spawns IngestWorker background task to tail WAL
//! 4. Optionally enables The Meter (economic throttling) //! 4. Starts axum HTTP server with OpenAPI documentation
//! 5. Optionally enables The Meter (economic throttling)
//! //!
//! # Environment Variables //! # Environment Variables
//! //!
@ -22,7 +23,7 @@ use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
use stemedb_api::{create_router, create_router_with_meter, AppState}; use stemedb_api::{create_router, create_router_with_meter, AppState};
use stemedb_ingest::worker::IngestWorker; use stemedb_ingest::worker::IngestWorker;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
/// Server configuration. /// Server configuration.
@ -96,20 +97,24 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
std::fs::create_dir_all(&config.wal_dir)?; std::fs::create_dir_all(&config.wal_dir)?;
std::fs::create_dir_all(&config.db_dir)?; std::fs::create_dir_all(&config.db_dir)?;
// Open Journal and Store // Open write Journal (owned by GroupCommitBuffer)
info!("Opening Journal at {:?}", config.wal_dir); info!("Opening write Journal at {:?}", config.wal_dir);
let journal = Journal::open(&config.wal_dir)?; let write_journal = Journal::open(&config.wal_dir)?;
info!("Opening SledStore at {:?}", config.db_dir); // Open read Journal (for IngestWorker to tail)
let store = SledStore::open(&config.db_dir)?; info!("Opening read Journal at {:?}", config.wal_dir);
let read_journal = Journal::open(&config.wal_dir)?;
// Create application state info!("Opening HybridStore at {:?}", config.db_dir);
let state = AppState::new(journal, store.clone()); let store = Arc::new(HybridStore::open(&config.db_dir)?);
// Spawn IngestWorker background task // Create application state (initializes GroupCommitBuffer)
let state = AppState::new(write_journal, read_journal, Arc::clone(&store));
// Spawn IngestWorker background task (uses read journal)
info!("Spawning IngestWorker background task"); info!("Spawning IngestWorker background task");
let worker_journal = state.journal.clone(); let worker_journal = state.journal.clone();
let worker_store = Arc::new(store); let worker_store = store;
tokio::spawn(async move { tokio::spawn(async move {
let worker_result = IngestWorker::new(worker_journal, worker_store).await; let worker_result = IngestWorker::new(worker_journal, worker_store).await;
match worker_result { match worker_result {

View File

@ -4,25 +4,29 @@ use std::sync::Arc;
use tokio::sync::Mutex; use tokio::sync::Mutex;
use stemedb_query::QueryEngine; use stemedb_query::QueryEngine;
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, SledStore}; use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, HybridStore};
use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig};
use stemedb_wal::Journal; use stemedb_wal::Journal;
/// Quota store type alias for convenience. /// Quota store type alias for convenience.
pub type QuotaStoreImpl = GenericQuotaStore<Arc<SledStore>>; pub type QuotaStoreImpl = GenericQuotaStore<Arc<HybridStore>>;
/// Escalation store type alias for convenience. /// Escalation store type alias for convenience.
pub type EscalationStoreImpl = GenericEscalationStore<SledStore>; pub type EscalationStoreImpl = GenericEscalationStore<HybridStore>;
/// Application state shared across all HTTP handlers. /// Application state shared across all HTTP handlers.
/// ///
/// This is passed to every request via axum's `State` extractor. /// This is passed to every request via axum's `State` extractor.
#[derive(Clone)] #[derive(Clone)]
pub struct AppState { pub struct AppState {
/// Write-ahead log for appending new assertions/votes/epochs /// Group commit buffer for batched WAL writes (used by write handlers)
pub commit_buffer: GroupCommitBuffer,
/// Write-ahead log for reading records (IngestWorker uses this)
pub journal: Arc<Mutex<Journal>>, pub journal: Arc<Mutex<Journal>>,
/// Key-value store for reading assertions /// Key-value store for reading assertions
pub store: Arc<SledStore>, pub store: Arc<HybridStore>,
/// Quota store for economic throttling (The Meter) /// Quota store for economic throttling (The Meter)
pub quota_store: Arc<QuotaStoreImpl>, pub quota_store: Arc<QuotaStoreImpl>,
@ -33,9 +37,13 @@ pub struct AppState {
impl AppState { impl AppState {
/// Create a new application state. /// Create a new application state.
pub fn new(journal: Journal, store: SledStore) -> Self { ///
let journal = Arc::new(Mutex::new(journal)); /// Takes two journals: one for the group commit buffer (writes) and
let store = Arc::new(store); /// one for reading (used by IngestWorker). Both should be opened on
/// the same directory.
pub fn new(write_journal: Journal, read_journal: Journal, store: Arc<HybridStore>) -> Self {
let commit_buffer = GroupCommitBuffer::new(write_journal, GroupCommitConfig::default());
let journal = Arc::new(Mutex::new(read_journal));
// Create quota store backed by the same KV store // Create quota store backed by the same KV store
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store))); let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
@ -43,13 +51,13 @@ impl AppState {
// Create escalation store backed by the same KV store // Create escalation store backed by the same KV store
let escalation_store = Arc::new(GenericEscalationStore::new(Arc::clone(&store))); let escalation_store = Arc::new(GenericEscalationStore::new(Arc::clone(&store)));
Self { journal, store, quota_store, escalation_store } Self { commit_buffer, journal, store, quota_store, escalation_store }
} }
/// Get a QueryEngine for this state. /// Get a QueryEngine for this state.
/// ///
/// Creates a new QueryEngine each time since it cannot be cloned. /// Creates a new QueryEngine each time since it cannot be cloned.
pub fn query_engine(&self) -> QueryEngine<SledStore> { pub fn query_engine(&self) -> QueryEngine<HybridStore> {
QueryEngine::new(self.store.clone()) QueryEngine::new(self.store.clone())
} }
} }

View File

@ -9,7 +9,7 @@ use serde_json::json;
use std::sync::Arc; use std::sync::Arc;
use stemedb_api::AppState; use stemedb_api::AppState;
use stemedb_ingest::Ingestor; use stemedb_ingest::Ingestor;
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, SledStore}; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tokio::sync::Mutex; use tokio::sync::Mutex;
@ -23,7 +23,7 @@ pub struct TestEnvironment {
pub struct TestEnvironmentWithIngestor { pub struct TestEnvironmentWithIngestor {
pub _temp_dir: tempfile::TempDir, pub _temp_dir: tempfile::TempDir,
pub state: AppState, pub state: AppState,
pub ingestor: Ingestor<SledStore>, pub ingestor: Ingestor<HybridStore>,
} }
/// Helper to create a test environment with temporary directories. /// Helper to create a test environment with temporary directories.
@ -35,10 +35,11 @@ pub async fn create_test_env() -> TestEnvironment {
std::fs::create_dir_all(&wal_dir).expect("failed to create wal dir"); std::fs::create_dir_all(&wal_dir).expect("failed to create wal dir");
std::fs::create_dir_all(&db_dir).expect("failed to create db dir"); std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
let journal = Journal::open(&wal_dir).expect("failed to open journal"); let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
let store = SledStore::open(&db_dir).expect("failed to open store"); let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
let state = AppState::new(journal, store); let state = AppState::new(write_journal, read_journal, store);
TestEnvironment { _temp_dir: temp_dir, state } TestEnvironment { _temp_dir: temp_dir, state }
} }
@ -46,8 +47,6 @@ pub async fn create_test_env() -> TestEnvironment {
/// Helper to create a test environment with a running ingestor for roundtrip tests. /// Helper to create a test environment with a running ingestor for roundtrip tests.
/// ///
/// Note: We need to share the same store between AppState and Ingestor. /// Note: We need to share the same store between AppState and Ingestor.
/// AppState::new() takes ownership, so we need a different approach:
/// we'll create the ingestor first, then construct AppState manually.
pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor { pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
let temp_dir = tempfile::tempdir().expect("failed to create temp dir"); let temp_dir = tempfile::tempdir().expect("failed to create temp dir");
let wal_dir = temp_dir.path().join("wal"); let wal_dir = temp_dir.path().join("wal");
@ -57,11 +56,7 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
std::fs::create_dir_all(&db_dir).expect("failed to create db dir"); std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
// Create shared store // Create shared store
let store = Arc::new(SledStore::open(&db_dir).expect("failed to open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
// Journal for API (writing)
let journal_for_api =
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("failed to open journal for API")));
// Journal for ingestor (reading) - WAL allows multiple readers // Journal for ingestor (reading) - WAL allows multiple readers
let journal_for_ingestor = let journal_for_ingestor =
@ -72,14 +67,10 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
.await .await
.expect("failed to create ingestor"); .expect("failed to create ingestor");
// Create quota store for AppState // Create AppState with write and read journals
let quota_store = Arc::new(GenericQuotaStore::new(store.clone())); let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
// Create escalation store for AppState let state = AppState::new(write_journal, read_journal, store);
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
// Construct AppState manually to share the store
let state = AppState { journal: journal_for_api, store, quota_store, escalation_store };
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor } TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
} }

View File

@ -28,7 +28,7 @@ use stemedb_api::create_router;
use stemedb_ingest::worker::IngestWorker; use stemedb_ingest::worker::IngestWorker;
use stemedb_lens::VoteAwareConsensusLens; use stemedb_lens::VoteAwareConsensusLens;
use stemedb_query::Materializer; use stemedb_query::Materializer;
use stemedb_storage::{GenericVoteStore, SledStore}; use stemedb_storage::{GenericVoteStore, HybridStore};
use stemedb_wal::Journal; use stemedb_wal::Journal;
// Test configuration constants // Test configuration constants
@ -44,7 +44,7 @@ const POLLING_INTERVAL_MS: u64 = 50;
struct TestEnvironment { struct TestEnvironment {
_temp_dir: tempfile::TempDir, _temp_dir: tempfile::TempDir,
state: stemedb_api::AppState, state: stemedb_api::AppState,
store: Arc<SledStore>, store: Arc<HybridStore>,
journal: Arc<Mutex<Journal>>, journal: Arc<Mutex<Journal>>,
} }
@ -57,15 +57,15 @@ async fn create_test_environment() -> TestEnvironment {
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir"); std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir"); std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
let journal = Journal::open(&wal_dir).expect("Failed to open journal"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let store = SledStore::open(&db_dir).expect("Failed to open store");
let journal_arc = Arc::new(Mutex::new(journal));
let store_arc = Arc::new(store); let store_arc = Arc::new(store);
// Open a second journal handle for AppState (WAL supports multiple readers) // Open journals: one for IngestWorker reads, one for AppState (write + read)
let journal_for_state = Journal::open(&wal_dir).expect("Failed to open second journal handle"); let journal_arc =
let state = stemedb_api::AppState::new(journal_for_state, (*store_arc).clone()); Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc } TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
} }

View File

@ -22,7 +22,7 @@ use tower::ServiceExt;
use stemedb_api::{create_router, AppState}; use stemedb_api::{create_router, AppState};
use stemedb_ingest::worker::IngestWorker; use stemedb_ingest::worker::IngestWorker;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
// Test configuration constants // Test configuration constants
@ -32,7 +32,7 @@ const INGEST_ITERATIONS: usize = 10;
struct TestEnvironment { struct TestEnvironment {
_temp_dir: tempfile::TempDir, _temp_dir: tempfile::TempDir,
state: AppState, state: AppState,
store: Arc<SledStore>, store: Arc<HybridStore>,
journal: Arc<Mutex<Journal>>, journal: Arc<Mutex<Journal>>,
} }
@ -45,15 +45,15 @@ async fn create_test_environment() -> TestEnvironment {
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir"); std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir"); std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
let journal = Journal::open(&wal_dir).expect("Failed to open journal"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let store = SledStore::open(&db_dir).expect("Failed to open store");
let journal_arc = Arc::new(Mutex::new(journal));
let store_arc = Arc::new(store); let store_arc = Arc::new(store);
// Open a second journal handle for AppState (WAL supports multiple readers) // Open journals: one for IngestWorker reads, one for AppState (write + read)
let journal_for_state = Journal::open(&wal_dir).expect("Failed to open second journal handle"); let journal_arc =
let state = AppState::new(journal_for_state, (*store_arc).clone()); Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc } TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
} }

View File

@ -20,7 +20,7 @@ use std::sync::Arc;
use tower::ServiceExt; use tower::ServiceExt;
use stemedb_api::{create_router, create_router_with_meter, AppState}; use stemedb_api::{create_router, create_router_with_meter, AppState};
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, QuotaStore, SledStore}; use stemedb_storage::{HybridStore, QuotaStore};
use stemedb_wal::Journal; use stemedb_wal::Journal;
// ============================================================================ // ============================================================================
@ -148,7 +148,7 @@ async fn test_decay_trust_ranks_actually_decays() {
let agent_id = [42u8; 32]; let agent_id = [42u8; 32];
let mut trust_rank = TrustRank::new(agent_id, 1000); let mut trust_rank = TrustRank::new(agent_id, 1000);
trust_rank.score = 0.8; trust_rank.score = 0.8;
let trust_store = GenericTrustRankStore::new((*env.state.store).clone()); let trust_store = GenericTrustRankStore::new(env.state.store.clone());
trust_store.put_trust_rank(&trust_rank).await.expect("put trust rank"); trust_store.put_trust_rank(&trust_rank).await.expect("put trust rank");
let app = create_router(env.state.clone()); let app = create_router(env.state.clone());
@ -198,18 +198,12 @@ async fn test_quota_consumption_with_meter() {
std::fs::create_dir_all(&wal_dir).expect("wal dir"); std::fs::create_dir_all(&wal_dir).expect("wal dir");
std::fs::create_dir_all(&db_dir).expect("db dir"); std::fs::create_dir_all(&db_dir).expect("db dir");
let journal = Journal::open(&wal_dir).expect("journal"); let write_journal = Journal::open(&wal_dir).expect("write journal");
let store = Arc::new(SledStore::open(&db_dir).expect("store")); let read_journal = Journal::open(&wal_dir).expect("read journal");
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
// Create AppState manually to share quota_store let state = AppState::new(write_journal, read_journal, store.clone());
let quota_store = Arc::new(GenericQuotaStore::new(store.clone())); let quota_store = state.quota_store.clone();
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
let state = AppState {
journal: Arc::new(tokio::sync::Mutex::new(journal)),
store: store.clone(),
quota_store: quota_store.clone(),
escalation_store,
};
let app = create_router_with_meter(state); let app = create_router_with_meter(state);
@ -260,17 +254,12 @@ async fn test_quota_exceeded_response() {
std::fs::create_dir_all(&wal_dir).expect("wal dir"); std::fs::create_dir_all(&wal_dir).expect("wal dir");
std::fs::create_dir_all(&db_dir).expect("db dir"); std::fs::create_dir_all(&db_dir).expect("db dir");
let journal = Journal::open(&wal_dir).expect("journal"); let write_journal = Journal::open(&wal_dir).expect("write journal");
let store = Arc::new(SledStore::open(&db_dir).expect("store")); let read_journal = Journal::open(&wal_dir).expect("read journal");
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
let quota_store = Arc::new(GenericQuotaStore::new(store.clone())); let state = AppState::new(write_journal, read_journal, store.clone());
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone())); let quota_store = state.quota_store.clone();
let state = AppState {
journal: Arc::new(tokio::sync::Mutex::new(journal)),
store: store.clone(),
quota_store: quota_store.clone(),
escalation_store,
};
let app = create_router_with_meter(state); let app = create_router_with_meter(state);
@ -311,17 +300,12 @@ async fn test_quota_headers_format() {
std::fs::create_dir_all(&wal_dir).expect("wal dir"); std::fs::create_dir_all(&wal_dir).expect("wal dir");
std::fs::create_dir_all(&db_dir).expect("db dir"); std::fs::create_dir_all(&db_dir).expect("db dir");
let journal = Journal::open(&wal_dir).expect("journal"); let write_journal = Journal::open(&wal_dir).expect("write journal");
let store = Arc::new(SledStore::open(&db_dir).expect("store")); let read_journal = Journal::open(&wal_dir).expect("read journal");
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
let quota_store = Arc::new(GenericQuotaStore::new(store.clone())); let state = AppState::new(write_journal, read_journal, store.clone());
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone())); let quota_store = state.quota_store.clone();
let state = AppState {
journal: Arc::new(tokio::sync::Mutex::new(journal)),
store: store.clone(),
quota_store: quota_store.clone(),
escalation_store,
};
let app = create_router_with_meter(state); let app = create_router_with_meter(state);

View File

@ -32,11 +32,6 @@ mod tests;
// Re-exports // Re-exports
pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType}; pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType};
/// The cursor tracks how far into the WAL the ingestor has processed,
/// allowing recovery to resume from the last checkpoint instead of
/// replaying the entire log.
const CURSOR_KEY: &[u8] = b"__CURSOR__:ingest";
/// Background worker that tails the WAL and updates the KV store. /// Background worker that tails the WAL and updates the KV store.
pub struct IngestWorker<S> { pub struct IngestWorker<S> {
journal: Arc<Mutex<Journal>>, journal: Arc<Mutex<Journal>>,
@ -68,7 +63,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> { pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> {
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
let vote_store = GenericVoteStore::new(store.clone()); let vote_store = GenericVoteStore::new(store.clone());
let current_offset = match store.get(CURSOR_KEY).await? { let cursor_key = stemedb_storage::key_codec::cursor_key();
let current_offset = match store.get(&cursor_key).await? {
Some(bytes) if bytes.len() == 8 => { Some(bytes) if bytes.len() == 8 => {
let offset = let offset =
u64::from_le_bytes(bytes.try_into().map_err(|_| { u64::from_le_bytes(bytes.try_into().map_err(|_| {

View File

@ -4,11 +4,12 @@
//! including validation and signature verification. //! including validation and signature verification.
use super::record_types::RECORD_HEADER_SIZE; use super::record_types::RECORD_HEADER_SIZE;
use super::{IngestWorker, RecordType, CURSOR_KEY}; use super::{IngestWorker, RecordType};
use crate::error::{IngestError, Result}; use crate::error::{IngestError, Result};
use ed25519_dalek::{Signature, Verifier, VerifyingKey}; use ed25519_dalek::{Signature, Verifier, VerifyingKey};
use stemedb_core::serde::deserialize; use stemedb_core::serde::deserialize;
use stemedb_core::types::{Assertion, Epoch, Hash, Vote}; use stemedb_core::types::{Assertion, Epoch, Hash, Vote};
use stemedb_storage::key_codec;
use stemedb_storage::{IndexStore, KVStore, VoteStore}; use stemedb_storage::{IndexStore, KVStore, VoteStore};
use tracing::{debug, info, warn}; use tracing::{debug, info, warn};
@ -16,7 +17,7 @@ impl<S: KVStore + 'static> IngestWorker<S> {
/// Process the next record from the WAL, returning bytes read (0 = no data). /// Process the next record from the WAL, returning bytes read (0 = no data).
pub async fn step(&mut self) -> Result<u64> { pub async fn step(&mut self) -> Result<u64> {
let record = { let record = {
let journal = self.journal.lock().await; let mut journal = self.journal.lock().await;
match journal.read(self.current_offset) { match journal.read(self.current_offset) {
Ok(record) => record, Ok(record) => record,
Err(stemedb_wal::QuarantineError::Io { source, .. }) Err(stemedb_wal::QuarantineError::Io { source, .. })
@ -80,7 +81,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
// Persist the cursor so recovery can skip already-processed records. // Persist the cursor so recovery can skip already-processed records.
// This is safe even if it fails: the write path is idempotent // This is safe even if it fails: the write path is idempotent
// (content-addressed keys), so re-processing a record is a no-op. // (content-addressed keys), so re-processing a record is a no-op.
self.store.put(CURSOR_KEY, &self.current_offset.to_le_bytes()).await?; let cursor_key = key_codec::cursor_key();
self.store.put(&cursor_key, &self.current_offset.to_le_bytes()).await?;
info!( info!(
record_type = ?record_type, record_type = ?record_type,
@ -116,14 +118,15 @@ impl<S: KVStore + 'static> IngestWorker<S> {
// Verify all signatures before storing // Verify all signatures before storing
self.verify_assertion_signatures(&assertion)?; self.verify_assertion_signatures(&assertion)?;
// Content-addressed key: H:{BLAKE3_hash} // Content-addressed key: {subject}\x00H:{BLAKE3_hash}
let hash = blake3::hash(data); let hash = blake3::hash(data);
let key = format!("H:{}", hash.to_hex()).into_bytes(); let hash_hex = hash.to_hex().to_string();
let key = key_codec::assertion_key(&assertion.subject, &hash_hex);
debug!( debug!(
subject = %assertion.subject, subject = %assertion.subject,
predicate = %assertion.predicate, predicate = %assertion.predicate,
hash = %hash.to_hex(), hash = %hash_hex,
signature_count = assertion.signatures.len(), signature_count = assertion.signatures.len(),
"Ingesting verified assertion" "Ingesting verified assertion"
); );
@ -131,7 +134,19 @@ impl<S: KVStore + 'static> IngestWorker<S> {
// Store the assertion // Store the assertion
self.store.put(&key, data).await?; self.store.put(&key, data).await?;
// Update indexes: S:{subject} and SP:{subject}:{predicate} // Write reverse index: \x00HASH_SUBJECT:{hash_hex} -> subject
let reverse_key = key_codec::hash_subject_key(&hash_hex);
self.store.put(&reverse_key, assertion.subject.as_bytes()).await?;
// Write subject discovery index: \x00SUBJECTS:{subject} -> empty
let subjects_key = key_codec::subjects_index_key(&assertion.subject);
self.store.put(&subjects_key, &[]).await?;
// Increment assertion count: \x00META:assertion_count
let count_key = key_codec::assertion_count_key();
self.store.fetch_and_add_u64(&count_key, 1).await?;
// Update indexes: {subject}\x00S: and {subject}\x00SP:{predicate}
let assertion_hash: Hash = *hash.as_bytes(); let assertion_hash: Hash = *hash.as_bytes();
self.index_store self.index_store
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash) .add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
@ -143,13 +158,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
if let Err(e) = vector_index.insert(&assertion_hash, vector) { if let Err(e) = vector_index.insert(&assertion_hash, vector) {
// Log but don't fail the ingestion - vector index is supplementary // Log but don't fail the ingestion - vector index is supplementary
warn!( warn!(
hash = %hash.to_hex(), hash = %hash_hex,
error = %e, error = %e,
"Failed to insert into vector index" "Failed to insert into vector index"
); );
} else { } else {
debug!( debug!(
hash = %hash.to_hex(), hash = %hash_hex,
dim = vector.len(), dim = vector.len(),
"Inserted into vector index" "Inserted into vector index"
); );
@ -163,13 +178,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
if let Err(e) = visual_index.insert(&assertion_hash, phash) { if let Err(e) = visual_index.insert(&assertion_hash, phash) {
// Log but don't fail the ingestion - visual index is supplementary // Log but don't fail the ingestion - visual index is supplementary
warn!( warn!(
hash = %hash.to_hex(), hash = %hash_hex,
error = %e, error = %e,
"Failed to insert into visual index" "Failed to insert into visual index"
); );
} else { } else {
debug!( debug!(
hash = %hash.to_hex(), hash = %hash_hex,
phash = %hex::encode(phash), phash = %hex::encode(phash),
"Inserted into visual index" "Inserted into visual index"
); );
@ -185,8 +200,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
/// - Confidence outside [0.0, 1.0] or NaN/Inf /// - Confidence outside [0.0, 1.0] or NaN/Inf
/// - Subject exceeding MAX_SUBJECT_LEN bytes /// - Subject exceeding MAX_SUBJECT_LEN bytes
/// - Predicate exceeding MAX_PREDICATE_LEN bytes /// - Predicate exceeding MAX_PREDICATE_LEN bytes
/// - Subject containing null byte (would corrupt key boundaries)
/// - Timestamp more than 1 hour in the future (clock skew protection) /// - Timestamp more than 1 hour in the future (clock skew protection)
fn validate_assertion(&self, assertion: &Assertion) -> Result<()> { fn validate_assertion(&self, assertion: &Assertion) -> Result<()> {
// Validate subject does not contain separator byte
key_codec::validate_subject(&assertion.subject)
.map_err(|e| IngestError::InputValidation(format!("invalid subject: {}", e)))?;
// Validate confidence: must be finite and in [0.0, 1.0] // Validate confidence: must be finite and in [0.0, 1.0]
if assertion.confidence.is_nan() { if assertion.confidence.is_nan() {
return Err(IngestError::InputValidation( return Err(IngestError::InputValidation(
@ -295,6 +315,9 @@ impl<S: KVStore + 'static> IngestWorker<S> {
/// Validates vote weight bounds (0.0 to 1.0, no NaN/Inf) and uses VoteStore /// Validates vote weight bounds (0.0 to 1.0, no NaN/Inf) and uses VoteStore
/// to maintain vote count and aggregate weight caches automatically. /// to maintain vote count and aggregate weight caches automatically.
/// This ensures VoteAwareConsensusLens has accurate data. /// This ensures VoteAwareConsensusLens has accurate data.
///
/// Looks up the assertion's subject from the reverse index to co-locate
/// vote data with the assertion for range sharding.
async fn ingest_vote(&self, data: &[u8]) -> Result<()> { async fn ingest_vote(&self, data: &[u8]) -> Result<()> {
let vote: Vote = let vote: Vote =
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?; deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
@ -318,34 +341,54 @@ impl<S: KVStore + 'static> IngestWorker<S> {
))); )));
} }
// Look up the subject from the reverse index
let hash_hex = hex::encode(vote.assertion_hash);
let reverse_key = key_codec::hash_subject_key(&hash_hex);
let subject = match self.store.get(&reverse_key).await? {
Some(bytes) => String::from_utf8(bytes).map_err(|e| {
IngestError::Serialization(format!("Invalid subject in reverse index: {}", e))
})?,
None => {
warn!(
assertion_hash = %hash_hex,
"Vote references unknown assertion (no reverse index entry)"
);
return Err(IngestError::InputValidation(format!(
"vote references unknown assertion {}",
hash_hex
)));
}
};
debug!( debug!(
assertion_hash = %hex::encode(vote.assertion_hash), assertion_hash = %hash_hex,
subject = %subject,
weight = vote.weight, weight = vote.weight,
"Ingesting vote via VoteStore" "Ingesting vote via VoteStore"
); );
// Delegate to VoteStore which handles: // Delegate to VoteStore which handles:
// 1. Content-addressed storage at V:{assertion_hash}:{vote_hash} // 1. Content-addressed storage at {subject}\x00V:{assertion_hex}:{vote_hex}
// 2. Vote count cache at VC:{assertion_hash} // 2. Vote count cache at {subject}\x00VC:{assertion_hex}
// 3. Aggregate weight cache at VW:{assertion_hash} // 3. Aggregate weight cache at {subject}\x00VW:{assertion_hex}
self.vote_store.put_vote(&vote).await?; self.vote_store.put_vote(&vote, &subject).await?;
Ok(()) Ok(())
} }
/// Ingest an epoch into the KV store. /// Ingest an epoch into the KV store.
/// ///
/// In addition to storing the epoch at `E:{epoch_id}`, this method writes /// Stores the epoch at `\x00E:{epoch_id}` and writes
/// `SUPERSEDED:{old_epoch_id}` marker keys for the full transitive closure /// `\x00SUPERSEDED:{old_epoch_id}` marker keys for the full transitive closure
/// of superseded epochs. This enables O(1) "is superseded?" lookups at /// of superseded epochs. This enables O(1) "is superseded?" lookups at
/// query time instead of O(chain_length) chain walks. /// query time instead of O(chain_length) chain walks.
async fn ingest_epoch(&self, data: &[u8]) -> Result<()> { async fn ingest_epoch(&self, data: &[u8]) -> Result<()> {
let epoch: Epoch = let epoch: Epoch =
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?; deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
// Epoch key: E:{epoch_id_hash} // Epoch key: \x00E:{epoch_id_hash}
let epoch_id_hex = hex::encode(epoch.id); let epoch_id_hex = hex::encode(epoch.id);
let key = format!("E:{}", epoch_id_hex).into_bytes(); let key = key_codec::epoch_key(&epoch_id_hex);
debug!( debug!(
epoch_id = %epoch_id_hex, epoch_id = %epoch_id_hex,

View File

@ -7,6 +7,7 @@ use super::IngestWorker;
use crate::error::Result; use crate::error::Result;
use stemedb_core::serde::deserialize; use stemedb_core::serde::deserialize;
use stemedb_core::types::Epoch; use stemedb_core::types::Epoch;
use stemedb_storage::key_codec;
use stemedb_storage::KVStore; use stemedb_storage::KVStore;
use tracing::{debug, warn}; use tracing::{debug, warn};
@ -14,13 +15,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
/// Maximum depth for walking supersession chains at write time. /// Maximum depth for walking supersession chains at write time.
pub(super) const MAX_CASCADE_DEPTH: usize = 100; pub(super) const MAX_CASCADE_DEPTH: usize = 100;
/// Write `SUPERSEDED:` markers for the full transitive closure of superseded epochs. /// Write `\x00SUPERSEDED:` markers for the full transitive closure of superseded epochs.
/// ///
/// All markers point to the LATEST superseding epoch (`new_epoch_id`). /// All markers point to the LATEST superseding epoch (`new_epoch_id`).
/// For chain C→B→A: writes `SUPERSEDED:B→C` and `SUPERSEDED:A→C`. /// For chain C→B→A: writes `SUPERSEDED:B→C` and `SUPERSEDED:A→C`.
/// ///
/// This enables O(1) "is this epoch superseded?" checks at query time: /// This enables O(1) "is this epoch superseded?" checks at query time:
/// just look for `SUPERSEDED:{epoch_id}` key existence. /// just look for `\x00SUPERSEDED:{epoch_id}` key existence.
/// ///
/// # Algorithm /// # Algorithm
/// ///
@ -63,8 +64,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
break; break;
} }
// Write marker: SUPERSEDED:{current_id} → new_epoch_id (always the LATEST) // Write marker: \x00SUPERSEDED:{current_id} → new_epoch_id (always the LATEST)
let marker_key = Self::superseded_key(&current_id); let marker_key = key_codec::superseded_key(&hex::encode(current_id));
self.store.put(&marker_key, new_epoch_id).await?; self.store.put(&marker_key, new_epoch_id).await?;
debug!( debug!(
@ -75,7 +76,7 @@ impl<S: KVStore + 'static> IngestWorker<S> {
); );
// Check if current_id also superseded something (transitive closure) // Check if current_id also superseded something (transitive closure)
let epoch_key = format!("E:{}", hex::encode(current_id)).into_bytes(); let epoch_key = key_codec::epoch_key(&hex::encode(current_id));
let ancestor_epoch = match self.store.get(&epoch_key).await? { let ancestor_epoch = match self.store.get(&epoch_key).await? {
Some(bytes) => match deserialize::<Epoch>(&bytes) { Some(bytes) => match deserialize::<Epoch>(&bytes) {
Ok(e) => e, Ok(e) => e,
@ -108,12 +109,4 @@ impl<S: KVStore + 'static> IngestWorker<S> {
Ok(()) Ok(())
} }
/// Build key for superseded epoch marker.
///
/// Format: `SUPERSEDED:{epoch_id_hex}`
/// Value: The 32-byte ID of the epoch that superseded this one.
pub(super) fn superseded_key(epoch_id: &[u8; 32]) -> Vec<u8> {
format!("SUPERSEDED:{}", hex::encode(epoch_id)).into_bytes()
}
} }

View File

@ -10,7 +10,7 @@ async fn test_ingest_assertion() {
// Create journal and store // Create journal and store
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Write assertion to WAL // Write assertion to WAL
let assertion = create_test_assertion(); let assertion = create_test_assertion();
@ -45,7 +45,7 @@ async fn test_ingest_vote() {
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let vote = create_test_vote(); let vote = create_test_vote();
let payload = serialize_vote(&vote).expect("Failed to serialize"); let payload = serialize_vote(&vote).expect("Failed to serialize");
@ -71,7 +71,7 @@ async fn test_ingest_epoch() {
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let epoch = create_test_epoch(); let epoch = create_test_epoch();
let payload = serialize_epoch(&epoch).expect("Failed to serialize"); let payload = serialize_epoch(&epoch).expect("Failed to serialize");
@ -97,7 +97,7 @@ async fn test_ingest_multiple_records() {
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Write multiple records // Write multiple records
let assertion = create_test_assertion(); let assertion = create_test_assertion();

View File

@ -42,7 +42,7 @@ use super::*;
// Phase 1: Write and ingest 2 records // Phase 1: Write and ingest 2 records
{ {
let mut journal = Journal::open(&wal_dir).expect("open journal"); let mut journal = Journal::open(&wal_dir).expect("open journal");
let store = SledStore::open(&db_dir).expect("open store"); let store = HybridStore::open(&db_dir).expect("open store");
let a1 = create_signed_assertion("Phase1_A", "prop"); let a1 = create_signed_assertion("Phase1_A", "prop");
let a2 = create_signed_assertion("Phase1_B", "prop"); let a2 = create_signed_assertion("Phase1_B", "prop");
@ -65,7 +65,7 @@ use super::*;
let a3 = create_signed_assertion("Phase2_C", "prop"); let a3 = create_signed_assertion("Phase2_C", "prop");
journal.append(serialize_assertion(&a3).expect("ser")).expect("append"); journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
let store = SledStore::open(&db_dir).expect("reopen store"); let store = HybridStore::open(&db_dir).expect("reopen store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);
@ -103,7 +103,7 @@ use super::*;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Write 5 assertions to the WAL // Write 5 assertions to the WAL
let assertions: Vec<Assertion> = let assertions: Vec<Assertion> =
@ -182,7 +182,7 @@ use super::*;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let assertion = create_test_assertion(); let assertion = create_test_assertion();
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");

View File

@ -33,7 +33,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -89,7 +89,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -126,7 +126,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_vote(&vote).expect("ser")).expect("append"); journal.append(serialize_vote(&vote).expect("ser")).expect("append");
@ -163,7 +163,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_vote(&vote).expect("ser")).expect("append"); journal.append(serialize_vote(&vote).expect("ser")).expect("append");
@ -226,7 +226,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -289,7 +289,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");

View File

@ -40,7 +40,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -88,7 +88,7 @@ use stemedb_core::serde::deserialize;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -116,7 +116,7 @@ use stemedb_core::serde::deserialize;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Create epochs: B supersedes A // Create epochs: B supersedes A
// Epoch A has no supersession (base epoch) // Epoch A has no supersession (base epoch)
@ -172,7 +172,7 @@ use stemedb_core::serde::deserialize;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Create chain: C → B → A // Create chain: C → B → A
let epoch_a = stemedb_core::types::Epoch { let epoch_a = stemedb_core::types::Epoch {
@ -243,7 +243,7 @@ use stemedb_core::serde::deserialize;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Create a cycle: A supersedes B, B supersedes A // Create a cycle: A supersedes B, B supersedes A
// This is pathological but we must not hang // This is pathological but we must not hang

View File

@ -31,7 +31,7 @@ use tracing::info;
// PHASE 2: Partial ingestion, then "crash" // PHASE 2: Partial ingestion, then "crash"
let cursor_before_crash = { let cursor_before_crash = {
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);
@ -70,7 +70,7 @@ use tracing::info;
// PHASE 3: Recovery - reopen everything and verify cursor restoration // PHASE 3: Recovery - reopen everything and verify cursor restoration
{ {
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal"); let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
let store = SledStore::open(&db_dir).expect("Failed to reopen store"); let store = HybridStore::open(&db_dir).expect("Failed to reopen store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);

View File

@ -17,7 +17,7 @@ use stemedb_core::testing::{self, AssertionBuilder};
use stemedb_core::types::{ use stemedb_core::types::{
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote, Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote,
}; };
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tempfile::tempdir; use tempfile::tempdir;
use tokio::sync::Mutex; use tokio::sync::Mutex;

View File

@ -12,7 +12,7 @@ use super::*;
// Phase 2: Recovery - reopen everything and run ingestor // Phase 2: Recovery - reopen everything and run ingestor
{ {
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal"); let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);
@ -62,7 +62,7 @@ use super::*;
// Phase 2: Recovery // Phase 2: Recovery
{ {
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal"); let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);
@ -108,7 +108,7 @@ use super::*;
// Recover and ingest // Recover and ingest
{ {
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal"); let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(store); let store = Arc::new(store);
@ -132,7 +132,7 @@ use super::*;
// Final verification: all data from all cycles present // Final verification: all data from all cycles present
{ {
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let assertions = store.scan_prefix(b"H:").await.expect("scan"); let assertions = store.scan_prefix(b"H:").await.expect("scan");
assert_eq!( assert_eq!(
assertions.len(), assertions.len(),
@ -144,7 +144,7 @@ use super::*;
/// Test: KV store persists across restarts. /// Test: KV store persists across restarts.
/// ///
/// Verifies that once data is ingested to sled, it survives store restarts. /// Verifies that once data is ingested to storage, it survives store restarts.
#[tokio::test] #[tokio::test]
async fn test_kv_store_persistence() { async fn test_kv_store_persistence() {
let dir = tempdir().expect("Failed to create temp dir"); let dir = tempdir().expect("Failed to create temp dir");
@ -154,7 +154,7 @@ use super::*;
// Phase 1: Write, ingest, and close everything // Phase 1: Write, ingest, and close everything
{ {
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
let assertion = create_test_assertion(); let assertion = create_test_assertion();
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -172,7 +172,7 @@ use super::*;
// Phase 2: Reopen only the KV store and verify data persists // Phase 2: Reopen only the KV store and verify data persists
{ {
let store = SledStore::open(&db_dir).expect("Failed to reopen store"); let store = HybridStore::open(&db_dir).expect("Failed to reopen store");
let assertions = store.scan_prefix(b"H:").await.expect("scan"); let assertions = store.scan_prefix(b"H:").await.expect("scan");
assert_eq!(assertions.len(), 1, "Assertion should persist in KV store across restarts"); assert_eq!(assertions.len(), 1, "Assertion should persist in KV store across restarts");
} }

View File

@ -17,7 +17,7 @@ use crate::error::IngestError;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -68,7 +68,7 @@ use crate::error::IngestError;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -129,7 +129,7 @@ use crate::error::IngestError;
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -157,6 +157,6 @@ use crate::error::IngestError;
let db_dir = dir.path().join("db"); let db_dir = dir.path().join("db");
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
// Write two assertions to the WAL // Write two assertions to the WAL

View File

@ -41,7 +41,7 @@ async fn test_rejects_high_confidence() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -96,7 +96,7 @@ async fn test_rejects_negative_confidence() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -130,7 +130,7 @@ async fn test_rejects_invalid_vote_weight() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_vote(&vote).expect("ser")).expect("append"); journal.append(serialize_vote(&vote).expect("ser")).expect("append");
@ -168,7 +168,7 @@ async fn test_rejects_negative_vote_weight() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_vote(&vote).expect("ser")).expect("append"); journal.append(serialize_vote(&vote).expect("ser")).expect("append");
@ -221,7 +221,7 @@ async fn test_rejects_oversized_subject() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -279,7 +279,7 @@ async fn test_rejects_oversized_predicate() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -339,7 +339,7 @@ async fn test_accepts_exact_max_subject_length() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -395,7 +395,7 @@ async fn test_accepts_exact_max_predicate_length() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -446,7 +446,7 @@ async fn test_rejects_nan_confidence() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
@ -479,7 +479,7 @@ async fn test_rejects_nan_vote_weight() {
}; };
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal"); let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
let store = SledStore::open(&db_dir).expect("Failed to open store"); let store = HybridStore::open(&db_dir).expect("Failed to open store");
journal.append(serialize_vote(&vote).expect("ser")).expect("append"); journal.append(serialize_vote(&vote).expect("ser")).expect("append");

View File

@ -32,10 +32,10 @@
//! //!
//! ```ignore //! ```ignore
//! use stemedb_lens::{EpochAwareLens, RecencyLens}; //! use stemedb_lens::{EpochAwareLens, RecencyLens};
//! use stemedb_storage::SledStore; //! use stemedb_storage::HybridStore;
//! use std::sync::Arc; //! use std::sync::Arc;
//! //!
//! let store = Arc::new(SledStore::open("./data").expect("store")); //! let store = Arc::new(HybridStore::open("./data").expect("store"));
//! let lens = EpochAwareLens::with_recency(store); //! let lens = EpochAwareLens::with_recency(store);
//! //!
//! let resolution = lens.resolve_async(&candidates).await; //! let resolution = lens.resolve_async(&candidates).await;
@ -49,7 +49,7 @@ use std::collections::HashSet;
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::serde::deserialize; use stemedb_core::serde::deserialize;
use stemedb_core::types::{Assertion, Epoch, EpochId}; use stemedb_core::types::{Assertion, Epoch, EpochId};
use stemedb_storage::KVStore; use stemedb_storage::{key_codec, KVStore};
use tracing::{debug, instrument, warn}; use tracing::{debug, instrument, warn};
/// Wrapper to use a sync Lens with EpochAwareLens. /// Wrapper to use a sync Lens with EpochAwareLens.
@ -111,18 +111,11 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
Self { store, inner } Self { store, inner }
} }
/// Build the key for reading an epoch record.
///
/// Format: `E:{epoch_id_hex}`
fn epoch_key(epoch_id: &EpochId) -> Vec<u8> {
format!("E:{}", hex::encode(epoch_id)).into_bytes()
}
/// Read an epoch record from the store. /// Read an epoch record from the store.
/// ///
/// Returns `None` if the epoch doesn't exist or fails to deserialize. /// Returns `None` if the epoch doesn't exist or fails to deserialize.
async fn read_epoch(&self, epoch_id: &EpochId) -> Option<Epoch> { async fn read_epoch(&self, epoch_id: &EpochId) -> Option<Epoch> {
let key = Self::epoch_key(epoch_id); let key = key_codec::epoch_key(&hex::encode(epoch_id));
match self.store.get(&key).await { match self.store.get(&key).await {
Ok(Some(bytes)) => match deserialize::<Epoch>(&bytes) { Ok(Some(bytes)) => match deserialize::<Epoch>(&bytes) {
@ -154,14 +147,6 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
} }
} }
/// Build the key for checking if an epoch is superseded.
///
/// Format: `SUPERSEDED:{epoch_id_hex}`
/// These markers are written by the IngestWorker when epochs are ingested.
fn superseded_key(epoch_id: &EpochId) -> Vec<u8> {
format!("SUPERSEDED:{}", hex::encode(epoch_id)).into_bytes()
}
/// Check if an epoch is superseded using O(1) marker lookup. /// Check if an epoch is superseded using O(1) marker lookup.
/// ///
/// The IngestWorker writes `SUPERSEDED:{epoch_id}` markers at epoch ingestion /// The IngestWorker writes `SUPERSEDED:{epoch_id}` markers at epoch ingestion
@ -174,7 +159,7 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
/// - Marker doesn't exist → epoch is NOT superseded (return false) /// - Marker doesn't exist → epoch is NOT superseded (return false)
/// - Storage error → treat as NOT superseded (fail-open) /// - Storage error → treat as NOT superseded (fail-open)
async fn is_epoch_superseded(&self, epoch_id: &EpochId) -> bool { async fn is_epoch_superseded(&self, epoch_id: &EpochId) -> bool {
let key = Self::superseded_key(epoch_id); let key = key_codec::superseded_key(&hex::encode(epoch_id));
match self.store.get(&key).await { match self.store.get(&key).await {
Ok(Some(_)) => { Ok(Some(_)) => {
debug!(epoch_id = %hex::encode(epoch_id), "Epoch is superseded (marker found)"); debug!(epoch_id = %hex::encode(epoch_id), "Epoch is superseded (marker found)");

View File

@ -3,14 +3,14 @@ use crate::consensus::ConsensusLens;
use stemedb_core::serde::serialize; use stemedb_core::serde::serialize;
use stemedb_core::testing::{test_epoch_with_supersession, AssertionBuilder}; use stemedb_core::testing::{test_epoch_with_supersession, AssertionBuilder};
use stemedb_core::types::SupersessionType; use stemedb_core::types::SupersessionType;
use stemedb_storage::SledStore; use stemedb_storage::{key_codec, HybridStore};
/// Store an epoch in the KV store and write SUPERSEDED markers. /// Store an epoch in the KV store and write SUPERSEDED markers.
/// ///
/// This simulates what the IngestWorker does: store the epoch AND write /// This simulates what the IngestWorker does: store the epoch AND write
/// cascade markers for the transitive closure of superseded epochs. /// cascade markers for the transitive closure of superseded epochs.
async fn store_epoch(store: &SledStore, epoch: &Epoch) { async fn store_epoch(store: &HybridStore, epoch: &Epoch) {
let key = format!("E:{}", hex::encode(epoch.id)).into_bytes(); let key = key_codec::epoch_key(&hex::encode(epoch.id));
let bytes = serialize(epoch).expect("serialize epoch"); let bytes = serialize(epoch).expect("serialize epoch");
store.put(&key, &bytes).await.expect("put epoch"); store.put(&key, &bytes).await.expect("put epoch");
@ -24,7 +24,7 @@ async fn store_epoch(store: &SledStore, epoch: &Epoch) {
/// ///
/// Mirrors the IngestWorker's cascade logic for test setup. /// Mirrors the IngestWorker's cascade logic for test setup.
async fn write_supersession_cascade( async fn write_supersession_cascade(
store: &SledStore, store: &HybridStore,
new_epoch_id: &[u8; 32], new_epoch_id: &[u8; 32],
superseded_id: &[u8; 32], superseded_id: &[u8; 32],
) { ) {
@ -41,11 +41,11 @@ async fn write_supersession_cascade(
} }
// Write marker // Write marker
let marker_key = format!("SUPERSEDED:{}", hex::encode(current_id)).into_bytes(); let marker_key = key_codec::superseded_key(&hex::encode(current_id));
store.put(&marker_key, new_epoch_id).await.expect("put marker"); store.put(&marker_key, new_epoch_id).await.expect("put marker");
// Check for ancestor // Check for ancestor
let epoch_key = format!("E:{}", hex::encode(current_id)).into_bytes(); let epoch_key = key_codec::epoch_key(&hex::encode(current_id));
let ancestor = match store.get(&epoch_key).await.expect("get") { let ancestor = match store.get(&epoch_key).await.expect("get") {
Some(bytes) => stemedb_core::serde::deserialize::<Epoch>(&bytes).ok(), Some(bytes) => stemedb_core::serde::deserialize::<Epoch>(&bytes).ok(),
None => None, None => None,
@ -75,7 +75,7 @@ fn create_epoch(id: [u8; 32], name: &str) -> Epoch {
#[tokio::test] #[tokio::test]
async fn test_empty_candidates() { async fn test_empty_candidates() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let lens = EpochAwareLens::with_recency(store); let lens = EpochAwareLens::with_recency(store);
let resolution = lens.resolve_async(&[]).await; let resolution = lens.resolve_async(&[]).await;
@ -86,7 +86,7 @@ async fn test_empty_candidates() {
#[tokio::test] #[tokio::test]
async fn test_single_candidate() { async fn test_single_candidate() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let lens = EpochAwareLens::with_recency(store); let lens = EpochAwareLens::with_recency(store);
let assertion = AssertionBuilder::new().subject("Tesla").timestamp(1000).build(); let assertion = AssertionBuilder::new().subject("Tesla").timestamp(1000).build();
@ -99,7 +99,7 @@ async fn test_single_candidate() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_no_epochs_passes_all() { async fn test_epoch_aware_no_epochs_passes_all() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let lens = EpochAwareLens::with_recency(store); let lens = EpochAwareLens::with_recency(store);
// Create assertions without epochs // Create assertions without epochs
@ -116,7 +116,7 @@ async fn test_epoch_aware_no_epochs_passes_all() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_excludes_superseded() { async fn test_epoch_aware_excludes_superseded() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create epochs: B supersedes A // Create epochs: B supersedes A
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
@ -149,7 +149,7 @@ async fn test_epoch_aware_excludes_superseded() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_chain_supersession() { async fn test_epoch_aware_chain_supersession() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create chain: C supersedes B, B supersedes A // Create chain: C supersedes B, B supersedes A
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
@ -191,7 +191,7 @@ async fn test_epoch_aware_chain_supersession() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_missing_epoch_record_includes() { async fn test_epoch_aware_missing_epoch_record_includes() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Only store epoch B which supersedes A, but DON'T store epoch A // Only store epoch B which supersedes A, but DON'T store epoch A
let epoch_b = test_epoch_with_supersession( let epoch_b = test_epoch_with_supersession(
@ -224,7 +224,7 @@ async fn test_epoch_aware_missing_epoch_record_includes() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_cycle_detection() { async fn test_epoch_aware_cycle_detection() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create circular supersession: A supersedes B, B supersedes A // Create circular supersession: A supersedes B, B supersedes A
let epoch_a = Epoch { let epoch_a = Epoch {
@ -275,7 +275,7 @@ async fn test_epoch_aware_cycle_detection() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_with_consensus_lens() { async fn test_epoch_aware_with_consensus_lens() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create epochs: B supersedes A // Create epochs: B supersedes A
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
@ -323,7 +323,7 @@ async fn test_epoch_aware_with_consensus_lens() {
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_mixed_epochs_and_no_epochs() { async fn test_epoch_aware_mixed_epochs_and_no_epochs() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create epochs: B supersedes A // Create epochs: B supersedes A
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
@ -357,7 +357,7 @@ async fn test_superseded_epoch_filtered_even_without_new_assertions() {
// With the O(1) marker-based approach, epochs are filtered based on // With the O(1) marker-based approach, epochs are filtered based on
// SUPERSEDED: markers, not based on what's in the candidate set. // SUPERSEDED: markers, not based on what's in the candidate set.
// If an epoch has a SUPERSEDED marker, its assertions are filtered. // If an epoch has a SUPERSEDED marker, its assertions are filtered.
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create epochs: B supersedes A // Create epochs: B supersedes A
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
@ -388,12 +388,12 @@ async fn test_epoch_without_marker_passes_through() {
// This test documents fail-open behavior: // This test documents fail-open behavior:
// If an epoch doesn't have a SUPERSEDED marker (e.g., data from before // If an epoch doesn't have a SUPERSEDED marker (e.g., data from before
// cascade logic was added), assertions pass through. // cascade logic was added), assertions pass through.
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Manually store epoch A WITHOUT writing cascade markers // Manually store epoch A WITHOUT writing cascade markers
// (simulating old data before the cascade feature) // (simulating old data before the cascade feature)
let epoch_a = create_epoch([1u8; 32], "Epoch A"); let epoch_a = create_epoch([1u8; 32], "Epoch A");
let key = format!("E:{}", hex::encode(epoch_a.id)).into_bytes(); let key = key_codec::epoch_key(&hex::encode(epoch_a.id));
let bytes = serialize(&epoch_a).expect("serialize epoch"); let bytes = serialize(&epoch_a).expect("serialize epoch");
store.put(&key, &bytes).await.expect("put epoch"); store.put(&key, &bytes).await.expect("put epoch");
@ -411,7 +411,7 @@ async fn test_epoch_without_marker_passes_through() {
#[tokio::test] #[tokio::test]
async fn test_lens_name() { async fn test_lens_name() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let lens = EpochAwareLens::with_recency(store); let lens = EpochAwareLens::with_recency(store);
assert_eq!(lens.name(), "EpochAware"); assert_eq!(lens.name(), "EpochAware");
@ -423,11 +423,11 @@ async fn test_lens_name() {
/// we don't need to read E:{epoch_id} records to determine supersession. /// we don't need to read E:{epoch_id} records to determine supersession.
#[tokio::test] #[tokio::test]
async fn test_epoch_aware_uses_marker_not_epoch_record() { async fn test_epoch_aware_uses_marker_not_epoch_record() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Write ONLY the SUPERSEDED marker, NOT the epoch records themselves // Write ONLY the SUPERSEDED marker, NOT the epoch records themselves
// This tests that we use the marker for filtering, not the epoch record // This tests that we use the marker for filtering, not the epoch record
let marker_key = format!("SUPERSEDED:{}", hex::encode([1u8; 32])).into_bytes(); let marker_key = key_codec::superseded_key(&hex::encode([1u8; 32]));
store.put(&marker_key, &[2u8; 32]).await.expect("put marker"); store.put(&marker_key, &[2u8; 32]).await.expect("put marker");
let lens = EpochAwareLens::with_recency(Arc::clone(&store)); let lens = EpochAwareLens::with_recency(Arc::clone(&store));
@ -447,6 +447,6 @@ async fn test_epoch_aware_uses_marker_not_epoch_record() {
// Verify we didn't need to read E:{epoch_id} records at all // Verify we didn't need to read E:{epoch_id} records at all
// (they don't exist in this test) // (they don't exist in this test)
let epochs = store.scan_prefix(b"E:").await.expect("scan"); let epochs = store.scan_prefix(b"\x00E:").await.expect("scan");
assert_eq!(epochs.len(), 0, "No epoch records should exist - test uses marker only"); assert_eq!(epochs.len(), 0, "No epoch records should exist - test uses marker only");
} }

View File

@ -68,7 +68,7 @@ impl<V: VoteStore, T: TrustRankStore> SkepticLens<V, T> {
/// If no votes exist, falls back to the assertion's own confidence score. /// If no votes exist, falls back to the assertion's own confidence score.
async fn get_assertion_weight(&self, assertion: &Assertion) -> f32 { async fn get_assertion_weight(&self, assertion: &Assertion) -> f32 {
let hash = Self::compute_assertion_hash(assertion); let hash = Self::compute_assertion_hash(assertion);
match self.vote_store.get_aggregate_weight(&hash).await { match self.vote_store.get_aggregate_weight(&hash, &assertion.subject).await {
Ok(weight) if weight > 0.0 => weight, Ok(weight) if weight > 0.0 => weight,
Ok(_) => { Ok(_) => {
// No votes exist, fall back to assertion confidence // No votes exist, fall back to assertion confidence

View File

@ -21,10 +21,10 @@
//! //!
//! ```ignore //! ```ignore
//! use stemedb_lens::SkepticLens; //! use stemedb_lens::SkepticLens;
//! use stemedb_storage::{SledStore, GenericVoteStore}; //! use stemedb_storage::{HybridStore, GenericVoteStore};
//! use std::sync::Arc; //! use std::sync::Arc;
//! //!
//! let store = SledStore::open("./data").await?; //! let store = HybridStore::open("./data").await?;
//! let vote_store = Arc::new(GenericVoteStore::new(store)); //! let vote_store = Arc::new(GenericVoteStore::new(store));
//! let lens = SkepticLens::new(vote_store); //! let lens = SkepticLens::new(vote_store);
//! //!
@ -74,9 +74,10 @@ impl<V: VoteStore, T: TrustRankStore> SkepticLens<V, T> {
mod tests { mod tests {
use super::*; use super::*;
use crate::traits::AnalysisLens; use crate::traits::AnalysisLens;
use std::sync::Arc;
use stemedb_core::testing::AssertionBuilder; use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::{Assertion, ObjectValue, ResolutionStatus, Vote}; use stemedb_core::types::{Assertion, ObjectValue, ResolutionStatus, Vote};
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, SledStore}; use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, HybridStore};
fn create_assertion(subject: &str, value: f64, confidence: f32) -> Assertion { fn create_assertion(subject: &str, value: f64, confidence: f32) -> Assertion {
AssertionBuilder::new() AssertionBuilder::new()
@ -98,7 +99,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_empty_candidates() { async fn test_empty_candidates() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -113,7 +114,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_single_candidate_is_unanimous() { async fn test_single_candidate_is_unanimous() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -129,7 +130,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_same_value_is_unanimous() { async fn test_same_value_is_unanimous() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -148,7 +149,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_50_50_split_is_contested() { async fn test_50_50_split_is_contested() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -166,7 +167,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_strong_majority_is_agreed() { async fn test_strong_majority_is_agreed() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -186,7 +187,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_claims_sorted_by_weight() { async fn test_claims_sorted_by_weight() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -204,7 +205,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_text_value_grouping() { async fn test_text_value_grouping() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -232,24 +233,24 @@ mod tests {
// Test the entropy calculation directly // Test the entropy calculation directly
// 50/50 split: max entropy for 2 options = 1.0 // 50/50 split: max entropy for 2 options = 1.0
let weights_equal = vec![(ObjectValue::Number(1.0), 0.5), (ObjectValue::Number(2.0), 0.5)]; let weights_equal = vec![(ObjectValue::Number(1.0), 0.5), (ObjectValue::Number(2.0), 0.5)];
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_equal); let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_equal);
assert!((score - 1.0).abs() < 0.01); assert!((score - 1.0).abs() < 0.01);
// 100/0 split: zero entropy // 100/0 split: zero entropy
let weights_unanimous = let weights_unanimous =
vec![(ObjectValue::Number(1.0), 1.0), (ObjectValue::Number(2.0), 0.0)]; vec![(ObjectValue::Number(1.0), 1.0), (ObjectValue::Number(2.0), 0.0)];
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_unanimous); let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_unanimous);
assert!((score - 0.0).abs() < 0.01); assert!((score - 0.0).abs() < 0.01);
// Single claim: unanimous // Single claim: unanimous
let weights_single = vec![(ObjectValue::Number(1.0), 1.0)]; let weights_single = vec![(ObjectValue::Number(1.0), 1.0)];
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_single); let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_single);
assert!((score - 0.0).abs() < 0.01); assert!((score - 0.0).abs() < 0.01);
} }
#[tokio::test] #[tokio::test]
async fn test_lens_name() { async fn test_lens_name() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store); let lens = SkepticLens::new(vote_store, trust_store);
@ -259,7 +260,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_with_votes() { async fn test_with_votes() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(Arc::clone(&vote_store), trust_store); let lens = SkepticLens::new(Arc::clone(&vote_store), trust_store);
@ -270,9 +271,9 @@ mod tests {
// Add votes to make a1 a strong winner // Add votes to make a1 a strong winner
let hash1 = let hash1 =
SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::compute_assertion_hash(&a1); SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::compute_assertion_hash(&a1);
let hash2 = let hash2 =
SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::compute_assertion_hash(&a2); SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::compute_assertion_hash(&a2);
// a1 gets 10 votes totaling 9.0 weight (strong majority) // a1 gets 10 votes totaling 9.0 weight (strong majority)
for i in 0..10 { for i in 0..10 {
@ -285,7 +286,7 @@ mod tests {
source_url: None, source_url: None,
observed_context: None, observed_context: None,
}; };
vote_store.put_vote(&vote).await.expect("put vote"); vote_store.put_vote(&vote, "Tesla").await.expect("put vote");
} }
// a2 gets 1 vote with 0.5 weight (minority) // a2 gets 1 vote with 0.5 weight (minority)
@ -298,7 +299,7 @@ mod tests {
source_url: None, source_url: None,
observed_context: None, observed_context: None,
}; };
vote_store.put_vote(&vote).await.expect("put vote"); vote_store.put_vote(&vote, "Tesla").await.expect("put vote");
let analysis = lens.analyze(&[a1, a2]).await; let analysis = lens.analyze(&[a1, a2]).await;

View File

@ -46,10 +46,10 @@ pub use crate::vote_aware_consensus::AsyncLens;
/// ///
/// ```ignore /// ```ignore
/// use stemedb_lens::TrustAwareAuthorityLens; /// use stemedb_lens::TrustAwareAuthorityLens;
/// use stemedb_storage::{SledStore, GenericTrustRankStore}; /// use stemedb_storage::{HybridStore, GenericTrustRankStore};
/// use std::sync::Arc; /// use std::sync::Arc;
/// ///
/// let store = SledStore::open("./data").await?; /// let store = HybridStore::open("./data").await?;
/// let trust_store = Arc::new(GenericTrustRankStore::new(store)); /// let trust_store = Arc::new(GenericTrustRankStore::new(store));
/// let lens = TrustAwareAuthorityLens::new(trust_store); /// let lens = TrustAwareAuthorityLens::new(trust_store);
/// ///
@ -209,7 +209,7 @@ mod tests {
use super::*; use super::*;
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::testing::AssertionBuilder; use stemedb_core::testing::AssertionBuilder;
use stemedb_storage::{GenericTrustRankStore, SledStore, TrustRank, TrustRankStore}; use stemedb_storage::{GenericTrustRankStore, HybridStore, TrustRank, TrustRankStore};
fn create_assertion( fn create_assertion(
subject: &str, subject: &str,
@ -227,7 +227,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_empty_candidates() { async fn test_empty_candidates() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);
@ -239,7 +239,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_single_candidate() { async fn test_single_candidate() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);
@ -254,7 +254,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_selects_highest_weighted_score() { async fn test_selects_highest_weighted_score() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -288,7 +288,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_high_confidence_low_trust_vs_low_confidence_high_trust() { async fn test_high_confidence_low_trust_vs_low_confidence_high_trust() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -319,7 +319,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_default_trust_for_new_agent() { async fn test_default_trust_for_new_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);
@ -336,7 +336,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_no_signatures_treated_as_untrusted() { async fn test_no_signatures_treated_as_untrusted() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -360,7 +360,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_tie_breaking_by_timestamp() { async fn test_tie_breaking_by_timestamp() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -383,7 +383,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_lens_name() { async fn test_lens_name() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store); let lens = TrustAwareAuthorityLens::new(trust_store);
@ -392,7 +392,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_multiple_candidates_different_trust_levels() { async fn test_multiple_candidates_different_trust_levels() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -430,7 +430,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_zero_trust_agent() { async fn test_zero_trust_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
@ -460,7 +460,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_perfect_trust_agent() { async fn test_perfect_trust_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = Arc::new(GenericTrustRankStore::new(store)); let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store)); let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));

View File

@ -67,10 +67,10 @@ pub trait AsyncLens: Send + Sync {
/// ///
/// ```ignore /// ```ignore
/// use stemedb_lens::VoteAwareConsensusLens; /// use stemedb_lens::VoteAwareConsensusLens;
/// use stemedb_storage::{SledStore, GenericVoteStore}; /// use stemedb_storage::{HybridStore, GenericVoteStore};
/// use std::sync::Arc; /// use std::sync::Arc;
/// ///
/// let store = SledStore::open("./data").await?; /// let store = HybridStore::open("./data").await?;
/// let vote_store = Arc::new(GenericVoteStore::new(store)); /// let vote_store = Arc::new(GenericVoteStore::new(store));
/// let lens = VoteAwareConsensusLens::new(vote_store); /// let lens = VoteAwareConsensusLens::new(vote_store);
/// ///
@ -147,7 +147,8 @@ impl<V: VoteStore + 'static> AsyncLens for VoteAwareConsensusLens<V> {
// Lookup vote count and aggregate weight from VoteStore // Lookup vote count and aggregate weight from VoteStore
// These are O(1) operations thanks to VoteStore's cached counters // These are O(1) operations thanks to VoteStore's cached counters
let vote_count = match self.vote_store.get_vote_count(&assertion_hash).await { let vote_count =
match self.vote_store.get_vote_count(&assertion_hash, &assertion.subject).await {
Ok(count) => count, Ok(count) => count,
Err(e) => { Err(e) => {
debug!( debug!(
@ -159,7 +160,10 @@ impl<V: VoteStore + 'static> AsyncLens for VoteAwareConsensusLens<V> {
} }
}; };
let aggregate_weight = match self.vote_store.get_aggregate_weight(&assertion_hash).await let aggregate_weight = match self
.vote_store
.get_aggregate_weight(&assertion_hash, &assertion.subject)
.await
{ {
Ok(weight) => weight, Ok(weight) => weight,
Err(e) => { Err(e) => {
@ -228,7 +232,7 @@ mod tests {
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::testing::{self, AssertionBuilder}; use stemedb_core::testing::{self, AssertionBuilder};
use stemedb_core::types::Vote; use stemedb_core::types::Vote;
use stemedb_storage::{GenericVoteStore, SledStore}; use stemedb_storage::{GenericVoteStore, HybridStore};
fn create_assertion(subject: &str, value: f64, timestamp: u64) -> Assertion { fn create_assertion(subject: &str, value: f64, timestamp: u64) -> Assertion {
AssertionBuilder::new().subject(subject).object_number(value).timestamp(timestamp).build() AssertionBuilder::new().subject(subject).object_number(value).timestamp(timestamp).build()
@ -240,7 +244,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_empty_candidates() { async fn test_empty_candidates() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
@ -252,7 +256,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_single_candidate() { async fn test_single_candidate() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
@ -266,7 +270,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_selects_highest_vote_weight() { async fn test_selects_highest_vote_weight() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -277,19 +281,31 @@ mod tests {
// Add votes: a1 gets 0.5 weight, a2 gets 1.5 weight (winner), a3 gets 0.3 weight // Add votes: a1 gets 0.5 weight, a2 gets 1.5 weight (winner), a3 gets 0.3 weight
let hash1 = let hash1 =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a1) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a1)
.unwrap(); .unwrap();
let hash2 = let hash2 =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a2) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a2)
.unwrap(); .unwrap();
let hash3 = let hash3 =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a3) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a3)
.unwrap(); .unwrap();
vote_store.put_vote(&create_vote(hash1, [1u8; 32], 0.5, 2000)).await.expect("put"); vote_store
vote_store.put_vote(&create_vote(hash2, [2u8; 32], 0.8, 2001)).await.expect("put"); .put_vote(&create_vote(hash1, [1u8; 32], 0.5, 2000), "Agent1")
vote_store.put_vote(&create_vote(hash2, [3u8; 32], 0.7, 2002)).await.expect("put"); .await
vote_store.put_vote(&create_vote(hash3, [4u8; 32], 0.3, 2003)).await.expect("put"); .expect("put");
vote_store
.put_vote(&create_vote(hash2, [2u8; 32], 0.8, 2001), "Agent2")
.await
.expect("put");
vote_store
.put_vote(&create_vote(hash2, [3u8; 32], 0.7, 2002), "Agent2")
.await
.expect("put");
vote_store
.put_vote(&create_vote(hash3, [4u8; 32], 0.3, 2003), "Agent3")
.await
.expect("put");
let resolution = lens.resolve_async(&[a1, a2.clone(), a3]).await; let resolution = lens.resolve_async(&[a1, a2.clone(), a3]).await;
@ -305,7 +321,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_no_votes_returns_most_recent() { async fn test_no_votes_returns_most_recent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
@ -324,7 +340,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_tie_breaking_by_timestamp() { async fn test_tie_breaking_by_timestamp() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -333,14 +349,20 @@ mod tests {
// Give both the same vote weight // Give both the same vote weight
let hash_old = let hash_old =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&old) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&old)
.unwrap(); .unwrap();
let hash_new = let hash_new =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&new) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&new)
.unwrap(); .unwrap();
vote_store.put_vote(&create_vote(hash_old, [1u8; 32], 0.5, 3000)).await.expect("put"); vote_store
vote_store.put_vote(&create_vote(hash_new, [2u8; 32], 0.5, 3001)).await.expect("put"); .put_vote(&create_vote(hash_old, [1u8; 32], 0.5, 3000), "Old")
.await
.expect("put");
vote_store
.put_vote(&create_vote(hash_new, [2u8; 32], 0.5, 3001), "New")
.await
.expect("put");
let resolution = lens.resolve_async(&[old, new.clone()]).await; let resolution = lens.resolve_async(&[old, new.clone()]).await;
@ -351,7 +373,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_mixed_votes_and_no_votes() { async fn test_mixed_votes_and_no_votes() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -359,12 +381,15 @@ mod tests {
let without_votes = create_assertion("NoVotes", 200.0, 2000); let without_votes = create_assertion("NoVotes", 200.0, 2000);
let hash_with = let hash_with =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash( VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
&with_votes, &with_votes,
) )
.unwrap(); .unwrap();
vote_store.put_vote(&create_vote(hash_with, [1u8; 32], 0.8, 3000)).await.expect("put"); vote_store
.put_vote(&create_vote(hash_with, [1u8; 32], 0.8, 3000), "WithVotes")
.await
.expect("put");
let resolution = lens.resolve_async(&[with_votes.clone(), without_votes]).await; let resolution = lens.resolve_async(&[with_votes.clone(), without_votes]).await;
@ -378,7 +403,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_lens_name() { async fn test_lens_name() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
@ -387,7 +412,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_many_votes_on_single_assertion() { async fn test_many_votes_on_single_assertion() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -395,10 +420,12 @@ mod tests {
let unpopular = create_assertion("Unpopular", 200.0, 1100); let unpopular = create_assertion("Unpopular", 200.0, 1100);
let hash_popular = let hash_popular =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&popular) VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
&popular,
)
.unwrap(); .unwrap();
let hash_unpopular = let hash_unpopular =
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash( VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
&unpopular, &unpopular,
) )
.unwrap(); .unwrap();
@ -411,14 +438,14 @@ mod tests {
id id
}; };
vote_store vote_store
.put_vote(&create_vote(hash_popular, agent_id, 0.5, 2000 + i as u64)) .put_vote(&create_vote(hash_popular, agent_id, 0.5, 2000 + i as u64), "Popular")
.await .await
.expect("put"); .expect("put");
} }
// Unpopular gets 1 vote // Unpopular gets 1 vote
vote_store vote_store
.put_vote(&create_vote(hash_unpopular, [99u8; 32], 0.5, 2100)) .put_vote(&create_vote(hash_unpopular, [99u8; 32], 0.5, 2100), "Unpopular")
.await .await
.expect("put"); .expect("put");

View File

@ -10,7 +10,7 @@
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::Assertion; use stemedb_core::types::Assertion;
use stemedb_storage::{IndexStore, KVStore, VectorIndex, VisualIndex}; use stemedb_storage::{key_codec, IndexStore, KVStore, VectorIndex, VisualIndex};
use tracing::debug; use tracing::debug;
use crate::error::{QueryError, Result}; use crate::error::{QueryError, Result};
@ -25,7 +25,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
let mut results = Vec::with_capacity(hash_list.len()); let mut results = Vec::with_capacity(hash_list.len());
for hash in hash_list { for hash in hash_list {
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes(); let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&assertion_key).await? { if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) { match self.deserialize_assertion(&data) {
Ok(assertion) => results.push(assertion), Ok(assertion) => results.push(assertion),
@ -49,7 +49,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
let mut results = Vec::with_capacity(hash_list.len()); let mut results = Vec::with_capacity(hash_list.len());
for hash in hash_list { for hash in hash_list {
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes(); let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&assertion_key).await? { if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) { match self.deserialize_assertion(&data) {
Ok(assertion) => results.push(assertion), Ok(assertion) => results.push(assertion),
@ -63,20 +63,34 @@ impl<S: KVStore + 'static> QueryEngine<S> {
Ok(results) Ok(results)
} }
/// Fetch all assertions (full scan of H: prefix). /// Fetch all assertions by scanning the subjects discovery index.
/// ///
/// This is O(n) and should be avoided for large databases. /// This scans `\x00SUBJECTS:` to discover all known subjects, then fetches
/// all assertions per subject. This is O(n) and should be avoided for large databases.
/// Use subject/predicate indexes when possible. /// Use subject/predicate indexes when possible.
pub(super) async fn fetch_all_assertions(&self) -> Result<Vec<Assertion>> { pub(super) async fn fetch_all_assertions(&self) -> Result<Vec<Assertion>> {
let entries = self.store.scan_prefix(b"H:").await?; // Discover all subjects via the subjects index
let subject_entries = self.store.scan_prefix(&key_codec::subjects_scan_prefix()).await?;
let mut assertions = Vec::with_capacity(entries.len()); let mut assertions = Vec::new();
for (_key, data) in entries { for (key, _) in subject_entries {
// Extract subject from key: \x00SUBJECTS:{subject}
let subject = match key_codec::extract_subject_from_subjects_key(&key) {
Some(s) => s,
None => continue,
};
// Fetch all assertions for this subject via the subject index
let hash_list = self.index_store.get_by_subject(&subject).await?;
for hash in hash_list {
let assertion_key = key_codec::assertion_key(&subject, &hex::encode(hash));
if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) { match self.deserialize_assertion(&data) {
Ok(assertion) => assertions.push(assertion), Ok(assertion) => assertions.push(assertion),
Err(e) => { Err(e) => {
debug!("Skipping malformed assertion: {:?}", e); debug!("Skipping malformed assertion: {:?}", e);
// Skip malformed entries rather than failing the whole query }
}
} }
} }
} }
@ -99,22 +113,39 @@ impl<S: KVStore + 'static> QueryEngine<S> {
debug!(candidates_count = neighbors.len(), "Vector index returned candidates"); debug!(candidates_count = neighbors.len(), "Vector index returned candidates");
// Fetch assertions by their hashes // Fetch assertions by their hashes using reverse index for subject lookup
let mut results = Vec::with_capacity(neighbors.len()); let mut results = Vec::with_capacity(neighbors.len());
for (hash, distance) in neighbors { for (hash, distance) in neighbors {
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes(); let hash_hex = hex::encode(hash);
// Look up subject from reverse index
let reverse_key = key_codec::hash_subject_key(&hash_hex);
let subject = match self.store.get(&reverse_key).await? {
Some(bytes) => match String::from_utf8(bytes) {
Ok(s) => s,
Err(_) => {
debug!(hash = %hash_hex, "Invalid UTF-8 in reverse index, skipping");
continue;
}
},
None => {
debug!(hash = %hash_hex, "No reverse index entry, skipping");
continue;
}
};
let assertion_key = key_codec::assertion_key(&subject, &hash_hex);
if let Some(data) = self.store.get(&assertion_key).await? { if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) { match self.deserialize_assertion(&data) {
Ok(assertion) => { Ok(assertion) => {
debug!( debug!(
hash = %hex::encode(hash), hash = %hash_hex,
distance, distance,
"Found assertion via vector index" "Found assertion via vector index"
); );
results.push(assertion); results.push(assertion);
} }
Err(e) => { Err(e) => {
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e); debug!(hash = %hash_hex, "Skipping malformed assertion: {:?}", e);
} }
} }
} }
@ -147,22 +178,39 @@ impl<S: KVStore + 'static> QueryEngine<S> {
debug!(candidates_count = matches.len(), threshold, "Visual index returned candidates"); debug!(candidates_count = matches.len(), threshold, "Visual index returned candidates");
// Fetch assertions by their hashes // Fetch assertions by their hashes using reverse index for subject lookup
let mut results = Vec::with_capacity(matches.len()); let mut results = Vec::with_capacity(matches.len());
for (hash, distance) in matches { for (hash, distance) in matches {
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes(); let hash_hex = hex::encode(hash);
// Look up subject from reverse index
let reverse_key = key_codec::hash_subject_key(&hash_hex);
let subject = match self.store.get(&reverse_key).await? {
Some(bytes) => match String::from_utf8(bytes) {
Ok(s) => s,
Err(_) => {
debug!(hash = %hash_hex, "Invalid UTF-8 in reverse index, skipping");
continue;
}
},
None => {
debug!(hash = %hash_hex, "No reverse index entry, skipping");
continue;
}
};
let assertion_key = key_codec::assertion_key(&subject, &hash_hex);
if let Some(data) = self.store.get(&assertion_key).await? { if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) { match self.deserialize_assertion(&data) {
Ok(assertion) => { Ok(assertion) => {
debug!( debug!(
hash = %hex::encode(hash), hash = %hash_hex,
distance, distance,
"Found assertion via visual index" "Found assertion via visual index"
); );
results.push(assertion); results.push(assertion);
} }
Err(e) => { Err(e) => {
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e); debug!(hash = %hash_hex, "Skipping malformed assertion: {:?}", e);
} }
} }
} }

View File

@ -9,7 +9,7 @@
use std::time::{SystemTime, UNIX_EPOCH}; use std::time::{SystemTime, UNIX_EPOCH};
use stemedb_core::types::{Assertion, MaterializedView}; use stemedb_core::types::{Assertion, MaterializedView};
use stemedb_storage::KVStore; use stemedb_storage::{key_codec, KVStore};
use tracing::debug; use tracing::debug;
use crate::decay::{apply_decay, apply_source_class_decay}; use crate::decay::{apply_decay, apply_source_class_decay};
@ -35,7 +35,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
predicate: &str, predicate: &str,
query: &Query, query: &Query,
) -> Result<Option<QueryResult>> { ) -> Result<Option<QueryResult>> {
let mv_key = format!("MV:{}:{}", subject, predicate).into_bytes(); let mv_key = key_codec::mv_key(subject, predicate);
let data = match self.store.get(&mv_key).await? { let data = match self.store.get(&mv_key).await? {
Some(data) => data, Some(data) => data,

View File

@ -2,14 +2,14 @@
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::LifecycleStage; use stemedb_core::types::LifecycleStage;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use super::{create_test_assertion, store_assertion, QueryEngine}; use super::{create_test_assertion, store_assertion, QueryEngine};
use crate::query::Query; use crate::query::Query;
#[tokio::test] #[tokio::test]
async fn test_query_empty_store() { async fn test_query_empty_store() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let engine = QueryEngine::new(Arc::new(store)); let engine = QueryEngine::new(Arc::new(store));
let query = Query::builder().subject("Tesla").build(); let query = Query::builder().subject("Tesla").build();
@ -21,7 +21,7 @@ async fn test_query_empty_store() {
#[tokio::test] #[tokio::test]
async fn test_query_by_subject() { async fn test_query_by_subject() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let tesla = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let tesla = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let apple = create_test_assertion("Apple", "revenue", LifecycleStage::Approved); let apple = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);
@ -40,7 +40,7 @@ async fn test_query_by_subject() {
#[tokio::test] #[tokio::test]
async fn test_query_by_lifecycle() { async fn test_query_by_lifecycle() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let proposed = create_test_assertion("Tesla", "profit", LifecycleStage::Proposed); let proposed = create_test_assertion("Tesla", "profit", LifecycleStage::Proposed);
@ -60,7 +60,7 @@ async fn test_query_by_lifecycle() {
#[tokio::test] #[tokio::test]
async fn test_query_with_limit() { async fn test_query_with_limit() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store multiple assertions // Store multiple assertions
for i in 0..5 { for i in 0..5 {
@ -81,7 +81,7 @@ async fn test_query_with_limit() {
#[tokio::test] #[tokio::test]
async fn test_query_all_filters() { async fn test_query_all_filters() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let target = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let target = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let wrong_subject = create_test_assertion("Apple", "revenue", LifecycleStage::Approved); let wrong_subject = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);

View File

@ -2,14 +2,14 @@
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::{LifecycleStage, MaterializedView}; use stemedb_core::types::{LifecycleStage, MaterializedView};
use stemedb_storage::{KVStore, SledStore}; use stemedb_storage::{key_codec, HybridStore, KVStore};
use super::{create_test_assertion, store_assertion, QueryEngine}; use super::{create_test_assertion, store_assertion, QueryEngine};
use crate::query::Query; use crate::query::Query;
/// Helper to store a materialized view with a custom conflict score. /// Helper to store a materialized view with a custom conflict score.
async fn store_mv_with_conflict( async fn store_mv_with_conflict(
store: &SledStore, store: &Arc<HybridStore>,
subject: &str, subject: &str,
predicate: &str, predicate: &str,
conflict_score: f32, conflict_score: f32,
@ -26,14 +26,14 @@ async fn store_mv_with_conflict(
conflict_score, conflict_score,
}; };
let key = format!("MV:{}:{}", subject, predicate).into_bytes(); let key = key_codec::mv_key(subject, predicate);
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV"); let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
store.put(&key, &bytes).await.expect("put MV"); store.put(&key, &bytes).await.expect("put MV");
} }
#[tokio::test] #[tokio::test]
async fn test_min_conflict_score_returns_empty_when_below_threshold() { async fn test_min_conflict_score_returns_empty_when_below_threshold() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with low conflict (agreement) // Store MV with low conflict (agreement)
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await; store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
@ -55,7 +55,7 @@ async fn test_min_conflict_score_returns_empty_when_below_threshold() {
#[tokio::test] #[tokio::test]
async fn test_min_conflict_score_returns_result_when_above_threshold() { async fn test_min_conflict_score_returns_result_when_above_threshold() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with high conflict (disagreement) // Store MV with high conflict (disagreement)
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await; store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
@ -77,7 +77,7 @@ async fn test_min_conflict_score_returns_result_when_above_threshold() {
#[tokio::test] #[tokio::test]
async fn test_min_conflict_score_edge_case_exact_match() { async fn test_min_conflict_score_edge_case_exact_match() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with conflict score exactly at threshold // Store MV with conflict score exactly at threshold
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await; store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
@ -98,7 +98,7 @@ async fn test_min_conflict_score_edge_case_exact_match() {
#[tokio::test] #[tokio::test]
async fn test_max_conflict_score_returns_result_when_below_threshold() { async fn test_max_conflict_score_returns_result_when_below_threshold() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with low conflict (agreement) // Store MV with low conflict (agreement)
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await; store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
@ -120,7 +120,7 @@ async fn test_max_conflict_score_returns_result_when_below_threshold() {
#[tokio::test] #[tokio::test]
async fn test_max_conflict_score_returns_empty_when_above_threshold() { async fn test_max_conflict_score_returns_empty_when_above_threshold() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with high conflict (disagreement) // Store MV with high conflict (disagreement)
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await; store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
@ -142,7 +142,7 @@ async fn test_max_conflict_score_returns_empty_when_above_threshold() {
#[tokio::test] #[tokio::test]
async fn test_max_conflict_score_edge_case_exact_match() { async fn test_max_conflict_score_edge_case_exact_match() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with conflict score exactly at threshold // Store MV with conflict score exactly at threshold
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await; store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
@ -163,7 +163,7 @@ async fn test_max_conflict_score_edge_case_exact_match() {
#[tokio::test] #[tokio::test]
async fn test_both_conflict_scores_filters_range() { async fn test_both_conflict_scores_filters_range() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MVs with different conflict scores // Store MVs with different conflict scores
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await; // Too low store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await; // Too low
@ -213,7 +213,7 @@ async fn test_both_conflict_scores_filters_range() {
#[tokio::test] #[tokio::test]
async fn test_no_conflict_filters_returns_all() { async fn test_no_conflict_filters_returns_all() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MVs with different conflict scores // Store MVs with different conflict scores
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await; store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await;
@ -233,7 +233,7 @@ async fn test_no_conflict_filters_returns_all() {
#[tokio::test] #[tokio::test]
async fn test_conflict_filters_combine_with_lifecycle() { async fn test_conflict_filters_combine_with_lifecycle() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with high conflict and Approved lifecycle // Store MV with high conflict and Approved lifecycle
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved); let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
@ -248,7 +248,7 @@ async fn test_conflict_filters_combine_with_lifecycle() {
conflict_score: 0.8, conflict_score: 0.8,
}; };
let key = b"MV:Drug:effect".to_vec(); let key = key_codec::mv_key("Drug", "effect");
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV"); let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
store.put(&key, &bytes).await.expect("put MV"); store.put(&key, &bytes).await.expect("put MV");
@ -271,7 +271,7 @@ async fn test_conflict_filters_combine_with_lifecycle() {
#[tokio::test] #[tokio::test]
async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() { async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store MV with high conflict but Approved lifecycle // Store MV with high conflict but Approved lifecycle
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved); let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
@ -286,7 +286,7 @@ async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() {
conflict_score: 0.8, conflict_score: 0.8,
}; };
let key = b"MV:Drug:effect".to_vec(); let key = key_codec::mv_key("Drug", "effect");
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV"); let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
store.put(&key, &bytes).await.expect("put MV"); store.put(&key, &bytes).await.expect("put MV");

View File

@ -2,14 +2,14 @@
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::{LifecycleStage, ObjectValue}; use stemedb_core::types::{LifecycleStage, ObjectValue};
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use super::{create_test_assertion, store_assertion, QueryEngine}; use super::{create_test_assertion, store_assertion, QueryEngine};
use crate::query::Query; use crate::query::Query;
#[tokio::test] #[tokio::test]
async fn test_compound_index_lookup() { async fn test_compound_index_lookup() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create multiple assertions with different subject/predicate combinations // Create multiple assertions with different subject/predicate combinations
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
@ -34,7 +34,7 @@ async fn test_compound_index_lookup() {
#[tokio::test] #[tokio::test]
async fn test_compound_index_multiple_assertions() { async fn test_compound_index_multiple_assertions() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store multiple assertions with same subject+predicate but different values/timestamps // Store multiple assertions with same subject+predicate but different values/timestamps
let mut assertion1 = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed); let mut assertion1 = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
@ -66,7 +66,7 @@ async fn test_compound_index_multiple_assertions() {
#[tokio::test] #[tokio::test]
async fn test_subject_only_index_still_works() { async fn test_subject_only_index_still_works() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let tesla_profit = create_test_assertion("Tesla", "profit", LifecycleStage::Approved); let tesla_profit = create_test_assertion("Tesla", "profit", LifecycleStage::Approved);
@ -89,7 +89,7 @@ async fn test_subject_only_index_still_works() {
#[tokio::test] #[tokio::test]
async fn test_compound_index_empty_result() { async fn test_compound_index_empty_result() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);

View File

@ -3,7 +3,7 @@
use std::sync::Arc; use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH}; use std::time::{SystemTime, UNIX_EPOCH};
use stemedb_core::types::LifecycleStage; use stemedb_core::types::LifecycleStage;
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use super::{ use super::{
create_test_assertion, store_assertion, store_materialized_view, create_test_assertion, store_assertion, store_materialized_view,
@ -13,7 +13,7 @@ use crate::query::Query;
#[tokio::test] #[tokio::test]
async fn test_fast_path_returns_materialized_view() { async fn test_fast_path_returns_materialized_view() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
store_assertion(&store, &assertion).await; store_assertion(&store, &assertion).await;
@ -32,7 +32,7 @@ async fn test_fast_path_returns_materialized_view() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_falls_back_when_no_mv() { async fn test_fast_path_falls_back_when_no_mv() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store assertion but NO materialized view // Store assertion but NO materialized view
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
@ -50,7 +50,7 @@ async fn test_fast_path_falls_back_when_no_mv() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_respects_lifecycle_filter() { async fn test_fast_path_respects_lifecycle_filter() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// MV winner is Approved // MV winner is Approved
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
@ -78,7 +78,7 @@ async fn test_fast_path_respects_lifecycle_filter() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_not_used_for_subject_only() { async fn test_fast_path_not_used_for_subject_only() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
store_assertion(&store, &assertion).await; store_assertion(&store, &assertion).await;
@ -95,7 +95,7 @@ async fn test_fast_path_not_used_for_subject_only() {
#[tokio::test] #[tokio::test]
async fn test_query_strategy_selection() { async fn test_query_strategy_selection() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let apple_profit = create_test_assertion("Apple", "profit", LifecycleStage::Approved); let apple_profit = create_test_assertion("Apple", "profit", LifecycleStage::Approved);
@ -127,7 +127,7 @@ async fn test_query_strategy_selection() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_stale_view_falls_back() { async fn test_fast_path_stale_view_falls_back() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
// Store an assertion and multiple MVs with different timestamps // Store an assertion and multiple MVs with different timestamps
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
@ -159,7 +159,7 @@ async fn test_fast_path_stale_view_falls_back() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_fresh_view_used() { async fn test_fast_path_fresh_view_used() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
store_assertion(&store, &assertion).await; store_assertion(&store, &assertion).await;
@ -186,7 +186,7 @@ async fn test_fast_path_fresh_view_used() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_no_max_stale_always_uses_mv() { async fn test_fast_path_no_max_stale_always_uses_mv() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed); let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
@ -212,7 +212,7 @@ async fn test_fast_path_no_max_stale_always_uses_mv() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_max_stale_zero_rejects_old_mv() { async fn test_fast_path_max_stale_zero_rejects_old_mv() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed); let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
@ -239,7 +239,7 @@ async fn test_fast_path_max_stale_zero_rejects_old_mv() {
#[tokio::test] #[tokio::test]
async fn test_fast_path_max_stale_zero_accepts_brand_new_mv() { async fn test_fast_path_max_stale_zero_accepts_brand_new_mv() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved); let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed); let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);

View File

@ -1,11 +1,10 @@
//! Test suite for QueryEngine. //! Test suite for QueryEngine.
use rkyv::ser::serializers::AllocSerializer; use std::sync::Arc;
use rkyv::ser::Serializer;
use stemedb_core::testing::AssertionBuilder; use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::{Assertion, LifecycleStage, MaterializedView}; use stemedb_core::types::{Assertion, LifecycleStage, MaterializedView};
use stemedb_storage::{GenericIndexStore, IndexStore, KVStore, SledStore}; use stemedb_storage::{key_codec, GenericIndexStore, HybridStore, IndexStore, KVStore};
use super::QueryEngine; use super::QueryEngine;
@ -32,13 +31,11 @@ pub(super) fn create_test_assertion(
} }
/// Helper to store an assertion in the KV store and update indexes. /// Helper to store an assertion in the KV store and update indexes.
pub(super) async fn store_assertion(store: &SledStore, assertion: &Assertion) { pub(super) async fn store_assertion(store: &Arc<HybridStore>, assertion: &Assertion) {
let mut serializer = AllocSerializer::<4096>::default(); let bytes = stemedb_core::serde::serialize(assertion).expect("serialize");
serializer.serialize_value(assertion).expect("serialize");
let bytes = serializer.into_serializer().into_inner();
let hash = blake3::hash(&bytes); let hash = blake3::hash(&bytes);
let key = format!("H:{}", hash.to_hex()).into_bytes(); let key = key_codec::assertion_key(&assertion.subject, &hash.to_hex());
store.put(&key, &bytes).await.expect("put"); store.put(&key, &bytes).await.expect("put");
// Update indexes using IndexStore // Update indexes using IndexStore
@ -52,7 +49,7 @@ pub(super) async fn store_assertion(store: &SledStore, assertion: &Assertion) {
/// Helper to store a materialized view directly in the KV store. /// Helper to store a materialized view directly in the KV store.
pub(super) async fn store_materialized_view( pub(super) async fn store_materialized_view(
store: &SledStore, store: &Arc<HybridStore>,
subject: &str, subject: &str,
predicate: &str, predicate: &str,
winner: &Assertion, winner: &Assertion,
@ -62,7 +59,7 @@ pub(super) async fn store_materialized_view(
/// Helper to store a materialized view with a custom materialized_at timestamp. /// Helper to store a materialized view with a custom materialized_at timestamp.
pub(super) async fn store_materialized_view_with_time( pub(super) async fn store_materialized_view_with_time(
store: &SledStore, store: &Arc<HybridStore>,
subject: &str, subject: &str,
predicate: &str, predicate: &str,
winner: &Assertion, winner: &Assertion,
@ -77,7 +74,7 @@ pub(super) async fn store_materialized_view_with_time(
conflict_score: 0.1, conflict_score: 0.1,
}; };
let key = format!("MV:{}:{}", subject, predicate).into_bytes(); let key = key_codec::mv_key(subject, predicate);
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV"); let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
store.put(&key, &bytes).await.expect("put MV"); store.put(&key, &bytes).await.expect("put MV");
} }

View File

@ -29,16 +29,10 @@ use crate::error::{QueryError, Result};
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::{Assertion, EscalationEvent, EscalationPolicy, MaterializedView}; use stemedb_core::types::{Assertion, EscalationEvent, EscalationPolicy, MaterializedView};
use stemedb_lens::AsyncLens; use stemedb_lens::AsyncLens;
use stemedb_storage::{EscalationStore, GenericIndexStore, KVStore}; use stemedb_storage::{key_codec, EscalationStore, GenericIndexStore, KVStore};
use tokio::sync::Notify; use tokio::sync::Notify;
use tracing::{debug, error, info, instrument, warn}; use tracing::{debug, error, info, instrument, warn};
/// Key prefix for materialized views.
const MV_PREFIX: &str = "MV:";
/// Key prefix for compound indexes (used to discover subject+predicate pairs).
const SP_PREFIX: &[u8] = b"SP:";
/// Report from a single materialization pass. /// Report from a single materialization pass.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct MaterializeReport { pub struct MaterializeReport {
@ -64,11 +58,11 @@ pub struct MaterializeReport {
/// ///
/// ```ignore /// ```ignore
/// use stemedb_query::Materializer; /// use stemedb_query::Materializer;
/// use stemedb_storage::{SledStore, GenericVoteStore}; /// use stemedb_storage::{HybridStore, GenericVoteStore};
/// use stemedb_lens::VoteAwareConsensusLens; /// use stemedb_lens::VoteAwareConsensusLens;
/// use std::sync::Arc; /// use std::sync::Arc;
/// ///
/// let store = Arc::new(SledStore::open("./data")?); /// let store = Arc::new(HybridStore::open("./data")?);
/// let vote_store = Arc::new(GenericVoteStore::new(store.clone())); /// let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
/// let lens = VoteAwareConsensusLens::new(vote_store); /// let lens = VoteAwareConsensusLens::new(vote_store);
/// ///
@ -138,24 +132,32 @@ impl<S: KVStore + 'static> Materializer<S> {
pub async fn step(&self) -> Result<MaterializeReport> { pub async fn step(&self) -> Result<MaterializeReport> {
let mut report = MaterializeReport::default(); let mut report = MaterializeReport::default();
// Discover all subject+predicate pairs from SP: index // Discover all subject+predicate pairs from subject-prefixed SP: keys
let sp_entries = self.store.scan_prefix(SP_PREFIX).await?; // We scan all subjects first, then fetch their SP: keys
let subject_entries = self.store.scan_prefix(&key_codec::subjects_scan_prefix()).await?;
let mut sp_pairs: Vec<(String, String)> = Vec::new();
for (key, _value) in &sp_entries { for (key, _) in &subject_entries {
report.pairs_scanned += 1; let subject = match key_codec::extract_subject_from_subjects_key(key) {
Some(s) => s,
// Parse the SP:{subject}:{predicate} key None => continue,
let (subject, predicate) = match Self::parse_sp_key(key) {
Some(pair) => pair,
None => {
warn!(key = %String::from_utf8_lossy(key), "Skipping malformed SP: key");
report.errors += 1;
continue;
}
}; };
// Scan this subject's SP: keys
let sp_prefix = key_codec::subject_predicate_scan_prefix(&subject);
let sp_entries = self.store.scan_prefix(&sp_prefix).await?;
for (sp_key, _) in sp_entries {
if let Some((s, p)) = key_codec::extract_sp_key(&sp_key) {
sp_pairs.push((s, p));
}
}
}
for (subject, predicate) in &sp_pairs {
report.pairs_scanned += 1;
// Materialize this subject+predicate pair // Materialize this subject+predicate pair
match self.materialize_pair(&subject, &predicate).await { match self.materialize_pair(subject, predicate).await {
Ok(Some(view)) => { Ok(Some(view)) => {
report.views_updated += 1; report.views_updated += 1;
// Check escalation policies // Check escalation policies
@ -244,8 +246,8 @@ impl<S: KVStore + 'static> Materializer<S> {
materialized_at: now, materialized_at: now,
}; };
// Write to MV:{subject}:{predicate} // Write to {subject}\x00MV:{predicate}
let mv_key = Self::mv_key(subject, predicate); let mv_key = key_codec::mv_key(subject, predicate);
let serialized = stemedb_core::serde::serialize(&view) let serialized = stemedb_core::serde::serialize(&view)
.map_err(|e| QueryError::Deserialization(e.to_string()))?; .map_err(|e| QueryError::Deserialization(e.to_string()))?;
self.store.put(&mv_key, &serialized).await?; self.store.put(&mv_key, &serialized).await?;
@ -271,7 +273,7 @@ impl<S: KVStore + 'static> Materializer<S> {
subject: &str, subject: &str,
predicate: &str, predicate: &str,
) -> Result<Option<MaterializedView>> { ) -> Result<Option<MaterializedView>> {
let key = Self::mv_key(subject, predicate); let key = key_codec::mv_key(subject, predicate);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let view: MaterializedView = stemedb_core::serde::deserialize(&data) let view: MaterializedView = stemedb_core::serde::deserialize(&data)
@ -358,7 +360,7 @@ impl<S: KVStore + 'static> Materializer<S> {
let mut candidates = Vec::with_capacity(hash_list.len()); let mut candidates = Vec::with_capacity(hash_list.len());
for hash in hash_list { for hash in hash_list {
let key = format!("H:{}", hex::encode(hash)).into_bytes(); let key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&key).await? { if let Some(data) = self.store.get(&key).await? {
match stemedb_core::serde::deserialize::<Assertion>(&data) { match stemedb_core::serde::deserialize::<Assertion>(&data) {
Ok(assertion) => candidates.push(assertion), Ok(assertion) => candidates.push(assertion),
@ -376,34 +378,6 @@ impl<S: KVStore + 'static> Materializer<S> {
Ok(candidates) Ok(candidates)
} }
/// Parse a `SP:{subject}:{predicate}` key into its components.
///
/// Uses `rfind(':')` to split on the **last** colon, because ConceptPath
/// subjects contain `://` (e.g., `code://rust/citadeldb/auth/jwt`).
/// Predicates never contain `://`, so the last colon is always the separator.
///
/// Returns `None` if the key is malformed.
fn parse_sp_key(key: &[u8]) -> Option<(String, String)> {
let key_str = std::str::from_utf8(key).ok()?;
let without_prefix = key_str.strip_prefix("SP:")?;
// Split on the LAST colon — subjects may contain colons (e.g., scheme://)
let colon_pos = without_prefix.rfind(':')?;
if colon_pos == 0 || colon_pos == without_prefix.len() - 1 {
return None;
}
let subject = &without_prefix[..colon_pos];
let predicate = &without_prefix[colon_pos + 1..];
Some((subject.to_string(), predicate.to_string()))
}
/// Construct the MV key for a subject+predicate pair.
fn mv_key(subject: &str, predicate: &str) -> Vec<u8> {
format!("{}{}:{}", MV_PREFIX, subject, predicate).into_bytes()
}
/// Check if any escalation policies should trigger for this materialized view. /// Check if any escalation policies should trigger for this materialized view.
/// ///
/// If a policy triggers, write an escalation event to storage. /// If a policy triggers, write an escalation event to storage.

View File

@ -3,7 +3,7 @@ use stemedb_core::testing::{self, AssertionBuilder};
use stemedb_core::types::{EscalationLevel, EscalationPolicy, ObjectValue, Vote}; use stemedb_core::types::{EscalationLevel, EscalationPolicy, ObjectValue, Vote};
use stemedb_lens::VoteAwareConsensusLens; use stemedb_lens::VoteAwareConsensusLens;
use stemedb_storage::{ use stemedb_storage::{
EscalationStore, GenericEscalationStore, GenericVoteStore, SledStore, VoteStore, key_codec, EscalationStore, GenericEscalationStore, GenericVoteStore, HybridStore, VoteStore,
}; };
use tokio::sync::Notify; use tokio::sync::Notify;
@ -16,17 +16,17 @@ fn create_assertion(subject: &str, predicate: &str, value: f64, timestamp: u64)
.build() .build()
} }
/// Store an assertion at H:{hash} and update indexes. /// Store an assertion at {subject}\x00H:{hash} and update indexes.
async fn store_assertion(store: &Arc<SledStore>, assertion: &Assertion) -> [u8; 32] { async fn store_assertion(store: &Arc<HybridStore>, assertion: &Assertion) -> [u8; 32] {
use stemedb_storage::IndexStore; use stemedb_storage::IndexStore;
let bytes = stemedb_core::serde::serialize(assertion).expect("serialize"); let bytes = stemedb_core::serde::serialize(assertion).expect("serialize");
let hash = blake3::hash(&bytes); let hash = blake3::hash(&bytes);
let key = format!("H:{}", hash.to_hex()).into_bytes(); let assertion_hash: [u8; 32] = *hash.as_bytes();
let key = key_codec::assertion_key(&assertion.subject, &hash.to_hex());
store.put(&key, &bytes).await.expect("put"); store.put(&key, &bytes).await.expect("put");
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
let assertion_hash: [u8; 32] = *hash.as_bytes();
index_store index_store
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash) .add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
.await .await
@ -41,7 +41,7 @@ fn create_vote(assertion_hash: [u8; 32], agent_id: [u8; 32], weight: f32, timest
#[tokio::test] #[tokio::test]
async fn test_empty_store() { async fn test_empty_store() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store, Box::new(lens)); let materializer = Materializer::new(store, Box::new(lens));
@ -55,7 +55,7 @@ async fn test_empty_store() {
#[tokio::test] #[tokio::test]
async fn test_single_assertion_materialized() { async fn test_single_assertion_materialized() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store.clone(), Box::new(lens)); let materializer = Materializer::new(store.clone(), Box::new(lens));
@ -86,7 +86,7 @@ async fn test_single_assertion_materialized() {
#[tokio::test] #[tokio::test]
async fn test_vote_weighted_winner() { async fn test_vote_weighted_winner() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
let materializer = Materializer::new(store.clone(), Box::new(lens)); let materializer = Materializer::new(store.clone(), Box::new(lens));
@ -98,9 +98,9 @@ async fn test_vote_weighted_winner() {
let hash2 = store_assertion(&store, &a2).await; let hash2 = store_assertion(&store, &a2).await;
// Give a2 more votes // Give a2 more votes
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.3, 2000)).await.expect("put"); vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.3, 2000), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.8, 2001)).await.expect("put"); vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.8, 2001), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash2, [30u8; 32], 0.7, 2002)).await.expect("put"); vote_store.put_vote(&create_vote(hash2, [30u8; 32], 0.7, 2002), "Tesla").await.expect("put");
// Materialize // Materialize
let report = materializer.step().await.expect("step"); let report = materializer.step().await.expect("step");
@ -119,7 +119,7 @@ async fn test_vote_weighted_winner() {
#[tokio::test] #[tokio::test]
async fn test_multiple_pairs_materialized() { async fn test_multiple_pairs_materialized() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store.clone(), Box::new(lens)); let materializer = Materializer::new(store.clone(), Box::new(lens));
@ -145,7 +145,7 @@ async fn test_multiple_pairs_materialized() {
#[tokio::test] #[tokio::test]
async fn test_idempotent_materialization() { async fn test_idempotent_materialization() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store.clone(), Box::new(lens)); let materializer = Materializer::new(store.clone(), Box::new(lens));
@ -163,7 +163,7 @@ async fn test_idempotent_materialization() {
#[tokio::test] #[tokio::test]
async fn test_no_mv_for_nonexistent_pair() { async fn test_no_mv_for_nonexistent_pair() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store, Box::new(lens)); let materializer = Materializer::new(store, Box::new(lens));
@ -174,26 +174,21 @@ async fn test_no_mv_for_nonexistent_pair() {
#[tokio::test] #[tokio::test]
async fn test_parse_sp_key() { async fn test_parse_sp_key() {
// Valid key // Valid key (key_codec format: {subject}\x00SP:{predicate})
let key = b"SP:Tesla:revenue"; let key = key_codec::subject_predicate_key("Tesla", "revenue");
let result = Materializer::<SledStore>::parse_sp_key(key); let result = key_codec::extract_sp_key(&key);
assert_eq!(result, Some(("Tesla".to_string(), "revenue".to_string()))); assert_eq!(result, Some(("Tesla".to_string(), "revenue".to_string())));
// Missing predicate // Wrong prefix (subject index key, not SP: key)
let key = b"SP:Tesla"; let key = key_codec::subject_index_key("Tesla");
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none()); assert!(key_codec::extract_sp_key(&key).is_none());
// Empty subject // ConceptPath subject with :// in scheme
let key = b"SP::revenue"; let key = key_codec::subject_predicate_key(
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none()); "code://rust/citadeldb/auth/jwt/audience_validation",
"config_value",
// Wrong prefix );
let key = b"S:Tesla"; let result = key_codec::extract_sp_key(&key);
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
// ConceptPath subject with :// in scheme — must split on LAST colon
let key = b"SP:code://rust/citadeldb/auth/jwt/audience_validation:config_value";
let result = Materializer::<SledStore>::parse_sp_key(key);
assert_eq!( assert_eq!(
result, result,
Some(( Some((
@ -203,21 +198,18 @@ async fn test_parse_sp_key() {
); );
// ConceptPath with multiple scheme-like colons // ConceptPath with multiple scheme-like colons
let key = b"SP:rfc://7519/jwt/audience_validation:must_validate"; let key =
let result = Materializer::<SledStore>::parse_sp_key(key); key_codec::subject_predicate_key("rfc://7519/jwt/audience_validation", "must_validate");
let result = key_codec::extract_sp_key(&key);
assert_eq!( assert_eq!(
result, result,
Some(("rfc://7519/jwt/audience_validation".to_string(), "must_validate".to_string(),)) Some(("rfc://7519/jwt/audience_validation".to_string(), "must_validate".to_string(),))
); );
// Empty predicate after ConceptPath subject
let key = b"SP:code://rust/citadeldb/auth/jwt:";
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
} }
#[tokio::test] #[tokio::test]
async fn test_materialize_pair_directly() { async fn test_materialize_pair_directly() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Materializer::new(store.clone(), Box::new(lens)); let materializer = Materializer::new(store.clone(), Box::new(lens));
@ -242,13 +234,17 @@ async fn test_materialize_pair_directly() {
#[tokio::test] #[tokio::test]
async fn test_mv_key_construction() { async fn test_mv_key_construction() {
let key = Materializer::<SledStore>::mv_key("Tesla", "revenue"); let key = key_codec::mv_key("Tesla", "revenue");
assert_eq!(key, b"MV:Tesla:revenue"); // key_codec format: {subject}\x00MV:{predicate}
let mut expected = b"Tesla".to_vec();
expected.push(0x00);
expected.extend_from_slice(b"MV:revenue");
assert_eq!(key, expected);
} }
#[tokio::test] #[tokio::test]
async fn test_run_notified_triggers_on_notify() { async fn test_run_notified_triggers_on_notify() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(vote_store); let lens = VoteAwareConsensusLens::new(vote_store);
let materializer = Arc::new(Materializer::new(store.clone(), Box::new(lens))); let materializer = Arc::new(Materializer::new(store.clone(), Box::new(lens)));
@ -285,7 +281,7 @@ async fn test_run_notified_triggers_on_notify() {
#[tokio::test] #[tokio::test]
async fn test_escalation_triggers_on_high_conflict() { async fn test_escalation_triggers_on_high_conflict() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -311,8 +307,8 @@ async fn test_escalation_triggers_on_high_conflict() {
let hash2 = store_assertion(&store, &a2).await; let hash2 = store_assertion(&store, &a2).await;
// Give both some votes (not relevant for conflict, but for resolution) // Give both some votes (not relevant for conflict, but for resolution)
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put"); vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put"); vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Tesla").await.expect("put");
// Materialize // Materialize
let report = materializer.step().await.expect("step"); let report = materializer.step().await.expect("step");
@ -340,7 +336,7 @@ async fn test_escalation_triggers_on_high_conflict() {
#[tokio::test] #[tokio::test]
async fn test_escalation_does_not_trigger_on_low_conflict() { async fn test_escalation_does_not_trigger_on_low_conflict() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -363,8 +359,8 @@ async fn test_escalation_does_not_trigger_on_low_conflict() {
let hash2 = store_assertion(&store, &a2).await; let hash2 = store_assertion(&store, &a2).await;
// Skewed votes create low conflict // Skewed votes create low conflict
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.2, 2000)).await.expect("put"); vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.2, 2000), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.9, 2001)).await.expect("put"); vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.9, 2001), "Tesla").await.expect("put");
// Materialize // Materialize
let report = materializer.step().await.expect("step"); let report = materializer.step().await.expect("step");
@ -379,7 +375,7 @@ async fn test_escalation_does_not_trigger_on_low_conflict() {
#[tokio::test] #[tokio::test]
async fn test_escalation_predicate_pattern_matching() { async fn test_escalation_predicate_pattern_matching() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -403,8 +399,14 @@ async fn test_escalation_predicate_pattern_matching() {
a2.confidence = 0.9; // High confidence a2.confidence = 0.9; // High confidence
let hash1 = store_assertion(&store, &a1).await; let hash1 = store_assertion(&store, &a1).await;
let hash2 = store_assertion(&store, &a2).await; let hash2 = store_assertion(&store, &a2).await;
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put"); vote_store
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put"); .put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Semaglutide")
.await
.expect("put");
vote_store
.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Semaglutide")
.await
.expect("put");
let mut a3 = create_assertion("Tesla", "revenue", 96.7, 1000); let mut a3 = create_assertion("Tesla", "revenue", 96.7, 1000);
a3.confidence = 0.3; // Low confidence a3.confidence = 0.3; // Low confidence
@ -412,8 +414,8 @@ async fn test_escalation_predicate_pattern_matching() {
a4.confidence = 1.0; // High confidence a4.confidence = 1.0; // High confidence
let hash3 = store_assertion(&store, &a3).await; let hash3 = store_assertion(&store, &a3).await;
let hash4 = store_assertion(&store, &a4).await; let hash4 = store_assertion(&store, &a4).await;
vote_store.put_vote(&create_vote(hash3, [30u8; 32], 0.5, 2002)).await.expect("put"); vote_store.put_vote(&create_vote(hash3, [30u8; 32], 0.5, 2002), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash4, [40u8; 32], 0.5, 2003)).await.expect("put"); vote_store.put_vote(&create_vote(hash4, [40u8; 32], 0.5, 2003), "Tesla").await.expect("put");
// Materialize // Materialize
let report = materializer.step().await.expect("step"); let report = materializer.step().await.expect("step");
@ -429,7 +431,7 @@ async fn test_escalation_predicate_pattern_matching() {
#[tokio::test] #[tokio::test]
async fn test_no_escalation_without_store() { async fn test_no_escalation_without_store() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -441,8 +443,8 @@ async fn test_no_escalation_without_store() {
let a2 = create_assertion("Tesla", "revenue", 100.0, 1100); let a2 = create_assertion("Tesla", "revenue", 100.0, 1100);
let hash1 = store_assertion(&store, &a1).await; let hash1 = store_assertion(&store, &a1).await;
let hash2 = store_assertion(&store, &a2).await; let hash2 = store_assertion(&store, &a2).await;
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put"); vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Tesla").await.expect("put");
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put"); vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Tesla").await.expect("put");
// Materialize // Materialize
let report = materializer.step().await.expect("step"); let report = materializer.step().await.expect("step");

View File

@ -37,7 +37,7 @@ use stemedb_core::types::{ConflictAnalysis, EntityId, RelationId};
use stemedb_lens::{AnalysisLens, SkepticLens}; use stemedb_lens::{AnalysisLens, SkepticLens};
use stemedb_storage::trust_rank_store::TrustRankStore; use stemedb_storage::trust_rank_store::TrustRankStore;
use stemedb_storage::vote_store::VoteStore; use stemedb_storage::vote_store::VoteStore;
use stemedb_storage::{GenericIndexStore, IndexStore, KVStore}; use stemedb_storage::{key_codec, GenericIndexStore, IndexStore, KVStore};
use tracing::instrument; use tracing::instrument;
/// A "Trust but Verify" view that shows disagreement instead of hiding it. /// A "Trust but Verify" view that shows disagreement instead of hiding it.
@ -96,7 +96,7 @@ where
// Load all assertions // Load all assertions
let mut candidates = Vec::with_capacity(hash_list.len()); let mut candidates = Vec::with_capacity(hash_list.len());
for hash in hash_list { for hash in hash_list {
let key = format!("H:{}", hex::encode(hash)).into_bytes(); let key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&key).await? { if let Some(data) = self.store.get(&key).await? {
if let Ok(assertion) = stemedb_core::serde::deserialize(&data) { if let Ok(assertion) = stemedb_core::serde::deserialize(&data) {
candidates.push(assertion); candidates.push(assertion);
@ -129,11 +129,11 @@ mod tests {
use super::*; use super::*;
use stemedb_core::testing::AssertionBuilder; use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::ResolutionStatus; use stemedb_core::types::ResolutionStatus;
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, SledStore}; use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, HybridStore};
async fn store_assertion( async fn store_assertion(
store: &Arc<SledStore>, store: &Arc<HybridStore>,
index_store: &GenericIndexStore<Arc<SledStore>>, index_store: &GenericIndexStore<Arc<HybridStore>>,
subject: &str, subject: &str,
predicate: &str, predicate: &str,
value: f64, value: f64,
@ -148,7 +148,7 @@ mod tests {
let bytes = stemedb_core::serde::serialize(&assertion).expect("serialize"); let bytes = stemedb_core::serde::serialize(&assertion).expect("serialize");
let hash = blake3::hash(&bytes); let hash = blake3::hash(&bytes);
let key = format!("H:{}", hash.to_hex()).into_bytes(); let key = key_codec::assertion_key(subject, &hash.to_hex());
store.put(&key, &bytes).await.expect("put"); store.put(&key, &bytes).await.expect("put");
let assertion_hash: [u8; 32] = *hash.as_bytes(); let assertion_hash: [u8; 32] = *hash.as_bytes();
@ -157,9 +157,9 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_empty() { async fn test_resolve_empty() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let vote_store = Arc::new(GenericVoteStore::new((*store).clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone())); let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
let resolver = SkepticResolver::new(store, vote_store, trust_store); let resolver = SkepticResolver::new(store, vote_store, trust_store);
let result = resolver.resolve("NonExistent", "predicate").await.expect("resolve"); let result = resolver.resolve("NonExistent", "predicate").await.expect("resolve");
@ -168,13 +168,13 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_single_claim() { async fn test_resolve_single_claim() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await; store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
let vote_store = Arc::new(GenericVoteStore::new((*store).clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone())); let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
let resolver = SkepticResolver::new(store, vote_store, trust_store); let resolver = SkepticResolver::new(store, vote_store, trust_store);
let result = resolver.resolve("Drug", "effect").await.expect("resolve"); let result = resolver.resolve("Drug", "effect").await.expect("resolve");
@ -189,15 +189,15 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_contested_claims() { async fn test_resolve_contested_claims() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
// Add two conflicting claims with equal weight // Add two conflicting claims with equal weight
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.5).await; store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.5).await;
store_assertion(&store, &index_store, "Drug", "effect", 200.0, 0.5).await; store_assertion(&store, &index_store, "Drug", "effect", 200.0, 0.5).await;
let vote_store = Arc::new(GenericVoteStore::new((*store).clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone())); let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
let resolver = SkepticResolver::new(store, vote_store, trust_store); let resolver = SkepticResolver::new(store, vote_store, trust_store);
let result = resolver.resolve("Drug", "effect").await.expect("resolve"); let result = resolver.resolve("Drug", "effect").await.expect("resolve");
@ -211,13 +211,13 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_includes_computed_at() { async fn test_resolve_includes_computed_at() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await; store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
let vote_store = Arc::new(GenericVoteStore::new((*store).clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone())); let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
let resolver = SkepticResolver::new(store, vote_store, trust_store); let resolver = SkepticResolver::new(store, vote_store, trust_store);
let result = resolver.resolve("Drug", "effect").await.expect("resolve"); let result = resolver.resolve("Drug", "effect").await.expect("resolve");

View File

@ -17,7 +17,7 @@ use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry}; use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry};
use stemedb_ingest::worker::{serialize_assertion, IngestWorker}; use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
use stemedb_query::{Query, QueryEngine}; use stemedb_query::{Query, QueryEngine};
use stemedb_storage::{KVStore, SledStore}; use stemedb_storage::{key_codec, HybridStore, KVStore};
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tempfile::tempdir; use tempfile::tempdir;
use tokio::sync::Mutex; use tokio::sync::Mutex;
@ -100,15 +100,27 @@ async fn test_e2e_decay_reduces_old_confidence() {
journal.append(serialize_assertion(&new_assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&new_assertion).expect("ser")).expect("append");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker"); let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
worker.step().await.expect("step 1"); worker.step().await.expect("step 1");
worker.step().await.expect("step 2"); worker.step().await.expect("step 2");
// Verify both assertions are stored // Verify both assertions are stored (check via subject-scoped assertion keys)
let h_entries = store.scan_prefix(b"H:").await.expect("scan"); let old_hash =
assert_eq!(h_entries.len(), 2, "should have two assertions"); *blake3::hash(&stemedb_core::serde::serialize(&old_assertion).expect("ser")).as_bytes();
let new_hash =
*blake3::hash(&stemedb_core::serde::serialize(&new_assertion).expect("ser")).as_bytes();
let old_key = key_codec::assertion_key("Semaglutide", &hex::encode(old_hash));
let new_key = key_codec::assertion_key("Semaglutide", &hex::encode(new_hash));
assert!(
store.get(&old_key).await.expect("get old").is_some(),
"old assertion should be stored"
);
assert!(
store.get(&new_key).await.expect("get new").is_some(),
"new assertion should be stored"
);
// Query WITHOUT decay: old assertion wins (0.95 > 0.6) // Query WITHOUT decay: old assertion wins (0.95 > 0.6)
let engine = QueryEngine::new(store.clone()); let engine = QueryEngine::new(store.clone());

View File

@ -23,7 +23,7 @@ use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry
use stemedb_ingest::worker::{serialize_assertion, IngestWorker}; use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
use stemedb_lens::{RecencyLens, SyncLensWrapper, VoteAwareConsensusLens}; use stemedb_lens::{RecencyLens, SyncLensWrapper, VoteAwareConsensusLens};
use stemedb_query::{Materializer, Query, QueryEngine}; use stemedb_query::{Materializer, Query, QueryEngine};
use stemedb_storage::{GenericVoteStore, KVStore, SledStore, VoteStore}; use stemedb_storage::{key_codec, GenericVoteStore, HybridStore, KVStore, VoteStore};
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tempfile::tempdir; use tempfile::tempdir;
use tokio::sync::{Mutex, Notify}; use tokio::sync::{Mutex, Notify};
@ -114,7 +114,7 @@ async fn test_e2e_write_materialize_read() {
// === Step 2: Run IngestWorker to process WAL === // === Step 2: Run IngestWorker to process WAL ===
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
let notify = Arc::new(Notify::new()); let notify = Arc::new(Notify::new());
let mut worker = IngestWorker::new(journal.clone(), store.clone()) let mut worker = IngestWorker::new(journal.clone(), store.clone())
@ -125,15 +125,15 @@ async fn test_e2e_write_materialize_read() {
let bytes_processed = worker.step().await.expect("ingest step"); let bytes_processed = worker.step().await.expect("ingest step");
assert!(bytes_processed > 0, "should have processed data from WAL"); assert!(bytes_processed > 0, "should have processed data from WAL");
// Verify assertion stored at H:{hash} // Verify assertion stored at {subject}\x00H:{hash}
let assertion_hash = compute_assertion_hash(&assertion); let assertion_hash = compute_assertion_hash(&assertion);
let h_key = format!("H:{}", hex::encode(assertion_hash)).into_bytes(); let h_key = key_codec::assertion_key("Tesla_Inc", &hex::encode(assertion_hash));
let stored = store.get(&h_key).await.expect("get assertion"); let stored = store.get(&h_key).await.expect("get assertion");
assert!(stored.is_some(), "assertion should be stored at H: key"); assert!(stored.is_some(), "assertion should be stored at H: key");
// Verify compound index SP:{subject}:{predicate} created // Verify compound index {subject}\x00SP:{predicate} created
let sp_key = b"SP:Tesla_Inc:has_revenue"; let sp_prefix = key_codec::subject_predicate_scan_prefix("Tesla_Inc");
let sp_entries = store.scan_prefix(sp_key).await.expect("scan SP: prefix"); let sp_entries = store.scan_prefix(&sp_prefix).await.expect("scan SP: prefix");
assert_eq!(sp_entries.len(), 1, "should have one SP: index entry"); assert_eq!(sp_entries.len(), 1, "should have one SP: index entry");
// === Step 3: Run Materializer === // === Step 3: Run Materializer ===
@ -145,9 +145,9 @@ async fn test_e2e_write_materialize_read() {
assert_eq!(report.pairs_scanned, 1, "should scan one subject+predicate pair"); assert_eq!(report.pairs_scanned, 1, "should scan one subject+predicate pair");
assert_eq!(report.views_updated, 1, "should update one materialized view"); assert_eq!(report.views_updated, 1, "should update one materialized view");
// Verify MV:{subject}:{predicate} written // Verify {subject}\x00MV:{predicate} written
let mv_key = b"MV:Tesla_Inc:has_revenue"; let mv_key = key_codec::mv_key("Tesla_Inc", "has_revenue");
let mv_data = store.get(mv_key).await.expect("get MV"); let mv_data = store.get(&mv_key).await.expect("get MV");
assert!(mv_data.is_some(), "materialized view should exist"); assert!(mv_data.is_some(), "materialized view should exist");
// === Step 4: Query via QueryEngine === // === Step 4: Query via QueryEngine ===
@ -186,7 +186,7 @@ async fn test_e2e_vote_consensus() {
// Ingest both // Ingest both
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
let mut worker = let mut worker =
IngestWorker::new(journal.clone(), store.clone()).await.expect("create worker"); IngestWorker::new(journal.clone(), store.clone()).await.expect("create worker");
@ -197,25 +197,28 @@ async fn test_e2e_vote_consensus() {
let bytes2 = worker.step().await.expect("step 2"); let bytes2 = worker.step().await.expect("step 2");
assert!(bytes2 > 0, "should ingest second assertion"); assert!(bytes2 > 0, "should ingest second assertion");
// Compute hashes for both assertions
let hash_a = compute_assertion_hash(&assertion_a);
let hash_b = compute_assertion_hash(&assertion_b);
// Verify both are stored // Verify both are stored
let h_entries = store.scan_prefix(b"H:").await.expect("scan H:"); let h_key_a = key_codec::assertion_key("Semaglutide", &hex::encode(hash_a));
assert_eq!(h_entries.len(), 2, "should have two assertions"); let h_key_b = key_codec::assertion_key("Semaglutide", &hex::encode(hash_b));
assert!(store.get(&h_key_a).await.expect("get a").is_some(), "assertion_a should be stored");
assert!(store.get(&h_key_b).await.expect("get b").is_some(), "assertion_b should be stored");
// Add votes via VoteStore // Add votes via VoteStore
let vote_store = Arc::new(GenericVoteStore::new(store.clone())); let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let hash_a = compute_assertion_hash(&assertion_a);
let hash_b = compute_assertion_hash(&assertion_b);
// assertion_a gets 3 votes (total weight = 2.7) // assertion_a gets 3 votes (total weight = 2.7)
for i in 0..3 { for i in 0..3 {
let vote = create_vote(hash_a, i, 0.9, 2000 + i as u64); let vote = create_vote(hash_a, i, 0.9, 2000 + i as u64);
vote_store.put_vote(&vote).await.expect("put vote for a"); vote_store.put_vote(&vote, "Semaglutide").await.expect("put vote for a");
} }
// assertion_b gets 1 vote (total weight = 0.2) // assertion_b gets 1 vote (total weight = 0.2)
let vote_b = create_vote(hash_b, 10, 0.2, 2100); let vote_b = create_vote(hash_b, 10, 0.2, 2100);
vote_store.put_vote(&vote_b).await.expect("put vote for b"); vote_store.put_vote(&vote_b, "Semaglutide").await.expect("put vote for b");
// Materialize with VoteAwareConsensusLens // Materialize with VoteAwareConsensusLens
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store)); let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
@ -258,7 +261,7 @@ async fn test_e2e_update_winner() {
journal.append(serialize_assertion(&assertion_v1).expect("ser")).expect("append v1"); journal.append(serialize_assertion(&assertion_v1).expect("ser")).expect("append v1");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
// Ingest v1 // Ingest v1
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker"); let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
@ -294,8 +297,12 @@ async fn test_e2e_update_winner() {
assert!(bytes2 > 0, "should process new assertion"); assert!(bytes2 > 0, "should process new assertion");
// Verify both assertions are now stored // Verify both assertions are now stored
let h_entries = store.scan_prefix(b"H:").await.expect("scan"); let hash_v1 = compute_assertion_hash(&assertion_v1);
assert_eq!(h_entries.len(), 2, "should have two assertions"); let hash_v2 = compute_assertion_hash(&assertion_v2);
let key_v1 = key_codec::assertion_key("Apple_Inc", &hex::encode(hash_v1));
let key_v2 = key_codec::assertion_key("Apple_Inc", &hex::encode(hash_v2));
assert!(store.get(&key_v1).await.expect("get v1").is_some(), "v1 should be stored");
assert!(store.get(&key_v2).await.expect("get v2").is_some(), "v2 should be stored");
// Re-materialize // Re-materialize
let lens2 = SyncLensWrapper(RecencyLens); let lens2 = SyncLensWrapper(RecencyLens);
@ -334,7 +341,7 @@ async fn test_e2e_cursor_persistence() {
journal.append(serialize_assertion(&a3).expect("ser")).expect("append"); journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
// Worker 1: Process first 2 assertions // Worker 1: Process first 2 assertions
let mut worker1 = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker1"); let mut worker1 = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker1");
@ -342,8 +349,12 @@ async fn test_e2e_cursor_persistence() {
worker1.step().await.expect("step 2"); worker1.step().await.expect("step 2");
// Verify 2 assertions stored // Verify 2 assertions stored
let h_entries = store.scan_prefix(b"H:").await.expect("scan"); let hash1 = compute_assertion_hash(&a1);
assert_eq!(h_entries.len(), 2, "worker1 should have processed 2 assertions"); let hash2 = compute_assertion_hash(&a2);
let key1 = key_codec::assertion_key("Entity_A", &hex::encode(hash1));
let key2 = key_codec::assertion_key("Entity_B", &hex::encode(hash2));
assert!(store.get(&key1).await.expect("get a1").is_some(), "a1 should be stored");
assert!(store.get(&key2).await.expect("get a2").is_some(), "a2 should be stored");
// Drop worker1, simulate restart // Drop worker1, simulate restart
drop(worker1); drop(worker1);
@ -358,8 +369,9 @@ async fn test_e2e_cursor_persistence() {
assert_eq!(steps, 1, "worker2 should only process 1 new assertion"); assert_eq!(steps, 1, "worker2 should only process 1 new assertion");
// Verify all 3 assertions now stored // Verify all 3 assertions now stored
let h_entries = store.scan_prefix(b"H:").await.expect("scan"); let hash3 = compute_assertion_hash(&a3);
assert_eq!(h_entries.len(), 3, "should have all 3 assertions"); let key3 = key_codec::assertion_key("Entity_C", &hex::encode(hash3));
assert!(store.get(&key3).await.expect("get a3").is_some(), "a3 should be stored");
} }
/// Test: Event-driven materialization via Notify. /// Test: Event-driven materialization via Notify.
@ -378,7 +390,7 @@ async fn test_e2e_notify_integration() {
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append"); journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
let journal = Arc::new(Mutex::new(journal)); let journal = Arc::new(Mutex::new(journal));
let store = Arc::new(SledStore::open(&db_dir).expect("open store")); let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
let notify = Arc::new(Notify::new()); let notify = Arc::new(Notify::new());
// Track if notification was received // Track if notification was received

View File

@ -11,7 +11,7 @@ use tracing::debug;
use crate::agent::Agent; use crate::agent::Agent;
use crate::helpers::{ use crate::helpers::{
verify_assertion_text, wait_until_ingested, write_assertion_to_wal, CURSOR_KEY, cursor_key, verify_assertion_text, wait_until_ingested, write_assertion_to_wal,
}; };
use crate::types::{ErrorKind, SimulationError, SimulationResult}; use crate::types::{ErrorKind, SimulationError, SimulationResult};
@ -48,7 +48,7 @@ pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
); );
// Check cursor state before writing // Check cursor state before writing
let cursor_before = match store.get(CURSOR_KEY).await { let cursor_before = match store.get(&cursor_key()).await {
Ok(Some(bytes)) => { Ok(Some(bytes)) => {
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) { if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
u64::from_le_bytes(arr) u64::from_le_bytes(arr)

View File

@ -5,7 +5,7 @@ use std::time::{Duration, Instant};
use stemedb_core::serde::serialize; use stemedb_core::serde::serialize;
use stemedb_core::types::{Assertion, Hash, Vote}; use stemedb_core::types::{Assertion, Hash, Vote};
use stemedb_ingest::{serialize_assertion, serialize_vote}; use stemedb_ingest::{serialize_assertion, serialize_vote};
use stemedb_storage::KVStore; use stemedb_storage::{key_codec, KVStore};
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tokio::sync::Mutex; use tokio::sync::Mutex;
use tracing::debug; use tracing::debug;
@ -68,7 +68,10 @@ pub(crate) fn compute_assertion_hash(assertion: &Assertion) -> Hash {
} }
/// The cursor key used by the ingestor to track its progress. /// The cursor key used by the ingestor to track its progress.
pub(crate) const CURSOR_KEY: &[u8] = b"__CURSOR__:ingest"; /// Uses key_codec format: `\x00META:cursor:ingest`
pub(crate) fn cursor_key() -> Vec<u8> {
key_codec::cursor_key()
}
/// Wait until the ingestor cursor reaches or exceeds the target offset. /// Wait until the ingestor cursor reaches or exceeds the target offset.
/// ///
@ -96,7 +99,7 @@ pub(crate) async fn wait_until_ingested<S: KVStore>(
loop { loop {
// Read current cursor position // Read current cursor position
if let Ok(Some(bytes)) = store.get(CURSOR_KEY).await { if let Ok(Some(bytes)) = store.get(&cursor_key()).await {
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) { if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
let cursor = u64::from_le_bytes(arr); let cursor = u64::from_le_bytes(arr);
// Use > (strictly greater) because journal.append() returns the START offset // Use > (strictly greater) because journal.append() returns the START offset

View File

@ -5,7 +5,7 @@ use std::sync::Arc;
use stemedb_core::types::{LifecycleStage, ObjectValue}; use stemedb_core::types::{LifecycleStage, ObjectValue};
use stemedb_ingest::Ingestor; use stemedb_ingest::Ingestor;
use stemedb_query::{Query, QueryEngine}; use stemedb_query::{Query, QueryEngine};
use stemedb_storage::SledStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;
use tokio::sync::Mutex; use tokio::sync::Mutex;
use tracing::{debug, info, warn}; use tracing::{debug, info, warn};
@ -61,7 +61,7 @@ pub async fn run_simulation(
.map_err(|e| SimulationSetupError::JournalOpen(e.to_string()))?, .map_err(|e| SimulationSetupError::JournalOpen(e.to_string()))?,
)); ));
let store = Arc::new( let store = Arc::new(
SledStore::open(temp_db_dir.path()) HybridStore::open(temp_db_dir.path())
.map_err(|e| SimulationSetupError::StoreOpen(e.to_string()))?, .map_err(|e| SimulationSetupError::StoreOpen(e.to_string()))?,
); );

View File

@ -10,18 +10,27 @@ workspace = true
[dependencies] [dependencies]
stemedb-core = { path = "../stemedb-core" } stemedb-core = { path = "../stemedb-core" }
sled = "0.34" fjall = "2"
redb = "2"
dashmap = "6"
tempfile = "3.10"
thiserror = "1.0" thiserror = "1.0"
tracing = "0.1" tracing = "0.1"
async-trait = "0.1" async-trait = "0.1"
blake3 = "1.5" blake3 = "1.5"
hex = "0.4" hex = "0.4"
memchr = "2"
rkyv = { version = "0.7", features = ["validation"] } rkyv = { version = "0.7", features = ["validation"] }
# HNSW vector index for k-NN similarity search # HNSW vector index for k-NN similarity search
hnsw_rs = "0.3" hnsw_rs = "0.3"
# Thread-safe read-write locks for index access # Thread-safe read-write locks for index access
parking_lot = "0.12" parking_lot = "0.12"
tokio = { version = "1", features = ["sync", "rt"] }
[dev-dependencies] [dev-dependencies]
tokio = { version = "1", features = ["macros", "rt"] } tokio = { version = "1", features = ["macros", "rt", "rt-multi-thread"] }
tempfile = "3.10" criterion = { version = "0.5", features = ["html_reports", "async_tokio"] }
[[bench]]
name = "kv_store"
harness = false

View File

@ -0,0 +1,145 @@
#![allow(missing_docs, clippy::unwrap_used, clippy::expect_used)]
use criterion::{criterion_group, criterion_main, Criterion};
use stemedb_storage::key_codec;
use stemedb_storage::{HybridStore, KVStore};
use tokio::runtime::Runtime;
fn sequential_put(c: &mut Criterion) {
let rt = Runtime::new().expect("runtime");
let store = HybridStore::open_temp().expect("store");
c.bench_function("sequential_put_10k", |b| {
b.iter(|| {
rt.block_on(async {
for i in 0..10_000u64 {
let hash_hex = format!("bench_{}", i);
let key = key_codec::assertion_key("Bench", &hash_hex);
let value = format!("value_{}", i);
store.put(&key, value.as_bytes()).await.unwrap();
}
})
})
});
}
fn random_get(c: &mut Criterion) {
let rt = Runtime::new().expect("runtime");
let store = HybridStore::open_temp().expect("store");
// Pre-populate (read-heavy keys → redb via S: tag)
rt.block_on(async {
for i in 0..10_000u64 {
let key = key_codec::subject_predicate_key("Bench", &format!("pred_{}", i));
let value = format!("value_{}", i);
store.put(&key, value.as_bytes()).await.unwrap();
}
});
c.bench_function("random_get_10k", |b| {
b.iter(|| {
rt.block_on(async {
for i in 0..10_000u64 {
let key = key_codec::subject_predicate_key("Bench", &format!("pred_{}", i));
let _ = store.get(&key).await.unwrap();
}
})
})
});
}
fn prefix_scan(c: &mut Criterion) {
let rt = Runtime::new().expect("runtime");
let store = HybridStore::open_temp().expect("store");
// Pre-populate: 1K keys under "target", 9K under "other"
rt.block_on(async {
for i in 0..1_000u64 {
let key = key_codec::subject_predicate_key("target", &format!("pred_{}", i));
store.put(&key, b"matching").await.unwrap();
}
for i in 0..9_000u64 {
let key = key_codec::subject_predicate_key("other", &format!("pred_{}", i));
store.put(&key, b"non_matching").await.unwrap();
}
});
c.bench_function("prefix_scan_1k_of_10k", |b| {
b.iter(|| {
rt.block_on(async {
let prefix = key_codec::subject_scan_prefix("target");
let results = store.scan_prefix(&prefix).await.unwrap();
assert_eq!(results.len(), 1_000);
})
})
});
}
fn atomic_increment(c: &mut Criterion) {
let rt = Runtime::new().expect("runtime");
let store = HybridStore::open_temp().expect("store");
c.bench_function("atomic_increment_10k", |b| {
b.iter(|| {
rt.block_on(async {
for i in 0..10_000u64 {
let hash_hex = format!("counter_{}", i % 100);
let key = key_codec::vote_count_key("Bench", &hash_hex);
store.fetch_and_add_u64(&key, 1).await.unwrap();
}
})
})
});
}
fn mixed_workload(c: &mut Criterion) {
let rt = Runtime::new().expect("runtime");
let store = HybridStore::open_temp().expect("store");
// Pre-populate read-heavy keys
rt.block_on(async {
for i in 0..1_000u64 {
let key = key_codec::subject_predicate_key("mixed", &format!("pred_{}", i));
store.put(&key, b"initial_value").await.unwrap();
}
});
c.bench_function("mixed_70r_20w_10s", |b| {
b.iter(|| {
rt.block_on(async {
for i in 0..1_000u64 {
match i % 10 {
// 70% reads (redb path)
0..=6 => {
let key = key_codec::subject_predicate_key(
"mixed",
&format!("pred_{}", i % 1000),
);
let _ = store.get(&key).await.unwrap();
}
// 20% writes (fjall path)
7 | 8 => {
let key = key_codec::assertion_key("mixed", &format!("write_{}", i));
store.put(&key, b"new_value").await.unwrap();
}
// 10% scans (redb path)
_ => {
let prefix = key_codec::subject_scan_prefix("mixed");
let _ = store.scan_prefix(&prefix).await.unwrap();
}
}
}
})
})
});
}
criterion_group!(
benches,
sequential_put,
random_get,
prefix_scan,
atomic_increment,
mixed_workload
);
criterion_main!(benches);

View File

@ -8,8 +8,8 @@
//! //!
//! | Key Pattern | Value | Purpose | //! | Key Pattern | Value | Purpose |
//! |-------------|-------|---------| //! |-------------|-------|---------|
//! | `AUD:{query_id}` | Serialized QueryAudit | Individual audit records | //! | `\x00AUD:{query_id}` | Serialized QueryAudit | Individual audit records |
//! | `AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries | //! | `\x00AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries |
//! //!
//! # Design Philosophy //! # Design Philosophy
//! //!
@ -54,8 +54,8 @@ pub trait AuditStore: Send + Sync {
/// ///
/// This operation: /// This operation:
/// 1. Serializes the audit using rkyv /// 1. Serializes the audit using rkyv
/// 2. Stores at `AUD:{query_id}` /// 2. Stores at `\x00AUD:{query_id}`
/// 3. Creates agent index entry at `AUDA:{agent_id}:{timestamp}:{query_id}` /// 3. Creates agent index entry at `\x00AUDA:{agent_id}:{timestamp}:{query_id}`
/// ///
/// # Returns /// # Returns
/// The query_id for reference. /// The query_id for reference.
@ -89,7 +89,7 @@ pub trait AuditStore: Send + Sync {
/// List recent audit records across all agents. /// List recent audit records across all agents.
/// ///
/// Scans all `AUD:` keys and returns the most recent audits. /// Scans all `\x00AUD:` keys and returns the most recent audits.
/// ///
/// # Arguments /// # Arguments
/// * `limit` - Maximum number of records to return /// * `limit` - Maximum number of records to return
@ -105,7 +105,8 @@ pub trait AuditStore: Send + Sync {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc;
use stemedb_core::types::{ContributingAssertion, LifecycleStage}; use stemedb_core::types::{ContributingAssertion, LifecycleStage};
fn create_test_audit( fn create_test_audit(
@ -137,7 +138,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_put_and_get_audit() { async fn test_put_and_get_audit() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
let query_id = [10u8; 32]; let query_id = [10u8; 32];
@ -161,7 +162,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_audits_for_agent() { async fn test_get_audits_for_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
let agent1 = [1u8; 32]; let agent1 = [1u8; 32];
@ -201,7 +202,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_list_recent_audits() { async fn test_list_recent_audits() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
// Create audits with different timestamps // Create audits with different timestamps
@ -224,7 +225,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_audit_without_agent() { async fn test_audit_without_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
// Audit without agent_id (anonymous query) // Audit without agent_id (anonymous query)
@ -241,7 +242,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_has_audits_for_agent() { async fn test_has_audits_for_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
let agent1 = [1u8; 32]; let agent1 = [1u8; 32];
@ -262,7 +263,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_nonexistent_audit() { async fn test_get_nonexistent_audit() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
let nonexistent = [99u8; 32]; let nonexistent = [99u8; 32];
@ -273,7 +274,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_empty_agent_audits() { async fn test_empty_agent_audits() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let audit_store = GenericAuditStore::new(store); let audit_store = GenericAuditStore::new(store);
let agent = [1u8; 32]; let agent = [1u8; 32];

View File

@ -1,6 +1,7 @@
//! AuditStore implementation backed by a generic KVStore. //! AuditStore implementation backed by a generic KVStore.
use crate::error::Result; use crate::error::Result;
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use stemedb_core::types::{QueryAudit, QueryId}; use stemedb_core::types::{QueryAudit, QueryId};
@ -8,12 +9,6 @@ use tracing::{debug, instrument};
use super::AuditStore; use super::AuditStore;
/// Key prefix for individual audit records.
pub(crate) const AUDIT_PREFIX: &[u8] = b"AUD:";
/// Key prefix for agent-based temporal index.
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
pub(crate) const AGENT_AUDIT_PREFIX: &[u8] = b"AUDA:";
/// AuditStore implementation backed by a generic KVStore. /// AuditStore implementation backed by a generic KVStore.
/// ///
/// This implementation maintains an agent index for efficient temporal queries. /// This implementation maintains an agent index for efficient temporal queries.
@ -29,9 +24,8 @@ impl<S: KVStore> GenericAuditStore<S> {
/// Construct the key for an individual audit record. /// Construct the key for an individual audit record.
pub(crate) fn audit_key(query_id: &QueryId) -> Vec<u8> { pub(crate) fn audit_key(query_id: &QueryId) -> Vec<u8> {
let mut key = AUDIT_PREFIX.to_vec(); let query_hex = hex::encode(query_id);
key.extend_from_slice(&hex::encode(query_id).into_bytes()); key_codec::audit_key(&query_hex)
key
} }
/// Construct the agent index key. /// Construct the agent index key.
@ -40,26 +34,16 @@ impl<S: KVStore> GenericAuditStore<S> {
timestamp: u64, timestamp: u64,
query_id: &QueryId, query_id: &QueryId,
) -> Vec<u8> { ) -> Vec<u8> {
// Format: AUDA:{agent_hex}:{timestamp_be}:{query_hex}
// Using big-endian timestamp for lexicographic ordering
let agent_hex = hex::encode(agent_id); let agent_hex = hex::encode(agent_id);
let timestamp_hex = format!("{:016x}", timestamp); // Zero-padded hex for sorting let timestamp_hex = format!("{:016x}", timestamp);
let query_hex = hex::encode(query_id); let query_hex = hex::encode(query_id);
format!("AUDA:{}:{}:{}", agent_hex, timestamp_hex, query_hex).into_bytes() key_codec::audit_agent_index_key(&agent_hex, &timestamp_hex, &query_hex)
}
/// Construct the prefix for scanning an agent's audits from a timestamp.
#[allow(dead_code)] // Reserved for future optimized range queries
pub(crate) fn agent_scan_prefix(agent_id: &[u8; 32], from_timestamp: u64) -> Vec<u8> {
let agent_hex = hex::encode(agent_id);
let timestamp_hex = format!("{:016x}", from_timestamp);
format!("AUDA:{}:{}", agent_hex, timestamp_hex).into_bytes()
} }
/// Construct the prefix for scanning all audits for an agent. /// Construct the prefix for scanning all audits for an agent.
pub(crate) fn agent_full_prefix(agent_id: &[u8; 32]) -> Vec<u8> { pub(crate) fn agent_full_prefix(agent_id: &[u8; 32]) -> Vec<u8> {
let agent_hex = hex::encode(agent_id); let agent_hex = hex::encode(agent_id);
format!("AUDA:{}:", agent_hex).into_bytes() key_codec::audit_agent_prefix(&agent_hex)
} }
/// Serialize an audit using the canonical serde helpers. /// Serialize an audit using the canonical serde helpers.
@ -74,9 +58,13 @@ impl<S: KVStore> GenericAuditStore<S> {
/// Extract query_id from an agent index key. /// Extract query_id from an agent index key.
pub(crate) fn extract_query_id_from_key(key: &[u8]) -> Option<QueryId> { pub(crate) fn extract_query_id_from_key(key: &[u8]) -> Option<QueryId> {
// Key format: AUDA:{agent_hex}:{timestamp_hex}:{query_hex} // Key format: \x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}
let key_str = std::str::from_utf8(key).ok()?; let key_str = std::str::from_utf8(key).ok()?;
let parts: Vec<&str> = key_str.split(':').collect();
// Skip the leading \x00 if present
let key_content = key_str.strip_prefix('\x00').unwrap_or(key_str);
let parts: Vec<&str> = key_content.split(':').collect();
if parts.len() != 4 { if parts.len() != 4 {
return None; return None;
} }
@ -92,8 +80,13 @@ impl<S: KVStore> GenericAuditStore<S> {
/// Extract timestamp from an agent index key. /// Extract timestamp from an agent index key.
pub(crate) fn extract_timestamp_from_key(key: &[u8]) -> Option<u64> { pub(crate) fn extract_timestamp_from_key(key: &[u8]) -> Option<u64> {
// Key format: \x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}
let key_str = std::str::from_utf8(key).ok()?; let key_str = std::str::from_utf8(key).ok()?;
let parts: Vec<&str> = key_str.split(':').collect();
// Skip the leading \x00 if present
let key_content = key_str.strip_prefix('\x00').unwrap_or(key_str);
let parts: Vec<&str> = key_content.split(':').collect();
if parts.len() != 4 { if parts.len() != 4 {
return None; return None;
} }
@ -202,7 +195,8 @@ impl<S: KVStore + 'static> AuditStore for GenericAuditStore<S> {
#[instrument(skip(self), fields(limit))] #[instrument(skip(self), fields(limit))]
async fn list_recent_audits(&self, limit: usize) -> Result<Vec<QueryAudit>> { async fn list_recent_audits(&self, limit: usize) -> Result<Vec<QueryAudit>> {
let entries = self.store.scan_prefix(AUDIT_PREFIX).await?; let prefix = key_codec::audit_scan_prefix();
let entries = self.store.scan_prefix(&prefix).await?;
let mut audits = Vec::with_capacity(entries.len().min(limit)); let mut audits = Vec::with_capacity(entries.len().min(limit));

View File

@ -10,9 +10,9 @@ pub enum StorageError {
#[error("Storage IO error: {0}")] #[error("Storage IO error: {0}")]
Io(#[from] std::io::Error), Io(#[from] std::io::Error),
/// Error specific to the sled backend. /// Error from the underlying storage backend (fjall, redb, etc.).
#[error("Sled error: {0}")] #[error("Backend error: {0}")]
Sled(#[from] sled::Error), Backend(String),
/// Serialization/Deserialization error. /// Serialization/Deserialization error.
#[error("Serialization error: {0}")] #[error("Serialization error: {0}")]

View File

@ -4,15 +4,13 @@
//! time-range queries. External systems can poll for pending escalations and //! time-range queries. External systems can poll for pending escalations and
//! resolve them after review. //! resolve them after review.
use crate::key_codec;
use crate::{KVStore, Result, StorageError}; use crate::{KVStore, Result, StorageError};
use async_trait::async_trait; use async_trait::async_trait;
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::EscalationEvent; use stemedb_core::types::EscalationEvent;
use tracing::{debug, instrument}; use tracing::{debug, instrument};
/// Key prefix for escalation events.
const ESC_PREFIX: &[u8] = b"ESC:";
/// Storage trait for escalation events. /// Storage trait for escalation events.
/// ///
/// Provides operations for writing, reading, and resolving escalations triggered /// Provides operations for writing, reading, and resolving escalations triggered
@ -56,16 +54,13 @@ impl<S: KVStore> GenericEscalationStore<S> {
Self { store } Self { store }
} }
/// Construct the storage key for an escalation event.
///
/// Format: `ESC:{timestamp_nanos}:{id_hex}`
fn escalation_key(event: &EscalationEvent) -> Vec<u8> {
format!("ESC:{}:{}", event.timestamp, hex::encode(event.id)).into_bytes()
}
/// Parse a key into (timestamp, id). /// Parse a key into (timestamp, id).
///
/// Key format: `\x00ESC:{timestamp}:{id_hex}`
fn parse_key(key: &[u8]) -> Option<(u64, [u8; 32])> { fn parse_key(key: &[u8]) -> Option<(u64, [u8; 32])> {
let key_str = std::str::from_utf8(key).ok()?; let key_str = std::str::from_utf8(key).ok()?;
// Remove the leading \x00 if present
let key_str = key_str.strip_prefix('\x00').unwrap_or(key_str);
let parts: Vec<&str> = key_str.split(':').collect(); let parts: Vec<&str> = key_str.split(':').collect();
if parts.len() != 3 || parts[0] != "ESC" { if parts.len() != 3 || parts[0] != "ESC" {
return None; return None;
@ -88,7 +83,7 @@ impl<S: KVStore> GenericEscalationStore<S> {
impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> { impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[instrument(skip(self, event), fields(id = %hex::encode(event.id), subject = %event.subject, predicate = %event.predicate))] #[instrument(skip(self, event), fields(id = %hex::encode(event.id), subject = %event.subject, predicate = %event.predicate))]
async fn write_escalation(&self, event: &EscalationEvent) -> Result<()> { async fn write_escalation(&self, event: &EscalationEvent) -> Result<()> {
let key = Self::escalation_key(event); let key = key_codec::escalation_key(event.timestamp, &hex::encode(event.id));
let serialized = stemedb_core::serde::serialize(event) let serialized = stemedb_core::serde::serialize(event)
.map_err(|e| StorageError::Serialization(e.to_string()))?; .map_err(|e| StorageError::Serialization(e.to_string()))?;
@ -109,7 +104,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[instrument(skip(self))] #[instrument(skip(self))]
async fn get_escalations_since(&self, since: u64) -> Result<Vec<EscalationEvent>> { async fn get_escalations_since(&self, since: u64) -> Result<Vec<EscalationEvent>> {
// Scan all escalation keys and filter by timestamp // Scan all escalation keys and filter by timestamp
let entries = self.store.scan_prefix(ESC_PREFIX).await?; let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
let mut events = Vec::new(); let mut events = Vec::new();
for (key, data) in entries { for (key, data) in entries {
@ -138,7 +133,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[instrument(skip(self), fields(id = %hex::encode(id)))] #[instrument(skip(self), fields(id = %hex::encode(id)))]
async fn resolve_escalation(&self, id: &[u8; 32]) -> Result<bool> { async fn resolve_escalation(&self, id: &[u8; 32]) -> Result<bool> {
// Scan for the event with this ID // Scan for the event with this ID
let entries = self.store.scan_prefix(ESC_PREFIX).await?; let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
for (key, data) in entries { for (key, data) in entries {
if let Some((_timestamp, found_id)) = Self::parse_key(&key) { if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
@ -176,7 +171,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[instrument(skip(self))] #[instrument(skip(self))]
async fn get_pending_escalations(&self) -> Result<Vec<EscalationEvent>> { async fn get_pending_escalations(&self) -> Result<Vec<EscalationEvent>> {
let entries = self.store.scan_prefix(ESC_PREFIX).await?; let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
let mut events = Vec::new(); let mut events = Vec::new();
for (_key, data) in entries { for (_key, data) in entries {
@ -199,7 +194,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[instrument(skip(self), fields(id = %hex::encode(id)))] #[instrument(skip(self), fields(id = %hex::encode(id)))]
async fn get_escalation(&self, id: &[u8; 32]) -> Result<Option<EscalationEvent>> { async fn get_escalation(&self, id: &[u8; 32]) -> Result<Option<EscalationEvent>> {
let entries = self.store.scan_prefix(ESC_PREFIX).await?; let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
for (key, data) in entries { for (key, data) in entries {
if let Some((_timestamp, found_id)) = Self::parse_key(&key) { if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
@ -218,7 +213,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use stemedb_core::types::{EscalationEvent, EscalationLevel}; use stemedb_core::types::{EscalationEvent, EscalationLevel};
fn create_event( fn create_event(
@ -243,7 +238,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_write_and_get_escalation() { async fn test_write_and_get_escalation() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let esc_store = GenericEscalationStore::new(store); let esc_store = GenericEscalationStore::new(store);
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High); let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
@ -256,7 +251,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_escalations_since() { async fn test_get_escalations_since() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let esc_store = GenericEscalationStore::new(store); let esc_store = GenericEscalationStore::new(store);
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High); let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
@ -277,7 +272,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_escalation() { async fn test_resolve_escalation() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let esc_store = GenericEscalationStore::new(store); let esc_store = GenericEscalationStore::new(store);
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High); let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
@ -303,7 +298,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_pending_escalations() { async fn test_get_pending_escalations() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let esc_store = GenericEscalationStore::new(store); let esc_store = GenericEscalationStore::new(store);
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High); let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
@ -327,7 +322,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_resolve_nonexistent_escalation() { async fn test_resolve_nonexistent_escalation() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let esc_store = GenericEscalationStore::new(store); let esc_store = GenericEscalationStore::new(store);
let nonexistent_id = [42u8; 32]; let nonexistent_id = [42u8; 32];
@ -338,10 +333,10 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_parse_key() { async fn test_parse_key() {
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High); let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
let key = GenericEscalationStore::<SledStore>::escalation_key(&event); let key = key_codec::escalation_key(event.timestamp, &hex::encode(event.id));
let (timestamp, id) = let (timestamp, id) =
GenericEscalationStore::<SledStore>::parse_key(&key).expect("parse should succeed"); GenericEscalationStore::<HybridStore>::parse_key(&key).expect("parse should succeed");
assert_eq!(timestamp, 1000); assert_eq!(timestamp, 1000);
assert_eq!(id, event.id); assert_eq!(id, event.id);

View File

@ -0,0 +1,213 @@
use crate::error::{Result, StorageError};
use crate::traits::KVStore;
use async_trait::async_trait;
use dashmap::DashMap;
use std::path::Path;
use std::sync::Arc;
use tracing::instrument;
fn fjall_err(e: fjall::Error) -> StorageError {
StorageError::Backend(e.to_string())
}
/// Fjall (LSM-tree) implementation of the KVStore trait.
///
/// Used for write-heavy key prefixes: assertions (`H:`), votes (`V:`, `VC:`, `VW:`),
/// epochs (`E:`), supersession markers (`SUPERSEDED:`), and ingestion cursors (`__CURSOR__:`).
pub struct FjallStore {
keyspace: fjall::Keyspace,
partition: fjall::PartitionHandle,
atomic_locks: Arc<DashMap<Vec<u8>, Arc<tokio::sync::Mutex<()>>>>,
_temp_dir: Option<tempfile::TempDir>,
}
impl std::fmt::Debug for FjallStore {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("FjallStore").finish()
}
}
impl FjallStore {
/// Open or create a Fjall database at the given path.
#[instrument(skip_all)]
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
let keyspace = fjall::Config::new(path.as_ref()).open().map_err(fjall_err)?;
let partition = keyspace
.open_partition("default", fjall::PartitionCreateOptions::default())
.map_err(fjall_err)?;
Ok(Self { keyspace, partition, atomic_locks: Arc::new(DashMap::new()), _temp_dir: None })
}
/// Open a temporary Fjall database for testing.
///
/// The database will be automatically deleted when the returned store is dropped.
pub fn open_temp() -> Result<Self> {
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
let keyspace = fjall::Config::new(temp_dir.path()).open().map_err(fjall_err)?;
let partition = keyspace
.open_partition("default", fjall::PartitionCreateOptions::default())
.map_err(fjall_err)?;
Ok(Self {
keyspace,
partition,
atomic_locks: Arc::new(DashMap::new()),
_temp_dir: Some(temp_dir),
})
}
}
#[async_trait]
impl KVStore for FjallStore {
#[instrument(skip_all, fields(key_len = key.len()))]
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
let result = self.partition.get(key).map_err(fjall_err)?;
Ok(result.map(|slice| slice.to_vec()))
}
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
self.partition.insert(key, value).map_err(fjall_err)?;
Ok(())
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn delete(&self, key: &[u8]) -> Result<()> {
self.partition.remove(key).map_err(fjall_err)?;
Ok(())
}
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
let mut results = Vec::new();
for item in self.partition.prefix(prefix) {
let (k, v) = item.map_err(fjall_err)?;
results.push((k.to_vec(), v.to_vec()));
}
Ok(results)
}
#[instrument(skip_all)]
async fn flush(&self) -> Result<()> {
self.keyspace.persist(fjall::PersistMode::SyncAll).map_err(fjall_err)?;
Ok(())
}
#[instrument(skip_all, fields(key_len = key.len(), delta))]
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
let lock = self
.atomic_locks
.entry(key.to_vec())
.or_insert_with(|| Arc::new(tokio::sync::Mutex::new(())))
.clone();
let _guard = lock.lock().await;
let current = match self.partition.get(key).map_err(fjall_err)? {
Some(bytes) => {
let arr: [u8; 8] = bytes.as_ref().try_into().map_err(|_| {
StorageError::Serialization(format!(
"Corrupted u64 counter: expected 8 bytes, got {}",
bytes.len()
))
})?;
u64::from_le_bytes(arr)
}
None => 0,
};
let new_val = current.saturating_add(delta);
self.partition.insert(key, new_val.to_le_bytes()).map_err(fjall_err)?;
Ok(new_val)
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
where
F: Fn(f32) -> f32 + Send + Sync,
{
let lock = self
.atomic_locks
.entry(key.to_vec())
.or_insert_with(|| Arc::new(tokio::sync::Mutex::new(())))
.clone();
let _guard = lock.lock().await;
let current = match self.partition.get(key).map_err(fjall_err)? {
Some(bytes) => {
let arr: [u8; 4] = bytes.as_ref().try_into().map_err(|_| {
StorageError::Serialization(format!(
"Corrupted f32 value: expected 4 bytes, got {}",
bytes.len()
))
})?;
f32::from_le_bytes(arr)
}
None => 0.0,
};
let new_val = update_fn(current);
self.partition.insert(key, new_val.to_le_bytes()).map_err(fjall_err)?;
Ok(new_val)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_fjall_store_roundtrip() {
let store = FjallStore::open_temp().expect("Failed to create temp DB");
let key = b"test_key";
let value = b"test_value";
store.put(key, value).await.expect("Put failed");
let retrieved = store.get(key).await.expect("Get failed");
assert_eq!(retrieved, Some(value.to_vec()));
store.delete(key).await.expect("Delete failed");
let deleted = store.get(key).await.expect("Get failed");
assert_eq!(deleted, None);
}
#[tokio::test]
async fn test_fjall_scan_prefix() {
let store = FjallStore::open_temp().expect("Failed to create temp DB");
store.put(b"prefix:1", b"val1").await.unwrap();
store.put(b"prefix:2", b"val2").await.unwrap();
store.put(b"other:3", b"val3").await.unwrap();
let results = store.scan_prefix(b"prefix:").await.unwrap();
assert_eq!(results.len(), 2);
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
}
#[tokio::test]
async fn test_fjall_fetch_and_add() {
let store = FjallStore::open_temp().expect("Failed to create temp DB");
let key = b"counter";
let val = store.fetch_and_add_u64(key, 5).await.unwrap();
assert_eq!(val, 5);
let val = store.fetch_and_add_u64(key, 3).await.unwrap();
assert_eq!(val, 8);
}
#[tokio::test]
async fn test_fjall_compare_and_swap_f32() {
let store = FjallStore::open_temp().expect("Failed to create temp DB");
let key = b"weight";
let val = store.compare_and_swap_f32(key, |current| current + 1.5).await.unwrap();
assert!((val - 1.5).abs() < f32::EPSILON);
let val = store.compare_and_swap_f32(key, |current| current + 2.0).await.unwrap();
assert!((val - 3.5).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_fjall_flush() {
let store = FjallStore::open_temp().expect("Failed to create temp DB");
store.put(b"key", b"value").await.unwrap();
store.flush().await.expect("Flush should succeed");
}
}

View File

@ -1,17 +1,14 @@
//! Storage for gold standard assertions. //! Storage for gold standard assertions.
//! //!
//! Gold standards are stored at `GS:{subject}:{predicate}` to enable efficient //! Gold standards are stored at `{subject}\x00GS:{predicate}` with a secondary
//! lookups when verifying agent submissions against known truths. //! index at `\x00GS_LIST:{subject}:{predicate}` for listing all gold standards.
use crate::{KVStore, Result, StorageError}; use crate::{key_codec, KVStore, Result, StorageError};
use async_trait::async_trait; use async_trait::async_trait;
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::GoldStandard; use stemedb_core::types::GoldStandard;
use tracing::{debug, instrument}; use tracing::{debug, instrument};
/// Key prefix for gold standard entries.
const GS_PREFIX: &[u8] = b"GS:";
/// Storage trait for gold standard operations. /// Storage trait for gold standard operations.
/// ///
/// Provides operations for creating, reading, listing, and removing gold standards /// Provides operations for creating, reading, listing, and removing gold standards
@ -71,25 +68,23 @@ impl<S: KVStore> GenericGoldStandardStore<S> {
pub fn new(store: Arc<S>) -> Self { pub fn new(store: Arc<S>) -> Self {
Self { store } Self { store }
} }
/// Construct the storage key for a gold standard.
///
/// Format: `GS:{subject}:{predicate}`
fn gold_standard_key(subject: &str, predicate: &str) -> Vec<u8> {
format!("GS:{}:{}", subject, predicate).into_bytes()
}
} }
#[async_trait] #[async_trait]
impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> { impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
#[instrument(skip(self, gs), fields(subject = %gs.subject, predicate = %gs.predicate))] #[instrument(skip(self, gs), fields(subject = %gs.subject, predicate = %gs.predicate))]
async fn set_gold_standard(&self, gs: &GoldStandard) -> Result<()> { async fn set_gold_standard(&self, gs: &GoldStandard) -> Result<()> {
let key = Self::gold_standard_key(&gs.subject, &gs.predicate); let key = key_codec::gold_standard_key(&gs.subject, &gs.predicate);
let list_key = key_codec::gs_list_key(&gs.subject, &gs.predicate);
let serialized = stemedb_core::serde::serialize(gs) let serialized = stemedb_core::serde::serialize(gs)
.map_err(|e| StorageError::Serialization(e.to_string()))?; .map_err(|e| StorageError::Serialization(e.to_string()))?;
// Write primary key
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
// Write secondary index for listing (empty value, just presence matters)
self.store.put(&list_key, &[]).await?;
debug!( debug!(
subject = %gs.subject, subject = %gs.subject,
predicate = %gs.predicate, predicate = %gs.predicate,
@ -106,7 +101,7 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
subject: &str, subject: &str,
predicate: &str, predicate: &str,
) -> Result<Option<GoldStandard>> { ) -> Result<Option<GoldStandard>> {
let key = Self::gold_standard_key(subject, predicate); let key = key_codec::gold_standard_key(subject, predicate);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
@ -135,14 +130,31 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
#[instrument(skip(self))] #[instrument(skip(self))]
async fn list_gold_standards(&self) -> Result<Vec<GoldStandard>> { async fn list_gold_standards(&self) -> Result<Vec<GoldStandard>> {
let entries = self.store.scan_prefix(GS_PREFIX).await?; // Scan the GS_LIST secondary index
let list_entries = self.store.scan_prefix(&key_codec::gs_list_scan_prefix()).await?;
let mut gold_standards = Vec::new(); let mut gold_standards = Vec::new();
for (_key, data) in entries { for (list_key, _) in list_entries {
// Extract subject and predicate from GS_LIST key: \x00GS_LIST:{subject}:{predicate}
let tag = key_codec::extract_tag(&list_key);
if let Some(suffix) = tag.strip_prefix(b"GS_LIST:") {
if let Ok(suffix_str) = std::str::from_utf8(suffix) {
// Split by first colon to get subject and predicate
if let Some(colon_pos) = suffix_str.find(':') {
let subject = &suffix_str[..colon_pos];
let predicate = &suffix_str[colon_pos + 1..];
// Fetch the actual gold standard from the primary key
let key = key_codec::gold_standard_key(subject, predicate);
if let Some(data) = self.store.get(&key).await? {
match stemedb_core::serde::deserialize::<GoldStandard>(&data) { match stemedb_core::serde::deserialize::<GoldStandard>(&data) {
Ok(gs) => gold_standards.push(gs), Ok(gs) => gold_standards.push(gs),
Err(e) => { Err(e) => {
debug!(error = %e, "Skipping malformed gold standard"); debug!(error = %e, subject = %subject, predicate = %predicate, "Skipping malformed gold standard");
}
}
}
}
} }
} }
} }
@ -158,13 +170,16 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))] #[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
async fn remove_gold_standard(&self, subject: &str, predicate: &str) -> Result<bool> { async fn remove_gold_standard(&self, subject: &str, predicate: &str) -> Result<bool> {
let key = Self::gold_standard_key(subject, predicate); let key = key_codec::gold_standard_key(subject, predicate);
let list_key = key_codec::gs_list_key(subject, predicate);
// Check if it exists first // Check if it exists first
let exists = self.store.get(&key).await?.is_some(); let exists = self.store.get(&key).await?.is_some();
if exists { if exists {
// Delete both primary key and secondary index
self.store.delete(&key).await?; self.store.delete(&key).await?;
self.store.delete(&list_key).await?;
debug!( debug!(
subject = %subject, subject = %subject,
predicate = %predicate, predicate = %predicate,
@ -185,7 +200,7 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use stemedb_core::types::GoldStandard; use stemedb_core::types::GoldStandard;
fn create_gold_standard(subject: &str, predicate: &str, expected_object: &str) -> GoldStandard { fn create_gold_standard(subject: &str, predicate: &str, expected_object: &str) -> GoldStandard {
@ -201,7 +216,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_set_and_get_gold_standard() { async fn test_set_and_get_gold_standard() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid"); let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
@ -218,7 +233,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_nonexistent_gold_standard() { async fn test_get_nonexistent_gold_standard() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let result = gs_store.get_gold_standard("NonExistent", "predicate").await.expect("get"); let result = gs_store.get_gold_standard("NonExistent", "predicate").await.expect("get");
@ -228,7 +243,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_list_gold_standards() { async fn test_list_gold_standards() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let gs1 = create_gold_standard("Earth", "has_shape", "oblate_spheroid"); let gs1 = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
@ -253,7 +268,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_remove_gold_standard() { async fn test_remove_gold_standard() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid"); let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
@ -274,7 +289,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_remove_nonexistent_gold_standard() { async fn test_remove_nonexistent_gold_standard() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let removed = let removed =
@ -284,7 +299,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_overwrite_gold_standard() { async fn test_overwrite_gold_standard() {
let store = Arc::new(SledStore::open_temp().expect("store")); let store = Arc::new(HybridStore::open_temp().expect("store"));
let gs_store = GenericGoldStandardStore::new(store); let gs_store = GenericGoldStandardStore::new(store);
let gs1 = create_gold_standard("Earth", "has_shape", "sphere"); let gs1 = create_gold_standard("Earth", "has_shape", "sphere");

View File

@ -0,0 +1,352 @@
use crate::error::{Result, StorageError};
use crate::fjall_backend::FjallStore;
use crate::key_codec;
use crate::redb_backend::RedbStore;
use crate::traits::KVStore;
use async_trait::async_trait;
use std::path::Path;
use tracing::instrument;
/// Which backend handles a given key.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum Backend {
/// Fjall (LSM) — optimized for write-heavy workloads.
Fjall,
/// Redb (B-tree) — optimized for read-heavy workloads.
Redb,
}
/// Hybrid storage backend that routes keys to fjall (write-heavy) or redb (read-heavy).
///
/// Keys follow the `key_codec` format:
/// - Subject-prefixed: `{subject}\x00{TAG}:{suffix}`
/// - Global: `\x00{TAG}:{suffix}`
///
/// Routing extracts the TAG and dispatches:
/// - **Fjall**: `H:` (assertions), `V:` (votes), `VC:` (vote counts), `VW:` (vote weights),
/// `E:` (epochs), `SUPERSEDED:`, `META:` (cursors, counters)
/// - **Redb**: `S:` (subject index), `SP:` (compound index), `MV:` (materialized views),
/// `TRUST:` (trust ranks), `AUD:` (audits), `QUOTA:` (quotas), `TP:` (trust packs),
/// `GS:` (gold standards), `ESC:` (escalations), and everything else
pub struct HybridStore {
fjall: FjallStore,
redb: RedbStore,
_temp_dir: Option<tempfile::TempDir>,
}
impl std::fmt::Debug for HybridStore {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("HybridStore").finish()
}
}
/// Route a key to the appropriate backend based on its tag.
///
/// Uses `key_codec::extract_tag` to parse the tag portion from keys in
/// `{subject}\x00{TAG}:{suffix}` or `\x00{TAG}:{suffix}` format.
fn route(key: &[u8]) -> Backend {
let tag = key_codec::extract_tag(key);
if tag.starts_with(b"H:")
|| tag.starts_with(b"V:")
|| tag.starts_with(b"VC:")
|| tag.starts_with(b"VW:")
|| tag.starts_with(b"E:")
|| tag.starts_with(b"SUPERSEDED:")
|| tag.starts_with(b"META:")
{
Backend::Fjall
} else {
Backend::Redb
}
}
/// Check if a prefix is ambiguous — it could match keys in both backends.
///
/// This happens when scanning by subject only (`{subject}\x00`) since a subject
/// can have keys in both fjall (assertions, votes) and redb (indexes, views).
fn is_cross_backend_prefix(prefix: &[u8]) -> bool {
// A subject-only prefix ends with \x00 and has no tag after it
if prefix.is_empty() {
return false;
}
let tag = key_codec::extract_tag(prefix);
// If the extracted tag is empty, the prefix doesn't specify which backend
tag.is_empty()
}
impl HybridStore {
/// Open or create a HybridStore at the given path.
///
/// Creates `fjall/` and `redb/` subdirectories under the given path.
#[instrument(skip_all)]
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
let base = path.as_ref();
let fjall_path = base.join("fjall");
let redb_path = base.join("redb");
std::fs::create_dir_all(&fjall_path).map_err(StorageError::Io)?;
std::fs::create_dir_all(&redb_path).map_err(StorageError::Io)?;
let fjall = FjallStore::open(&fjall_path)?;
let redb = RedbStore::open(redb_path.join("data.redb"))?;
Ok(Self { fjall, redb, _temp_dir: None })
}
/// Open a temporary HybridStore for testing.
///
/// Both backends share one temp directory with `fjall/` and `redb/` subdirectories.
pub fn open_temp() -> Result<Self> {
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
let redb_dir = temp_dir.path().join("redb");
std::fs::create_dir_all(&redb_dir).map_err(StorageError::Io)?;
let fjall = FjallStore::open(temp_dir.path().join("fjall"))?;
let redb = RedbStore::open(redb_dir.join("data.redb"))?;
Ok(Self { fjall, redb, _temp_dir: Some(temp_dir) })
}
}
#[async_trait]
impl KVStore for HybridStore {
#[instrument(skip_all, fields(key_len = key.len()))]
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
match route(key) {
Backend::Fjall => self.fjall.get(key).await,
Backend::Redb => self.redb.get(key).await,
}
}
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
match route(key) {
Backend::Fjall => self.fjall.put(key, value).await,
Backend::Redb => self.redb.put(key, value).await,
}
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn delete(&self, key: &[u8]) -> Result<()> {
match route(key) {
Backend::Fjall => self.fjall.delete(key).await,
Backend::Redb => self.redb.delete(key).await,
}
}
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
if is_cross_backend_prefix(prefix) {
// Subject-only prefix — scan both backends and merge
let mut results = self.fjall.scan_prefix(prefix).await?;
results.extend(self.redb.scan_prefix(prefix).await?);
results.sort_by(|a, b| a.0.cmp(&b.0));
return Ok(results);
}
match route(prefix) {
Backend::Fjall => self.fjall.scan_prefix(prefix).await,
Backend::Redb => self.redb.scan_prefix(prefix).await,
}
}
#[instrument(skip_all)]
async fn flush(&self) -> Result<()> {
// Flush fjall first (write-heavy, most critical for durability),
// then redb (always durable after commit, so this is a no-op).
self.fjall.flush().await?;
self.redb.flush().await?;
Ok(())
}
#[instrument(skip_all, fields(key_len = key.len(), delta))]
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
match route(key) {
Backend::Fjall => self.fjall.fetch_and_add_u64(key, delta).await,
Backend::Redb => self.redb.fetch_and_add_u64(key, delta).await,
}
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
where
F: Fn(f32) -> f32 + Send + Sync,
{
match route(key) {
Backend::Fjall => self.fjall.compare_and_swap_f32(key, update_fn).await,
Backend::Redb => self.redb.compare_and_swap_f32(key, update_fn).await,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::key_codec;
// ── Basic KVStore contract tests ──
#[tokio::test]
async fn test_hybrid_store_roundtrip() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
let key = b"test_key";
let value = b"test_value";
store.put(key, value).await.expect("Put failed");
let retrieved = store.get(key).await.expect("Get failed");
assert_eq!(retrieved, Some(value.to_vec()));
store.delete(key).await.expect("Delete failed");
let deleted = store.get(key).await.expect("Get failed");
assert_eq!(deleted, None);
}
#[tokio::test]
async fn test_hybrid_scan_prefix() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
let k1 = key_codec::subject_index_key("subject1");
let k2 = key_codec::subject_predicate_key("subject1", "pred");
let k3 = key_codec::subject_index_key("subject2");
store.put(&k1, b"val1").await.unwrap();
store.put(&k2, b"val2").await.unwrap();
store.put(&k3, b"val3").await.unwrap();
let prefix = key_codec::subject_scan_prefix("subject1");
let results = store.scan_prefix(&prefix).await.unwrap();
assert_eq!(results.len(), 2);
}
#[tokio::test]
async fn test_hybrid_fetch_and_add() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
// Vote count (fjall path — subject-prefixed)
let vc_key = key_codec::vote_count_key("Tesla", "abc123");
let val = store.fetch_and_add_u64(&vc_key, 5).await.unwrap();
assert_eq!(val, 5);
let val = store.fetch_and_add_u64(&vc_key, 3).await.unwrap();
assert_eq!(val, 8);
// Quota counter (redb path — global)
let qt_key = key_codec::quota_key("agent1", 1000);
let val = store.fetch_and_add_u64(&qt_key, 10).await.unwrap();
assert_eq!(val, 10);
}
#[tokio::test]
async fn test_hybrid_compare_and_swap_f32() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
// Vote weight (fjall path — subject-prefixed)
let vw_key = key_codec::vote_weight_key("Tesla", "abc123");
let val = store.compare_and_swap_f32(&vw_key, |c| c + 1.5).await.unwrap();
assert!((val - 1.5).abs() < f32::EPSILON);
// Trust rank (redb path — global)
let tr_key = key_codec::trust_rank_key("agent1");
let val = store.compare_and_swap_f32(&tr_key, |c| c + 0.8).await.unwrap();
assert!((val - 0.8).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_hybrid_flush() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
let h_key = key_codec::assertion_key("Tesla", "hash1");
let s_key = key_codec::subject_index_key("Tesla");
store.put(&h_key, b"assertion_data").await.unwrap();
store.put(&s_key, b"index_data").await.unwrap();
store.flush().await.expect("Flush should succeed");
}
// ── Routing tests with key_codec keys ──
#[test]
fn test_routing_fjall_subject_prefixed() {
// Subject-prefixed write-heavy keys → Fjall
assert_eq!(route(&key_codec::assertion_key("Tesla", "abc")), Backend::Fjall);
assert_eq!(route(&key_codec::vote_key("Tesla", "abc", "def")), Backend::Fjall);
assert_eq!(route(&key_codec::vote_count_key("Tesla", "abc")), Backend::Fjall);
assert_eq!(route(&key_codec::vote_weight_key("Tesla", "abc")), Backend::Fjall);
}
#[test]
fn test_routing_fjall_global() {
// Global write-heavy keys → Fjall
assert_eq!(route(&key_codec::epoch_key("deadbeef")), Backend::Fjall);
assert_eq!(route(&key_codec::superseded_key("deadbeef")), Backend::Fjall);
assert_eq!(route(&key_codec::cursor_key()), Backend::Fjall);
assert_eq!(route(&key_codec::assertion_count_key()), Backend::Fjall);
}
#[test]
fn test_routing_redb_subject_prefixed() {
// Subject-prefixed read-heavy keys → Redb
assert_eq!(route(&key_codec::subject_index_key("Tesla")), Backend::Redb);
assert_eq!(route(&key_codec::subject_predicate_key("Tesla", "rev")), Backend::Redb);
assert_eq!(route(&key_codec::mv_key("Tesla", "revenue")), Backend::Redb);
assert_eq!(route(&key_codec::gold_standard_key("Earth", "shape")), Backend::Redb);
}
#[test]
fn test_routing_redb_global() {
// Global read-heavy keys → Redb
assert_eq!(route(&key_codec::trust_rank_key("agent1")), Backend::Redb);
assert_eq!(route(&key_codec::quota_key("agent1", 1000)), Backend::Redb);
assert_eq!(route(&key_codec::audit_key("query1")), Backend::Redb);
assert_eq!(route(&key_codec::escalation_key(1000, "hash1")), Backend::Redb);
assert_eq!(route(&key_codec::trust_pack_key(&[1u8; 32])), Backend::Redb);
}
#[test]
fn test_routing_default_to_redb() {
assert_eq!(route(b"unknown:key"), Backend::Redb);
assert_eq!(route(b""), Backend::Redb);
}
#[tokio::test]
async fn test_cross_backend_isolation() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
// Write to fjall (assertion — subject-prefixed)
let h_key = key_codec::assertion_key("Tesla", "hash1");
store.put(&h_key, b"assertion").await.unwrap();
// Write to redb (index — subject-prefixed)
let s_key = key_codec::subject_index_key("Tesla");
store.put(&s_key, b"index").await.unwrap();
// Both should be retrievable
assert_eq!(store.get(&h_key).await.unwrap(), Some(b"assertion".to_vec()));
assert_eq!(store.get(&s_key).await.unwrap(), Some(b"index".to_vec()));
// Delete from one backend shouldn't affect the other
store.delete(&h_key).await.unwrap();
assert_eq!(store.get(&h_key).await.unwrap(), None);
assert_eq!(store.get(&s_key).await.unwrap(), Some(b"index".to_vec()));
}
#[tokio::test]
async fn test_prefix_scan_within_backend() {
let store = HybridStore::open_temp().expect("Failed to create temp DB");
// Write assertion hashes (fjall — subject-prefixed)
let h1 = key_codec::assertion_key("Earth", "aaa");
let h2 = key_codec::assertion_key("Earth", "bbb");
store.put(&h1, b"val1").await.unwrap();
store.put(&h2, b"val2").await.unwrap();
// Write index entries (redb — global)
let tr1 = key_codec::trust_rank_key("agent_a");
let tr2 = key_codec::trust_rank_key("agent_b");
store.put(&tr1, b"rank1").await.unwrap();
store.put(&tr2, b"rank2").await.unwrap();
// Scan fjall (subject prefix)
let earth_prefix = key_codec::subject_scan_prefix("Earth");
let h_results = store.scan_prefix(&earth_prefix).await.unwrap();
assert_eq!(h_results.len(), 2);
// Scan redb (global prefix)
let trust_prefix = key_codec::trust_rank_scan_prefix();
let tr_results = store.scan_prefix(&trust_prefix).await.unwrap();
assert_eq!(tr_results.len(), 2);
}
}

View File

@ -21,17 +21,12 @@
//! All operations are append-only and content-addressed. //! All operations are append-only and content-addressed.
use crate::error::Result; use crate::error::Result;
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use stemedb_core::types::Hash; use stemedb_core::types::Hash;
use tracing::{debug, instrument}; use tracing::{debug, instrument};
/// Key prefix for subject-only index.
const SUBJECT_PREFIX: &[u8] = b"S:";
/// Key prefix for compound subject+predicate index.
const SUBJECT_PREDICATE_PREFIX: &[u8] = b"SP:";
/// Specialized storage trait for assertion index operations. /// Specialized storage trait for assertion index operations.
/// ///
/// This trait provides index-specific operations on top of a generic KVStore, /// This trait provides index-specific operations on top of a generic KVStore,
@ -108,22 +103,6 @@ impl<S: KVStore> GenericIndexStore<S> {
Self { store } Self { store }
} }
/// Construct the key for the subject index.
fn subject_key(subject: &str) -> Vec<u8> {
let mut key = SUBJECT_PREFIX.to_vec();
key.extend_from_slice(subject.as_bytes());
key
}
/// Construct the key for the compound subject+predicate index.
fn subject_predicate_key(subject: &str, predicate: &str) -> Vec<u8> {
let mut key = SUBJECT_PREDICATE_PREFIX.to_vec();
key.extend_from_slice(subject.as_bytes());
key.push(b':');
key.extend_from_slice(predicate.as_bytes());
key
}
/// Serialize a hash list using the canonical serde helpers. /// Serialize a hash list using the canonical serde helpers.
fn serialize_hash_list(hashes: &Vec<Hash>) -> Result<Vec<u8>> { fn serialize_hash_list(hashes: &Vec<Hash>) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(hashes) crate::serde_helpers::serialize(hashes)
@ -164,14 +143,18 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
predicate: &str, predicate: &str,
assertion_hash: &Hash, assertion_hash: &Hash,
) -> Result<()> { ) -> Result<()> {
// Update subject index: S:{subject} // Update subject index
let subject_key = Self::subject_key(subject); let subject_key = key_codec::subject_index_key(subject);
self.append_to_index(subject_key, assertion_hash).await?; self.append_to_index(subject_key, assertion_hash).await?;
// Update compound index: SP:{subject}:{predicate} // Update compound index
let sp_key = Self::subject_predicate_key(subject, predicate); let sp_key = key_codec::subject_predicate_key(subject, predicate);
self.append_to_index(sp_key, assertion_hash).await?; self.append_to_index(sp_key, assertion_hash).await?;
// Update subjects discovery index
let subjects_index_key = key_codec::subjects_index_key(subject);
self.store.put(&subjects_index_key, &[]).await?;
debug!( debug!(
subject, subject,
predicate, predicate,
@ -184,7 +167,7 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
#[instrument(skip(self), fields(subject = %subject))] #[instrument(skip(self), fields(subject = %subject))]
async fn get_by_subject(&self, subject: &str) -> Result<Vec<Hash>> { async fn get_by_subject(&self, subject: &str) -> Result<Vec<Hash>> {
let key = Self::subject_key(subject); let key = key_codec::subject_index_key(subject);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let hashes = Self::deserialize_hash_list(&data)?; let hashes = Self::deserialize_hash_list(&data)?;
@ -200,7 +183,7 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))] #[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
async fn get_by_subject_predicate(&self, subject: &str, predicate: &str) -> Result<Vec<Hash>> { async fn get_by_subject_predicate(&self, subject: &str, predicate: &str) -> Result<Vec<Hash>> {
let key = Self::subject_predicate_key(subject, predicate); let key = key_codec::subject_predicate_key(subject, predicate);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let hashes = Self::deserialize_hash_list(&data)?; let hashes = Self::deserialize_hash_list(&data)?;
@ -216,13 +199,13 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
#[instrument(skip(self), fields(subject = %subject))] #[instrument(skip(self), fields(subject = %subject))]
async fn has_subject(&self, subject: &str) -> Result<bool> { async fn has_subject(&self, subject: &str) -> Result<bool> {
let key = Self::subject_key(subject); let key = key_codec::subject_index_key(subject);
Ok(self.store.get(&key).await?.is_some()) Ok(self.store.get(&key).await?.is_some())
} }
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))] #[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
async fn has_subject_predicate(&self, subject: &str, predicate: &str) -> Result<bool> { async fn has_subject_predicate(&self, subject: &str, predicate: &str) -> Result<bool> {
let key = Self::subject_predicate_key(subject, predicate); let key = key_codec::subject_predicate_key(subject, predicate);
Ok(self.store.get(&key).await?.is_some()) Ok(self.store.get(&key).await?.is_some())
} }
} }
@ -230,11 +213,12 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc;
#[tokio::test] #[tokio::test]
async fn test_add_and_get_by_subject() { async fn test_add_and_get_by_subject() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let subject = "Tesla"; let subject = "Tesla";
@ -255,7 +239,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_add_and_get_by_subject_predicate() { async fn test_add_and_get_by_subject_predicate() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let subject = "Tesla"; let subject = "Tesla";
@ -279,7 +263,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_idempotent_insert() { async fn test_idempotent_insert() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let subject = "Tesla"; let subject = "Tesla";
@ -299,7 +283,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_empty_index_returns_empty_vec() { async fn test_empty_index_returns_empty_vec() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let hashes = index_store.get_by_subject("Nonexistent").await.expect("get"); let hashes = index_store.get_by_subject("Nonexistent").await.expect("get");
@ -312,7 +296,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_has_subject() { async fn test_has_subject() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let subject = "Tesla"; let subject = "Tesla";
@ -331,7 +315,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_has_subject_predicate() { async fn test_has_subject_predicate() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let subject = "Tesla"; let subject = "Tesla";
@ -353,7 +337,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_multiple_subjects_isolated() { async fn test_multiple_subjects_isolated() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let index_store = GenericIndexStore::new(store); let index_store = GenericIndexStore::new(store);
let hash1 = [1u8; 32]; let hash1 = [1u8; 32];
@ -378,8 +362,8 @@ mod tests {
let hashes = vec![[1u8; 32], [2u8; 32], [3u8; 32]]; let hashes = vec![[1u8; 32], [2u8; 32], [3u8; 32]];
let serialized = let serialized =
GenericIndexStore::<SledStore>::serialize_hash_list(&hashes).expect("serialize"); GenericIndexStore::<HybridStore>::serialize_hash_list(&hashes).expect("serialize");
let deserialized = GenericIndexStore::<SledStore>::deserialize_hash_list(&serialized) let deserialized = GenericIndexStore::<HybridStore>::deserialize_hash_list(&serialized)
.expect("deserialize"); .expect("deserialize");
assert_eq!(hashes, deserialized); assert_eq!(hashes, deserialized);

View File

@ -0,0 +1,340 @@
//! Central key encoding/decoding for subject-prefix range sharding.
//!
//! ALL storage keys flow through this module. Keys are partitioned into two families:
//!
//! **Subject-prefixed keys** — co-located by subject for range sharding:
//! ```text
//! {subject}\x00{TAG}:{suffix}
//! ```
//!
//! **Global keys** — metadata, trust, quotas, epochs (sort first under `\x00`):
//! ```text
//! \x00{TAG}:{suffix}
//! ```
//!
//! A prefix scan on `{subject}\x00` returns ALL data for that subject.
//! A prefix scan on `\x00` returns ALL global metadata.
use crate::error::{Result, StorageError};
/// Separator byte between subject and tag. Also serves as global key prefix.
pub const SEPARATOR: u8 = 0x00;
// ── Subject validation ──────────────────────────────────────────────
/// Validate that a subject string does not contain the separator byte.
///
/// Subjects containing `\x00` would corrupt key boundaries. This MUST be
/// called on all inbound subjects at the ingestion boundary.
pub fn validate_subject(subject: &str) -> Result<()> {
if subject.as_bytes().contains(&SEPARATOR) {
return Err(StorageError::InputValidation(
"Subject must not contain null byte (\\x00)".to_string(),
));
}
if subject.is_empty() {
return Err(StorageError::InputValidation("Subject must not be empty".to_string()));
}
Ok(())
}
// ── Key builders ────────────────────────────────────────────────────
/// Build a subject-prefixed key: `{subject}\x00{tag}{suffix}`.
fn subject_key(subject: &str, tag: &[u8], suffix: &[u8]) -> Vec<u8> {
let mut key = Vec::with_capacity(subject.len() + 1 + tag.len() + suffix.len());
key.extend_from_slice(subject.as_bytes());
key.push(SEPARATOR);
key.extend_from_slice(tag);
key.extend_from_slice(suffix);
key
}
/// Build a global key: `\x00{tag}{suffix}`.
fn global_key(tag: &[u8], suffix: &[u8]) -> Vec<u8> {
let mut key = Vec::with_capacity(1 + tag.len() + suffix.len());
key.push(SEPARATOR);
key.extend_from_slice(tag);
key.extend_from_slice(suffix);
key
}
// ── Subject-prefixed keys ───────────────────────────────────────────
/// Assertion key: `{subject}\x00H:{hash_hex}`
pub fn assertion_key(subject: &str, hash_hex: &str) -> Vec<u8> {
subject_key(subject, b"H:", hash_hex.as_bytes())
}
/// Subject index key: `{subject}\x00S:`
pub fn subject_index_key(subject: &str) -> Vec<u8> {
subject_key(subject, b"S:", b"")
}
/// Subject+predicate index key: `{subject}\x00SP:{predicate}`
pub fn subject_predicate_key(subject: &str, predicate: &str) -> Vec<u8> {
subject_key(subject, b"SP:", predicate.as_bytes())
}
/// Materialized view key: `{subject}\x00MV:{predicate}`
pub fn mv_key(subject: &str, predicate: &str) -> Vec<u8> {
subject_key(subject, b"MV:", predicate.as_bytes())
}
/// Vote key: `{subject}\x00V:{assert_hex}:{vote_hex}`
pub fn vote_key(subject: &str, assertion_hex: &str, vote_hex: &str) -> Vec<u8> {
let suffix = format!("{}:{}", assertion_hex, vote_hex);
subject_key(subject, b"V:", suffix.as_bytes())
}
/// Vote scan prefix: `{subject}\x00V:{assert_hex}:`
pub fn vote_scan_prefix(subject: &str, assertion_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", assertion_hex);
subject_key(subject, b"V:", suffix.as_bytes())
}
/// Vote count cache key: `{subject}\x00VC:{assert_hex}`
pub fn vote_count_key(subject: &str, assertion_hex: &str) -> Vec<u8> {
subject_key(subject, b"VC:", assertion_hex.as_bytes())
}
/// Vote weight cache key: `{subject}\x00VW:{assert_hex}`
pub fn vote_weight_key(subject: &str, assertion_hex: &str) -> Vec<u8> {
subject_key(subject, b"VW:", assertion_hex.as_bytes())
}
/// Gold standard key: `{subject}\x00GS:{predicate}`
pub fn gold_standard_key(subject: &str, predicate: &str) -> Vec<u8> {
subject_key(subject, b"GS:", predicate.as_bytes())
}
/// Subject+predicate scan prefix: `{subject}\x00SP:` — returns all SP keys for a subject.
pub fn subject_predicate_scan_prefix(subject: &str) -> Vec<u8> {
subject_key(subject, b"SP:", b"")
}
/// Subject scan prefix: `{subject}\x00` — returns ALL data for a subject.
pub fn subject_scan_prefix(subject: &str) -> Vec<u8> {
let mut key = Vec::with_capacity(subject.len() + 1);
key.extend_from_slice(subject.as_bytes());
key.push(SEPARATOR);
key
}
// ── Global keys ─────────────────────────────────────────────────────
/// Trust rank key: `\x00TRUST:{agent_id_hex}`
pub fn trust_rank_key(agent_id_hex: &str) -> Vec<u8> {
global_key(b"TRUST:", agent_id_hex.as_bytes())
}
/// Quota record key: `\x00QUOTA:{agent_hex}:{window}`
pub fn quota_key(agent_hex: &str, window: u64) -> Vec<u8> {
let suffix = format!("{}:{}", agent_hex, window);
global_key(b"QUOTA:", suffix.as_bytes())
}
/// Quota limit key: `\x00QLIMIT:{agent_id_hex}`
pub fn quota_limit_key(agent_id_hex: &str) -> Vec<u8> {
global_key(b"QLIMIT:", agent_id_hex.as_bytes())
}
/// Epoch key: `\x00E:{epoch_id_hex}`
pub fn epoch_key(epoch_id_hex: &str) -> Vec<u8> {
global_key(b"E:", epoch_id_hex.as_bytes())
}
/// Superseded marker key: `\x00SUPERSEDED:{epoch_id_hex}`
pub fn superseded_key(epoch_id_hex: &str) -> Vec<u8> {
global_key(b"SUPERSEDED:", epoch_id_hex.as_bytes())
}
/// Supersession record key: `\x00SUP:{target_hash_hex}`
pub fn supersession_key(target_hash_hex: &str) -> Vec<u8> {
global_key(b"SUP:", target_hash_hex.as_bytes())
}
/// Supersession agent index key: `\x00SUP:IDX:{agent_hex}:{ts_be_bytes}`
pub fn supersession_index_key(agent_hex: &str, timestamp_be_bytes: &[u8]) -> Vec<u8> {
let mut suffix = Vec::with_capacity(agent_hex.len() + 1 + timestamp_be_bytes.len());
suffix.extend_from_slice(agent_hex.as_bytes());
suffix.push(b':');
suffix.extend_from_slice(timestamp_be_bytes);
global_key(b"SUP:IDX:", &suffix)
}
/// Supersession agent scan prefix: `\x00SUP:IDX:{agent_hex}:`
pub fn supersession_index_prefix(agent_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", agent_hex);
global_key(b"SUP:IDX:", suffix.as_bytes())
}
/// Audit record key: `\x00AUD:{query_id_hex}`
pub fn audit_key(query_id_hex: &str) -> Vec<u8> {
global_key(b"AUD:", query_id_hex.as_bytes())
}
/// Audit agent index key: `\x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}`
pub fn audit_agent_index_key(agent_hex: &str, timestamp_hex: &str, query_hex: &str) -> Vec<u8> {
let suffix = format!("{}:{}:{}", agent_hex, timestamp_hex, query_hex);
global_key(b"AUDA:", suffix.as_bytes())
}
/// Audit agent scan prefix: `\x00AUDA:{agent_hex}:`
pub fn audit_agent_prefix(agent_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", agent_hex);
global_key(b"AUDA:", suffix.as_bytes())
}
/// Audit listing prefix: `\x00AUD:`
pub fn audit_scan_prefix() -> Vec<u8> {
global_key(b"AUD:", b"")
}
/// Escalation key: `\x00ESC:{timestamp}:{id_hex}`
pub fn escalation_key(timestamp: u64, id_hex: &str) -> Vec<u8> {
let suffix = format!("{}:{}", timestamp, id_hex);
global_key(b"ESC:", suffix.as_bytes())
}
/// Escalation scan prefix: `\x00ESC:`
pub fn escalation_scan_prefix() -> Vec<u8> {
global_key(b"ESC:", b"")
}
/// Trust pack key: `\x00TP:{pack_id_bytes}`
pub fn trust_pack_key(pack_id: &[u8]) -> Vec<u8> {
global_key(b"TP:", pack_id)
}
/// Trust pack scan prefix: `\x00TP:`
pub fn trust_pack_scan_prefix() -> Vec<u8> {
global_key(b"TP:", b"")
}
/// Gold standard verified key: `\x00GS_VERIFIED:{agent_hex}:{subject}:{predicate}`
pub fn gs_verified_key(agent_hex: &str, subject: &str, predicate: &str) -> Vec<u8> {
let suffix = format!("{}:{}:{}", agent_hex, subject, predicate);
global_key(b"GS_VERIFIED:", suffix.as_bytes())
}
/// Cursor key: `\x00META:cursor:ingest`
pub fn cursor_key() -> Vec<u8> {
global_key(b"META:cursor:ingest", b"")
}
/// Assertion count key: `\x00META:assertion_count`
pub fn assertion_count_key() -> Vec<u8> {
global_key(b"META:assertion_count", b"")
}
/// Trust rank scan prefix for decay: `\x00TRUST:`
pub fn trust_rank_scan_prefix() -> Vec<u8> {
global_key(b"TRUST:", b"")
}
// ── Secondary indexes ───────────────────────────────────────────────
/// Known subjects index key: `\x00SUBJECTS:{subject}`
pub fn subjects_index_key(subject: &str) -> Vec<u8> {
global_key(b"SUBJECTS:", subject.as_bytes())
}
/// Known subjects scan prefix: `\x00SUBJECTS:`
pub fn subjects_scan_prefix() -> Vec<u8> {
global_key(b"SUBJECTS:", b"")
}
/// Gold standard listing index: `\x00GS_LIST:{subject}:{predicate}`
pub fn gs_list_key(subject: &str, predicate: &str) -> Vec<u8> {
let suffix = format!("{}:{}", subject, predicate);
global_key(b"GS_LIST:", suffix.as_bytes())
}
/// Gold standard listing scan prefix: `\x00GS_LIST:`
pub fn gs_list_scan_prefix() -> Vec<u8> {
global_key(b"GS_LIST:", b"")
}
/// Hash-to-subject reverse index: `\x00HASH_SUBJECT:{hash_hex}`
pub fn hash_subject_key(hash_hex: &str) -> Vec<u8> {
global_key(b"HASH_SUBJECT:", hash_hex.as_bytes())
}
// ── Key extraction / parsing ────────────────────────────────────────
/// Extract subject from a `\x00SUBJECTS:{subject}` key.
///
/// Returns the subject string, or `None` if the key doesn't match the expected format.
pub fn extract_subject_from_subjects_key(key: &[u8]) -> Option<String> {
let prefix = b"\x00SUBJECTS:";
if key.starts_with(prefix) {
std::str::from_utf8(&key[prefix.len()..]).ok().map(|s| s.to_string())
} else {
None
}
}
/// Extract subject and predicate from a `{subject}\x00SP:{predicate}` key.
///
/// Returns `(subject, predicate)` or `None` if the key doesn't match.
pub fn extract_sp_key(key: &[u8]) -> Option<(String, String)> {
// Find the \x00 separator
let sep_pos = memchr::memchr(SEPARATOR, key)?;
if sep_pos == 0 {
return None; // Global key, not subject-prefixed
}
let subject = std::str::from_utf8(&key[..sep_pos]).ok()?;
let after_sep = &key[sep_pos + 1..];
// Check for SP: tag
if !after_sep.starts_with(b"SP:") {
return None;
}
let predicate = std::str::from_utf8(&after_sep[3..]).ok()?;
if subject.is_empty() || predicate.is_empty() {
return None;
}
Some((subject.to_string(), predicate.to_string()))
}
/// Extract the tag portion from a key (the part after the separator).
///
/// For subject-prefixed keys: returns bytes after `{subject}\x00`
/// For global keys: returns bytes after `\x00`
pub fn extract_tag(key: &[u8]) -> &[u8] {
if key.first() == Some(&SEPARATOR) {
// Global key: \x00TAG:rest
&key[1..]
} else if let Some(pos) = memchr::memchr(SEPARATOR, key) {
// Subject-prefixed: subject\x00TAG:rest
&key[pos + 1..]
} else {
key
}
}
/// Check if a key is a global key (starts with `\x00`).
pub fn is_global_key(key: &[u8]) -> bool {
key.first() == Some(&SEPARATOR)
}
/// Extract the subject from a subject-prefixed key.
///
/// Returns `None` for global keys or keys without a separator.
pub fn extract_subject(key: &[u8]) -> Option<&str> {
if is_global_key(key) {
return None;
}
if let Some(pos) = memchr::memchr(SEPARATOR, key) {
std::str::from_utf8(&key[..pos]).ok()
} else {
None
}
}
#[cfg(test)]
mod tests;

View File

@ -0,0 +1,231 @@
//! Tests for key encoding/decoding.
use super::*;
#[test]
fn test_validate_subject_rejects_null() {
let result = validate_subject("has\x00null");
assert!(result.is_err());
}
#[test]
fn test_validate_subject_rejects_empty() {
let result = validate_subject("");
assert!(result.is_err());
}
#[test]
fn test_validate_subject_accepts_normal() {
validate_subject("Tesla").expect("normal subject should be valid");
validate_subject("AAPL").expect("ticker should be valid");
validate_subject("Earth::Moon").expect("colons should be valid");
}
#[test]
fn test_assertion_key() {
let key = assertion_key("Tesla", "abc123");
assert_eq!(key, b"Tesla\x00H:abc123");
}
#[test]
fn test_subject_index_key() {
let key = subject_index_key("Tesla");
assert_eq!(key, b"Tesla\x00S:");
}
#[test]
fn test_subject_predicate_key() {
let key = subject_predicate_key("Tesla", "revenue");
assert_eq!(key, b"Tesla\x00SP:revenue");
}
#[test]
fn test_mv_key() {
let key = mv_key("Tesla", "revenue");
assert_eq!(key, b"Tesla\x00MV:revenue");
}
#[test]
fn test_vote_key() {
let key = vote_key("Tesla", "aaa", "bbb");
assert_eq!(key, b"Tesla\x00V:aaa:bbb");
}
#[test]
fn test_vote_scan_prefix() {
let key = vote_scan_prefix("Tesla", "aaa");
assert_eq!(key, b"Tesla\x00V:aaa:");
}
#[test]
fn test_vote_count_key() {
let key = vote_count_key("Tesla", "aaa");
assert_eq!(key, b"Tesla\x00VC:aaa");
}
#[test]
fn test_vote_weight_key() {
let key = vote_weight_key("Tesla", "aaa");
assert_eq!(key, b"Tesla\x00VW:aaa");
}
#[test]
fn test_gold_standard_key() {
let key = gold_standard_key("Earth", "has_shape");
assert_eq!(key, b"Earth\x00GS:has_shape");
}
#[test]
fn test_trust_rank_key() {
let key = trust_rank_key("abc123");
assert_eq!(key, b"\x00TRUST:abc123");
}
#[test]
fn test_quota_key() {
let key = quota_key("abc", 1705314000);
assert_eq!(key, b"\x00QUOTA:abc:1705314000");
}
#[test]
fn test_quota_limit_key() {
let key = quota_limit_key("abc");
assert_eq!(key, b"\x00QLIMIT:abc");
}
#[test]
fn test_epoch_key() {
let key = epoch_key("deadbeef");
assert_eq!(key, b"\x00E:deadbeef");
}
#[test]
fn test_superseded_key() {
let key = superseded_key("deadbeef");
assert_eq!(key, b"\x00SUPERSEDED:deadbeef");
}
#[test]
fn test_supersession_key() {
let key = supersession_key("deadbeef");
assert_eq!(key, b"\x00SUP:deadbeef");
}
#[test]
fn test_audit_key() {
let key = audit_key("abc123");
assert_eq!(key, b"\x00AUD:abc123");
}
#[test]
fn test_escalation_key() {
let key = escalation_key(1000, "abc123");
assert_eq!(key, b"\x00ESC:1000:abc123");
}
#[test]
fn test_trust_pack_key() {
let key = trust_pack_key(&[1u8; 32]);
assert_eq!(&key[..4], b"\x00TP:");
assert_eq!(&key[4..], &[1u8; 32]);
}
#[test]
fn test_cursor_key() {
let key = cursor_key();
assert_eq!(key, b"\x00META:cursor:ingest");
}
#[test]
fn test_assertion_count_key() {
let key = assertion_count_key();
assert_eq!(key, b"\x00META:assertion_count");
}
#[test]
fn test_subjects_index_key() {
let key = subjects_index_key("Tesla");
assert_eq!(key, b"\x00SUBJECTS:Tesla");
}
#[test]
fn test_gs_list_key() {
let key = gs_list_key("Earth", "has_shape");
assert_eq!(key, b"\x00GS_LIST:Earth:has_shape");
}
#[test]
fn test_hash_subject_key() {
let key = hash_subject_key("abc123");
assert_eq!(key, b"\x00HASH_SUBJECT:abc123");
}
#[test]
fn test_gs_verified_key() {
let key = gs_verified_key("abc", "Earth", "has_shape");
assert_eq!(key, b"\x00GS_VERIFIED:abc:Earth:has_shape");
}
#[test]
fn test_subject_scan_prefix() {
let prefix = subject_scan_prefix("Tesla");
assert_eq!(prefix, b"Tesla\x00");
}
#[test]
fn test_extract_tag_global() {
let key = b"\x00TRUST:abc123";
assert_eq!(extract_tag(key), b"TRUST:abc123");
}
#[test]
fn test_extract_tag_subject() {
let key = b"Tesla\x00H:abc123";
assert_eq!(extract_tag(key), b"H:abc123");
}
#[test]
fn test_is_global_key() {
assert!(is_global_key(b"\x00TRUST:abc"));
assert!(!is_global_key(b"Tesla\x00H:abc"));
assert!(!is_global_key(b"plain_key"));
}
#[test]
fn test_extract_subject() {
assert_eq!(extract_subject(b"Tesla\x00H:abc"), Some("Tesla"));
assert_eq!(extract_subject(b"Earth\x00GS:pred"), Some("Earth"));
assert_eq!(extract_subject(b"\x00TRUST:abc"), None);
assert_eq!(extract_subject(b"no_separator"), None);
}
#[test]
fn test_subject_colocation() {
// All Tesla keys should share the same prefix for range sharding
let h = assertion_key("Tesla", "abc");
let s = subject_index_key("Tesla");
let sp = subject_predicate_key("Tesla", "revenue");
let mv = mv_key("Tesla", "revenue");
let v = vote_key("Tesla", "abc", "def");
let vc = vote_count_key("Tesla", "abc");
let vw = vote_weight_key("Tesla", "abc");
let gs = gold_standard_key("Tesla", "stock_price");
let prefix = b"Tesla\x00";
assert!(h.starts_with(prefix));
assert!(s.starts_with(prefix));
assert!(sp.starts_with(prefix));
assert!(mv.starts_with(prefix));
assert!(v.starts_with(prefix));
assert!(vc.starts_with(prefix));
assert!(vw.starts_with(prefix));
assert!(gs.starts_with(prefix));
}
#[test]
fn test_global_keys_sort_first() {
// Global keys (\x00...) sort before subject keys (a-z...)
let global = trust_rank_key("abc");
let subject = assertion_key("Apple", "abc");
assert!(global < subject, "Global keys should sort before subject keys");
}

View File

@ -1,7 +1,7 @@
//! Storage engine abstractions and implementations for Episteme. //! Storage engine abstractions and implementations for Episteme.
//! //!
//! This crate provides the `KVStore` trait for pluggable storage backends //! This crate provides the `KVStore` trait for pluggable storage backends
//! and a concrete implementation using `sled`. //! and a concrete `HybridStore` that routes keys to fjall (write-heavy) or redb (read-heavy).
//! //!
//! # The Ballot Box //! # The Ballot Box
//! //!
@ -10,13 +10,13 @@
//! votes from assertions to enable thousands of agents to vote simultaneously. //! votes from assertions to enable thousands of agents to vote simultaneously.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericVoteStore, VoteStore}; //! use stemedb_storage::{HybridStore, GenericVoteStore, VoteStore};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let vote_store = GenericVoteStore::new(kv_store); //! let vote_store = GenericVoteStore::new(kv_store);
//! //!
//! // High-velocity vote ingestion //! // High-velocity vote ingestion
//! let vote_hash = vote_store.put_vote(&vote).await?; //! let vote_hash = vote_store.put_vote(&vote, "subject").await?;
//! //!
//! // O(1) aggregation via caches //! // O(1) aggregation via caches
//! let count = vote_store.get_vote_count(&assertion_hash).await?; //! let count = vote_store.get_vote_count(&assertion_hash).await?;
@ -30,9 +30,9 @@
//! weighted in the Authority lens. //! weighted in the Authority lens.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericTrustRankStore, TrustRankStore}; //! use stemedb_storage::{HybridStore, GenericTrustRankStore, TrustRankStore};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let trust_store = GenericTrustRankStore::new(kv_store); //! let trust_store = GenericTrustRankStore::new(kv_store);
//! //!
//! // Get agent's current reputation //! // Get agent's current reputation
@ -51,9 +51,9 @@
//! Every query is logged with provenance to enable "Why did you think that?" debugging. //! Every query is logged with provenance to enable "Why did you think that?" debugging.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericAuditStore, AuditStore}; //! use stemedb_storage::{HybridStore, GenericAuditStore, AuditStore};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let audit_store = GenericAuditStore::new(kv_store); //! let audit_store = GenericAuditStore::new(kv_store);
//! //!
//! // Log a query audit //! // Log a query audit
@ -72,9 +72,9 @@
//! Users subscribe to domain expert packs to see reality through trusted lenses. //! Users subscribe to domain expert packs to see reality through trusted lenses.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericTrustPackStore, TrustPackStore}; //! use stemedb_storage::{HybridStore, GenericTrustPackStore, TrustPackStore};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let pack_store = GenericTrustPackStore::new(kv_store); //! let pack_store = GenericTrustPackStore::new(kv_store);
//! //!
//! // Create and store a pack //! // Create and store a pack
@ -94,9 +94,9 @@
//! runaway agents from exhausting system resources. //! runaway agents from exhausting system resources.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericQuotaStore, QuotaStore, OperationType}; //! use stemedb_storage::{HybridStore, GenericQuotaStore, QuotaStore, OperationType};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let quota_store = GenericQuotaStore::new(kv_store); //! let quota_store = GenericQuotaStore::new(kv_store);
//! //!
//! // Check and record cost for an operation //! // Check and record cost for an operation
@ -123,9 +123,9 @@
//! earn TrustRank and unlock premium features. //! earn TrustRank and unlock premium features.
//! //!
//! ```ignore //! ```ignore
//! use stemedb_storage::{SledStore, GenericGoldStandardStore, GoldStandardStore}; //! use stemedb_storage::{HybridStore, GenericGoldStandardStore, GoldStandardStore};
//! //!
//! let kv_store = SledStore::open("./data")?; //! let kv_store = HybridStore::open("./data")?;
//! let gs_store = GenericGoldStandardStore::new(kv_store); //! let gs_store = GenericGoldStandardStore::new(kv_store);
//! //!
//! // Create and store a gold standard //! // Create and store a gold standard
@ -141,22 +141,29 @@
//! } //! }
//! ``` //! ```
/// Central key encoding/decoding for subject-prefix range sharding.
pub mod key_codec;
/// Query audit trail storage for incident investigation. /// Query audit trail storage for incident investigation.
pub mod audit_store; pub mod audit_store;
/// Error types and Result wrapper for storage operations. /// Error types and Result wrapper for storage operations.
pub mod error; pub mod error;
/// Escalation event storage for high-conflict assertions. /// Escalation event storage for high-conflict assertions.
pub mod escalation_store; pub mod escalation_store;
/// Fjall (LSM-tree) backend for write-heavy key prefixes.
pub mod fjall_backend;
/// Gold standard assertions for agent verification. /// Gold standard assertions for agent verification.
pub mod gold_standard_store; pub mod gold_standard_store;
/// Hybrid storage backend: routes keys to fjall (write-heavy) or redb (read-heavy).
pub mod hybrid_backend;
/// Specialized storage for assertion indexes. /// Specialized storage for assertion indexes.
pub mod index_store; pub mod index_store;
/// Economic throttling via Token Bucket quotas (The Meter). /// Economic throttling via Token Bucket quotas (The Meter).
pub mod quota_store; pub mod quota_store;
/// Redb (B-tree) backend for read-heavy key prefixes.
pub mod redb_backend;
/// Storage-layer serialization helpers. /// Storage-layer serialization helpers.
pub(crate) mod serde_helpers; pub(crate) mod serde_helpers;
/// Sled implementation of the storage backend.
pub mod sled_backend;
/// Assertion supersession storage (Error Correction). /// Assertion supersession storage (Error Correction).
pub mod supersession_store; pub mod supersession_store;
/// Core traits for key-value storage. /// Core traits for key-value storage.
@ -176,12 +183,12 @@ pub use audit_store::{AuditStore, GenericAuditStore};
pub use error::{Result, StorageError}; pub use error::{Result, StorageError};
pub use escalation_store::{EscalationStore, GenericEscalationStore}; pub use escalation_store::{EscalationStore, GenericEscalationStore};
pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore}; pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore};
pub use hybrid_backend::HybridStore;
pub use index_store::{GenericIndexStore, IndexStore}; pub use index_store::{GenericIndexStore, IndexStore};
pub use quota_store::{ pub use quota_store::{
CostConfig, GenericQuotaStore, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore, CostConfig, GenericQuotaStore, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore,
DEFAULT_QUOTA_LIMIT, DEFAULT_QUOTA_LIMIT,
}; };
pub use sled_backend::SledStore;
pub use supersession_store::{GenericSupersessionStore, SupersessionStore}; pub use supersession_store::{GenericSupersessionStore, SupersessionStore};
pub use traits::KVStore; pub use traits::KVStore;
pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore}; pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore};

View File

@ -37,9 +37,6 @@ use async_trait::async_trait;
#[allow(dead_code)] // Documented for reference; actual key construction uses format!() #[allow(dead_code)] // Documented for reference; actual key construction uses format!()
const QUOTA_PREFIX: &[u8] = b"QT:"; const QUOTA_PREFIX: &[u8] = b"QT:";
/// Key prefix for per-agent quota limit overrides.
const QUOTA_LIMIT_PREFIX: &[u8] = b"QL:";
/// Default quota limit per agent per hour (10,000 tokens). /// Default quota limit per agent per hour (10,000 tokens).
pub const DEFAULT_QUOTA_LIMIT: u64 = 10_000; pub const DEFAULT_QUOTA_LIMIT: u64 = 10_000;
@ -108,7 +105,8 @@ pub trait QuotaStore: Send + Sync {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc;
fn test_agent() -> [u8; 32] { fn test_agent() -> [u8; 32] {
[1u8; 32] [1u8; 32]
@ -157,7 +155,7 @@ mod tests {
fn test_hour_window() { fn test_hour_window() {
// 2024-01-15 09:30:00 UTC = 1705315800 // 2024-01-15 09:30:00 UTC = 1705315800
let timestamp = 1705315800; let timestamp = 1705315800;
let window = GenericQuotaStore::<SledStore>::hour_window(timestamp); let window = GenericQuotaStore::<HybridStore>::hour_window(timestamp);
// 1705315800 / 3600 = 473698.833... -> 473698 * 3600 = 1705312800 // 1705315800 / 3600 = 473698.833... -> 473698 * 3600 = 1705312800
assert_eq!(window, 1705312800); assert_eq!(window, 1705312800);
@ -167,7 +165,7 @@ mod tests {
#[test] #[test]
fn test_reset_timestamp() { fn test_reset_timestamp() {
let timestamp = 1705315800; // :30 past the hour let timestamp = 1705315800; // :30 past the hour
let reset = GenericQuotaStore::<SledStore>::reset_timestamp(timestamp); let reset = GenericQuotaStore::<HybridStore>::reset_timestamp(timestamp);
// Window is 1705312800, next hour is +3600 = 1705316400 // Window is 1705312800, next hour is +3600 = 1705316400
assert_eq!(reset, 1705316400); assert_eq!(reset, 1705316400);
@ -203,7 +201,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_check_and_record_basic() { async fn test_check_and_record_basic() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::new(store); let quota_store = GenericQuotaStore::new(store);
let agent_id = test_agent(); let agent_id = test_agent();
@ -223,7 +221,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_quota_enforcement() { async fn test_quota_enforcement() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::with_config( let quota_store = GenericQuotaStore::with_config(
store, store,
CostConfig::default(), CostConfig::default(),
@ -260,7 +258,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_quota_resets_each_hour() { async fn test_quota_resets_each_hour() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::with_config(store, CostConfig::default(), 50); let quota_store = GenericQuotaStore::with_config(store, CostConfig::default(), 50);
let agent_id = test_agent(); let agent_id = test_agent();
@ -294,7 +292,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_custom_quota_limit() { async fn test_custom_quota_limit() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::new(store); let quota_store = GenericQuotaStore::new(store);
let agent_id = test_agent(); let agent_id = test_agent();
@ -312,7 +310,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_quota_status() { async fn test_get_quota_status() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::new(store); let quota_store = GenericQuotaStore::new(store);
let agent_id = test_agent(); let agent_id = test_agent();
@ -337,7 +335,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_different_operation_types() { async fn test_different_operation_types() {
let store = SledStore::open_temp().expect("store"); let store = Arc::new(HybridStore::open_temp().expect("store"));
let quota_store = GenericQuotaStore::new(store); let quota_store = GenericQuotaStore::new(store);
let agent_id = test_agent(); let agent_id = test_agent();
@ -376,9 +374,9 @@ mod tests {
}; };
let serialized = let serialized =
GenericQuotaStore::<SledStore>::serialize_record(&record).expect("serialize"); GenericQuotaStore::<HybridStore>::serialize_record(&record).expect("serialize");
let deserialized = let deserialized =
GenericQuotaStore::<SledStore>::deserialize_record(&serialized).expect("deserialize"); GenericQuotaStore::<HybridStore>::deserialize_record(&serialized).expect("deserialize");
assert_eq!(record, deserialized); assert_eq!(record, deserialized);
} }

View File

@ -2,9 +2,9 @@
use super::{ use super::{
CostConfig, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore, DEFAULT_QUOTA_LIMIT, CostConfig, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore, DEFAULT_QUOTA_LIMIT,
QUOTA_LIMIT_PREFIX,
}; };
use crate::error::{Result, StorageError}; use crate::error::{Result, StorageError};
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use tracing::{debug, instrument, warn}; use tracing::{debug, instrument, warn};
@ -40,19 +40,6 @@ impl<S: KVStore> GenericQuotaStore<S> {
Self::hour_window(timestamp) + 3600 Self::hour_window(timestamp) + 3600
} }
/// Construct the key for a quota record.
fn quota_key(agent_id: &[u8; 32], window_start: u64) -> Vec<u8> {
let agent_hex = hex::encode(agent_id);
format!("QT:{}:{}", agent_hex, window_start).into_bytes()
}
/// Construct the key for a quota limit override.
fn limit_key(agent_id: &[u8; 32]) -> Vec<u8> {
let mut key = QUOTA_LIMIT_PREFIX.to_vec();
key.extend_from_slice(agent_id);
key
}
/// Serialize a QuotaRecord using the canonical serde helpers. /// Serialize a QuotaRecord using the canonical serde helpers.
pub(crate) fn serialize_record(record: &QuotaRecord) -> Result<Vec<u8>> { pub(crate) fn serialize_record(record: &QuotaRecord) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(record) crate::serde_helpers::serialize(record)
@ -91,7 +78,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
let limit = self.get_quota_limit(agent_id).await?; let limit = self.get_quota_limit(agent_id).await?;
// Get or create quota record for this window // Get or create quota record for this window
let key = Self::quota_key(agent_id, window_start); let key = key_codec::quota_key(&hex::encode(agent_id), window_start);
let mut record = match self.store.get(&key).await? { let mut record = match self.store.get(&key).await? {
Some(data) => Self::deserialize_record(&data)?, Some(data) => Self::deserialize_record(&data)?,
None => QuotaRecord::new(*agent_id, window_start), None => QuotaRecord::new(*agent_id, window_start),
@ -138,7 +125,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
let reset_at = Self::reset_timestamp(timestamp); let reset_at = Self::reset_timestamp(timestamp);
let limit = self.get_quota_limit(agent_id).await?; let limit = self.get_quota_limit(agent_id).await?;
let key = Self::quota_key(agent_id, window_start); let key = key_codec::quota_key(&hex::encode(agent_id), window_start);
let record = match self.store.get(&key).await? { let record = match self.store.get(&key).await? {
Some(data) => Self::deserialize_record(&data)?, Some(data) => Self::deserialize_record(&data)?,
None => QuotaRecord::new(*agent_id, window_start), None => QuotaRecord::new(*agent_id, window_start),
@ -155,7 +142,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id), limit))] #[instrument(skip(self), fields(agent_id = %hex::encode(agent_id), limit))]
async fn set_quota_limit(&self, agent_id: &[u8; 32], limit: u64) -> Result<()> { async fn set_quota_limit(&self, agent_id: &[u8; 32], limit: u64) -> Result<()> {
let key = Self::limit_key(agent_id); let key = key_codec::quota_limit_key(&hex::encode(agent_id));
self.store.put(&key, &limit.to_le_bytes()).await?; self.store.put(&key, &limit.to_le_bytes()).await?;
debug!("Set custom quota limit"); debug!("Set custom quota limit");
Ok(()) Ok(())
@ -163,7 +150,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))] #[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
async fn get_quota_limit(&self, agent_id: &[u8; 32]) -> Result<u64> { async fn get_quota_limit(&self, agent_id: &[u8; 32]) -> Result<u64> {
let key = Self::limit_key(agent_id); let key = key_codec::quota_limit_key(&hex::encode(agent_id));
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(bytes) if bytes.len() == 8 => { Some(bytes) if bytes.len() == 8 => {
let arr: [u8; 8] = bytes let arr: [u8; 8] = bytes

View File

@ -0,0 +1,280 @@
use crate::error::{Result, StorageError};
use crate::traits::KVStore;
use async_trait::async_trait;
use redb::ReadableTable;
use std::path::Path;
use std::sync::Arc;
use tracing::instrument;
const DATA_TABLE: redb::TableDefinition<&[u8], &[u8]> = redb::TableDefinition::new("data");
fn redb_err(e: impl std::fmt::Display) -> StorageError {
StorageError::Backend(e.to_string())
}
/// Compute the lexicographic successor of a byte prefix.
///
/// Returns `None` if the prefix is all `0xFF` (no successor possible).
fn prefix_successor(prefix: &[u8]) -> Option<Vec<u8>> {
let mut end = prefix.to_vec();
while let Some(last) = end.last_mut() {
if *last < 0xFF {
*last += 1;
return Some(end);
}
end.pop();
}
None
}
/// Redb (B-tree) implementation of the KVStore trait.
///
/// Used for read-heavy key prefixes: indexes (`S:`, `SP:`), materialized views (`MV:`),
/// trust ranks (`TR:`), audits (`QA:`), quotas (`QT:`), trust packs (`TP:`),
/// gold standards (`GS:`), and escalations (`ESC:`).
pub struct RedbStore {
db: Arc<redb::Database>,
_temp_dir: Option<tempfile::TempDir>,
}
impl std::fmt::Debug for RedbStore {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("RedbStore").finish()
}
}
impl RedbStore {
/// Open or create a Redb database at the given path.
#[instrument(skip_all)]
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
let db = redb::Database::create(path.as_ref()).map_err(redb_err)?;
Ok(Self { db: Arc::new(db), _temp_dir: None })
}
/// Open a temporary Redb database for testing.
///
/// The database will be automatically deleted when the returned store is dropped.
pub fn open_temp() -> Result<Self> {
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
let db_path = temp_dir.path().join("data.redb");
let db = redb::Database::create(&db_path).map_err(redb_err)?;
Ok(Self { db: Arc::new(db), _temp_dir: Some(temp_dir) })
}
}
#[async_trait]
impl KVStore for RedbStore {
#[instrument(skip_all, fields(key_len = key.len()))]
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
let read_txn = self.db.begin_read().map_err(redb_err)?;
let table = match read_txn.open_table(DATA_TABLE) {
Ok(t) => t,
Err(redb::TableError::TableDoesNotExist(_)) => return Ok(None),
Err(e) => return Err(redb_err(e)),
};
match table.get(key).map_err(redb_err)? {
Some(guard) => Ok(Some(guard.value().to_vec())),
None => Ok(None),
}
}
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
let write_txn = self.db.begin_write().map_err(redb_err)?;
{
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
table.insert(key, value).map_err(redb_err)?;
}
write_txn.commit().map_err(redb_err)?;
Ok(())
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn delete(&self, key: &[u8]) -> Result<()> {
let write_txn = self.db.begin_write().map_err(redb_err)?;
{
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
table.remove(key).map_err(redb_err)?;
}
write_txn.commit().map_err(redb_err)?;
Ok(())
}
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
let read_txn = self.db.begin_read().map_err(redb_err)?;
let table = match read_txn.open_table(DATA_TABLE) {
Ok(t) => t,
Err(redb::TableError::TableDoesNotExist(_)) => return Ok(Vec::new()),
Err(e) => return Err(redb_err(e)),
};
let mut results = Vec::new();
match prefix_successor(prefix) {
Some(end_key) => {
let range = table.range(prefix..end_key.as_slice()).map_err(redb_err)?;
for entry in range {
let (k, v) = entry.map_err(redb_err)?;
results.push((k.value().to_vec(), v.value().to_vec()));
}
}
None => {
// prefix is all 0xFF — scan from prefix to end
let range = table.range(prefix..).map_err(redb_err)?;
for entry in range {
let (k, v) = entry.map_err(redb_err)?;
results.push((k.value().to_vec(), v.value().to_vec()));
}
}
}
Ok(results)
}
#[instrument(skip_all)]
async fn flush(&self) -> Result<()> {
// redb is always durable after commit — flush is a no-op
Ok(())
}
#[instrument(skip_all, fields(key_len = key.len(), delta))]
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
let write_txn = self.db.begin_write().map_err(redb_err)?;
let new_val = {
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
let current = match table.get(key).map_err(redb_err)? {
Some(guard) => {
let arr: [u8; 8] = guard.value().try_into().map_err(|_| {
StorageError::Serialization(format!(
"Corrupted u64 counter: expected 8 bytes, got {}",
guard.value().len()
))
})?;
u64::from_le_bytes(arr)
}
None => 0,
};
let new_val = current.saturating_add(delta);
table.insert(key, new_val.to_le_bytes().as_slice()).map_err(redb_err)?;
new_val
};
write_txn.commit().map_err(redb_err)?;
Ok(new_val)
}
#[instrument(skip_all, fields(key_len = key.len()))]
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
where
F: Fn(f32) -> f32 + Send + Sync,
{
let write_txn = self.db.begin_write().map_err(redb_err)?;
let new_val = {
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
let current = match table.get(key).map_err(redb_err)? {
Some(guard) => {
let arr: [u8; 4] = guard.value().try_into().map_err(|_| {
StorageError::Serialization(format!(
"Corrupted f32 value: expected 4 bytes, got {}",
guard.value().len()
))
})?;
f32::from_le_bytes(arr)
}
None => 0.0,
};
let new_val = update_fn(current);
table.insert(key, new_val.to_le_bytes().as_slice()).map_err(redb_err)?;
new_val
};
write_txn.commit().map_err(redb_err)?;
Ok(new_val)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_redb_store_roundtrip() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
let key = b"test_key";
let value = b"test_value";
store.put(key, value).await.expect("Put failed");
let retrieved = store.get(key).await.expect("Get failed");
assert_eq!(retrieved, Some(value.to_vec()));
store.delete(key).await.expect("Delete failed");
let deleted = store.get(key).await.expect("Get failed");
assert_eq!(deleted, None);
}
#[tokio::test]
async fn test_redb_scan_prefix() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
store.put(b"prefix:1", b"val1").await.unwrap();
store.put(b"prefix:2", b"val2").await.unwrap();
store.put(b"other:3", b"val3").await.unwrap();
let results = store.scan_prefix(b"prefix:").await.unwrap();
assert_eq!(results.len(), 2);
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
}
#[tokio::test]
async fn test_redb_fetch_and_add() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
let key = b"counter";
let val = store.fetch_and_add_u64(key, 5).await.unwrap();
assert_eq!(val, 5);
let val = store.fetch_and_add_u64(key, 3).await.unwrap();
assert_eq!(val, 8);
}
#[tokio::test]
async fn test_redb_compare_and_swap_f32() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
let key = b"weight";
let val = store.compare_and_swap_f32(key, |current| current + 1.5).await.unwrap();
assert!((val - 1.5).abs() < f32::EPSILON);
let val = store.compare_and_swap_f32(key, |current| current + 2.0).await.unwrap();
assert!((val - 3.5).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_redb_flush() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
store.put(b"key", b"value").await.unwrap();
store.flush().await.expect("Flush should succeed");
}
#[tokio::test]
async fn test_redb_get_nonexistent_table() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
// Get from empty database (table doesn't exist yet)
let result = store.get(b"missing").await.unwrap();
assert_eq!(result, None);
}
#[tokio::test]
async fn test_redb_scan_prefix_empty_table() {
let store = RedbStore::open_temp().expect("Failed to create temp DB");
// Scan from empty database
let results = store.scan_prefix(b"prefix:").await.unwrap();
assert!(results.is_empty());
}
#[test]
fn test_prefix_successor() {
assert_eq!(prefix_successor(b"abc"), Some(b"abd".to_vec()));
assert_eq!(prefix_successor(b"ab\xff"), Some(b"ac".to_vec()));
assert_eq!(prefix_successor(b"\xff\xff\xff"), None);
assert_eq!(prefix_successor(b""), None);
assert_eq!(prefix_successor(b"a\xff\xff"), Some(b"b".to_vec()));
}
}

View File

@ -1,156 +0,0 @@
use crate::error::{Result, StorageError};
use crate::traits::KVStore;
use async_trait::async_trait;
use sled::Db;
use std::path::Path;
/// Sled-based implementation of the KVStore trait.
#[derive(Debug, Clone)]
pub struct SledStore {
db: Db,
}
impl SledStore {
/// Open or create a new Sled database at the given path.
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
let db = sled::open(path).map_err(StorageError::Sled)?;
Ok(Self { db })
}
/// Open a temporary Sled database for testing.
///
/// The database will be automatically deleted when dropped.
/// Useful for unit tests in this and other crates.
pub fn open_temp() -> Result<Self> {
let config = sled::Config::new().temporary(true);
let db = config.open().map_err(StorageError::Sled)?;
Ok(Self { db })
}
}
#[async_trait]
impl KVStore for SledStore {
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
let result = self.db.get(key).map_err(StorageError::Sled)?;
Ok(result.map(|ivec| ivec.to_vec()))
}
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
self.db.insert(key, value).map_err(StorageError::Sled)?;
Ok(())
}
async fn delete(&self, key: &[u8]) -> Result<()> {
self.db.remove(key).map_err(StorageError::Sled)?;
Ok(())
}
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
let iter = self.db.scan_prefix(prefix);
let mut results = Vec::new();
for item in iter {
let (k, v) = item.map_err(StorageError::Sled)?;
results.push((k.to_vec(), v.to_vec()));
}
Ok(results)
}
async fn flush(&self) -> Result<()> {
self.db.flush_async().await.map_err(StorageError::Sled)?;
Ok(())
}
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
let result = self
.db
.update_and_fetch(key, |old| {
let current = match old {
Some(bytes) => match <[u8; 8]>::try_from(bytes) {
Ok(arr) => u64::from_le_bytes(arr),
Err(_) => 0, // Corrupted data, start fresh
},
None => 0, // Key doesn't exist, start at 0
};
Some(current.saturating_add(delta).to_le_bytes().to_vec())
})
.map_err(StorageError::Sled)?;
// Result is Some because our update_fn always returns Some
let bytes = result.ok_or_else(|| {
StorageError::Serialization("fetch_and_add_u64 returned None unexpectedly".to_string())
})?;
let arr: [u8; 8] = bytes.as_ref().try_into().map_err(|_| {
StorageError::Serialization("fetch_and_add_u64 returned wrong size".to_string())
})?;
Ok(u64::from_le_bytes(arr))
}
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
where
F: Fn(f32) -> f32 + Send + Sync,
{
let result = self
.db
.update_and_fetch(key, |old| {
let current = match old {
Some(bytes) => match <[u8; 4]>::try_from(bytes) {
Ok(arr) => f32::from_le_bytes(arr),
Err(_) => 0.0, // Corrupted data, start fresh
},
None => 0.0, // Key doesn't exist, start at 0.0
};
let new_value = update_fn(current);
Some(new_value.to_le_bytes().to_vec())
})
.map_err(StorageError::Sled)?;
let bytes = result.ok_or_else(|| {
StorageError::Serialization(
"compare_and_swap_f32 returned None unexpectedly".to_string(),
)
})?;
let arr: [u8; 4] = bytes.as_ref().try_into().map_err(|_| {
StorageError::Serialization("compare_and_swap_f32 returned wrong size".to_string())
})?;
Ok(f32::from_le_bytes(arr))
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_sled_store_roundtrip() {
let store = SledStore::open_temp().expect("Failed to create temp DB");
let key = b"test_key";
let value = b"test_value";
// Put
store.put(key, value).await.expect("Put failed");
// Get
let retrieved = store.get(key).await.expect("Get failed");
assert_eq!(retrieved, Some(value.to_vec()));
// Delete
store.delete(key).await.expect("Delete failed");
// Get after delete
let deleted = store.get(key).await.expect("Get failed");
assert_eq!(deleted, None);
}
#[tokio::test]
async fn test_scan_prefix() {
let store = SledStore::open_temp().expect("Failed to create temp DB");
store.put(b"prefix:1", b"val1").await.unwrap();
store.put(b"prefix:2", b"val2").await.unwrap();
store.put(b"other:3", b"val3").await.unwrap();
let results = store.scan_prefix(b"prefix:").await.unwrap();
assert_eq!(results.len(), 2);
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
}
}

View File

@ -20,17 +20,13 @@
//! 5. Audit trail preserved: "Who fixed it? When? Why?" //! 5. Audit trail preserved: "Who fixed it? When? Why?"
use crate::error::{Result, StorageError}; use crate::error::{Result, StorageError};
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use stemedb_core::serde::{deserialize, serialize}; use stemedb_core::serde::{deserialize, serialize};
use stemedb_core::types::{Hash, Supersession}; use stemedb_core::types::{Hash, Supersession};
use tracing::{debug, instrument}; use tracing::{debug, instrument};
/// Key prefix for supersession records.
const SUPERSESSION_PREFIX: &[u8] = b"SUP:";
/// Key prefix for agent supersession index.
const SUPERSESSION_INDEX_PREFIX: &[u8] = b"SUP:IDX:";
/// Specialized storage trait for supersession operations. /// Specialized storage trait for supersession operations.
/// ///
/// This trait provides supersession-specific operations on top of a generic KVStore, /// This trait provides supersession-specific operations on top of a generic KVStore,
@ -95,30 +91,6 @@ impl<S: KVStore> GenericSupersessionStore<S> {
pub fn new(store: S) -> Self { pub fn new(store: S) -> Self {
Self { store } Self { store }
} }
/// Build the key for a supersession record.
fn supersession_key(target_hash: &Hash) -> Vec<u8> {
let mut key = SUPERSESSION_PREFIX.to_vec();
key.extend_from_slice(target_hash);
key
}
/// Build the key for the agent supersession index.
fn index_key(agent_id: &[u8; 32], timestamp: u64) -> Vec<u8> {
let mut key = SUPERSESSION_INDEX_PREFIX.to_vec();
key.extend_from_slice(agent_id);
key.push(b':');
key.extend_from_slice(&timestamp.to_be_bytes());
key
}
/// Build the prefix for scanning an agent's supersessions.
fn index_prefix(agent_id: &[u8; 32]) -> Vec<u8> {
let mut key = SUPERSESSION_INDEX_PREFIX.to_vec();
key.extend_from_slice(agent_id);
key.push(b':');
key
}
} }
#[async_trait] #[async_trait]
@ -131,11 +103,15 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
})?; })?;
// Store at primary key // Store at primary key
let key = Self::supersession_key(&supersession.target_hash); let key = key_codec::supersession_key(&hex::encode(supersession.target_hash));
self.store.put(&key, &bytes).await?; self.store.put(&key, &bytes).await?;
// Store index entry (value is the target_hash for lookup) // Store index entry (value is the target_hash for lookup)
let index_key = Self::index_key(&supersession.agent_id, supersession.timestamp); let timestamp_bytes = supersession.timestamp.to_be_bytes();
let index_key = key_codec::supersession_index_key(
&hex::encode(supersession.agent_id),
&timestamp_bytes,
);
self.store.put(&index_key, &supersession.target_hash).await?; self.store.put(&index_key, &supersession.target_hash).await?;
debug!( debug!(
@ -149,7 +125,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
#[instrument(skip(self))] #[instrument(skip(self))]
async fn get_supersession(&self, target_hash: &Hash) -> Result<Option<Supersession>> { async fn get_supersession(&self, target_hash: &Hash) -> Result<Option<Supersession>> {
let key = Self::supersession_key(target_hash); let key = key_codec::supersession_key(&hex::encode(target_hash));
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(bytes) => { Some(bytes) => {
@ -167,7 +143,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
#[instrument(skip(self))] #[instrument(skip(self))]
async fn is_superseded(&self, target_hash: &Hash) -> Result<bool> { async fn is_superseded(&self, target_hash: &Hash) -> Result<bool> {
let key = Self::supersession_key(target_hash); let key = key_codec::supersession_key(&hex::encode(target_hash));
Ok(self.store.get(&key).await?.is_some()) Ok(self.store.get(&key).await?.is_some())
} }
@ -179,7 +155,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
to_timestamp: Option<u64>, to_timestamp: Option<u64>,
limit: Option<usize>, limit: Option<usize>,
) -> Result<Vec<Supersession>> { ) -> Result<Vec<Supersession>> {
let prefix = Self::index_prefix(agent_id); let prefix = key_codec::supersession_index_prefix(&hex::encode(agent_id));
let entries = self.store.scan_prefix(&prefix).await?; let entries = self.store.scan_prefix(&prefix).await?;
let to_ts = to_timestamp.unwrap_or(u64::MAX); let to_ts = to_timestamp.unwrap_or(u64::MAX);
@ -188,9 +164,11 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
let mut supersessions = Vec::new(); let mut supersessions = Vec::new();
for (key, target_hash_bytes) in entries { for (key, target_hash_bytes) in entries {
// Extract timestamp from key (last 8 bytes after the prefix + agent_id + colon) // Extract timestamp from key
// Key format: SUP:IDX:{agent_id}:{timestamp} // Key format: \x00SUP:IDX:{agent_hex}:{timestamp_be_bytes}
let timestamp_start = SUPERSESSION_INDEX_PREFIX.len() + 32 + 1; // prefix + agent_id + ':' // We need to find the last colon and extract the 8 bytes after it
if let Some(last_colon_pos) = key.iter().rposition(|&b| b == b':') {
let timestamp_start = last_colon_pos + 1;
if key.len() < timestamp_start + 8 { if key.len() < timestamp_start + 8 {
continue; // Malformed key continue; // Malformed key
} }
@ -223,6 +201,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
} }
} }
} }
}
// Sort by timestamp descending (most recent first) // Sort by timestamp descending (most recent first)
supersessions.sort_by(|a, b| b.timestamp.cmp(&a.timestamp)); supersessions.sort_by(|a, b| b.timestamp.cmp(&a.timestamp));
@ -234,13 +213,13 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use stemedb_core::types::SupersessionType; use stemedb_core::types::SupersessionType;
use tempfile::tempdir; use tempfile::tempdir;
async fn create_test_store() -> GenericSupersessionStore<SledStore> { async fn create_test_store() -> GenericSupersessionStore<HybridStore> {
let dir = tempdir().expect("Failed to create temp dir"); let dir = tempdir().expect("Failed to create temp dir");
let store = SledStore::open(dir.path()).expect("Failed to open store"); let store = HybridStore::open(dir.path()).expect("Failed to open store");
GenericSupersessionStore::new(store) GenericSupersessionStore::new(store)
} }

View File

@ -4,7 +4,7 @@ use std::sync::Arc;
/// Abstract interface for Key-Value storage backends. /// Abstract interface for Key-Value storage backends.
/// ///
/// This trait allows us to swap the underlying storage engine (e.g., sled, RocksDB) /// This trait allows us to swap the underlying storage engine (e.g., fjall, redb)
/// without changing the core logic of the database. /// without changing the core logic of the database.
#[async_trait] #[async_trait]
pub trait KVStore: Send + Sync { pub trait KVStore: Send + Sync {

View File

@ -8,7 +8,7 @@
//! //!
//! | Key Pattern | Value | Purpose | //! | Key Pattern | Value | Purpose |
//! |-------------|-------|---------| //! |-------------|-------|---------|
//! | `TP:{pack_id}` | Serialized TrustPack | Pack definition and agent membership | //! | `\x00TP:{pack_id}` | Serialized TrustPack | Pack definition and agent membership |
//! //!
//! # Design Philosophy //! # Design Philosophy
//! //!
@ -21,14 +21,12 @@
//! All operations are defensive against missing data (missing pack returns None). //! All operations are defensive against missing data (missing pack returns None).
use crate::error::{Result, StorageError}; use crate::error::{Result, StorageError};
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use stemedb_core::types::{PackId, TrustPack}; use stemedb_core::types::{PackId, TrustPack};
use tracing::{debug, instrument}; use tracing::{debug, instrument};
/// Key prefix for TrustPack entries.
const TRUST_PACK_PREFIX: &[u8] = b"TP:";
/// Specialized storage trait for TrustPack operations. /// Specialized storage trait for TrustPack operations.
/// ///
/// This trait provides pack-specific operations on top of a generic KVStore, /// This trait provides pack-specific operations on top of a generic KVStore,
@ -127,13 +125,6 @@ impl<S: KVStore> GenericTrustPackStore<S> {
Self { store } Self { store }
} }
/// Construct the key for a TrustPack entry.
fn trust_pack_key(pack_id: &PackId) -> Vec<u8> {
let mut key = TRUST_PACK_PREFIX.to_vec();
key.extend_from_slice(pack_id);
key
}
/// Serialize a TrustPack using the canonical serde helpers. /// Serialize a TrustPack using the canonical serde helpers.
fn serialize_pack(pack: &TrustPack) -> Result<Vec<u8>> { fn serialize_pack(pack: &TrustPack) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(pack) crate::serde_helpers::serialize(pack)
@ -150,7 +141,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
#[instrument(skip(self, pack), fields(pack_id = %hex::encode(pack.id), pack_name = %pack.name, agent_count = pack.agent_count()))] #[instrument(skip(self, pack), fields(pack_id = %hex::encode(pack.id), pack_name = %pack.name, agent_count = pack.agent_count()))]
async fn put_pack(&self, pack: &TrustPack) -> Result<PackId> { async fn put_pack(&self, pack: &TrustPack) -> Result<PackId> {
let serialized = Self::serialize_pack(pack)?; let serialized = Self::serialize_pack(pack)?;
let key = Self::trust_pack_key(&pack.id); let key = key_codec::trust_pack_key(&pack.id);
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!( debug!(
@ -163,7 +154,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
#[instrument(skip(self), fields(pack_id = %hex::encode(pack_id)))] #[instrument(skip(self), fields(pack_id = %hex::encode(pack_id)))]
async fn get_pack(&self, pack_id: &PackId) -> Result<Option<TrustPack>> { async fn get_pack(&self, pack_id: &PackId) -> Result<Option<TrustPack>> {
let key = Self::trust_pack_key(pack_id); let key = key_codec::trust_pack_key(pack_id);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let pack = Self::deserialize_pack(&data)?; let pack = Self::deserialize_pack(&data)?;
@ -192,7 +183,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
let new_count = pack.agent_count(); let new_count = pack.agent_count();
let serialized = Self::serialize_pack(&pack)?; let serialized = Self::serialize_pack(&pack)?;
let key = Self::trust_pack_key(pack_id); let key = key_codec::trust_pack_key(pack_id);
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!(old_count, new_count, added = new_count > old_count, "Updated pack membership"); debug!(old_count, new_count, added = new_count > old_count, "Updated pack membership");
@ -211,7 +202,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
let new_count = pack.agent_count(); let new_count = pack.agent_count();
let serialized = Self::serialize_pack(&pack)?; let serialized = Self::serialize_pack(&pack)?;
let key = Self::trust_pack_key(pack_id); let key = key_codec::trust_pack_key(pack_id);
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!(old_count, new_count, removed = new_count < old_count, "Updated pack membership"); debug!(old_count, new_count, removed = new_count < old_count, "Updated pack membership");
@ -236,16 +227,17 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
#[instrument(skip(self))] #[instrument(skip(self))]
async fn list_packs(&self) -> Result<Vec<PackId>> { async fn list_packs(&self) -> Result<Vec<PackId>> {
let prefix = TRUST_PACK_PREFIX.to_vec(); let prefix = key_codec::trust_pack_scan_prefix();
let entries = self.store.scan_prefix(&prefix).await?; let entries = self.store.scan_prefix(&prefix).await?;
let pack_ids: Vec<PackId> = entries let pack_ids: Vec<PackId> = entries
.into_iter() .into_iter()
.filter_map(|(key, _data)| { .filter_map(|(key, _data)| {
// Extract pack_id from key: "TP:{pack_id}" // Extract pack_id from key: "\x00TP:{pack_id}"
if key.len() == TRUST_PACK_PREFIX.len() + 32 { // Key format: \x00 (1 byte) + "TP:" (3 bytes) + pack_id (32 bytes) = 36 bytes
if key.len() == 36 {
let mut pack_id = [0u8; 32]; let mut pack_id = [0u8; 32];
pack_id.copy_from_slice(&key[TRUST_PACK_PREFIX.len()..]); pack_id.copy_from_slice(&key[4..]); // Skip \x00TP:
Some(pack_id) Some(pack_id)
} else { } else {
None None
@ -261,7 +253,8 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc;
fn create_test_pack(id: PackId, name: &str, maintainer: [u8; 32]) -> TrustPack { fn create_test_pack(id: PackId, name: &str, maintainer: [u8; 32]) -> TrustPack {
TrustPack::new(id, name.to_string(), maintainer) TrustPack::new(id, name.to_string(), maintainer)
@ -269,7 +262,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_put_and_get_pack() { async fn test_put_and_get_pack() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let pack_id = [1u8; 32]; let pack_id = [1u8; 32];
@ -292,7 +285,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_add_agent_idempotent() { async fn test_add_agent_idempotent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let pack_id = [2u8; 32]; let pack_id = [2u8; 32];
@ -315,7 +308,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_remove_agent() { async fn test_remove_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let pack_id = [3u8; 32]; let pack_id = [3u8; 32];
@ -345,7 +338,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_contains_agent() { async fn test_contains_agent() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let pack_id = [4u8; 32]; let pack_id = [4u8; 32];
@ -367,7 +360,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_list_packs() { async fn test_list_packs() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
// Initially empty // Initially empty
@ -394,7 +387,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_missing_pack_returns_none() { async fn test_missing_pack_returns_none() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let nonexistent_id = [99u8; 32]; let nonexistent_id = [99u8; 32];
@ -411,7 +404,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_add_to_missing_pack_errors() { async fn test_add_to_missing_pack_errors() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let nonexistent_id = [98u8; 32]; let nonexistent_id = [98u8; 32];
@ -424,7 +417,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_remove_from_missing_pack_errors() { async fn test_remove_from_missing_pack_errors() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let nonexistent_id = [97u8; 32]; let nonexistent_id = [97u8; 32];
@ -446,9 +439,9 @@ mod tests {
original.add_agent([3u8; 32]); original.add_agent([3u8; 32]);
let serialized = let serialized =
GenericTrustPackStore::<SledStore>::serialize_pack(&original).expect("serialize"); GenericTrustPackStore::<HybridStore>::serialize_pack(&original).expect("serialize");
let deserialized = let deserialized = GenericTrustPackStore::<HybridStore>::deserialize_pack(&serialized)
GenericTrustPackStore::<SledStore>::deserialize_pack(&serialized).expect("deserialize"); .expect("deserialize");
assert_eq!(original, deserialized); assert_eq!(original, deserialized);
assert_eq!(deserialized.agent_count(), 3); assert_eq!(deserialized.agent_count(), 3);
@ -456,7 +449,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_multiple_agents() { async fn test_multiple_agents() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let pack_store = GenericTrustPackStore::new(store); let pack_store = GenericTrustPackStore::new(store);
let pack_id = [5u8; 32]; let pack_id = [5u8; 32];

View File

@ -1,8 +1,9 @@
//! Basic tests for TrustRank model and CRUD operations. //! Basic tests for TrustRank model and CRUD operations.
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use model::{DEFAULT_HALF_LIFE_SECONDS, DEFAULT_TRUST_SCORE}; use model::{DEFAULT_HALF_LIFE_SECONDS, DEFAULT_TRUST_SCORE};
use std::sync::Arc;
#[test] #[test]
fn test_trust_rank_new() { fn test_trust_rank_new() {
@ -88,7 +89,7 @@ fn test_decay_no_time_elapsed() {
#[tokio::test] #[tokio::test]
async fn test_get_default_trust_rank() { async fn test_get_default_trust_rank() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [1u8; 32]; let agent_id = [1u8; 32];
@ -100,7 +101,7 @@ async fn test_get_default_trust_rank() {
#[tokio::test] #[tokio::test]
async fn test_put_and_get_trust_rank() { async fn test_put_and_get_trust_rank() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [2u8; 32]; let agent_id = [2u8; 32];
@ -117,7 +118,7 @@ async fn test_put_and_get_trust_rank() {
#[tokio::test] #[tokio::test]
async fn test_update_trust_rank() { async fn test_update_trust_rank() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [3u8; 32]; let agent_id = [3u8; 32];
@ -133,7 +134,7 @@ async fn test_update_trust_rank() {
#[tokio::test] #[tokio::test]
async fn test_record_outcome_updates_score() { async fn test_record_outcome_updates_score() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [4u8; 32]; let agent_id = [4u8; 32];
@ -155,7 +156,7 @@ async fn test_record_outcome_updates_score() {
#[tokio::test] #[tokio::test]
async fn test_decay_trust_ranks() { async fn test_decay_trust_ranks() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
// Create several agents with different scores // Create several agents with different scores
@ -184,7 +185,7 @@ async fn test_decay_trust_ranks() {
#[tokio::test] #[tokio::test]
async fn test_decay_no_change_skips_update() { async fn test_decay_no_change_skips_update() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [5u8; 32]; let agent_id = [5u8; 32];
@ -208,8 +209,8 @@ async fn test_serialization_roundtrip() {
}; };
let serialized = let serialized =
GenericTrustRankStore::<SledStore>::serialize_trust_rank(&original).expect("serialize"); GenericTrustRankStore::<HybridStore>::serialize_trust_rank(&original).expect("serialize");
let deserialized = GenericTrustRankStore::<SledStore>::deserialize_trust_rank(&serialized) let deserialized = GenericTrustRankStore::<HybridStore>::deserialize_trust_rank(&serialized)
.expect("deserialize"); .expect("deserialize");
assert_eq!(original, deserialized); assert_eq!(original, deserialized);

View File

@ -1,12 +1,13 @@
//! Tests for gold standard verification and advanced TrustRank features. //! Tests for gold standard verification and advanced TrustRank features.
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc;
use stemedb_core::types::GoldStandard; use stemedb_core::types::GoldStandard;
#[tokio::test] #[tokio::test]
async fn test_custom_half_life() { async fn test_custom_half_life() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [6u8; 32]; let agent_id = [6u8; 32];
@ -33,7 +34,7 @@ async fn test_custom_half_life() {
#[tokio::test] #[tokio::test]
async fn test_verify_correct_answer() { async fn test_verify_correct_answer() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [7u8; 32]; let agent_id = [7u8; 32];
@ -65,7 +66,7 @@ async fn test_verify_correct_answer() {
#[tokio::test] #[tokio::test]
async fn test_verify_incorrect_answer() { async fn test_verify_incorrect_answer() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [8u8; 32]; let agent_id = [8u8; 32];
@ -97,7 +98,7 @@ async fn test_verify_incorrect_answer() {
#[tokio::test] #[tokio::test]
async fn test_verify_multiple_gold_standards() { async fn test_verify_multiple_gold_standards() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [9u8; 32]; let agent_id = [9u8; 32];
@ -142,7 +143,7 @@ async fn test_verify_multiple_gold_standards() {
#[tokio::test] #[tokio::test]
async fn test_verify_score_clamping() { async fn test_verify_score_clamping() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [10u8; 32]; let agent_id = [10u8; 32];
@ -173,7 +174,7 @@ async fn test_verify_score_clamping() {
#[tokio::test] #[tokio::test]
async fn test_first_gold_standard_verification_succeeds_with_reward() { async fn test_first_gold_standard_verification_succeeds_with_reward() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [11u8; 32]; let agent_id = [11u8; 32];
@ -198,7 +199,7 @@ async fn test_first_gold_standard_verification_succeeds_with_reward() {
#[tokio::test] #[tokio::test]
async fn test_second_gold_standard_verification_returns_already_verified() { async fn test_second_gold_standard_verification_returns_already_verified() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [12u8; 32]; let agent_id = [12u8; 32];
@ -232,7 +233,7 @@ async fn test_second_gold_standard_verification_returns_already_verified() {
#[tokio::test] #[tokio::test]
async fn test_different_gold_standard_verification_works_after_first() { async fn test_different_gold_standard_verification_works_after_first() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [13u8; 32]; let agent_id = [13u8; 32];
@ -277,7 +278,7 @@ async fn test_different_gold_standard_verification_works_after_first() {
#[tokio::test] #[tokio::test]
async fn test_incorrect_answer_penalizes_and_marks_verified() { async fn test_incorrect_answer_penalizes_and_marks_verified() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustRankStore::new(store); let trust_store = GenericTrustRankStore::new(store);
let agent_id = [14u8; 32]; let agent_id = [14u8; 32];

View File

@ -4,6 +4,7 @@
//! including CRUD operations, decay mechanics, and learning loop integration. //! including CRUD operations, decay mechanics, and learning loop integration.
use crate::error::Result; use crate::error::Result;
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use tracing::{debug, instrument}; use tracing::{debug, instrument};
@ -11,16 +12,9 @@ use tracing::{debug, instrument};
use super::model::{TrustRank, DEFAULT_HALF_LIFE_SECONDS}; use super::model::{TrustRank, DEFAULT_HALF_LIFE_SECONDS};
use super::TrustRankStore; use super::TrustRankStore;
/// Key prefix for TrustRank entries.
const TRUST_RANK_PREFIX: &[u8] = b"TR:";
/// Key prefix for gold standard verification markers.
/// Format: GS_VERIFIED:{agent_id_hex}:{subject}:{predicate}
const GS_VERIFIED_PREFIX: &[u8] = b"GS_VERIFIED:";
/// TrustRankStore implementation backed by a generic KVStore. /// TrustRankStore implementation backed by a generic KVStore.
/// ///
/// This implementation stores TrustRank data at `TR:{agent_id}` and provides /// This implementation stores TrustRank data at `\x00TRUST:{agent_id_hex}` and provides
/// all operations for reputation management. /// all operations for reputation management.
pub struct GenericTrustRankStore<S> { pub struct GenericTrustRankStore<S> {
store: S, store: S,
@ -32,25 +26,6 @@ impl<S: KVStore> GenericTrustRankStore<S> {
Self { store } Self { store }
} }
/// Construct the key for a TrustRank entry.
pub(crate) fn trust_rank_key(agent_id: &[u8; 32]) -> Vec<u8> {
let mut key = TRUST_RANK_PREFIX.to_vec();
key.extend_from_slice(agent_id);
key
}
/// Construct the key for a gold standard verification marker.
/// Format: GS_VERIFIED:{agent_id_hex}:{subject}:{predicate}
pub(crate) fn gs_verified_key(agent_id: &[u8; 32], subject: &str, predicate: &str) -> Vec<u8> {
let mut key = GS_VERIFIED_PREFIX.to_vec();
key.extend_from_slice(hex::encode(agent_id).as_bytes());
key.push(b':');
key.extend_from_slice(subject.as_bytes());
key.push(b':');
key.extend_from_slice(predicate.as_bytes());
key
}
/// Serialize a TrustRank using the canonical serde helpers. /// Serialize a TrustRank using the canonical serde helpers.
pub(crate) fn serialize_trust_rank(trust_rank: &TrustRank) -> Result<Vec<u8>> { pub(crate) fn serialize_trust_rank(trust_rank: &TrustRank) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(trust_rank) crate::serde_helpers::serialize(trust_rank)
@ -66,7 +41,7 @@ impl<S: KVStore> GenericTrustRankStore<S> {
impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> { impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))] #[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> { async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> {
let key = Self::trust_rank_key(agent_id); let key = key_codec::trust_rank_key(&hex::encode(agent_id));
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let trust_rank = Self::deserialize_trust_rank(&data)?; let trust_rank = Self::deserialize_trust_rank(&data)?;
@ -97,7 +72,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
let new_score = trust_rank.adjust_score(delta, timestamp); let new_score = trust_rank.adjust_score(delta, timestamp);
let serialized = Self::serialize_trust_rank(&trust_rank)?; let serialized = Self::serialize_trust_rank(&trust_rank)?;
let key = Self::trust_rank_key(agent_id); let key = key_codec::trust_rank_key(&hex::encode(agent_id));
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!(new_score, "Updated TrustRank"); debug!(new_score, "Updated TrustRank");
@ -111,7 +86,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
half_life_seconds: Option<u64>, half_life_seconds: Option<u64>,
) -> Result<usize> { ) -> Result<usize> {
let half_life = half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS); let half_life = half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
let prefix = TRUST_RANK_PREFIX.to_vec(); let prefix = key_codec::trust_rank_scan_prefix();
let entries = self.store.scan_prefix(&prefix).await?; let entries = self.store.scan_prefix(&prefix).await?;
let mut decayed_count = 0; let mut decayed_count = 0;
@ -158,7 +133,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
let new_score = trust_rank.adjust_score(delta, timestamp); let new_score = trust_rank.adjust_score(delta, timestamp);
let serialized = Self::serialize_trust_rank(&trust_rank)?; let serialized = Self::serialize_trust_rank(&trust_rank)?;
let key = Self::trust_rank_key(agent_id); let key = key_codec::trust_rank_key(&hex::encode(agent_id));
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!( debug!(
@ -172,7 +147,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
#[instrument(skip(self, trust_rank), fields(agent_id = %hex::encode(trust_rank.agent_id)))] #[instrument(skip(self, trust_rank), fields(agent_id = %hex::encode(trust_rank.agent_id)))]
async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> { async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> {
let serialized = Self::serialize_trust_rank(trust_rank)?; let serialized = Self::serialize_trust_rank(trust_rank)?;
let key = Self::trust_rank_key(&trust_rank.agent_id); let key = key_codec::trust_rank_key(&hex::encode(trust_rank.agent_id));
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!(score = trust_rank.score, "Stored TrustRank"); debug!(score = trust_rank.score, "Stored TrustRank");
Ok(()) Ok(())
@ -194,8 +169,11 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
use super::model::TrustAdjustment; use super::model::TrustAdjustment;
// Check if the agent has already verified this gold standard // Check if the agent has already verified this gold standard
let verified_key = let verified_key = key_codec::gs_verified_key(
Self::gs_verified_key(agent_id, &gold_standard.subject, &gold_standard.predicate); &hex::encode(agent_id),
&gold_standard.subject,
&gold_standard.predicate,
);
if self.store.get(&verified_key).await?.is_some() { if self.store.get(&verified_key).await?.is_some() {
debug!("Agent has already verified this gold standard"); debug!("Agent has already verified this gold standard");
return Ok(TrustAdjustment::AlreadyVerified); return Ok(TrustAdjustment::AlreadyVerified);
@ -224,7 +202,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
// Store the updated trust rank // Store the updated trust rank
let serialized = Self::serialize_trust_rank(&trust_rank)?; let serialized = Self::serialize_trust_rank(&trust_rank)?;
let key = Self::trust_rank_key(agent_id); let key = key_codec::trust_rank_key(&hex::encode(agent_id));
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
// Mark this gold standard as verified by this agent (value is just a timestamp) // Mark this gold standard as verified by this agent (value is just a timestamp)

View File

@ -7,9 +7,9 @@
//! //!
//! | Key Pattern | Value | Purpose | //! | Key Pattern | Value | Purpose |
//! |-------------|-------|---------| //! |-------------|-------|---------|
//! | `V:{assertion_hash}:{vote_hash}` | Serialized Vote | Individual votes | //! | `{subject}\x00V:{assertion_hex}:{vote_hex}` | Serialized Vote | Individual votes |
//! | `VC:{assertion_hash}` | u64 (LE) | Vote count cache | //! | `{subject}\x00VC:{assertion_hex}` | u64 (LE) | Vote count cache |
//! | `VW:{assertion_hash}` | f32 (LE) | Aggregate weight cache | //! | `{subject}\x00VW:{assertion_hex}` | f32 (LE) | Aggregate weight cache |
//! //!
//! # Design Philosophy //! # Design Philosophy
//! //!
@ -29,21 +29,20 @@ use crate::error::Result;
pub use store_impl::GenericVoteStore; pub use store_impl::GenericVoteStore;
/// Key prefix for individual votes.
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
const VOTE_PREFIX: &[u8] = b"V:";
/// Specialized storage trait for high-velocity vote operations. /// Specialized storage trait for high-velocity vote operations.
/// ///
/// This trait provides vote-specific operations on top of a generic KVStore, /// This trait provides vote-specific operations on top of a generic KVStore,
/// enabling efficient vote ingestion and aggregation for the Ballot Box pattern. /// enabling efficient vote ingestion and aggregation for the Ballot Box pattern.
/// ///
/// All methods require a `subject` parameter to co-locate vote data with the
/// assertion's subject for range sharding.
///
/// # Example /// # Example
/// ///
/// ```ignore /// ```ignore
/// let vote_store = SledVoteStore::new(kv_store); /// let vote_store = GenericVoteStore::new(kv_store);
/// let vote_hash = vote_store.put_vote(&vote).await?; /// let vote_hash = vote_store.put_vote(&vote, "Tesla").await?;
/// let votes = vote_store.get_votes_for_assertion(&assertion_hash).await?; /// let votes = vote_store.get_votes_for_assertion(&assertion_hash, "Tesla").await?;
/// ``` /// ```
#[async_trait] #[async_trait]
pub trait VoteStore: Send + Sync { pub trait VoteStore: Send + Sync {
@ -52,26 +51,32 @@ pub trait VoteStore: Send + Sync {
/// This operation: /// This operation:
/// 1. Serializes the vote using rkyv /// 1. Serializes the vote using rkyv
/// 2. Computes BLAKE3 hash for content addressing /// 2. Computes BLAKE3 hash for content addressing
/// 3. Stores at `V:{assertion_hash}:{vote_hash}` /// 3. Stores at `{subject}\x00V:{assertion_hex}:{vote_hex}`
/// 4. Updates vote count and aggregate weight caches /// 4. Updates vote count and aggregate weight caches
/// ///
/// # Returns /// # Returns
/// The BLAKE3 hash of the serialized vote (content address). /// The BLAKE3 hash of the serialized vote (content address).
async fn put_vote(&self, vote: &Vote) -> Result<Hash>; async fn put_vote(&self, vote: &Vote, subject: &str) -> Result<Hash>;
/// Get a specific vote by its hash. /// Get a specific vote by its hash.
/// ///
/// # Arguments /// # Arguments
/// * `assertion_hash` - The assertion this vote is for /// * `assertion_hash` - The assertion this vote is for
/// * `vote_hash` - The content-addressed hash of the vote /// * `vote_hash` - The content-addressed hash of the vote
/// * `subject` - The subject the assertion belongs to
/// ///
/// # Returns /// # Returns
/// The vote if found, None otherwise. /// The vote if found, None otherwise.
async fn get_vote(&self, assertion_hash: &Hash, vote_hash: &Hash) -> Result<Option<Vote>>; async fn get_vote(
&self,
assertion_hash: &Hash,
vote_hash: &Hash,
subject: &str,
) -> Result<Option<Vote>>;
/// Get all votes for a specific assertion. /// Get all votes for a specific assertion.
/// ///
/// Scans all keys with prefix `V:{assertion_hash}:` and deserializes. /// Scans all keys with prefix `{subject}\x00V:{assertion_hex}:` and deserializes.
/// ///
/// # Performance /// # Performance
/// O(n) where n is the number of votes. For high-cardinality assertions, /// O(n) where n is the number of votes. For high-cardinality assertions,
@ -79,37 +84,43 @@ pub trait VoteStore: Send + Sync {
/// ///
/// # Returns /// # Returns
/// Vector of votes, empty if no votes exist. /// Vector of votes, empty if no votes exist.
async fn get_votes_for_assertion(&self, assertion_hash: &Hash) -> Result<Vec<Vote>>; async fn get_votes_for_assertion(
&self,
assertion_hash: &Hash,
subject: &str,
) -> Result<Vec<Vote>>;
/// Get the number of votes for an assertion. /// Get the number of votes for an assertion.
/// ///
/// Uses cached counter at `VC:{assertion_hash}` for O(1) performance. /// Uses cached counter at `{subject}\x00VC:{assertion_hex}` for O(1) performance.
/// ///
/// # Returns /// # Returns
/// Vote count, 0 if no votes exist. /// Vote count, 0 if no votes exist.
async fn get_vote_count(&self, assertion_hash: &Hash) -> Result<u64>; async fn get_vote_count(&self, assertion_hash: &Hash, subject: &str) -> Result<u64>;
/// Get the aggregate weight (sum of all vote weights) for an assertion. /// Get the aggregate weight (sum of all vote weights) for an assertion.
/// ///
/// Uses cached value at `VW:{assertion_hash}` for O(1) performance. /// Uses cached value at `{subject}\x00VW:{assertion_hex}` for O(1) performance.
/// The weight is the sum of all `vote.weight` values. /// The weight is the sum of all `vote.weight` values.
/// ///
/// # Returns /// # Returns
/// Aggregate weight, 0.0 if no votes exist. /// Aggregate weight, 0.0 if no votes exist.
async fn get_aggregate_weight(&self, assertion_hash: &Hash) -> Result<f32>; async fn get_aggregate_weight(&self, assertion_hash: &Hash, subject: &str) -> Result<f32>;
/// Check if any votes exist for an assertion. /// Check if any votes exist for an assertion.
/// ///
/// More efficient than `get_vote_count() > 0` as it can short-circuit. /// More efficient than `get_vote_count() > 0` as it can short-circuit.
async fn has_votes(&self, assertion_hash: &Hash) -> Result<bool>; async fn has_votes(&self, assertion_hash: &Hash, subject: &str) -> Result<bool>;
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::SledStore; use crate::HybridStore;
use std::sync::Arc; use std::sync::Arc;
const TEST_SUBJECT: &str = "TestSubject";
fn create_test_vote(assertion_hash: Hash, weight: f32) -> Vote { fn create_test_vote(assertion_hash: Hash, weight: f32) -> Vote {
Vote { Vote {
assertion_hash, assertion_hash,
@ -124,18 +135,20 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_put_and_get_vote() { async fn test_put_and_get_vote() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [0u8; 32]; let assertion_hash = [0u8; 32];
let vote = create_test_vote(assertion_hash, 0.8); let vote = create_test_vote(assertion_hash, 0.8);
// Put vote // Put vote
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote"); let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
// Get vote back // Get vote back
let retrieved = let retrieved = vote_store
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote"); .get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
.await
.expect("Failed to get vote");
assert!(retrieved.is_some()); assert!(retrieved.is_some());
let retrieved_vote = retrieved.expect("Vote should exist"); let retrieved_vote = retrieved.expect("Vote should exist");
@ -145,7 +158,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_get_votes_for_assertion() { async fn test_get_votes_for_assertion() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [1u8; 32]; let assertion_hash = [1u8; 32];
@ -159,51 +172,55 @@ mod tests {
}; };
let vote_other = create_test_vote(other_assertion, 0.9); let vote_other = create_test_vote(other_assertion, 0.9);
vote_store.put_vote(&vote1).await.expect("put"); vote_store.put_vote(&vote1, TEST_SUBJECT).await.expect("put");
vote_store.put_vote(&vote2).await.expect("put"); vote_store.put_vote(&vote2, TEST_SUBJECT).await.expect("put");
vote_store.put_vote(&vote_other).await.expect("put"); vote_store.put_vote(&vote_other, TEST_SUBJECT).await.expect("put");
// Get votes for assertion // Get votes for assertion
let votes = vote_store.get_votes_for_assertion(&assertion_hash).await.expect("get"); let votes =
vote_store.get_votes_for_assertion(&assertion_hash, TEST_SUBJECT).await.expect("get");
assert_eq!(votes.len(), 2); assert_eq!(votes.len(), 2);
// Get votes for other assertion // Get votes for other assertion
let other_votes = vote_store.get_votes_for_assertion(&other_assertion).await.expect("get"); let other_votes =
vote_store.get_votes_for_assertion(&other_assertion, TEST_SUBJECT).await.expect("get");
assert_eq!(other_votes.len(), 1); assert_eq!(other_votes.len(), 1);
} }
#[tokio::test] #[tokio::test]
async fn test_vote_count_cache() { async fn test_vote_count_cache() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [3u8; 32]; let assertion_hash = [3u8; 32];
// Initially zero // Initially zero
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count"); let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
assert_eq!(count, 0); assert_eq!(count, 0);
// Add votes and check count increments // Add votes and check count increments
for i in 0..5 { for i in 0..5 {
let vote = Vote { agent_id: [i; 32], ..create_test_vote(assertion_hash, 0.5) }; let vote = Vote { agent_id: [i; 32], ..create_test_vote(assertion_hash, 0.5) };
vote_store.put_vote(&vote).await.expect("put"); vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count"); let count =
vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
assert_eq!(count, (i as u64) + 1); assert_eq!(count, (i as u64) + 1);
} }
} }
#[tokio::test] #[tokio::test]
async fn test_aggregate_weight_cache() { async fn test_aggregate_weight_cache() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [4u8; 32]; let assertion_hash = [4u8; 32];
// Initially zero // Initially zero
let weight = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight"); let weight =
vote_store.get_aggregate_weight(&assertion_hash, TEST_SUBJECT).await.expect("weight");
assert!((weight - 0.0).abs() < f32::EPSILON); assert!((weight - 0.0).abs() < f32::EPSILON);
// Add votes with known weights // Add votes with known weights
@ -213,10 +230,13 @@ mod tests {
for (i, &w) in weights.iter().enumerate() { for (i, &w) in weights.iter().enumerate() {
let vote = let vote =
Vote { agent_id: [i as u8; 32], weight: w, ..create_test_vote(assertion_hash, w) }; Vote { agent_id: [i as u8; 32], weight: w, ..create_test_vote(assertion_hash, w) };
vote_store.put_vote(&vote).await.expect("put"); vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
expected_total += w; expected_total += w;
let actual = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight"); let actual = vote_store
.get_aggregate_weight(&assertion_hash, TEST_SUBJECT)
.await
.expect("weight");
// Float comparison with tolerance // Float comparison with tolerance
assert!( assert!(
@ -230,57 +250,58 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_has_votes() { async fn test_has_votes() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [5u8; 32]; let assertion_hash = [5u8; 32];
// No votes initially // No votes initially
assert!(!vote_store.has_votes(&assertion_hash).await.expect("has")); assert!(!vote_store.has_votes(&assertion_hash, TEST_SUBJECT).await.expect("has"));
// Add a vote // Add a vote
let vote = create_test_vote(assertion_hash, 0.5); let vote = create_test_vote(assertion_hash, 0.5);
vote_store.put_vote(&vote).await.expect("put"); vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
// Now has votes // Now has votes
assert!(vote_store.has_votes(&assertion_hash).await.expect("has")); assert!(vote_store.has_votes(&assertion_hash, TEST_SUBJECT).await.expect("has"));
} }
#[tokio::test] #[tokio::test]
async fn test_empty_assertion_returns_empty_vec() { async fn test_empty_assertion_returns_empty_vec() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let nonexistent = [99u8; 32]; let nonexistent = [99u8; 32];
let votes = vote_store.get_votes_for_assertion(&nonexistent).await.expect("get"); let votes =
vote_store.get_votes_for_assertion(&nonexistent, TEST_SUBJECT).await.expect("get");
assert!(votes.is_empty(), "Should return empty vec, not error"); assert!(votes.is_empty(), "Should return empty vec, not error");
} }
#[tokio::test] #[tokio::test]
async fn test_content_addressing() { async fn test_content_addressing() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [6u8; 32]; let assertion_hash = [6u8; 32];
let vote = create_test_vote(assertion_hash, 0.5); let vote = create_test_vote(assertion_hash, 0.5);
// Same vote should produce same hash // Same vote should produce same hash
let hash1 = vote_store.put_vote(&vote).await.expect("put"); let hash1 = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
let hash2 = vote_store.put_vote(&vote).await.expect("put"); let hash2 = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
assert_eq!(hash1, hash2, "Same vote should produce same hash"); assert_eq!(hash1, hash2, "Same vote should produce same hash");
// Count should still increment (idempotent storage but not idempotent counting) // Count should still increment (idempotent storage but not idempotent counting)
// This is by design - duplicate vote detection is a higher-level concern // This is by design - duplicate vote detection is a higher-level concern
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count"); let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
assert_eq!(count, 2); assert_eq!(count, 2);
} }
#[tokio::test] #[tokio::test]
async fn test_high_velocity_simulation() { async fn test_high_velocity_simulation() {
// Simulate many agents voting on the same assertion // Simulate many agents voting on the same assertion
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let assertion_hash = [7u8; 32]; let assertion_hash = [7u8; 32];
@ -302,15 +323,16 @@ mod tests {
source_url: None, source_url: None,
observed_context: None, observed_context: None,
}; };
vote_store.put_vote(&vote).await.expect("put"); vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
} }
// Verify counts // Verify counts
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count"); let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
assert_eq!(count, num_votes); assert_eq!(count, num_votes);
// Verify we can retrieve all votes // Verify we can retrieve all votes
let votes = vote_store.get_votes_for_assertion(&assertion_hash).await.expect("get"); let votes =
vote_store.get_votes_for_assertion(&assertion_hash, TEST_SUBJECT).await.expect("get");
assert_eq!(votes.len(), num_votes as usize); assert_eq!(votes.len(), num_votes as usize);
} }
@ -324,7 +346,7 @@ mod tests {
async fn test_concurrent_vote_ingestion() { async fn test_concurrent_vote_ingestion() {
use tokio::task::JoinSet; use tokio::task::JoinSet;
let store = Arc::new(SledStore::open_temp().expect("Failed to create store")); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = Arc::new(GenericVoteStore::new(store)); let vote_store = Arc::new(GenericVoteStore::new(store));
let assertion_hash = [8u8; 32]; let assertion_hash = [8u8; 32];
@ -352,7 +374,7 @@ mod tests {
source_url: None, source_url: None,
observed_context: None, observed_context: None,
}; };
vs.put_vote(&vote).await.expect("concurrent put should succeed"); vs.put_vote(&vote, TEST_SUBJECT).await.expect("concurrent put should succeed");
} }
}); });
} }
@ -364,7 +386,8 @@ mod tests {
// Verify final vote count is exactly num_concurrent_tasks * votes_per_task // Verify final vote count is exactly num_concurrent_tasks * votes_per_task
let expected_count = (num_concurrent_tasks * votes_per_task) as u64; let expected_count = (num_concurrent_tasks * votes_per_task) as u64;
let actual_count = vote_store.get_vote_count(&assertion_hash).await.expect("count"); let actual_count =
vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
assert_eq!( assert_eq!(
actual_count, expected_count, actual_count, expected_count,
"Vote count should be {} (got {}). Race condition detected!", "Vote count should be {} (got {}). Race condition detected!",
@ -374,7 +397,8 @@ mod tests {
// Verify aggregate weight is approximately correct // Verify aggregate weight is approximately correct
// (some float imprecision is expected with concurrent additions) // (some float imprecision is expected with concurrent additions)
let expected_weight = (num_concurrent_tasks * votes_per_task) as f32 * vote_weight; let expected_weight = (num_concurrent_tasks * votes_per_task) as f32 * vote_weight;
let actual_weight = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight"); let actual_weight =
vote_store.get_aggregate_weight(&assertion_hash, TEST_SUBJECT).await.expect("weight");
let tolerance = 0.01 * expected_weight; // 1% tolerance for float accumulation let tolerance = 0.01 * expected_weight; // 1% tolerance for float accumulation
assert!( assert!(
(actual_weight - expected_weight).abs() < tolerance, (actual_weight - expected_weight).abs() < tolerance,
@ -386,7 +410,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_vote_with_provenance_fields() { async fn test_vote_with_provenance_fields() {
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [10u8; 32]; let assertion_hash = [10u8; 32];
@ -401,11 +425,13 @@ mod tests {
}; };
// Put vote with provenance // Put vote with provenance
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote"); let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
// Get vote back and verify provenance fields // Get vote back and verify provenance fields
let retrieved = let retrieved = vote_store
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote"); .get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
.await
.expect("Failed to get vote");
assert!(retrieved.is_some()); assert!(retrieved.is_some());
let retrieved_vote = retrieved.expect("Vote should exist"); let retrieved_vote = retrieved.expect("Vote should exist");
@ -417,7 +443,7 @@ mod tests {
#[tokio::test] #[tokio::test]
async fn test_vote_backward_compatibility() { async fn test_vote_backward_compatibility() {
// Test that votes without provenance fields (None) work correctly // Test that votes without provenance fields (None) work correctly
let store = SledStore::open_temp().expect("Failed to create store"); let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store); let vote_store = GenericVoteStore::new(store);
let assertion_hash = [11u8; 32]; let assertion_hash = [11u8; 32];
@ -428,9 +454,11 @@ mod tests {
assert_eq!(vote.observed_context, None); assert_eq!(vote.observed_context, None);
// Put and retrieve // Put and retrieve
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote"); let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
let retrieved = let retrieved = vote_store
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote"); .get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
.await
.expect("Failed to get vote");
assert!(retrieved.is_some()); assert!(retrieved.is_some());
let retrieved_vote = retrieved.expect("Vote should exist"); let retrieved_vote = retrieved.expect("Vote should exist");
@ -438,4 +466,24 @@ mod tests {
assert_eq!(retrieved_vote.observed_context, None); assert_eq!(retrieved_vote.observed_context, None);
assert_eq!(retrieved_vote.weight, 0.7); assert_eq!(retrieved_vote.weight, 0.7);
} }
#[tokio::test]
async fn test_votes_isolated_by_subject() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let vote_store = GenericVoteStore::new(store);
let assertion_hash = [12u8; 32];
let vote = create_test_vote(assertion_hash, 0.5);
// Store vote under "Tesla"
vote_store.put_vote(&vote, "Tesla").await.expect("put");
// Should NOT be visible under "Apple"
let count = vote_store.get_vote_count(&assertion_hash, "Apple").await.expect("count");
assert_eq!(count, 0, "Votes should be isolated by subject");
// Should be visible under "Tesla"
let count = vote_store.get_vote_count(&assertion_hash, "Tesla").await.expect("count");
assert_eq!(count, 1);
}
} }

View File

@ -1,6 +1,7 @@
//! GenericVoteStore implementation backed by a generic KVStore. //! GenericVoteStore implementation backed by a generic KVStore.
use crate::error::{Result, StorageError}; use crate::error::{Result, StorageError};
use crate::key_codec;
use crate::traits::KVStore; use crate::traits::KVStore;
use async_trait::async_trait; use async_trait::async_trait;
use stemedb_core::types::{Hash, Vote}; use stemedb_core::types::{Hash, Vote};
@ -8,11 +9,6 @@ use tracing::{debug, instrument};
use super::VoteStore; use super::VoteStore;
/// Key prefix for vote count cache.
const VOTE_COUNT_PREFIX: &[u8] = b"VC:";
/// Key prefix for aggregate weight cache.
const VOTE_WEIGHT_PREFIX: &[u8] = b"VW:";
/// VoteStore implementation backed by a generic KVStore. /// VoteStore implementation backed by a generic KVStore.
/// ///
/// This implementation maintains caches for vote counts and aggregate weights /// This implementation maintains caches for vote counts and aggregate weights
@ -27,33 +23,6 @@ impl<S: KVStore> GenericVoteStore<S> {
Self { store } Self { store }
} }
/// Construct the key for an individual vote.
fn vote_key(assertion_hash: &Hash, vote_hash: &Hash) -> Vec<u8> {
let assertion_hex = hex::encode(assertion_hash);
let vote_hex = hex::encode(vote_hash);
format!("V:{}:{}", assertion_hex, vote_hex).into_bytes()
}
/// Construct the prefix for scanning all votes on an assertion.
fn vote_scan_prefix(assertion_hash: &Hash) -> Vec<u8> {
let assertion_hex = hex::encode(assertion_hash);
format!("V:{}:", assertion_hex).into_bytes()
}
/// Construct the key for the vote count cache.
fn vote_count_key(assertion_hash: &Hash) -> Vec<u8> {
let mut key = VOTE_COUNT_PREFIX.to_vec();
key.extend_from_slice(assertion_hash);
key
}
/// Construct the key for the aggregate weight cache.
fn vote_weight_key(assertion_hash: &Hash) -> Vec<u8> {
let mut key = VOTE_WEIGHT_PREFIX.to_vec();
key.extend_from_slice(assertion_hash);
key
}
/// Serialize a vote using the canonical serde helpers. /// Serialize a vote using the canonical serde helpers.
fn serialize_vote(vote: &Vote) -> Result<Vec<u8>> { fn serialize_vote(vote: &Vote) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(vote) crate::serde_helpers::serialize(vote)
@ -67,8 +36,8 @@ impl<S: KVStore> GenericVoteStore<S> {
#[async_trait] #[async_trait]
impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> { impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
#[instrument(skip(self, vote), fields(assertion_hash = %hex::encode(vote.assertion_hash), weight = vote.weight))] #[instrument(skip(self, vote), fields(assertion_hash = %hex::encode(vote.assertion_hash), weight = vote.weight, subject))]
async fn put_vote(&self, vote: &Vote) -> Result<Hash> { async fn put_vote(&self, vote: &Vote, subject: &str) -> Result<Hash> {
// Serialize the vote // Serialize the vote
let serialized = Self::serialize_vote(vote)?; let serialized = Self::serialize_vote(vote)?;
@ -76,21 +45,23 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
let vote_hash_bytes = blake3::hash(&serialized); let vote_hash_bytes = blake3::hash(&serialized);
let vote_hash: Hash = *vote_hash_bytes.as_bytes(); let vote_hash: Hash = *vote_hash_bytes.as_bytes();
// Store the vote // Store the vote using subject-prefixed key
let key = Self::vote_key(&vote.assertion_hash, &vote_hash); let assertion_hex = hex::encode(vote.assertion_hash);
let vote_hex = hex::encode(vote_hash);
let key = key_codec::vote_key(subject, &assertion_hex, &vote_hex);
self.store.put(&key, &serialized).await?; self.store.put(&key, &serialized).await?;
debug!( debug!(
vote_hash = %hex::encode(vote_hash), vote_hash = %vote_hex,
"Stored vote" "Stored vote"
); );
// Update vote count cache (atomic increment - prevents race conditions) // Update vote count cache (atomic increment - prevents race conditions)
let count_key = Self::vote_count_key(&vote.assertion_hash); let count_key = key_codec::vote_count_key(subject, &assertion_hex);
let new_count = self.store.fetch_and_add_u64(&count_key, 1).await?; let new_count = self.store.fetch_and_add_u64(&count_key, 1).await?;
// Update aggregate weight cache (atomic CAS - prevents race conditions) // Update aggregate weight cache (atomic CAS - prevents race conditions)
let weight_key = Self::vote_weight_key(&vote.assertion_hash); let weight_key = key_codec::vote_weight_key(subject, &assertion_hex);
let vote_weight = vote.weight; let vote_weight = vote.weight;
let new_weight = self let new_weight = self
.store .store
@ -102,9 +73,16 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
Ok(vote_hash) Ok(vote_hash)
} }
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), vote_hash = %hex::encode(vote_hash)))] #[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), vote_hash = %hex::encode(vote_hash), subject))]
async fn get_vote(&self, assertion_hash: &Hash, vote_hash: &Hash) -> Result<Option<Vote>> { async fn get_vote(
let key = Self::vote_key(assertion_hash, vote_hash); &self,
assertion_hash: &Hash,
vote_hash: &Hash,
subject: &str,
) -> Result<Option<Vote>> {
let assertion_hex = hex::encode(assertion_hash);
let vote_hex = hex::encode(vote_hash);
let key = key_codec::vote_key(subject, &assertion_hex, &vote_hex);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(data) => { Some(data) => {
let vote = Self::deserialize_vote(&data)?; let vote = Self::deserialize_vote(&data)?;
@ -114,9 +92,14 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
} }
} }
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))] #[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
async fn get_votes_for_assertion(&self, assertion_hash: &Hash) -> Result<Vec<Vote>> { async fn get_votes_for_assertion(
let prefix = Self::vote_scan_prefix(assertion_hash); &self,
assertion_hash: &Hash,
subject: &str,
) -> Result<Vec<Vote>> {
let assertion_hex = hex::encode(assertion_hash);
let prefix = key_codec::vote_scan_prefix(subject, &assertion_hex);
let entries = self.store.scan_prefix(&prefix).await?; let entries = self.store.scan_prefix(&prefix).await?;
let mut votes = Vec::with_capacity(entries.len()); let mut votes = Vec::with_capacity(entries.len());
@ -129,9 +112,10 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
Ok(votes) Ok(votes)
} }
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))] #[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
async fn get_vote_count(&self, assertion_hash: &Hash) -> Result<u64> { async fn get_vote_count(&self, assertion_hash: &Hash, subject: &str) -> Result<u64> {
let key = Self::vote_count_key(assertion_hash); let assertion_hex = hex::encode(assertion_hash);
let key = key_codec::vote_count_key(subject, &assertion_hex);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(bytes) if bytes.len() == 8 => { Some(bytes) if bytes.len() == 8 => {
let arr: [u8; 8] = bytes.try_into().map_err(|_| { let arr: [u8; 8] = bytes.try_into().map_err(|_| {
@ -143,9 +127,10 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
} }
} }
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))] #[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
async fn get_aggregate_weight(&self, assertion_hash: &Hash) -> Result<f32> { async fn get_aggregate_weight(&self, assertion_hash: &Hash, subject: &str) -> Result<f32> {
let key = Self::vote_weight_key(assertion_hash); let assertion_hex = hex::encode(assertion_hash);
let key = key_codec::vote_weight_key(subject, &assertion_hex);
match self.store.get(&key).await? { match self.store.get(&key).await? {
Some(bytes) if bytes.len() == 4 => { Some(bytes) if bytes.len() == 4 => {
let arr: [u8; 4] = bytes let arr: [u8; 4] = bytes
@ -157,9 +142,9 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
} }
} }
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))] #[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
async fn has_votes(&self, assertion_hash: &Hash) -> Result<bool> { async fn has_votes(&self, assertion_hash: &Hash, subject: &str) -> Result<bool> {
let count = self.get_vote_count(assertion_hash).await?; let count = self.get_vote_count(assertion_hash, subject).await?;
Ok(count > 0) Ok(count > 0)
} }
} }

View File

@ -14,6 +14,12 @@ thiserror = "1.0"
tracing = "0.1" tracing = "0.1"
byteorder = "1.5" byteorder = "1.5"
blake3 = "1.5" blake3 = "1.5"
crc32c = "0.6"
tokio = { version = "1", features = ["sync", "time", "rt"], optional = true }
[features]
group-commit = ["tokio"]
[dev-dependencies] [dev-dependencies]
tempfile = "3.10" tempfile = "3.10"
tokio = { version = "1", features = ["sync", "time", "rt", "macros"] }

View File

@ -33,6 +33,37 @@ pub enum QuarantineError {
path: PathBuf, path: PathBuf,
}, },
/// CRC32C checksum mismatch (fast integrity check, detects torn writes).
#[error(
"CRC32C mismatch at offset {offset}: expected {expected:#010x}, actual {actual:#010x}"
)]
Crc32cMismatch {
/// File offset where the corrupt record starts.
offset: u64,
/// Expected CRC32C value from the record header.
expected: u32,
/// Actual CRC32C computed from the data.
actual: u32,
},
/// Record length field is invalid (zero or exceeds MAX_RECORD_SIZE).
#[error("Invalid record length at offset {offset}: {length} bytes")]
InvalidRecordLength {
/// File offset where the record starts.
offset: u64,
/// The invalid length value read.
length: u32,
},
/// Generic record corruption with a descriptive reason.
#[error("Corrupt record at offset {offset}: {reason}")]
CorruptRecord {
/// File offset where corruption was detected.
offset: u64,
/// Human-readable description of the corruption.
reason: String,
},
/// Generic IO error. /// Generic IO error.
#[error(transparent)] #[error(transparent)]
IoGeneric(#[from] io::Error), IoGeneric(#[from] io::Error),

View File

@ -6,12 +6,16 @@ use std::io::{Read, Write};
pub const MAGIC: &[u8; 4] = b"STEM"; pub const MAGIC: &[u8; 4] = b"STEM";
/// Current file format version. /// Current file format version.
pub const VERSION: u8 = 1; pub const VERSION: u8 = 2;
/// Size of the file header in bytes. /// Size of the file header in bytes.
/// Magic (4) + Version (1) + Reserved (3) /// Magic (4) + Version (1) + Reserved (3)
pub const HEADER_SIZE: usize = 8; pub const HEADER_SIZE: usize = 8;
/// Per-record overhead in bytes (v2 format).
/// payload_len (4) + crc32c (4) + blake3 (32) = 40
pub const RECORD_OVERHEAD: usize = 40;
/// Maximum record size (100 MB). /// Maximum record size (100 MB).
pub const MAX_RECORD_SIZE: usize = 100 * 1024 * 1024; pub const MAX_RECORD_SIZE: usize = 100 * 1024 * 1024;
@ -61,10 +65,13 @@ impl FileHeader {
let version = reader.read_u8().map_err(QuarantineError::IoGeneric)?; let version = reader.read_u8().map_err(QuarantineError::IoGeneric)?;
if version != VERSION { if version != VERSION {
return Err(QuarantineError::IoGeneric(std::io::Error::new( return Err(QuarantineError::CorruptRecord {
std::io::ErrorKind::InvalidData, offset: 0,
format!("Unsupported version: {}", version), reason: format!(
))); "Unsupported WAL version {} (expected {}). Delete the WAL and re-ingest.",
version, VERSION
),
});
} }
// Skip reserved bytes // Skip reserved bytes
@ -75,69 +82,102 @@ impl FileHeader {
} }
} }
/// Compute CRC32C over the concatenation of len_bytes, blake3, and payload.
///
/// The CRC covers everything except itself, providing fast integrity checking
/// that detects torn writes before the more expensive BLAKE3 verification.
pub fn compute_crc32c(len_bytes: &[u8; 4], blake3: &[u8; 32], payload: &[u8]) -> u32 {
let crc = crc32c::crc32c_append(0, len_bytes);
let crc = crc32c::crc32c_append(crc, blake3);
crc32c::crc32c_append(crc, payload)
}
/// A single log record in the WAL. /// A single log record in the WAL.
/// ///
/// Format: /// v2 Format: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
/// - Checksum (32 bytes, BLAKE3) ///
/// - Payload Length (4 bytes, u32 LE) /// - Length first: recovery scanner knows read size before touching checksums
/// - Payload (N bytes) /// - CRC32C second: fast integrity check, rejects torn writes
/// - BLAKE3 before payload: content-addressing hash in fixed 40-byte header
#[derive(Debug, Clone, PartialEq, Eq)] #[derive(Debug, Clone, PartialEq, Eq)]
pub struct Record { pub struct Record {
/// BLAKE3 checksum of the payload. /// BLAKE3 checksum of the payload.
pub checksum: [u8; 32], pub checksum: [u8; 32],
/// CRC32C integrity check covering len + blake3 + payload.
pub crc: u32,
/// The actual data payload. /// The actual data payload.
pub payload: Vec<u8>, pub payload: Vec<u8>,
} }
impl Record { impl Record {
/// Create a new record from a payload, calculating the checksum. /// Create a new record from a payload, calculating both checksums.
pub fn new(payload: Vec<u8>) -> Self { pub fn new(payload: Vec<u8>) -> Self {
let checksum = blake3::hash(&payload).into(); let checksum: [u8; 32] = blake3::hash(&payload).into();
Self { checksum, payload } let len_bytes = (payload.len() as u32).to_le_bytes();
let crc = compute_crc32c(&len_bytes, &checksum, &payload);
Self { checksum, crc, payload }
} }
/// Calculate the on-disk size of this record. /// Calculate the on-disk size of this record.
pub fn disk_size(&self) -> u64 { pub fn disk_size(&self) -> u64 {
(32 + 4 + self.payload.len()) as u64 (RECORD_OVERHEAD + self.payload.len()) as u64
} }
/// Write the record to a writer. /// Write the record to a writer in v2 format.
///
/// Layout: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
pub fn write_to<W: Write>(&self, writer: &mut W) -> Result<()> { pub fn write_to<W: Write>(&self, writer: &mut W) -> Result<()> {
writer.write_all(&self.checksum).map_err(QuarantineError::IoGeneric)?;
writer writer
.write_u32::<LittleEndian>(self.payload.len() as u32) .write_u32::<LittleEndian>(self.payload.len() as u32)
.map_err(QuarantineError::IoGeneric)?; .map_err(QuarantineError::IoGeneric)?;
writer.write_u32::<LittleEndian>(self.crc).map_err(QuarantineError::IoGeneric)?;
writer.write_all(&self.checksum).map_err(QuarantineError::IoGeneric)?;
writer.write_all(&self.payload).map_err(QuarantineError::IoGeneric)?; writer.write_all(&self.payload).map_err(QuarantineError::IoGeneric)?;
Ok(()) Ok(())
} }
/// Read a record from a reader and verify its checksum. /// Read a record from a reader in v2 format and verify both checksums.
///
/// CRC32C is checked first (fast reject for torn writes), then BLAKE3.
pub fn read_from<R: Read>(reader: &mut R) -> Result<Self> { pub fn read_from<R: Read>(reader: &mut R) -> Result<Self> {
let mut checksum = [0u8; 32];
reader.read_exact(&mut checksum).map_err(QuarantineError::IoGeneric)?;
let len = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?; let len = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?;
if len as usize > MAX_RECORD_SIZE { if len == 0 || len as usize > MAX_RECORD_SIZE {
return Err(QuarantineError::IoGeneric(std::io::Error::new( return Err(QuarantineError::IoGeneric(std::io::Error::new(
std::io::ErrorKind::InvalidData, std::io::ErrorKind::InvalidData,
format!("Record too large: {} bytes", len), format!("Invalid record length: {} bytes", len),
))); )));
} }
let stored_crc = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?;
let mut checksum = [0u8; 32];
reader.read_exact(&mut checksum).map_err(QuarantineError::IoGeneric)?;
let mut payload = vec![0u8; len as usize]; let mut payload = vec![0u8; len as usize];
reader.read_exact(&mut payload).map_err(QuarantineError::IoGeneric)?; reader.read_exact(&mut payload).map_err(QuarantineError::IoGeneric)?;
// Verify checksum // Verify CRC32C first (fast reject for torn writes)
let len_bytes = len.to_le_bytes();
let computed_crc = compute_crc32c(&len_bytes, &checksum, &payload);
if stored_crc != computed_crc {
return Err(QuarantineError::Crc32cMismatch {
offset: 0, // caller should adjust
expected: stored_crc,
actual: computed_crc,
});
}
// Verify BLAKE3 (content-addressing integrity)
let calculated: [u8; 32] = blake3::hash(&payload).into(); let calculated: [u8; 32] = blake3::hash(&payload).into();
if checksum != calculated { if checksum != calculated {
return Err(QuarantineError::IoGeneric(std::io::Error::new( return Err(QuarantineError::IoGeneric(std::io::Error::new(
std::io::ErrorKind::InvalidData, std::io::ErrorKind::InvalidData,
"Checksum mismatch", "BLAKE3 checksum mismatch",
))); )));
} }
Ok(Self { checksum, payload }) Ok(Self { checksum, crc: stored_crc, payload })
} }
} }
@ -151,46 +191,166 @@ mod tests {
let header = FileHeader::new(); let header = FileHeader::new();
let mut buffer = Vec::new(); let mut buffer = Vec::new();
header.write_to(&mut buffer).unwrap(); header.write_to(&mut buffer).expect("write header");
assert_eq!(buffer.len(), HEADER_SIZE); assert_eq!(buffer.len(), HEADER_SIZE);
let mut reader = Cursor::new(buffer); let mut reader = Cursor::new(buffer);
let read_header = FileHeader::read_from(&mut reader).unwrap(); let read_header = FileHeader::read_from(&mut reader).expect("read header");
assert_eq!(header, read_header); assert_eq!(header, read_header);
} }
#[test] #[test]
fn test_record_roundtrip() { fn test_record_v2_roundtrip() {
let payload = b"test payload data".to_vec(); let payload = b"test payload data".to_vec();
let record = Record::new(payload.clone()); let record = Record::new(payload.clone());
let mut buffer = Vec::new(); let mut buffer = Vec::new();
record.write_to(&mut buffer).unwrap(); record.write_to(&mut buffer).expect("write record");
assert_eq!(buffer.len() as u64, record.disk_size()); assert_eq!(buffer.len() as u64, record.disk_size());
assert_eq!(buffer.len(), RECORD_OVERHEAD + payload.len());
let mut reader = Cursor::new(buffer); let mut reader = Cursor::new(buffer);
let read_record = Record::read_from(&mut reader).unwrap(); let read_record = Record::read_from(&mut reader).expect("read record");
assert_eq!(record, read_record); assert_eq!(record, read_record);
assert_eq!(read_record.payload, payload); assert_eq!(read_record.payload, payload);
} }
#[test] #[test]
fn test_record_checksum_validation() { fn test_crc32c_detects_payload_corruption() {
let payload = b"test data".to_vec(); let payload = b"test data".to_vec();
let record = Record::new(payload); let record = Record::new(payload);
let mut buffer = Vec::new(); let mut buffer = Vec::new();
record.write_to(&mut buffer).unwrap(); record.write_to(&mut buffer).expect("write record");
// Corrupt the payload in the buffer // Corrupt a byte in the payload region (after the 40-byte header)
let len = buffer.len(); let last = buffer.len() - 1;
buffer[len - 1] ^= 0xFF; // Flip bits in the last byte buffer[last] ^= 0xFF;
let mut reader = Cursor::new(buffer); let mut reader = Cursor::new(buffer);
let result = Record::read_from(&mut reader); let result = Record::read_from(&mut reader);
assert!(result.is_err()); assert!(result.is_err());
assert_eq!(result.unwrap_err().to_string(), "Checksum mismatch"); let err = result.unwrap_err();
assert!(
matches!(err, QuarantineError::Crc32cMismatch { .. }),
"Expected Crc32cMismatch, got: {}",
err
);
}
#[test]
fn test_crc32c_detects_length_corruption() {
let payload = b"test data".to_vec();
let record = Record::new(payload);
let mut buffer = Vec::new();
record.write_to(&mut buffer).expect("write record");
// Corrupt the length field (first 4 bytes) - change to a valid but wrong length
// Set length to payload.len() + 1 (still within bounds)
let corrupted_len = (record.payload.len() as u32 + 1).to_le_bytes();
buffer[0] = corrupted_len[0];
buffer[1] = corrupted_len[1];
buffer[2] = corrupted_len[2];
buffer[3] = corrupted_len[3];
let mut reader = Cursor::new(buffer);
let result = Record::read_from(&mut reader);
// Should fail - either EOF because length is too long, or CRC mismatch
assert!(result.is_err());
}
#[test]
fn test_blake3_still_verified() {
// Create a record, then manually construct a buffer with correct CRC
// but wrong BLAKE3 to verify the second check works
let payload = b"original data".to_vec();
let record = Record::new(payload);
let mut buffer = Vec::new();
record.write_to(&mut buffer).expect("write record");
// Tamper with blake3 hash bytes (bytes 8..40) AND the CRC to match
// This is contrived but tests that BLAKE3 is independently verified
let bad_payload = b"tampered data".to_vec();
let bad_checksum: [u8; 32] = blake3::hash(b"wrong data").into();
let len_bytes = (bad_payload.len() as u32).to_le_bytes();
let new_crc = compute_crc32c(&len_bytes, &bad_checksum, &bad_payload);
let mut tampered = Vec::new();
tampered.extend_from_slice(&len_bytes);
tampered.extend_from_slice(&new_crc.to_le_bytes());
tampered.extend_from_slice(&bad_checksum);
tampered.extend_from_slice(&bad_payload);
let mut reader = Cursor::new(tampered);
let result = Record::read_from(&mut reader);
assert!(result.is_err());
let err_msg = result.unwrap_err().to_string();
assert!(err_msg.contains("BLAKE3"), "Expected BLAKE3 error, got: {}", err_msg);
}
#[test]
fn test_header_rejects_v1() {
let mut buffer = Vec::new();
// Write magic
buffer.extend_from_slice(MAGIC);
// Write version 1
buffer.push(1);
// Write reserved
buffer.extend_from_slice(&[0u8; 3]);
let mut reader = Cursor::new(buffer);
let result = FileHeader::read_from(&mut reader);
assert!(result.is_err());
let err_msg = result.unwrap_err().to_string();
assert!(
err_msg.contains("Unsupported WAL version 1"),
"Expected version error, got: {}",
err_msg
);
assert!(
err_msg.contains("Delete the WAL"),
"Expected remediation advice, got: {}",
err_msg
);
}
#[test]
fn test_record_disk_size() {
let payload = vec![0u8; 100];
let record = Record::new(payload);
assert_eq!(record.disk_size(), (RECORD_OVERHEAD + 100) as u64);
}
#[test]
fn test_record_empty_payload_rejected() {
// Empty payload (len=0) should be rejected on read
let mut buffer = Vec::new();
buffer.extend_from_slice(&0u32.to_le_bytes()); // len = 0
buffer.extend_from_slice(&0u32.to_le_bytes()); // crc
buffer.extend_from_slice(&[0u8; 32]); // blake3
let mut reader = Cursor::new(buffer);
let result = Record::read_from(&mut reader);
assert!(result.is_err());
}
#[test]
fn test_compute_crc32c_deterministic() {
let len_bytes = [10, 0, 0, 0u8];
let blake3 = [0xABu8; 32];
let payload = b"hello world!";
let crc1 = compute_crc32c(&len_bytes, &blake3, payload);
let crc2 = compute_crc32c(&len_bytes, &blake3, payload);
assert_eq!(crc1, crc2);
// Different payload -> different CRC
let crc3 = compute_crc32c(&len_bytes, &blake3, b"different");
assert_ne!(crc1, crc3);
} }
} }

View File

@ -0,0 +1,342 @@
//! Group commit buffer for batching fsync operations.
//!
//! The `GroupCommitBuffer` wraps a `Journal` and batches writes so that
//! multiple concurrent appenders share a single fsync. This dramatically
//! reduces fsync overhead under concurrent load.
//!
//! # Architecture
//!
//! Writers send payloads through an MPSC channel. A background flusher task
//! collects up to `max_writes` payloads (or waits up to `max_duration`),
//! writes them all to the Journal with `DurabilityLevel::Eventual`, calls
//! `force_sync()` once, then responds to all waiting writers.
//!
//! # Feature Gate
//!
//! This module is only available with the `group-commit` feature enabled,
//! which brings in the `tokio` dependency.
use crate::error::QuarantineError;
use crate::journal::Journal;
use std::time::{Duration, Instant};
use tokio::sync::{mpsc, oneshot};
use tracing::{debug, error, info, instrument, warn};
/// Type alias for a flush batch entry: response sender + write result.
type FlushEntry = (oneshot::Sender<Result<u64, QuarantineError>>, Result<u64, QuarantineError>);
/// Configuration for the group commit buffer.
#[derive(Debug, Clone)]
pub struct GroupCommitConfig {
/// Maximum number of writes to batch before flushing.
pub max_writes: usize,
/// Maximum time to wait before flushing a partial batch.
pub max_duration: Duration,
/// Channel capacity for pending write requests.
pub channel_capacity: usize,
}
impl Default for GroupCommitConfig {
fn default() -> Self {
Self { max_writes: 100, max_duration: Duration::from_millis(10), channel_capacity: 10_000 }
}
}
/// A write request sent through the channel.
struct WriteRequest {
payload: Vec<u8>,
response: oneshot::Sender<Result<u64, QuarantineError>>,
}
/// Group commit buffer that batches fsync operations.
///
/// Owns the Journal internally and provides an async `append()` API.
/// Concurrent writers are coalesced into batches that share a single fsync.
///
/// This struct is cheaply cloneable (just clones the channel sender).
#[derive(Clone)]
pub struct GroupCommitBuffer {
sender: mpsc::Sender<WriteRequest>,
}
impl GroupCommitBuffer {
/// Create a new group commit buffer wrapping the given journal.
///
/// Spawns a background flusher task on the current tokio runtime.
/// The journal is moved into the flusher and is not accessible externally.
#[instrument(skip(journal), fields(max_writes = config.max_writes, max_duration_ms = config.max_duration.as_millis() as u64))]
pub fn new(journal: Journal, config: GroupCommitConfig) -> Self {
let (sender, receiver) = mpsc::channel(config.channel_capacity);
tokio::spawn(Self::flusher_loop(journal, receiver, config));
info!("GroupCommitBuffer started");
Self { sender }
}
/// Append a payload to the journal via the group commit buffer.
///
/// Returns the WAL offset of the written record once the batch
/// containing this write has been fsynced.
pub async fn append(&self, payload: Vec<u8>) -> Result<u64, QuarantineError> {
let (response_tx, response_rx) = oneshot::channel();
let request = WriteRequest { payload, response: response_tx };
self.sender.send(request).await.map_err(|_| {
QuarantineError::IoGeneric(std::io::Error::other(
"GroupCommitBuffer flusher has shut down",
))
})?;
response_rx.await.map_err(|_| {
QuarantineError::IoGeneric(std::io::Error::other(
"GroupCommitBuffer flusher dropped response channel",
))
})?
}
/// Background flusher loop.
///
/// Collects writes into batches, writes them all, then fsyncs once.
async fn flusher_loop(
mut journal: Journal,
mut receiver: mpsc::Receiver<WriteRequest>,
config: GroupCommitConfig,
) {
let mut batch: Vec<WriteRequest> = Vec::with_capacity(config.max_writes);
loop {
// Wait for the first write request
let first = match receiver.recv().await {
Some(req) => req,
None => {
info!("GroupCommitBuffer channel closed, flusher exiting");
return;
}
};
batch.push(first);
// Collect more requests up to max_writes or max_duration
let deadline = tokio::time::Instant::now() + config.max_duration;
while batch.len() < config.max_writes {
match tokio::time::timeout_at(deadline, receiver.recv()).await {
Ok(Some(req)) => batch.push(req),
Ok(None) => {
// Channel closed, flush what we have and exit
Self::flush_batch(&mut journal, &mut batch);
info!("GroupCommitBuffer channel closed during batch, flusher exiting");
return;
}
Err(_) => break, // Timeout reached, flush
}
}
debug!(batch_size = batch.len(), "Flushing batch");
Self::flush_batch(&mut journal, &mut batch);
}
}
/// Write all requests in the batch, fsync once, respond to all waiters.
fn flush_batch(journal: &mut Journal, batch: &mut Vec<WriteRequest>) {
let mut results: Vec<FlushEntry> = Vec::with_capacity(batch.len());
let mut any_error = false;
for request in batch.drain(..) {
if any_error {
// If a previous write in this batch failed, fail all subsequent
let err = QuarantineError::IoGeneric(std::io::Error::other(
"Previous write in batch failed",
));
results.push((request.response, Err(err)));
continue;
}
match journal.append(request.payload) {
Ok(offset) => {
results.push((request.response, Ok(offset)));
}
Err(e) => {
error!(error = %e, "Write failed in group commit batch");
any_error = true;
results.push((request.response, Err(e)));
}
}
}
// Single fsync for the entire batch
if !any_error {
let fsync_start = Instant::now();
if let Err(e) = journal.force_sync() {
error!(error = %e, "Fsync failed in group commit batch");
// Convert all Ok results to errors since fsync failed
for (_, result) in &mut results {
if result.is_ok() {
*result = Err(QuarantineError::IoGeneric(std::io::Error::other(
"Batch fsync failed",
)));
}
}
} else {
let fsync_ms = fsync_start.elapsed().as_millis();
if fsync_ms > 500 {
warn!(fsync_ms, batch_size = results.len(), "Slow fsync detected");
}
}
}
// Send all responses
for (sender, result) in results {
// Ignore send errors - the receiver may have been dropped (timeout)
let _ = sender.send(result);
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::durability::DurabilityLevel;
use tempfile::tempdir;
fn create_test_journal() -> (tempfile::TempDir, Journal) {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
let journal = Journal::open(&wal_path)
.expect("open journal")
.with_durability(DurabilityLevel::Eventual);
(dir, journal)
}
#[tokio::test]
async fn test_single_write_through_buffer() {
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig::default();
let buffer = GroupCommitBuffer::new(journal, config);
let offset = buffer.append(b"hello world".to_vec()).await.expect("append");
assert_eq!(offset, 8); // HEADER_SIZE
}
#[tokio::test]
async fn test_batch_coalesces_fsync() {
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig {
max_writes: 50,
max_duration: Duration::from_millis(100),
channel_capacity: 1000,
};
let buffer = GroupCommitBuffer::new(journal, config);
// Launch 50 concurrent writes
let mut handles = Vec::new();
for i in 0..50 {
let buf = buffer.clone();
handles.push(tokio::spawn(async move {
buf.append(format!("record {}", i).into_bytes()).await
}));
}
let mut offsets = Vec::new();
for handle in handles {
let offset = handle.await.expect("join").expect("append");
offsets.push(offset);
}
// All offsets should be unique
offsets.sort();
offsets.dedup();
assert_eq!(offsets.len(), 50);
}
#[tokio::test]
async fn test_flush_on_timeout() {
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig {
max_writes: 1000, // High threshold - won't trigger
max_duration: Duration::from_millis(50),
channel_capacity: 100,
};
let buffer = GroupCommitBuffer::new(journal, config);
// Single write should flush after timeout
let offset = buffer.append(b"timeout test".to_vec()).await.expect("append");
assert_eq!(offset, 8);
}
#[tokio::test]
async fn test_flush_on_max_writes() {
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig {
max_writes: 5,
max_duration: Duration::from_secs(60), // Long timeout - won't trigger
channel_capacity: 100,
};
let buffer = GroupCommitBuffer::new(journal, config);
// Write exactly max_writes records
let mut handles = Vec::new();
for i in 0..5 {
let buf = buffer.clone();
handles.push(tokio::spawn(async move {
buf.append(format!("rec {}", i).into_bytes()).await
}));
}
for handle in handles {
handle.await.expect("join").expect("append");
}
}
#[tokio::test]
async fn test_concurrent_writers_unique_offsets() {
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig {
max_writes: 50,
max_duration: Duration::from_millis(20),
channel_capacity: 10_000,
};
let buffer = GroupCommitBuffer::new(journal, config);
// 10 tasks x 100 writes
let mut handles = Vec::new();
for task_id in 0..10 {
for write_id in 0..100 {
let buf = buffer.clone();
let payload = format!("task {} write {}", task_id, write_id).into_bytes();
handles.push(tokio::spawn(async move { buf.append(payload).await }));
}
}
let mut offsets = Vec::new();
for handle in handles {
let offset = handle.await.expect("join").expect("append");
offsets.push(offset);
}
// All 1000 offsets must be unique
offsets.sort();
let unique_count = offsets.len();
offsets.dedup();
assert_eq!(offsets.len(), unique_count, "All offsets should be unique");
assert_eq!(offsets.len(), 1000);
}
#[tokio::test]
async fn test_error_propagation_to_waiters() {
// Dropping the buffer (and thus the sender) should cause pending
// appends to fail
let (_dir, journal) = create_test_journal();
let config = GroupCommitConfig::default();
let buffer = GroupCommitBuffer::new(journal, config);
// First write should succeed
buffer.append(b"ok".to_vec()).await.expect("first append");
// Drop the buffer to close the channel
drop(buffer);
// Can't send more since we dropped it - this is correct behavior
}
}

View File

@ -1,33 +1,47 @@
use crate::durability::{DurabilityLevel, FsyncGuard}; use crate::durability::{DurabilityLevel, FsyncGuard};
use crate::error::{QuarantineError, Result}; use crate::error::{QuarantineError, Result};
use crate::format::{FileHeader, Record, HEADER_SIZE}; use crate::format::{FileHeader, Record, HEADER_SIZE};
use std::fs::{self, File, OpenOptions}; use crate::recovery::{self, RecoveryReport};
use crate::segment::{SegmentManager, DEFAULT_MAX_SEGMENT_SIZE};
use std::fs::{File, OpenOptions};
use std::io::{BufReader, Seek, SeekFrom}; use std::io::{BufReader, Seek, SeekFrom};
use std::path::{Path, PathBuf}; use std::path::Path;
use tracing::{debug, info, instrument, warn}; use tracing::{debug, info, instrument, warn};
/// The main quarantine journal. /// The main quarantine journal.
/// ///
/// Provides append-only storage with crash recovery and fsync guarantees. /// Provides append-only storage with crash recovery, fsync guarantees,
/// and log rotation via segments.
pub struct Journal { pub struct Journal {
data_dir: PathBuf, segment_mgr: SegmentManager,
current_file: Option<FsyncGuard>, current_file: Option<FsyncGuard>,
current_offset: u64, current_offset: u64,
durability: DurabilityLevel, durability: DurabilityLevel,
last_recovery_report: Option<RecoveryReport>,
} }
impl Journal { impl Journal {
/// Open or create a journal in the specified directory. /// Open or create a journal in the specified directory.
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))] #[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))]
pub fn open(data_dir: impl AsRef<Path>) -> Result<Self> { pub fn open(data_dir: impl AsRef<Path>) -> Result<Self> {
Self::open_with_max_segment_size(data_dir, DEFAULT_MAX_SEGMENT_SIZE)
}
/// Open with a custom max segment size (useful for tests).
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display(), max_segment_size))]
pub fn open_with_max_segment_size(
data_dir: impl AsRef<Path>,
max_segment_size: u64,
) -> Result<Self> {
let data_dir = data_dir.as_ref().to_path_buf(); let data_dir = data_dir.as_ref().to_path_buf();
fs::create_dir_all(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?; let segment_mgr = SegmentManager::open(&data_dir, max_segment_size)?;
let mut journal = Self { let mut journal = Self {
data_dir, segment_mgr,
current_file: None, current_file: None,
current_offset: 0, current_offset: 0,
durability: DurabilityLevel::Immediate, durability: DurabilityLevel::Immediate,
last_recovery_report: None,
}; };
journal.recover()?; journal.recover()?;
@ -41,11 +55,32 @@ impl Journal {
self self
} }
/// Get the current write offset.
pub fn current_offset(&self) -> u64 {
self.current_offset
}
/// Get the recovery report from the last open/recover call.
pub fn recovery_report(&self) -> Option<&RecoveryReport> {
self.last_recovery_report.as_ref()
}
/// Append a record to the journal. /// Append a record to the journal.
///
/// Checks if rotation is needed before writing. Returns the global offset.
#[instrument(skip(self, payload), fields(payload_len = payload.len()))] #[instrument(skip(self, payload), fields(payload_len = payload.len()))]
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> { pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
if self.current_file.is_none() { if self.current_file.is_none() {
self.open_current_file()?; self.ensure_current_segment()?;
}
// Check if rotation is needed
if let Some(guard) = &self.current_file {
let current_size =
guard.file().metadata().map_err(|e| QuarantineError::io(guard.path(), e))?.len();
if self.segment_mgr.needs_rotation(current_size) {
self.rotate()?;
}
} }
let record = Record::new(payload); let record = Record::new(payload);
@ -64,57 +99,172 @@ impl Journal {
Ok(offset) Ok(offset)
} }
/// Read a record at the given offset. /// Read a record at the given global offset.
///
/// Resolves the correct segment via binary search, then seeks within it.
/// If no segment is found, rescans the directory for new segments created
/// by a separate writer instance and retries once.
#[instrument(skip(self))] #[instrument(skip(self))]
pub fn read(&self, offset: u64) -> Result<Record> { pub fn read(&mut self, offset: u64) -> Result<Record> {
let path = self.current_file_path(); // Try to resolve the segment, refreshing once if not found
let mut file = File::open(&path).map_err(|e| QuarantineError::io(&path, e))?; let segment_info = match self.segment_mgr.resolve_segment(offset) {
file.seek(SeekFrom::Start(offset)).map_err(|e| QuarantineError::io(&path, e))?; Some(seg) => (seg.base_offset, seg.path.clone()),
None => {
// Segment not found - rescan directory for new segments
self.segment_mgr.refresh()?;
match self.segment_mgr.resolve_segment(offset) {
Some(seg) => (seg.base_offset, seg.path.clone()),
None => {
return Err(QuarantineError::IoGeneric(std::io::Error::new(
std::io::ErrorKind::UnexpectedEof,
format!("No segment contains offset {}", offset),
)));
}
}
}
};
let local_offset = offset - segment_info.0;
let mut file =
File::open(&segment_info.1).map_err(|e| QuarantineError::io(&segment_info.1, e))?;
file.seek(SeekFrom::Start(local_offset))
.map_err(|e| QuarantineError::io(&segment_info.1, e))?;
let mut reader = BufReader::new(file); let mut reader = BufReader::new(file);
Record::read_from(&mut reader) Record::read_from(&mut reader)
} }
/// Recover state from disk. /// Force sync any pending writes.
#[instrument(skip(self))]
pub fn force_sync(&mut self) -> Result<()> {
if let Some(ref mut guard) = self.current_file {
guard.force_sync()?;
}
Ok(())
}
/// Clean up old segments below the given minimum cursor.
///
/// Returns the number of bytes freed.
#[instrument(skip(self))]
pub fn cleanup(&mut self, min_cursor: u64) -> Result<u64> {
self.segment_mgr.cleanup(min_cursor)
}
/// Recover state from disk using full record scanning across all segments.
#[instrument(skip(self))] #[instrument(skip(self))]
fn recover(&mut self) -> Result<()> { fn recover(&mut self) -> Result<()> {
let path = self.current_file_path(); let segments = self.segment_mgr.segments().to_vec();
if !path.exists() {
debug!("No existing WAL file, starting fresh"); if segments.is_empty() {
debug!("No existing WAL segments, starting fresh");
return Ok(()); return Ok(());
} }
let file = File::open(&path).map_err(|e| QuarantineError::io(&path, e))?; // Recover each segment in order; stop at first with issues
let len = file.metadata().map_err(|e| QuarantineError::io(&path, e))?.len(); let mut total_valid = 0u64;
let mut final_offset = 0u64;
let mut last_report = None;
// Basic recovery: validate header and set offset to end for (i, segment) in segments.iter().enumerate() {
// TODO: Implement full scan and truncate of partial records let file_len = std::fs::metadata(&segment.path)
if len >= HEADER_SIZE as u64 { .map_err(|e| QuarantineError::io(&segment.path, e))?
let mut reader = BufReader::new(file); .len();
let _header = FileHeader::read_from(&mut reader)?;
self.current_offset = len; if file_len == 0 {
info!(file_size = len, "Recovered existing WAL"); debug!(base_offset = segment.base_offset, "Empty segment file, skipping");
} else { continue;
// Corrupt or empty, start over
warn!(file_size = len, "WAL file too small, resetting");
self.current_offset = 0;
} }
let report = recovery::recover_file(&segment.path)?;
total_valid += report.valid_records;
// The final_offset from recover_file is a local offset within the segment.
// Convert to global: segment.base_offset + local_offset
final_offset = segment.base_offset + report.final_offset;
if report.bytes_truncated > 0 {
warn!(
segment_index = i,
base_offset = segment.base_offset,
truncated = report.bytes_truncated,
"Segment had corrupt data, truncated"
);
}
last_report = Some(report);
}
self.current_offset = final_offset;
if let Some(report) = &last_report {
if report.bytes_truncated > 0 {
warn!(total_valid, final_offset, "Recovery truncated corrupt data");
}
}
info!(total_valid, final_offset, "Multi-segment recovery complete");
self.last_recovery_report = last_report;
Ok(()) Ok(())
} }
fn current_file_path(&self) -> PathBuf { /// Ensure there's a current segment open for writing.
self.data_dir.join("00000000.wal") fn ensure_current_segment(&mut self) -> Result<()> {
if self.segment_mgr.segments().is_empty() {
// First ever segment
self.segment_mgr.create_segment(0)?;
self.current_offset = HEADER_SIZE as u64;
} }
#[instrument(skip(self), fields(path = %self.current_file_path().display()))] self.open_current_file()
}
/// Rotate to a new segment.
#[instrument(skip(self))]
fn rotate(&mut self) -> Result<()> {
// Close current file
if let Some(mut guard) = self.current_file.take() {
guard.force_sync()?;
}
let new_base = self.current_offset;
self.segment_mgr.create_segment(new_base)?;
// The new segment starts with a header, so the actual write position
// within the segment is at HEADER_SIZE. But the global offset stays
// at current_offset (which already accounts for everything written so far).
// We do NOT advance current_offset by HEADER_SIZE here because the
// segment's base_offset = current_offset, and reads will use
// local_offset = global_offset - base_offset = 0 + HEADER_SIZE for the first record.
//
// Actually, we need the first record in the new segment to have a global
// offset that, when converted to local, lands after the header.
// local_offset = global_offset - base_offset
// For the first record: local_offset should be HEADER_SIZE.
// So global_offset = base_offset + HEADER_SIZE = current_offset + HEADER_SIZE.
self.current_offset = new_base + HEADER_SIZE as u64;
self.open_current_file()?;
info!(new_base, "Rotated to new segment");
Ok(())
}
#[instrument(skip(self))]
fn open_current_file(&mut self) -> Result<()> { fn open_current_file(&mut self) -> Result<()> {
let path = self.current_file_path(); let segment = self.segment_mgr.current_segment().ok_or_else(|| {
QuarantineError::IoGeneric(std::io::Error::other("No segments available"))
})?;
let path = segment.path.clone();
let file = OpenOptions::new() let file = OpenOptions::new()
.create(true) .create(true)
.read(true) .read(true)
.write(true) .write(true)
.truncate(false) // Never truncate existing WAL files on open .truncate(false)
.open(&path) .open(&path)
.map_err(|e| QuarantineError::io(&path, e))?; .map_err(|e| QuarantineError::io(&path, e))?;
@ -128,15 +278,12 @@ impl Journal {
let mut buf = Vec::with_capacity(HEADER_SIZE); let mut buf = Vec::with_capacity(HEADER_SIZE);
header.write_to(&mut buf)?; header.write_to(&mut buf)?;
guard.write(&buf)?; guard.write(&buf)?;
self.current_offset = HEADER_SIZE as u64; debug!("Wrote v2 header to new segment");
debug!("Created new WAL file with header");
} else {
// Seek to end of file for append operations
guard.file_mut().seek(SeekFrom::End(0)).map_err(|e| QuarantineError::io(&path, e))?;
self.current_offset = len;
debug!(file_size = len, "Opened existing WAL file");
} }
// Seek to end for appends
guard.file_mut().seek(SeekFrom::End(0)).map_err(|e| QuarantineError::io(&path, e))?;
self.current_file = Some(guard); self.current_file = Some(guard);
Ok(()) Ok(())
} }

View File

@ -3,13 +3,27 @@
//! This crate provides the foundational durability layer, ensuring that //! This crate provides the foundational durability layer, ensuring that
//! assertions are safely persisted to disk before being acknowledged. //! assertions are safely persisted to disk before being acknowledged.
//! //!
//! # Record Format (v2)
//!
//! Each record is stored as: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
//!
//! - CRC32C provides fast integrity checking to detect torn writes
//! - BLAKE3 provides content-addressed verification
//!
//! # Crash Recovery //! # Crash Recovery
//! //!
//! The WAL provides crash recovery guarantees via immediate fsync. When a //! The WAL provides crash recovery guarantees via immediate fsync. When a
//! record is appended with `DurabilityLevel::Immediate` (the default), it //! record is appended with `DurabilityLevel::Immediate` (the default), it
//! is guaranteed to survive process crashes or power failures. //! is guaranteed to survive process crashes or power failures.
//! //!
//! See the `recovery` module for integration tests proving these guarantees. //! On open, the journal scans all records across all segments, verifying
//! CRC32C and BLAKE3. Any corrupt or partial records at the tail are truncated.
//!
//! # Log Rotation
//!
//! Segment files are named `{base_offset:016x}.wal`. When the current segment
//! exceeds the configured max size, a new segment is created. Old segments
//! can be cleaned up once all consumers have advanced past them.
pub mod durability; pub mod durability;
/// Error types and Result wrapper for WAL operations. /// Error types and Result wrapper for WAL operations.
@ -18,10 +32,18 @@ pub mod error;
pub mod format; pub mod format;
/// The main Journal API. /// The main Journal API.
pub mod journal; pub mod journal;
/// Crash recovery integration tests. /// Crash recovery: file scanning, validation, and truncation.
mod recovery; pub mod recovery;
/// Log rotation via segment files.
pub mod segment;
/// Group commit buffer for batching fsync operations.
#[cfg(feature = "group-commit")]
pub mod group_commit;
pub use durability::{DurabilityLevel, FsyncGuard}; pub use durability::{DurabilityLevel, FsyncGuard};
pub use error::{QuarantineError, Result}; pub use error::{QuarantineError, Result};
pub use format::{FileHeader, Record, HEADER_SIZE}; pub use format::{FileHeader, Record, HEADER_SIZE, RECORD_OVERHEAD};
pub use journal::Journal; pub use journal::Journal;
pub use recovery::RecoveryReport;
pub use segment::{Segment, SegmentManager};

View File

@ -1,198 +0,0 @@
//! Crash recovery integration tests for the WAL.
//!
//! These tests verify that the Write-Ahead Log survives crashes (simulated by
//! dropping the Journal and reopening it) without data loss.
//!
//! # Test Strategy
//!
//! We cannot truly simulate a power failure in a unit test, but we can:
//! 1. Write data with immediate fsync (ensuring it hits disk)
//! 2. Drop the Journal (simulating process termination)
//! 3. Reopen the Journal (simulating restart)
//! 4. Verify all data is present and readable
//!
//! This proves the durability guarantees of the WAL.
#[cfg(test)]
mod tests {
use crate::format::HEADER_SIZE;
use crate::journal::Journal;
use tempfile::tempdir;
/// Test: Single record survives Journal close and reopen.
///
/// This is the fundamental crash recovery guarantee:
/// After fsync completes, data is durable.
#[test]
fn test_single_record_crash_recovery() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("wal");
let payload = b"critical assertion data".to_vec();
let offset: u64;
// Phase 1: Write and "crash" (drop journal)
{
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
offset = journal.append(payload.clone()).expect("Failed to append");
// Journal dropped here - simulates crash/restart
}
// Phase 2: Recovery - reopen and verify
{
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
let record = journal.read(offset).expect("Failed to read after recovery");
assert_eq!(record.payload, payload, "Data should survive restart");
}
}
/// Test: Multiple records survive crash and are readable in order.
#[test]
fn test_multiple_records_crash_recovery() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("wal");
let records = vec![
b"assertion 1: Tesla revenue is $96.7B".to_vec(),
b"assertion 2: Apple revenue is $394B".to_vec(),
b"assertion 3: Microsoft revenue is $211B".to_vec(),
];
let mut offsets = Vec::new();
// Phase 1: Write multiple records and "crash"
{
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
for payload in &records {
let offset = journal.append(payload.clone()).expect("Failed to append");
offsets.push(offset);
}
// Journal dropped here
}
// Phase 2: Recovery - verify all records
{
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
for (i, offset) in offsets.iter().enumerate() {
let record = journal.read(*offset).expect("Failed to read");
assert_eq!(record.payload, records[i], "Record {} should match", i);
}
}
}
/// Test: Journal can continue appending after recovery.
///
/// This verifies that recovery properly sets the write offset.
#[test]
fn test_append_after_recovery() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("wal");
let first_payload = b"first record".to_vec();
let first_offset: u64;
// Phase 1: Write first record and "crash"
{
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
first_offset = journal.append(first_payload.clone()).expect("Failed to append");
}
// Phase 2: Recover and append more
let second_payload = b"second record after recovery".to_vec();
let second_offset: u64;
{
let mut journal = Journal::open(&wal_path).expect("Failed to reopen journal");
second_offset = journal.append(second_payload.clone()).expect("Failed to append");
// Verify second offset is after first
assert!(
second_offset > first_offset,
"New records should be appended after existing data"
);
}
// Phase 3: Verify both records after another "crash"
{
let journal = Journal::open(&wal_path).expect("Failed to reopen journal again");
let first = journal.read(first_offset).expect("Failed to read first");
let second = journal.read(second_offset).expect("Failed to read second");
assert_eq!(first.payload, first_payload);
assert_eq!(second.payload, second_payload);
}
}
/// Test: Large payloads survive crash recovery.
///
/// Ensures the WAL handles larger data correctly, not just small test payloads.
#[test]
fn test_large_payload_crash_recovery() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("wal");
// Create a 1MB payload (simulating a large assertion with embeddings)
let large_payload: Vec<u8> = (0..1024 * 1024).map(|i| (i % 256) as u8).collect();
let offset: u64;
// Write and "crash"
{
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
offset = journal.append(large_payload.clone()).expect("Failed to append large payload");
}
// Recover and verify
{
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
let record = journal.read(offset).expect("Failed to read large payload");
assert_eq!(record.payload.len(), large_payload.len());
assert_eq!(record.payload, large_payload, "Large payload should survive");
}
}
/// Test: Empty WAL directory is handled gracefully.
#[test]
fn test_fresh_start_no_existing_wal() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("fresh_wal");
// Opening a fresh directory should work
let mut journal = Journal::open(&wal_path).expect("Failed to open fresh journal");
// Should be able to write immediately
let offset = journal.append(b"first record".to_vec()).expect("Failed to append");
assert_eq!(offset, HEADER_SIZE as u64, "First record should start after header");
}
/// Test: Repeated crash-recovery cycles work correctly.
///
/// Simulates a flaky system that crashes and recovers multiple times.
#[test]
fn test_repeated_crash_recovery_cycles() {
let dir = tempdir().expect("Failed to create temp dir");
let wal_path = dir.path().join("wal");
let mut all_offsets = Vec::new();
let num_cycles = 5;
let records_per_cycle = 3;
for cycle in 0..num_cycles {
// Write some records
{
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
for i in 0..records_per_cycle {
let payload = format!("cycle {} record {}", cycle, i).into_bytes();
let offset = journal.append(payload).expect("Failed to append");
all_offsets.push((offset, cycle, i));
}
// "Crash" - drop journal
}
}
// Final verification - all records from all cycles should be present
{
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
for (offset, cycle, i) in &all_offsets {
let record = journal.read(*offset).expect("Failed to read");
let expected = format!("cycle {} record {}", cycle, i).into_bytes();
assert_eq!(record.payload, expected, "Record from cycle {} should survive", cycle);
}
}
}
}

View File

@ -0,0 +1,236 @@
//! Crash recovery for WAL files.
//!
//! Provides `recover_file()` which scans a WAL file record-by-record,
//! verifying CRC32C and BLAKE3 checksums. On encountering corruption or
//! a partial record, it truncates the file to the last valid offset.
//!
//! Recovery never returns `Err` for data corruption — it logs and truncates.
//! Only I/O failures (disk error, permission denied) produce errors.
use crate::error::{QuarantineError, Result};
use crate::format::{
compute_crc32c, FileHeader, HEADER_SIZE, MAX_RECORD_SIZE, RECORD_OVERHEAD, VERSION,
};
use byteorder::{LittleEndian, ReadBytesExt};
use std::fs::{File, OpenOptions};
use std::io::{BufReader, Read, Seek, SeekFrom};
use std::path::Path;
use std::time::{Duration, Instant};
use tracing::{info, instrument, warn};
/// Report from a recovery scan.
#[derive(Debug, Clone)]
pub struct RecoveryReport {
/// Number of valid records found.
pub valid_records: u64,
/// Number of invalid/corrupt records encountered (always 0 or 1,
/// since we stop at first corruption).
pub invalid_records: u64,
/// Bytes truncated from the end of the file.
pub bytes_truncated: u64,
/// Time spent during recovery.
pub recovery_duration: Duration,
/// Final valid offset (write position after recovery).
pub final_offset: u64,
}
/// Recover a single WAL file, returning a report.
///
/// Algorithm:
/// 1. Read and validate FileHeader (must be v2)
/// 2. Sequential scan from HEADER_SIZE:
/// - Read payload_len. EOF -> clean end.
/// - Validate length (> 0, <= MAX_RECORD_SIZE). Invalid -> truncate.
/// - Read crc32c, blake3, payload. EOF -> truncate at scan position.
/// - Verify CRC32C. Mismatch -> truncate.
/// - Verify BLAKE3. Mismatch -> truncate.
/// - Advance scan position.
/// 3. If truncation needed: set_len + fsync.
/// 4. Return RecoveryReport.
#[instrument(skip_all, fields(path = %path.as_ref().display()))]
pub fn recover_file(path: impl AsRef<Path>) -> Result<RecoveryReport> {
let path = path.as_ref();
let start = Instant::now();
let file = File::open(path).map_err(|e| QuarantineError::io(path, e))?;
let file_len = file.metadata().map_err(|e| QuarantineError::io(path, e))?.len();
// File too small for header
if file_len < HEADER_SIZE as u64 {
warn!(file_len, "WAL file smaller than header, truncating to 0");
let wfile =
OpenOptions::new().write(true).open(path).map_err(|e| QuarantineError::io(path, e))?;
wfile.set_len(0).map_err(|e| QuarantineError::io(path, e))?;
wfile.sync_all().map_err(|e| QuarantineError::io(path, e))?;
return Ok(RecoveryReport {
valid_records: 0,
invalid_records: 0,
bytes_truncated: file_len,
recovery_duration: start.elapsed(),
final_offset: 0,
});
}
let mut reader = BufReader::new(file);
// Validate header
let header_result = FileHeader::read_from(&mut reader);
if let Err(e) = header_result {
warn!(error = %e, "WAL header invalid, cannot recover");
return Err(e);
}
let header = header_result?;
if header.version != VERSION {
return Err(QuarantineError::CorruptRecord {
offset: 0,
reason: format!(
"Unsupported WAL version {} (expected {}). Delete the WAL and re-ingest.",
header.version, VERSION
),
});
}
// Sequential scan
let mut scan_offset = HEADER_SIZE as u64;
let mut valid_records: u64 = 0;
let mut needs_truncation = false;
loop {
if scan_offset >= file_len {
break;
}
// Not enough room for even the fixed header portion of a record
let remaining = file_len - scan_offset;
if remaining < RECORD_OVERHEAD as u64 {
warn!(
offset = scan_offset,
remaining_bytes = remaining,
"Partial record header at end of file"
);
needs_truncation = true;
break;
}
reader.seek(SeekFrom::Start(scan_offset)).map_err(|e| QuarantineError::io(path, e))?;
// Read payload_len
let payload_len = match reader.read_u32::<LittleEndian>() {
Ok(len) => len,
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => break,
Err(e) => return Err(QuarantineError::io(path, e)),
};
// Validate length
if payload_len == 0 || payload_len as usize > MAX_RECORD_SIZE {
warn!(offset = scan_offset, payload_len, "Invalid record length, truncating");
needs_truncation = true;
break;
}
// Check if enough bytes remain for the full record
let record_size = RECORD_OVERHEAD as u64 + payload_len as u64;
if scan_offset + record_size > file_len {
warn!(
offset = scan_offset,
expected_size = record_size,
file_len,
"Truncated record at end of file"
);
needs_truncation = true;
break;
}
// Read CRC32C
let stored_crc = match reader.read_u32::<LittleEndian>() {
Ok(crc) => crc,
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => {
needs_truncation = true;
break;
}
Err(e) => return Err(QuarantineError::io(path, e)),
};
// Read BLAKE3
let mut blake3_hash = [0u8; 32];
if let Err(e) = reader.read_exact(&mut blake3_hash) {
if e.kind() == std::io::ErrorKind::UnexpectedEof {
needs_truncation = true;
break;
}
return Err(QuarantineError::io(path, e));
}
// Read payload
let mut payload = vec![0u8; payload_len as usize];
if let Err(e) = reader.read_exact(&mut payload) {
if e.kind() == std::io::ErrorKind::UnexpectedEof {
needs_truncation = true;
break;
}
return Err(QuarantineError::io(path, e));
}
// Verify CRC32C
let len_bytes = payload_len.to_le_bytes();
let computed_crc = compute_crc32c(&len_bytes, &blake3_hash, &payload);
if stored_crc != computed_crc {
warn!(
offset = scan_offset,
expected = stored_crc,
actual = computed_crc,
"CRC32C mismatch, truncating"
);
needs_truncation = true;
break;
}
// Verify BLAKE3
let computed_blake3: [u8; 32] = blake3::hash(&payload).into();
if blake3_hash != computed_blake3 {
warn!(offset = scan_offset, "BLAKE3 mismatch, truncating");
needs_truncation = true;
break;
}
// Record is valid
scan_offset += record_size;
valid_records += 1;
}
// Truncate if needed
let bytes_truncated = if needs_truncation && scan_offset < file_len {
let truncated = file_len - scan_offset;
let wfile =
OpenOptions::new().write(true).open(path).map_err(|e| QuarantineError::io(path, e))?;
wfile.set_len(scan_offset).map_err(|e| QuarantineError::io(path, e))?;
wfile.sync_all().map_err(|e| QuarantineError::io(path, e))?;
if let Some(parent) = path.parent() {
let _ = crate::durability::sync_directory(parent);
}
info!(truncated_bytes = truncated, final_offset = scan_offset, "Truncated corrupt tail");
truncated
} else {
0
};
let report = RecoveryReport {
valid_records,
invalid_records: u64::from(needs_truncation),
bytes_truncated,
recovery_duration: start.elapsed(),
final_offset: scan_offset,
};
info!(
valid_records = report.valid_records,
bytes_truncated = report.bytes_truncated,
final_offset = report.final_offset,
"Recovery complete"
);
Ok(report)
}
#[cfg(test)]
mod tests;

View File

@ -0,0 +1,413 @@
//! Tests for crash recovery and log rotation integration.
use super::*;
use crate::format::{FileHeader, Record, HEADER_SIZE, MAX_RECORD_SIZE, RECORD_OVERHEAD};
use crate::journal::Journal;
use std::io::Write;
use tempfile::tempdir;
/// Helper: write a raw WAL file with header + records for testing
fn write_test_wal(path: &Path, records: &[&[u8]]) -> Vec<u64> {
let mut file = File::create(path).expect("create file");
let header = FileHeader::new();
let mut buf = Vec::new();
header.write_to(&mut buf).expect("write header");
file.write_all(&buf).expect("write header bytes");
let mut offsets = Vec::new();
let mut offset = HEADER_SIZE as u64;
for payload in records {
offsets.push(offset);
let record = Record::new(payload.to_vec());
let mut rec_buf = Vec::new();
record.write_to(&mut rec_buf).expect("write record");
file.write_all(&rec_buf).expect("write record bytes");
offset += record.disk_size();
}
file.sync_all().expect("sync");
offsets
}
#[test]
fn test_recovery_truncates_partial_record() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
write_test_wal(&wal_file, &[b"record 0", b"record 1", b"record 2"]);
// Append 5 trailing junk bytes
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
file.write_all(&[0xDE, 0xAD, 0xBE, 0xEF, 0x42]).expect("write junk");
file.sync_all().expect("sync");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 3);
assert_eq!(report.bytes_truncated, 5);
assert_eq!(report.invalid_records, 1);
}
#[test]
fn test_recovery_truncates_corrupt_checksum() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
write_test_wal(&wal_file, &[b"record 0", b"record 1", b"record 2"]);
// Corrupt a byte in record 2's payload area
let mut data = std::fs::read(&wal_file).expect("read file");
let corrupt_pos = data.len() - 2;
data[corrupt_pos] ^= 0xFF;
std::fs::write(&wal_file, &data).expect("write corrupted");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 2);
assert_eq!(report.invalid_records, 1);
assert!(report.bytes_truncated > 0);
}
#[test]
fn test_recovery_handles_empty_after_header() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
// Write just a header
let mut file = File::create(&wal_file).expect("create");
let header = FileHeader::new();
let mut buf = Vec::new();
header.write_to(&mut buf).expect("write header");
file.write_all(&buf).expect("write");
file.sync_all().expect("sync");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 0);
assert_eq!(report.bytes_truncated, 0);
assert_eq!(report.final_offset, HEADER_SIZE as u64);
}
#[test]
fn test_recovery_handles_truncated_header() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
// Write only 4 bytes (less than HEADER_SIZE = 8)
std::fs::write(&wal_file, [0x53, 0x54, 0x45, 0x4D]).expect("write");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 0);
assert_eq!(report.bytes_truncated, 4);
assert_eq!(report.final_offset, 0);
}
#[test]
fn test_recovery_report_metrics() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
write_test_wal(&wal_file, &[b"alpha", b"beta", b"gamma"]);
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 3);
assert_eq!(report.invalid_records, 0);
assert_eq!(report.bytes_truncated, 0);
assert!(report.recovery_duration < Duration::from_secs(5));
let expected_offset = HEADER_SIZE as u64 + 3 * RECORD_OVERHEAD as u64 + 5 + 4 + 5; // alpha=5, beta=4, gamma=5
assert_eq!(report.final_offset, expected_offset);
}
#[test]
fn test_recovery_preserves_valid_before_corruption() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
let offsets = write_test_wal(&wal_file, &[b"keep me", b"keep me too", b"corrupt me"]);
// Corrupt record 2 by flipping a CRC byte (bytes 4..8 of that record)
let mut data = std::fs::read(&wal_file).expect("read");
let record2_crc_offset = offsets[2] as usize + 4; // skip payload_len, hit CRC
data[record2_crc_offset] ^= 0xFF;
std::fs::write(&wal_file, &data).expect("write");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 2);
assert_eq!(report.final_offset, offsets[2]);
// Verify the file was actually truncated
let new_len = std::fs::metadata(&wal_file).expect("metadata").len();
assert_eq!(new_len, offsets[2]);
}
#[test]
fn test_recovery_handles_zero_length_record() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
write_test_wal(&wal_file, &[b"good record"]);
// Append a record with payload_len = 0 (invalid)
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
file.write_all(&0u32.to_le_bytes()).expect("write zero len");
file.write_all(&[0u8; 36]).expect("write padding"); // crc + blake3
file.sync_all().expect("sync");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 1);
assert_eq!(report.invalid_records, 1);
assert!(report.bytes_truncated > 0);
}
#[test]
fn test_recovery_handles_impossibly_large_length() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
write_test_wal(&wal_file, &[b"good record"]);
// Append a record with payload_len = MAX + 1 (invalid)
let huge_len = (MAX_RECORD_SIZE as u32) + 1;
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
file.write_all(&huge_len.to_le_bytes()).expect("write huge len");
file.sync_all().expect("sync");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 1);
assert_eq!(report.invalid_records, 1);
}
/// Integration test: Journal uses recover_file under the hood
#[test]
fn test_journal_recovery_integration() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
let offsets: Vec<u64>;
// Write records via Journal
{
let mut journal = Journal::open(&wal_path).expect("open journal");
offsets = (0..5)
.map(|i| journal.append(format!("record {}", i).into_bytes()).expect("append"))
.collect();
}
// Append junk to simulate torn write
let wal_file = wal_path.join("0000000000000000.wal");
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
file.write_all(&[0xFF; 20]).expect("write junk");
file.sync_all().expect("sync");
// Journal should recover cleanly
{
let mut journal = Journal::open(&wal_path).expect("reopen journal");
for (i, offset) in offsets.iter().enumerate() {
let record = journal.read(*offset).expect("read record");
assert_eq!(record.payload, format!("record {}", i).into_bytes());
}
}
}
/// Performance: recovery of 10K records should be fast
#[test]
fn test_recovery_performance_10k_records() {
let dir = tempdir().expect("tempdir");
let wal_file = dir.path().join("test.wal");
let payloads: Vec<&[u8]> =
(0..10_000).map(|_| b"benchmark payload data here" as &[u8]).collect();
write_test_wal(&wal_file, &payloads);
// Corrupt the last record
let mut data = std::fs::read(&wal_file).expect("read");
let last = data.len() - 1;
data[last] ^= 0xFF;
std::fs::write(&wal_file, &data).expect("write");
let report = recover_file(&wal_file).expect("recover");
assert_eq!(report.valid_records, 9_999);
assert!(
report.recovery_duration < Duration::from_secs(10),
"Recovery took {:?}",
report.recovery_duration
);
}
// =========================================================================
// Wave 4: Log Rotation Integration Tests
// =========================================================================
/// Test: Rotation creates new segments at the configured threshold.
#[test]
fn test_rotation_creates_new_segment() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
// Use a tiny max_segment_size (1KB) to trigger rotation quickly
let mut journal = Journal::open_with_max_segment_size(&wal_path, 1024).expect("open journal");
let mut offsets = Vec::new();
// Write enough records to trigger multiple rotations
for i in 0..50 {
let payload = format!("rotation test record {} with some padding data", i).into_bytes();
let offset = journal.append(payload).expect("append");
offsets.push(offset);
}
// Verify we created multiple segments
let segment_files: Vec<_> = std::fs::read_dir(&wal_path)
.expect("readdir")
.filter_map(|e| e.ok())
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
.collect();
assert!(segment_files.len() > 1, "Expected multiple segments, got {}", segment_files.len());
}
/// Test: Records can be read across segment boundaries.
#[test]
fn test_read_across_segments() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
// 512 byte threshold to force rotation
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open journal");
let mut records = Vec::new();
for i in 0..30 {
let payload = format!("cross-segment record {}", i).into_bytes();
let offset = journal.append(payload.clone()).expect("append");
records.push((offset, payload));
}
// Read all records back - some will span segment boundaries
for (offset, expected_payload) in &records {
let record = journal.read(*offset).expect("read across segments");
assert_eq!(&record.payload, expected_payload);
}
}
/// Test: Recovery works across multiple segments.
#[test]
fn test_recovery_across_segments() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
let mut records = Vec::new();
// Write records with small segments
{
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
for i in 0..20 {
let payload = format!("recovery segment test {}", i).into_bytes();
let offset = journal.append(payload.clone()).expect("append");
records.push((offset, payload));
}
}
// Append junk to the last segment to simulate torn write
let last_segment = std::fs::read_dir(&wal_path)
.expect("readdir")
.filter_map(|e| e.ok())
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
.max_by_key(|e| e.file_name())
.expect("at least one segment");
let mut file =
OpenOptions::new().append(true).open(last_segment.path()).expect("open last segment");
file.write_all(&[0xDE, 0xAD, 0xBE, 0xEF]).expect("write junk");
file.sync_all().expect("sync");
// Recovery should preserve all valid records
{
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("reopen");
for (offset, expected) in &records {
let record = journal.read(*offset).expect("read after recovery");
assert_eq!(&record.payload, expected);
}
}
}
/// Test: Appending after recovery with rotation works.
#[test]
fn test_append_after_recovery_with_rotation() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
let mut pre_recovery_records = Vec::new();
// Phase 1: Write some records
{
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
for i in 0..10 {
let payload = format!("before recovery {}", i).into_bytes();
let offset = journal.append(payload.clone()).expect("append");
pre_recovery_records.push((offset, payload));
}
}
// Phase 2: Recover and continue writing
let mut post_recovery_records = Vec::new();
{
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("reopen");
// Verify pre-recovery records
for (offset, expected) in &pre_recovery_records {
let record = journal.read(*offset).expect("read pre-recovery");
assert_eq!(&record.payload, expected);
}
// Write more records
for i in 0..10 {
let payload = format!("after recovery {}", i).into_bytes();
let offset = journal.append(payload.clone()).expect("append");
post_recovery_records.push((offset, payload));
}
}
// Phase 3: Final verification
{
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("final open");
for (offset, expected) in pre_recovery_records.iter().chain(&post_recovery_records) {
let record = journal.read(*offset).expect("read final");
assert_eq!(&record.payload, expected);
}
}
}
/// Test: Cleanup removes old segments after cursor advances.
#[test]
fn test_cleanup_removes_old_segments() {
let dir = tempdir().expect("tempdir");
let wal_path = dir.path().join("wal");
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
// Write enough to create multiple segments
let mut last_offset = 0;
for i in 0..30 {
let payload = format!("cleanup test record {}", i).into_bytes();
last_offset = journal.append(payload).expect("append");
}
let count_segments = || -> usize {
std::fs::read_dir(&wal_path)
.expect("readdir")
.filter_map(|e| e.ok())
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
.count()
};
let initial_segments = count_segments();
assert!(initial_segments > 1, "Should have multiple segments");
// Cleanup with cursor at the last offset should remove old segments
let freed = journal.cleanup(last_offset).expect("cleanup");
assert!(freed > 0, "Should have freed some bytes");
let final_segments = count_segments();
assert!(
final_segments < initial_segments,
"Should have fewer segments after cleanup: {} -> {}",
initial_segments,
final_segments
);
}

View File

@ -0,0 +1,368 @@
//! Log rotation via segment files with global offset addressing.
//!
//! Each segment file is named `{base_offset:016x}.wal` where `base_offset` is
//! the global WAL offset where that segment begins. Reads resolve the correct
//! segment via binary search, and writes rotate to a new segment when the
//! current one exceeds `MAX_SEGMENT_SIZE`.
//!
//! # Cleanup
//!
//! `SegmentManager::cleanup(min_cursor)` deletes segments whose entire range
//! is below `min_cursor`, freeing disk space after consumers have advanced.
use crate::error::{QuarantineError, Result};
use crate::format::{FileHeader, HEADER_SIZE};
use std::fs;
use std::path::{Path, PathBuf};
use tracing::{debug, info, instrument, warn};
/// Default maximum segment size (1 GB).
pub const DEFAULT_MAX_SEGMENT_SIZE: u64 = 1024 * 1024 * 1024;
/// A single WAL segment file.
#[derive(Debug, Clone)]
pub struct Segment {
/// Global WAL offset where this segment starts.
pub base_offset: u64,
/// Path to the segment file.
pub path: PathBuf,
/// Current file size in bytes.
pub size: u64,
}
impl Segment {
/// Format a segment filename from its base offset.
pub fn filename(base_offset: u64) -> String {
format!("{:016x}.wal", base_offset)
}
/// Parse a base offset from a segment filename.
pub fn parse_filename(name: &str) -> Option<u64> {
let stem = name.strip_suffix(".wal")?;
if stem.len() != 16 {
return None;
}
u64::from_str_radix(stem, 16).ok()
}
}
/// Manages multiple WAL segment files.
pub struct SegmentManager {
/// Directory containing segment files.
data_dir: PathBuf,
/// Segments sorted by base_offset.
segments: Vec<Segment>,
/// Maximum size per segment before rotation.
max_segment_size: u64,
}
impl SegmentManager {
/// Open an existing segment directory, scanning for segment files.
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))]
pub fn open(data_dir: impl AsRef<Path>, max_segment_size: u64) -> Result<Self> {
let data_dir = data_dir.as_ref().to_path_buf();
fs::create_dir_all(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?;
let mut segments = Vec::new();
let entries = fs::read_dir(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?;
for entry in entries {
let entry = entry.map_err(|e| QuarantineError::io(&data_dir, e))?;
let name = entry.file_name();
let name_str = name.to_string_lossy();
if let Some(base_offset) = Segment::parse_filename(&name_str) {
let meta = entry.metadata().map_err(|e| QuarantineError::io(entry.path(), e))?;
segments.push(Segment { base_offset, path: entry.path(), size: meta.len() });
}
}
segments.sort_by_key(|s| s.base_offset);
debug!(segment_count = segments.len(), "SegmentManager opened");
Ok(Self { data_dir, segments, max_segment_size })
}
/// Rescan the data directory for new segment files.
///
/// This is used by read-only journal instances that need to discover
/// segments created by a separate writer instance.
#[instrument(skip(self), fields(data_dir = %self.data_dir.display()))]
pub fn refresh(&mut self) -> Result<()> {
let mut segments = Vec::new();
let entries =
fs::read_dir(&self.data_dir).map_err(|e| QuarantineError::io(&self.data_dir, e))?;
for entry in entries {
let entry = entry.map_err(|e| QuarantineError::io(&self.data_dir, e))?;
let name = entry.file_name();
let name_str = name.to_string_lossy();
if let Some(base_offset) = Segment::parse_filename(&name_str) {
let meta = entry.metadata().map_err(|e| QuarantineError::io(entry.path(), e))?;
segments.push(Segment { base_offset, path: entry.path(), size: meta.len() });
}
}
segments.sort_by_key(|s| s.base_offset);
debug!(segment_count = segments.len(), "SegmentManager refreshed");
self.segments = segments;
Ok(())
}
/// Get all segments, sorted by base_offset.
pub fn segments(&self) -> &[Segment] {
&self.segments
}
/// Find the segment containing the given global offset.
///
/// Uses binary search: finds the last segment whose `base_offset <= offset`.
pub fn resolve_segment(&self, offset: u64) -> Option<&Segment> {
if self.segments.is_empty() {
return None;
}
// Binary search for the largest base_offset <= offset
let idx = match self.segments.binary_search_by_key(&offset, |s| s.base_offset) {
Ok(exact) => exact,
Err(insert) => {
if insert == 0 {
return None; // offset is before all segments
}
insert - 1
}
};
Some(&self.segments[idx])
}
/// Get the current (latest) segment, if any.
pub fn current_segment(&self) -> Option<&Segment> {
self.segments.last()
}
/// Check if the current segment needs rotation.
pub fn needs_rotation(&self, current_segment_size: u64) -> bool {
current_segment_size >= self.max_segment_size
}
/// Create a new segment with the given base offset.
///
/// Writes a v2 FileHeader to the new file and adds it to the segment list.
#[instrument(skip(self), fields(base_offset))]
pub fn create_segment(&mut self, base_offset: u64) -> Result<&Segment> {
let filename = Segment::filename(base_offset);
let path = self.data_dir.join(&filename);
// Write header
let header = FileHeader::new();
let mut buf = Vec::with_capacity(HEADER_SIZE);
header.write_to(&mut buf)?;
fs::write(&path, &buf).map_err(|e| QuarantineError::io(&path, e))?;
let segment = Segment { base_offset, path, size: HEADER_SIZE as u64 };
self.segments.push(segment);
info!(base_offset, filename, "Created new segment");
self.segments.last().ok_or_else(|| {
QuarantineError::IoGeneric(std::io::Error::other("segment list unexpectedly empty"))
})
}
/// Delete segments whose entire range is below `min_cursor`.
///
/// A segment can be deleted if the *next* segment's base_offset <= min_cursor,
/// meaning no reads will ever need the deleted segment.
///
/// Returns the number of bytes freed.
#[instrument(skip(self))]
pub fn cleanup(&mut self, min_cursor: u64) -> Result<u64> {
let mut freed = 0u64;
let mut to_remove = Vec::new();
for (i, _segment) in self.segments.iter().enumerate() {
// Can only delete if there's a next segment and it starts at or below min_cursor
if i + 1 < self.segments.len() && self.segments[i + 1].base_offset <= min_cursor {
to_remove.push(i);
}
}
// Remove in reverse order to preserve indices
for &idx in to_remove.iter().rev() {
let segment = &self.segments[idx];
info!(
base_offset = segment.base_offset,
size = segment.size,
path = %segment.path.display(),
"Deleting old segment"
);
match fs::remove_file(&segment.path) {
Ok(()) => {
freed += segment.size;
self.segments.remove(idx);
}
Err(e) => {
warn!(
error = %e,
path = %segment.path.display(),
"Failed to delete segment file, keeping in list"
);
}
}
}
if freed > 0 {
info!(
freed_bytes = freed,
remaining_segments = self.segments.len(),
"Cleanup complete"
);
}
Ok(freed)
}
/// Get the data directory path.
pub fn data_dir(&self) -> &Path {
&self.data_dir
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
#[test]
fn test_segment_name_roundtrip() {
let offsets = [0u64, 1, 255, 65536, 0xDEAD_BEEF, u64::MAX];
for offset in offsets {
let name = Segment::filename(offset);
let parsed = Segment::parse_filename(&name);
assert_eq!(parsed, Some(offset), "Roundtrip failed for offset {}", offset);
}
}
#[test]
fn test_parse_filename_rejects_invalid() {
assert_eq!(Segment::parse_filename("not_a_wal.txt"), None);
assert_eq!(Segment::parse_filename("short.wal"), None);
assert_eq!(Segment::parse_filename("0000000000000000.log"), None);
assert_eq!(Segment::parse_filename(""), None);
// Too many hex digits
assert_eq!(Segment::parse_filename("00000000000000000.wal"), None);
}
#[test]
fn test_resolve_segment_binary_search() {
let dir = tempdir().expect("tempdir");
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
// Create segments at offsets 0, 1000, 2000
mgr.create_segment(0).expect("seg 0");
mgr.create_segment(1000).expect("seg 1000");
mgr.create_segment(2000).expect("seg 2000");
// Offset 0 -> segment 0
assert_eq!(mgr.resolve_segment(0).map(|s| s.base_offset), Some(0));
// Offset 500 -> segment 0
assert_eq!(mgr.resolve_segment(500).map(|s| s.base_offset), Some(0));
// Offset 999 -> segment 0
assert_eq!(mgr.resolve_segment(999).map(|s| s.base_offset), Some(0));
// Offset 1000 -> segment 1000
assert_eq!(mgr.resolve_segment(1000).map(|s| s.base_offset), Some(1000));
// Offset 1500 -> segment 1000
assert_eq!(mgr.resolve_segment(1500).map(|s| s.base_offset), Some(1000));
// Offset 2000 -> segment 2000
assert_eq!(mgr.resolve_segment(2000).map(|s| s.base_offset), Some(2000));
// Offset 99999 -> segment 2000
assert_eq!(mgr.resolve_segment(99999).map(|s| s.base_offset), Some(2000));
}
#[test]
fn test_resolve_segment_empty() {
let dir = tempdir().expect("tempdir");
let mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
assert!(mgr.resolve_segment(0).is_none());
}
#[test]
fn test_rotation_creates_new_segment() {
let dir = tempdir().expect("tempdir");
// Small threshold for testing: 1KB
let mut mgr = SegmentManager::open(dir.path(), 1024).expect("open");
mgr.create_segment(0).expect("create seg 0");
assert_eq!(mgr.segments().len(), 1);
// Simulate that segment 0 grew beyond threshold
assert!(mgr.needs_rotation(2048));
assert!(!mgr.needs_rotation(512));
mgr.create_segment(2048).expect("create seg 2048");
assert_eq!(mgr.segments().len(), 2);
}
#[test]
fn test_cleanup_deletes_old_segments() {
let dir = tempdir().expect("tempdir");
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
mgr.create_segment(0).expect("seg 0");
mgr.create_segment(1000).expect("seg 1000");
mgr.create_segment(2000).expect("seg 2000");
assert_eq!(mgr.segments().len(), 3);
// Cleanup with min_cursor=1500: can delete seg 0 (next seg starts at 1000 <= 1500)
let freed = mgr.cleanup(1500).expect("cleanup");
assert!(freed > 0);
assert_eq!(mgr.segments().len(), 2);
assert_eq!(mgr.segments()[0].base_offset, 1000);
// Cleanup with min_cursor=2500: can delete seg 1000 (next starts at 2000 <= 2500)
let freed = mgr.cleanup(2500).expect("cleanup");
assert!(freed > 0);
assert_eq!(mgr.segments().len(), 1);
assert_eq!(mgr.segments()[0].base_offset, 2000);
// Last segment is never deleted
let freed = mgr.cleanup(u64::MAX).expect("cleanup");
assert_eq!(freed, 0);
assert_eq!(mgr.segments().len(), 1);
}
#[test]
fn test_segment_manager_scans_existing_files() {
let dir = tempdir().expect("tempdir");
// Create segments manually, then reopen
{
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
mgr.create_segment(0).expect("seg 0");
mgr.create_segment(5000).expect("seg 5000");
mgr.create_segment(10000).expect("seg 10000");
}
// Reopen and verify scan
let mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("reopen");
assert_eq!(mgr.segments().len(), 3);
assert_eq!(mgr.segments()[0].base_offset, 0);
assert_eq!(mgr.segments()[1].base_offset, 5000);
assert_eq!(mgr.segments()[2].base_offset, 10000);
}
#[test]
fn test_segment_file_has_valid_header() {
let dir = tempdir().expect("tempdir");
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
mgr.create_segment(0).expect("seg 0");
// Read the file and verify header
let data = std::fs::read(&mgr.segments()[0].path).expect("read");
assert_eq!(data.len(), HEADER_SIZE);
assert_eq!(&data[0..4], b"STEM");
assert_eq!(data[4], 2); // version
}
}

View File

@ -603,17 +603,18 @@
#### 5A. Storage Engine Replacement #### 5A. Storage Engine Replacement
- [ ] **5A.1 Replace sled with redb + fjall**: sled is abandoned (author recommends alternatives). - [x] **5A.1 Replace sled with redb + fjall**: sled is abandoned (author recommends alternatives).
- **Problem:** sled is alpha-stage with known performance regressions and no active development. Our entire storage layer depends on it. - **Problem:** sled is alpha-stage with known performance regressions and no active development. Our entire storage layer depends on it.
- **Solution:** Use **redb** (pure Rust B-tree, 1.0 stable since 2023) for read-heavy paths and **fjall** (Rust LSM engine v3.0, lowest write amplification) for write-heavy paths. - **Solution:** HybridStore routes keys by prefix — **fjall** (LSM) for write-heavy paths (`H:`, `V:`, `VC:`, `VW:`, `E:`, `SUPERSEDED:`, `__CURSOR__:`) and **redb** (B-tree) for read-heavy paths (`S:`, `SP:`, `MV:`, `TR:`, `QA:`, `QT:`, `TP:`, `GS:`, `ESC:`).
- **Tasks:** - **Tasks:**
- [ ] Abstract `KVStore` trait to be backend-agnostic (already trait-based, verify no sled-specific leakage). - [x] Generalize `StorageError::Sled` to `StorageError::Backend(String)`.
- [ ] Implement `RedbStore` backend with ACID transactions. - [x] Implement `FjallStore` backend with DashMap per-key locks for atomics.
- [ ] Implement `FjallStore` backend for high-throughput assertion writes. - [x] Implement `RedbStore` backend with ACID transactions.
- [ ] Benchmark: redb vs fjall vs sled for our access patterns (bulk load, random read, prefix scan). - [x] Implement `HybridStore` routing layer with prefix-based dispatch.
- [ ] Migration tool: read all sled data, write to new backend. - [x] Migrate all ~500 tests from `SledStore` to `HybridStore`.
- [ ] Update all integration tests. - [x] Remove sled dependency entirely.
- **Crates:** `redb = "2.0"`, `fjall = "3.0"` - [x] Add criterion benchmarks (sequential put, random get, prefix scan, atomic increment, mixed workload).
- **Crates:** `redb = "2"`, `fjall = "2"`, `dashmap = "6"`
- [ ] **5A.2 Key Layout Redesign**: Prepare keys for subject-prefix range sharding. - [ ] **5A.2 Key Layout Redesign**: Prepare keys for subject-prefix range sharding.
- **Problem:** Current keys (`H:{hash}`, `S:{subject}`, `MV:{subject}:{predicate}`) scatter related data across the keyspace. Distributed sharding needs co-location. - **Problem:** Current keys (`H:{hash}`, `S:{subject}`, `MV:{subject}:{predicate}`) scatter related data across the keyspace. Distributed sharding needs co-location.
@ -930,9 +931,9 @@
* [x] **Phase 3 The Pilot**: Consumer Health vertical integration. ✅ COMPLETE * [x] **Phase 3 The Pilot**: Consumer Health vertical integration. ✅ COMPLETE
* [x] **Phase 4 The Hive**: Trust & Scale + Extension Primitives. ✅ COMPLETE * [x] **Phase 4 The Hive**: Trust & Scale + Extension Primitives. ✅ COMPLETE
* [ ] **Phase 5 The Forge**: Foundation hardening — replace sled, fix WAL, persist indices. * [ ] **Phase 5 The Forge**: Foundation hardening — replace sled, fix WAL, persist indices.
* [x] **5A.1**: Replace sled with redb/fjall (HybridStore). ✅ COMPLETE
### Next Up ### Next Up
* **Phase 5A.1**: Replace sled with redb/fjall (critical — sled is abandoned).
* **Phase 5B.2**: Implement real crash recovery (current recovery is a stub). * **Phase 5B.2**: Implement real crash recovery (current recovery is a stub).
* **Phase 5B.3**: Group commit for WAL throughput. * **Phase 5B.3**: Group commit for WAL throughput.
* **Phase 5A.2**: Key layout redesign for subject-prefix sharding. * **Phase 5A.2**: Key layout redesign for subject-prefix sharding.
@ -1053,7 +1054,7 @@
* [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) — Spanner/CockroachDB-style distributed writes adapted for append-only model. * [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) — Spanner/CockroachDB-style distributed writes adapted for append-only model.
### Key Architectural Decisions ### Key Architectural Decisions
* **sled → redb/fjall**: sled is abandoned. redb for reads, fjall for writes. * **sled → redb/fjall**: sled is abandoned. HybridStore routes by key prefix: redb for reads, fjall for writes. ✅ COMPLETE
* **Raft log = WAL**: TiKV eliminated duplicate WAL in v5.4. We should too. * **Raft log = WAL**: TiKV eliminated duplicate WAL in v5.4. We should too.
* **CRDT for data, Raft for coordination**: Assertions are a G-Set CRDT (merge = set union). Only cluster metadata needs Raft. * **CRDT for data, Raft for coordination**: Assertions are a G-Set CRDT (merge = set union). Only cluster metadata needs Raft.
* **Subject-prefix ranges**: Co-locate all data for a subject on one shard. Split hot subjects via range split. * **Subject-prefix ranges**: Co-locate all data for a subject on one shard. Split hot subjects via range split.
@ -1173,7 +1174,7 @@ Phase 3 (Data Foundation) Phase 4 (Extension Primitives) Extensio
Phase 5 (The Forge) Phase 6 (The Mesh) Phase 7+8 Phase 5 (The Forge) Phase 6 (The Mesh) Phase 7+8
======================= ======================= ================== ======================= ======================= ==================
[5A.1 Replace sled] ──────────────> [6A.1 CRDT Foundation] ──┐ [5A.1 Replace sled] ───────────> [6A.1 CRDT Foundation] ──┐
| | | |
[5A.2 Key Layout] ───────────────> [6C.2 Range Sharding] ──> | [5A.2 Key Layout] ───────────────> [6C.2 Range Sharding] ──> |
| |

View File

@ -10,27 +10,12 @@ Think of it as **Git for Truth**: just as Git lets developers work on different
## The Problem We Solve ## The Problem We Solve
### The Semaglutide Story
A woman researching a weight-loss medication finds:
| Source | Says |
|--------|------|
| Her doctor | "Generally well-tolerated" |
| FDA label | "Thyroid warning, gastroparesis rare" |
| Reddit (500+ posts) | "Stomach paralysis, can't eat, hospitalized" |
| Clinical trials | "No gastroparesis signal in Phase III" |
**What should she believe?**
A traditional database would force someone to pick one answer. The Reddit signal gets ignored or the clinical trial gets overwritten. In January 2024, the FDA added a gastroparesis warning. The Reddit users were right. The system failed because it couldn't hold "clinical trials say X, patients report Y, and these disagree" as a structured fact.
### The M&A Story ### The M&A Story
Three analyst teams assess an acquisition target. They find: Three analyst teams assess an acquisition target. They find:
| Team | Revenue Estimate | | Team | Revenue Estimate |
|------|------------------| | -------------------- | ---------------- |
| SEC Filing Analysis | $47M | | SEC Filing Analysis | $47M |
| Investor Deck | $62M | | Investor Deck | $62M |
| Bank Statement Audit | $52M | | Bank Statement Audit | $52M |
@ -42,7 +27,7 @@ The database forces "canonical truth." The acquirer picks the investor deck numb
An AI agent is tasked with deploying a microservice update. It finds: An AI agent is tasked with deploying a microservice update. It finds:
| Source | Says | | Source | Says |
|--------|------| | ---------------------- | --------------------------------------------- |
| RFC 7519 (JWT spec) | "Tokens MUST be validated with `aud` claim" | | RFC 7519 (JWT spec) | "Tokens MUST be validated with `aud` claim" |
| Internal Wiki (2024) | "Skip `aud` validation for internal services" | | Internal Wiki (2024) | "Skip `aud` validation for internal services" |
| Approved Runbook v3.2 | "Validate all claims including `aud`" | | Approved Runbook v3.2 | "Validate all claims including `aud`" |
@ -52,10 +37,23 @@ The agent picks the Stack Overflow snippet—it's the most recent thing it found
**The problem wasn't bad data. The problem was that the database erased the disagreement.** **The problem wasn't bad data. The problem was that the database erased the disagreement.**
Episteme would have surfaced the conflict: "RFC 7519 (Tier 0, regulatory) contradicts Internal Wiki (Tier 3, expert). Conflict score: 0.9. The Approved Runbook agrees with the RFC." The agent—or a human reviewer—sees the disagreement *before* deployment, not after the breach. Episteme would have surfaced the conflict: "RFC 7519 (Tier 0, regulatory) contradicts Internal Wiki (Tier 3, expert). Conflict score: 0.9. The Approved Runbook agrees with the RFC." The agent—or a human reviewer—sees the disagreement _before_ deployment, not after the breach.
**Episteme prevents AI agents from hallucinating production configs.** **Episteme prevents AI agents from hallucinating production configs.**
### The Pharmaceutical Safety Story
A doctor reviews the safety profile of a newly prescribed medication and finds conflicting information across sources:
| Source | Says |
| ------------------- | -------------------------------------------- |
| Prescribing info | "Generally well-tolerated" |
| FDA label | "Thyroid warning, gastroparesis rare" |
| Reddit (500+ posts) | "Stomach paralysis, can't eat, hospitalized" |
| Clinical trials | "No gastroparesis signal in Phase III" |
A traditional database would force someone to pick one answer. The patient reports get ignored or the clinical trial gets overwritten. When the FDA later adds a gastroparesis warning, it turns out the patient community was right. The system failed because it couldn't hold "clinical trials say X, patients report Y, and these disagree" as a structured fact.
### What These Stories Have in Common ### What These Stories Have in Common
The problem wasn't bad data. In each case, the correct information existed. The problem was that the database erased the disagreement—and nobody automated the reconciliation. The problem wasn't bad data. In each case, the correct information existed. The problem was that the database erased the disagreement—and nobody automated the reconciliation.
@ -95,7 +93,7 @@ You can query for the **conflict score** and see exactly where sources agree and
Every claim has a **source class** that affects how much weight it carries: Every claim has a **source class** that affects how much weight it carries:
| Tier | Source Type | Examples | Decay Rate | | Tier | Source Type | Examples | Decay Rate |
|------|-------------|----------|------------| | ---- | ------------- | -------------------- | ----------------- |
| 0 | Regulatory | FDA, SEC, EMA | Never fades | | 0 | Regulatory | FDA, SEC, EMA | Never fades |
| 1 | Clinical | Peer-reviewed trials | 2 year half-life | | 1 | Clinical | Peer-reviewed trials | 2 year half-life |
| 2 | Observational | Real-world studies | 1 year half-life | | 2 | Observational | Real-world studies | 1 year half-life |
@ -146,7 +144,7 @@ Episteme preserves every historical state. You can query what was believed at an
The same data can be queried with different **Lenses**: The same data can be queried with different **Lenses**:
| Lens | Question | Answer Style | | Lens | Question | Answer Style |
|------|----------|--------------| | ------------- | -------------------------------- | ------------------------------------- |
| **Consensus** | "What do most sources agree on?" | The most common answer | | **Consensus** | "What do most sources agree on?" | The most common answer |
| **Authority** | "What do trusted sources say?" | Weighted by source tier | | **Authority** | "What do trusted sources say?" | Weighted by source tier |
| **Recency** | "What's the latest?" | Most recent claim wins | | **Recency** | "What's the latest?" | Most recent claim wins |
@ -162,6 +160,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
### Consumer Health Intelligence ### Consumer Health Intelligence
**The Living Review:** A continuously updated assessment of a drug or treatment that: **The Living Review:** A continuously updated assessment of a drug or treatment that:
- Shows regulatory, clinical, and patient evidence separately - Shows regulatory, clinical, and patient evidence separately
- Surfaces emerging signals from patient communities before clinical confirmation - Surfaces emerging signals from patient communities before clinical confirmation
- Time-travels to "what was known when you started treatment" - Time-travels to "what was known when you started treatment"
@ -172,6 +171,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
### Financial Due Diligence ### Financial Due Diligence
**The Contradiction Detector:** Multiple analyst teams assess a target. The system: **The Contradiction Detector:** Multiple analyst teams assess a target. The system:
- Holds all revenue/liability estimates without forcing resolution - Holds all revenue/liability estimates without forcing resolution
- Shows where teams agree (high confidence) vs. disagree (investigate further) - Shows where teams agree (high confidence) vs. disagree (investigate further)
- Tracks which sources informed which conclusions - Tracks which sources informed which conclusions
@ -182,6 +182,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
### DevOps & Production Safety ### DevOps & Production Safety
**The Config Guardian:** AI agents deploy infrastructure changes. The system: **The Config Guardian:** AI agents deploy infrastructure changes. The system:
- Holds specs from RFCs, internal wikis, runbooks, and Stack Overflow with source tiers - Holds specs from RFCs, internal wikis, runbooks, and Stack Overflow with source tiers
- Blocks deployments when high-tier sources (RFCs, approved runbooks) conflict with the agent's chosen config - Blocks deployments when high-tier sources (RFCs, approved runbooks) conflict with the agent's chosen config
- Auto-escalates to human review when conflict score exceeds threshold - Auto-escalates to human review when conflict score exceeds threshold
@ -192,6 +193,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
### AI Agent Collaboration ### AI Agent Collaboration
**The Shared Memory:** Multiple AI research agents explore a topic. The system: **The Shared Memory:** Multiple AI research agents explore a topic. The system:
- Lets each agent contribute observations with confidence scores - Lets each agent contribute observations with confidence scores
- Resolves conflicts based on agent reputation (trust scores) - Resolves conflicts based on agent reputation (trust scores)
- Maintains audit trail: "Agent A believed X because it read Y" - Maintains audit trail: "Agent A believed X because it read Y"
@ -254,6 +256,7 @@ This isn't a fact-checker. Fact-checkers pick a side. This shows you **all the s
**Traditional databases optimize for consensus.** They want one answer. **Traditional databases optimize for consensus.** They want one answer.
**Episteme optimizes for epistemic honesty.** It wants you to see: **Episteme optimizes for epistemic honesty.** It wants you to see:
- What different sources believe - What different sources believe
- How confident they are - How confident they are
- Where they disagree - Where they disagree
@ -284,7 +287,7 @@ The result: a database that acts more like a **version control system for knowle
## When Is Episteme the Right Choice? ## When Is Episteme the Right Choice?
| Scenario | Episteme? | Why | | Scenario | Episteme? | Why |
|----------|-----------|-----| | ---------------------------------------- | --------- | ------------------------ |
| Multiple sources report different things | Yes | Core use case | | Multiple sources report different things | Yes | Core use case |
| You need to weight sources by authority | Yes | Source class hierarchy | | You need to weight sources by authority | Yes | Source class hierarchy |
| You need to surface disagreement | Yes | Skeptic lens | | You need to surface disagreement | Yes | Skeptic lens |