feat: WAL hardening (Phase 5B) - CRC32C, crash recovery, group commit, log rotation
Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
55349845d0
commit
3320c24afa
@ -30,7 +30,7 @@ The Arena simulation tests fundamental write→read paths through the system, bu
|
|||||||
```
|
```
|
||||||
Agent.sign_assertion() → write_assertion_to_wal() → Journal.append()
|
Agent.sign_assertion() → write_assertion_to_wal() → Journal.append()
|
||||||
→ IngestWorker.step() → IngestWorker.ingest_assertion()
|
→ IngestWorker.step() → IngestWorker.ingest_assertion()
|
||||||
→ SledStore.put() → IndexStore.add_to_indexes()
|
→ HybridStore.put() → IndexStore.add_to_indexes()
|
||||||
```
|
```
|
||||||
|
|
||||||
**What Works:**
|
**What Works:**
|
||||||
|
|||||||
@ -55,10 +55,10 @@ trust_store.decay_trust_ranks(current_timestamp, Some(custom_half_life)).await?;
|
|||||||
|
|
||||||
```rust
|
```rust
|
||||||
use stemedb_lens::TrustAwareAuthorityLens;
|
use stemedb_lens::TrustAwareAuthorityLens;
|
||||||
use stemedb_storage::{SledStore, GenericTrustRankStore};
|
use stemedb_storage::{HybridStore, GenericTrustRankStore};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
let store = SledStore::open("./data")?;
|
let store = HybridStore::open("./data")?;
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
|
|||||||
@ -36,7 +36,7 @@ pub enum StemeError {
|
|||||||
InvalidSignature { agent: AgentId },
|
InvalidSignature { agent: AgentId },
|
||||||
|
|
||||||
#[error("storage error: {0}")]
|
#[error("storage error: {0}")]
|
||||||
Storage(#[from] sled::Error),
|
Storage(String),
|
||||||
|
|
||||||
#[error("serialization error: {0}")]
|
#[error("serialization error: {0}")]
|
||||||
Serialization(String),
|
Serialization(String),
|
||||||
|
|||||||
@ -53,11 +53,11 @@ pub trait VoteStore: Send + Sync {
|
|||||||
## Usage Example
|
## Usage Example
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
use stemedb_storage::{SledStore, GenericVoteStore, VoteStore};
|
use stemedb_storage::{HybridStore, GenericVoteStore, VoteStore};
|
||||||
use stemedb_core::types::Vote;
|
use stemedb_core::types::Vote;
|
||||||
|
|
||||||
// Create vote store backed by sled
|
// Create vote store backed by HybridStore (fjall + redb)
|
||||||
let kv_store = SledStore::open("./data")?;
|
let kv_store = HybridStore::open("./data")?;
|
||||||
let vote_store = GenericVoteStore::new(kv_store);
|
let vote_store = GenericVoteStore::new(kv_store);
|
||||||
|
|
||||||
// High-velocity vote ingestion
|
// High-velocity vote ingestion
|
||||||
|
|||||||
@ -5,12 +5,12 @@
|
|||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to sled using content-addressed keys.
|
The Ingestor is the background worker that bridges the Write-Ahead Log (WAL) to the KV storage engine. It continuously tails the WAL and persists records to the HybridStore (fjall + redb) using content-addressed keys.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
[WAL Journal] ---> [IngestWorker] ---> [KVStore (sled)]
|
[WAL Journal] ---> [IngestWorker] ---> [KVStore (HybridStore)]
|
||||||
|
|
|
|
||||||
v
|
v
|
||||||
[Subject Index]
|
[Subject Index]
|
||||||
@ -39,11 +39,11 @@ Discriminator for WAL payloads (8-byte aligned header):
|
|||||||
```rust
|
```rust
|
||||||
use stemedb_ingest::{Ingestor, serialize_assertion};
|
use stemedb_ingest::{Ingestor, serialize_assertion};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
|
|
||||||
// Create components
|
// Create components
|
||||||
let journal = Arc::new(Mutex::new(Journal::open("./wal")?));
|
let journal = Arc::new(Mutex::new(Journal::open("./wal")?));
|
||||||
let store = Arc::new(SledStore::open("./db")?);
|
let store = Arc::new(HybridStore::open("./db")?);
|
||||||
|
|
||||||
// Create and start ingestor
|
// Create and start ingestor
|
||||||
let mut ingestor = Ingestor::new(journal.clone(), store);
|
let mut ingestor = Ingestor::new(journal.clone(), store);
|
||||||
@ -79,5 +79,5 @@ The ingestor has integration tests covering:
|
|||||||
|
|
||||||
## Related
|
## Related
|
||||||
|
|
||||||
- [Storage Service](./storage.md) - KVStore trait and SledStore
|
- [Storage Service](./storage.md) - KVStore trait and HybridStore (fjall + redb)
|
||||||
- [Content Addressing](../patterns/content-addressing.md) - BLAKE3 hashing
|
- [Content Addressing](../patterns/content-addressing.md) - BLAKE3 hashing
|
||||||
|
|||||||
@ -74,10 +74,10 @@ confidence = winner_weight / total_weight_across_all_candidates
|
|||||||
**Example:**
|
**Example:**
|
||||||
```rust
|
```rust
|
||||||
use stemedb_lens::VoteAwareConsensusLens;
|
use stemedb_lens::VoteAwareConsensusLens;
|
||||||
use stemedb_storage::{SledStore, GenericVoteStore};
|
use stemedb_storage::{HybridStore, GenericVoteStore};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
let store = SledStore::open("./data").await?;
|
let store = HybridStore::open("./data").await?;
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
|
|
||||||
@ -112,10 +112,10 @@ confidence = weighted_score // Direct weighted score
|
|||||||
**Example:**
|
**Example:**
|
||||||
```rust
|
```rust
|
||||||
use stemedb_lens::TrustAwareAuthorityLens;
|
use stemedb_lens::TrustAwareAuthorityLens;
|
||||||
use stemedb_storage::{SledStore, GenericTrustRankStore};
|
use stemedb_storage::{HybridStore, GenericTrustRankStore};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
let store = SledStore::open("./data").await?;
|
let store = HybridStore::open("./data").await?;
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
@ -189,10 +189,10 @@ GET /v1/query?subject=Acme&predicate=lease_liability&lens=EpochAware
|
|||||||
**Example:**
|
**Example:**
|
||||||
```rust
|
```rust
|
||||||
use stemedb_lens::EpochAwareLens;
|
use stemedb_lens::EpochAwareLens;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
let store = Arc::new(SledStore::open("./data").expect("store"));
|
let store = Arc::new(HybridStore::open("./data").expect("store"));
|
||||||
|
|
||||||
// Default: filter superseded epochs, then pick most recent
|
// Default: filter superseded epochs, then pick most recent
|
||||||
let lens = EpochAwareLens::with_recency(store.clone());
|
let lens = EpochAwareLens::with_recency(store.clone());
|
||||||
@ -250,10 +250,10 @@ GET /v1/skeptic?subject=Semaglutide&predicate=muscle_effect
|
|||||||
**Example:**
|
**Example:**
|
||||||
```rust
|
```rust
|
||||||
use stemedb_lens::SkepticLens;
|
use stemedb_lens::SkepticLens;
|
||||||
use stemedb_storage::{SledStore, GenericVoteStore, GenericTrustRankStore};
|
use stemedb_storage::{HybridStore, GenericVoteStore, GenericTrustRankStore};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
let store = SledStore::open("./data").await?;
|
let store = HybridStore::open("./data").await?;
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
|
|||||||
@ -10,12 +10,14 @@ Episteme uses a Log-Structured, Content-Addressed storage model. Writes append t
|
|||||||
**Key Facts:**
|
**Key Facts:**
|
||||||
- Append-only (never mutate)
|
- Append-only (never mutate)
|
||||||
- WAL for durability (fsync on write)
|
- WAL for durability (fsync on write)
|
||||||
- KV store for indexes (sled MVP, trait-abstracted)
|
- KV store: HybridStore (fjall for writes, redb for reads)
|
||||||
- Content-addressed by BLAKE3 hash
|
- Content-addressed by BLAKE3 hash
|
||||||
|
|
||||||
**File Pointers:**
|
**File Pointers:**
|
||||||
- `crates/stemedb-storage/src/traits.rs` - KVStore trait
|
- `crates/stemedb-storage/src/traits.rs` - KVStore trait
|
||||||
- `crates/stemedb-storage/src/sled_backend.rs` - Sled implementation
|
- `crates/stemedb-storage/src/hybrid_backend.rs` - HybridStore (routes to fjall or redb)
|
||||||
|
- `crates/stemedb-storage/src/fjall_backend.rs` - FjallStore (write-heavy keys)
|
||||||
|
- `crates/stemedb-storage/src/redb_backend.rs` - RedbStore (read-heavy keys)
|
||||||
- `crates/stemedb-storage/src/serde_helpers.rs` - Storage-layer serialize/deserialize helpers
|
- `crates/stemedb-storage/src/serde_helpers.rs` - Storage-layer serialize/deserialize helpers
|
||||||
- `crates/stemedb-storage/src/vote_store.rs` - VoteStore (Ballot Box)
|
- `crates/stemedb-storage/src/vote_store.rs` - VoteStore (Ballot Box)
|
||||||
- `crates/stemedb-storage/src/index_store.rs` - IndexStore (S: and SP: indexes)
|
- `crates/stemedb-storage/src/index_store.rs` - IndexStore (S: and SP: indexes)
|
||||||
|
|||||||
BIN
applications/aphoria/aphoria-vision.pdf
Normal file
BIN
applications/aphoria/aphoria-vision.pdf
Normal file
Binary file not shown.
@ -1,10 +1,10 @@
|
|||||||
# Sentinel Roadmap
|
# Aphoria Roadmap
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 0: StemeDB Foundation
|
## Phase 0: StemeDB Foundation
|
||||||
|
|
||||||
Changes to the core database that Sentinel depends on. These ship before the CLI.
|
Changes to the core database that Aphoria depends on. These ship before the CLI.
|
||||||
|
|
||||||
### 0.1 ConceptPath Type
|
### 0.1 ConceptPath Type
|
||||||
|
|
||||||
@ -53,7 +53,7 @@ GET /v1/concepts/suggest Suggested aliases (shared leaf detection)
|
|||||||
|
|
||||||
## Phase 1: Authoritative Corpus
|
## Phase 1: Authoritative Corpus
|
||||||
|
|
||||||
Before Sentinel can find conflicts, Episteme needs the authoritative sources to conflict against.
|
Before Aphoria can find conflicts, Episteme needs the authoritative sources to conflict against.
|
||||||
|
|
||||||
### 1.1 RFC Ingester
|
### 1.1 RFC Ingester
|
||||||
|
|
||||||
@ -94,13 +94,13 @@ For v1, manually curate a small set of vendor doc claims:
|
|||||||
|
|
||||||
These are `vendor://{product}/{topic}/{claim}` at Tier 2.
|
These are `vendor://{product}/{topic}/{claim}` at Tier 2.
|
||||||
|
|
||||||
This doesn't need to be exhaustive. It needs to cover the claims that Sentinel's extractors will actually find in code.
|
This doesn't need to be exhaustive. It needs to cover the claims that Aphoria's extractors will actually find in code.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 2: CLI Core
|
## Phase 2: CLI Core
|
||||||
|
|
||||||
The Sentinel binary itself.
|
The Aphoria binary itself.
|
||||||
|
|
||||||
### 2.1 Project Walker
|
### 2.1 Project Walker
|
||||||
|
|
||||||
@ -174,7 +174,7 @@ The bridge handles:
|
|||||||
- ConceptPath construction from extractor output
|
- ConceptPath construction from extractor output
|
||||||
- Source hash computation (BLAKE3 of the file at scan time)
|
- Source hash computation (BLAKE3 of the file at scan time)
|
||||||
- Source metadata encoding (file path, line number, extraction method)
|
- Source metadata encoding (file path, line number, extraction method)
|
||||||
- Signing with the Sentinel agent's keypair
|
- Signing with the Aphoria agent's keypair
|
||||||
|
|
||||||
### 2.4 Conflict Query
|
### 2.4 Conflict Query
|
||||||
|
|
||||||
@ -201,10 +201,10 @@ The Skeptic lens returns all claims for the concept across all aliased paths, wi
|
|||||||
### 2.5 Report Output
|
### 2.5 Report Output
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel scan ./citadeldb --format table
|
$ aphoria scan ./citadeldb --format table
|
||||||
|
|
||||||
┌──────────────────────────────────────────────────────────────────────┐
|
┌──────────────────────────────────────────────────────────────────────┐
|
||||||
│ Sentinel Report: citadeldb │
|
│ Aphoria Report: citadeldb │
|
||||||
│ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │
|
│ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │
|
||||||
├──────────┬───────────────────────────────────────┬──────────┬───────┤
|
├──────────┬───────────────────────────────────────┬──────────┬───────┤
|
||||||
│ Verdict │ Concept │ Score │ Tier │
|
│ Verdict │ Concept │ Score │ Tier │
|
||||||
@ -219,12 +219,12 @@ Details:
|
|||||||
BLOCK code://rust/citadeldb/auth/jwt/audience_validation
|
BLOCK code://rust/citadeldb/auth/jwt/audience_validation
|
||||||
Your code: aud validation disabled (src/auth/jwt.rs:47)
|
Your code: aud validation disabled (src/auth/jwt.rs:47)
|
||||||
RFC 7519: aud validation MUST be enabled (Tier 0)
|
RFC 7519: aud validation MUST be enabled (Tier 0)
|
||||||
Action: Fix or acknowledge with: sentinel ack <path> --reason "..."
|
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
|
||||||
|
|
||||||
BLOCK code://rust/citadeldb/net/tls/cert_verification
|
BLOCK code://rust/citadeldb/net/tls/cert_verification
|
||||||
Your code: verify = false (src/net/client.rs:23)
|
Your code: verify = false (src/net/client.rs:23)
|
||||||
OWASP: verification required (Tier 1)
|
OWASP: verification required (Tier 1)
|
||||||
Action: Fix or acknowledge with: sentinel ack <path> --reason "..."
|
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
|
||||||
|
|
||||||
FLAG code://rust/citadeldb/http/timeout
|
FLAG code://rust/citadeldb/http/timeout
|
||||||
Your code: timeout = 0 (infinite) (config/production.yaml:8)
|
Your code: timeout = 0 (infinite) (config/production.yaml:8)
|
||||||
@ -237,7 +237,7 @@ Output formats: `table` (default), `json`, `sarif` (for CI integration), `markdo
|
|||||||
### 2.6 Acknowledge Command
|
### 2.6 Acknowledge Command
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel ack code://rust/citadeldb/auth/jwt/audience_validation \
|
$ aphoria ack code://rust/citadeldb/auth/jwt/audience_validation \
|
||||||
--reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003."
|
--reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003."
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -256,27 +256,27 @@ The conflict still exists in Episteme, but the acknowledgment is recorded. Next
|
|||||||
|
|
||||||
### 3.1 Claude Code Skill
|
### 3.1 Claude Code Skill
|
||||||
|
|
||||||
A `/sentinel` skill that wraps the CLI:
|
A `/aphoria` skill that wraps the CLI:
|
||||||
|
|
||||||
```
|
```
|
||||||
/sentinel scan Scan current project, report conflicts
|
/aphoria scan Scan current project, report conflicts
|
||||||
/sentinel scan --fix Scan and offer to fix each conflict
|
/aphoria scan --fix Scan and offer to fix each conflict
|
||||||
/sentinel ack <path> Acknowledge a conflict with a reason
|
/aphoria ack <path> Acknowledge a conflict with a reason
|
||||||
/sentinel status Show current conflict summary
|
/aphoria status Show current conflict summary
|
||||||
/sentinel diff Show new conflicts since last scan
|
/aphoria diff Show new conflicts since last scan
|
||||||
```
|
```
|
||||||
|
|
||||||
The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session.
|
The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session.
|
||||||
|
|
||||||
### 3.2 Agent Pre-Flight Hook
|
### 3.2 Agent Pre-Flight Hook
|
||||||
|
|
||||||
A Claude Code hook that runs Sentinel before certain operations:
|
A Claude Code hook that runs Aphoria before certain operations:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"hooks": {
|
"hooks": {
|
||||||
"pre-commit": "sentinel scan --format sarif --exit-code",
|
"pre-commit": "aphoria scan --format sarif --exit-code",
|
||||||
"pre-deploy": "sentinel scan --strict --exit-code"
|
"pre-deploy": "aphoria scan --strict --exit-code"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@ -285,7 +285,7 @@ A Claude Code hook that runs Sentinel before certain operations:
|
|||||||
|
|
||||||
### 3.3 Alias Suggestion Workflow
|
### 3.3 Alias Suggestion Workflow
|
||||||
|
|
||||||
When Sentinel scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts:
|
When Aphoria scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts:
|
||||||
|
|
||||||
```
|
```
|
||||||
New concept detected: code://rust/newproject/auth/jwt/audience_validation
|
New concept detected: code://rust/newproject/auth/jwt/audience_validation
|
||||||
@ -305,8 +305,8 @@ Accepting creates the alias. Deferring flags it for later review. Rejecting reco
|
|||||||
### 4.1 GitHub Action
|
### 4.1 GitHub Action
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
- name: Sentinel Scan
|
- name: Aphoria Scan
|
||||||
uses: orchard9/sentinel-action@v1
|
uses: orchard9/aphoria-action@v1
|
||||||
with:
|
with:
|
||||||
episteme-url: ${{ secrets.EPISTEME_URL }}
|
episteme-url: ${{ secrets.EPISTEME_URL }}
|
||||||
fail-on: block
|
fail-on: block
|
||||||
@ -317,10 +317,10 @@ Publishes SARIF results to GitHub Security tab. BLOCK verdicts fail the check. F
|
|||||||
|
|
||||||
### 4.2 PR Comment Bot
|
### 4.2 PR Comment Bot
|
||||||
|
|
||||||
On pull request, Sentinel scans the diff (not the whole project) and comments:
|
On pull request, Aphoria scans the diff (not the whole project) and comments:
|
||||||
|
|
||||||
```
|
```
|
||||||
## Sentinel Report
|
## Aphoria Report
|
||||||
|
|
||||||
This PR introduces 1 new conflict:
|
This PR introduces 1 new conflict:
|
||||||
|
|
||||||
@ -328,15 +328,15 @@ This PR introduces 1 new conflict:
|
|||||||
|------|----------|-------|
|
|------|----------|-------|
|
||||||
| src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 |
|
| src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 |
|
||||||
|
|
||||||
Run `sentinel ack` to acknowledge, or fix before merge.
|
Run `aphoria ack` to acknowledge, or fix before merge.
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4.3 Baseline Mode
|
### 4.3 Baseline Mode
|
||||||
|
|
||||||
For existing projects with many conflicts, `sentinel baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem.
|
For existing projects with many conflicts, `aphoria baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel baseline
|
$ aphoria baseline
|
||||||
Baseline recorded: 12 existing conflicts frozen.
|
Baseline recorded: 12 existing conflicts frozen.
|
||||||
Future scans will only report new conflicts.
|
Future scans will only report new conflicts.
|
||||||
```
|
```
|
||||||
@ -347,7 +347,7 @@ Future scans will only report new conflicts.
|
|||||||
|
|
||||||
### 5.1 Gap Detection
|
### 5.1 Gap Detection
|
||||||
|
|
||||||
When Sentinel extracts a claim and no authoritative source exists for that concept, log it as a gap:
|
When Aphoria extracts a claim and no authoritative source exists for that concept, log it as a gap:
|
||||||
|
|
||||||
```
|
```
|
||||||
GAP: code://rust/citadeldb/cache/redis/max_memory_policy
|
GAP: code://rust/citadeldb/cache/redis/max_memory_policy
|
||||||
@ -363,11 +363,11 @@ When a gap is seen across N projects (configurable, default 3), dispatch a resea
|
|||||||
2. Finds Redis official docs
|
2. Finds Redis official docs
|
||||||
3. Extracts normative claims: "default is `noeviction`, recommended `allkeys-lru` for cache use cases"
|
3. Extracts normative claims: "default is `noeviction`, recommended `allkeys-lru` for cache use cases"
|
||||||
4. Ingests as `vendor://redis/cache/max_memory_policy` at Tier 2
|
4. Ingests as `vendor://redis/cache/max_memory_policy` at Tier 2
|
||||||
5. Future Sentinel scans now have something to conflict against
|
5. Future Aphoria scans now have something to conflict against
|
||||||
|
|
||||||
### 5.3 Community Corpus Contributions
|
### 5.3 Community Corpus Contributions
|
||||||
|
|
||||||
Users who run Sentinel can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate:
|
Users who run Aphoria can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate:
|
||||||
|
|
||||||
- "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries
|
- "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries
|
||||||
- "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept
|
- "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept
|
||||||
@ -381,7 +381,7 @@ Users who run Sentinel can opt in to contribute their alias mappings and acknowl
|
|||||||
|-------|-------------|------------|
|
|-------|-------------|------------|
|
||||||
| 0 | ConceptPath in StemeDB | concept-hierarchy spec |
|
| 0 | ConceptPath in StemeDB | concept-hierarchy spec |
|
||||||
| 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 |
|
| 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 |
|
||||||
| 2 | Sentinel CLI (scan, report, ack) | Phase 0, Phase 1 |
|
| 2 | Aphoria CLI (scan, report, ack) | Phase 0, Phase 1 |
|
||||||
| 3 | Claude Code skill + hooks | Phase 2 |
|
| 3 | Claude Code skill + hooks | Phase 2 |
|
||||||
| 4 | CI integration (GitHub Action, PR bot) | Phase 2 |
|
| 4 | CI integration (GitHub Action, PR bot) | Phase 2 |
|
||||||
| 5 | Research agent loop | Phase 2, Phase 4 (gap data) |
|
| 5 | Research agent loop | Phase 2, Phase 4 (gap data) |
|
||||||
@ -1,4 +1,4 @@
|
|||||||
# Sentinel Technical Spec
|
# Aphoria Technical Spec
|
||||||
|
|
||||||
**Status:** Draft
|
**Status:** Draft
|
||||||
**Date:** 2026-02-02
|
**Date:** 2026-02-02
|
||||||
@ -7,14 +7,14 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Sentinel is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.
|
Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.
|
||||||
|
|
||||||
```
|
```
|
||||||
sentinel scan <project-root> [--config sentinel.toml] [--format table|json|sarif|markdown]
|
aphoria scan <project-root> [--config aphoria.toml] [--format table|json|sarif|markdown]
|
||||||
sentinel ack <concept-path> --reason "..."
|
aphoria ack <concept-path> --reason "..."
|
||||||
sentinel baseline
|
aphoria baseline
|
||||||
sentinel diff
|
aphoria diff
|
||||||
sentinel status
|
aphoria status
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -23,7 +23,7 @@ sentinel status
|
|||||||
|
|
||||||
```
|
```
|
||||||
┌──────────────────────────────────────────────────────────────┐
|
┌──────────────────────────────────────────────────────────────┐
|
||||||
│ sentinel CLI │
|
│ aphoria CLI │
|
||||||
│ │
|
│ │
|
||||||
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │
|
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │
|
||||||
│ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
|
│ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
|
||||||
@ -46,13 +46,13 @@ sentinel status
|
|||||||
└──────────────────────────────────────────────────────────────┘
|
└──────────────────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
Sentinel depends on:
|
Aphoria depends on:
|
||||||
- `stemedb-core` (types: ConceptPath, Assertion, SourceClass)
|
- `stemedb-core` (types: ConceptPath, Assertion, SourceClass)
|
||||||
- `stemedb-storage` (KVStore, IndexStore, AliasStore)
|
- `stemedb-storage` (KVStore, IndexStore, AliasStore)
|
||||||
- `stemedb-ingest` (ingestion pipeline)
|
- `stemedb-ingest` (ingestion pipeline)
|
||||||
- `stemedb-query` (query engine, lenses)
|
- `stemedb-query` (query engine, lenses)
|
||||||
|
|
||||||
It does **not** depend on `stemedb-api`. Sentinel talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).
|
It does **not** depend on `stemedb-api`. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -60,11 +60,11 @@ It does **not** depend on `stemedb-api`. Sentinel talks to Episteme directly thr
|
|||||||
|
|
||||||
```
|
```
|
||||||
crates/
|
crates/
|
||||||
sentinel/
|
aphoria/
|
||||||
Cargo.toml
|
Cargo.toml
|
||||||
src/
|
src/
|
||||||
main.rs CLI entrypoint (clap)
|
main.rs CLI entrypoint (clap)
|
||||||
config.rs sentinel.toml parsing
|
config.rs aphoria.toml parsing
|
||||||
walker/
|
walker/
|
||||||
mod.rs Project walker orchestration
|
mod.rs Project walker orchestration
|
||||||
language.rs Language detection
|
language.rs Language detection
|
||||||
@ -96,7 +96,7 @@ crates/
|
|||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
`sentinel.toml` at project root (optional, sensible defaults):
|
`aphoria.toml` at project root (optional, sensible defaults):
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
[project]
|
[project]
|
||||||
@ -104,7 +104,7 @@ name = "citadeldb"
|
|||||||
language = "rust" # auto-detected if omitted
|
language = "rust" # auto-detected if omitted
|
||||||
|
|
||||||
[episteme]
|
[episteme]
|
||||||
data_dir = "~/.sentinel/db" # local Episteme instance
|
data_dir = "~/.aphoria/db" # local Episteme instance
|
||||||
# url = "http://localhost:3000" # future: remote instance
|
# url = "http://localhost:3000" # future: remote instance
|
||||||
|
|
||||||
[thresholds]
|
[thresholds]
|
||||||
@ -121,7 +121,7 @@ min_reasonable_ms = 1000 # flag timeouts below this
|
|||||||
max_reasonable_ms = 300000 # flag timeouts above this
|
max_reasonable_ms = 300000 # flag timeouts above this
|
||||||
|
|
||||||
[extractors.dep_versions]
|
[extractors.dep_versions]
|
||||||
advisory_db = "~/.sentinel/advisory-db" # rustsec/advisory-db clone
|
advisory_db = "~/.aphoria/advisory-db" # rustsec/advisory-db clone
|
||||||
|
|
||||||
[scan]
|
[scan]
|
||||||
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
|
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
|
||||||
@ -139,7 +139,7 @@ auto_accept_tier0 = true # auto-accept alias suggestions to Tier 0 sources
|
|||||||
### Language Detection
|
### Language Detection
|
||||||
|
|
||||||
Priority order:
|
Priority order:
|
||||||
1. Explicit `language` in `sentinel.toml`
|
1. Explicit `language` in `aphoria.toml`
|
||||||
2. Dominant language heuristic (count files by extension)
|
2. Dominant language heuristic (count files by extension)
|
||||||
3. Per-file extension mapping
|
3. Per-file extension mapping
|
||||||
|
|
||||||
@ -207,7 +207,7 @@ docker-compose.yml
|
|||||||
```
|
```
|
||||||
|
|
||||||
The project name comes from:
|
The project name comes from:
|
||||||
1. `sentinel.toml` `project.name`
|
1. `aphoria.toml` `project.name`
|
||||||
2. `Cargo.toml` `[package] name`
|
2. `Cargo.toml` `[package] name`
|
||||||
3. `go.mod` module name (last segment)
|
3. `go.mod` module name (last segment)
|
||||||
4. `package.json` `name`
|
4. `package.json` `name`
|
||||||
@ -456,7 +456,7 @@ Value: Text("1.0.2")
|
|||||||
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
|
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
|
||||||
```
|
```
|
||||||
|
|
||||||
The advisory databases are downloaded locally and refreshed periodically. Sentinel doesn't call external APIs during scan.
|
The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan.
|
||||||
|
|
||||||
### Extractor: cors_config
|
### Extractor: cors_config
|
||||||
|
|
||||||
@ -519,7 +519,7 @@ fn to_assertion(
|
|||||||
"line": claim.line,
|
"line": claim.line,
|
||||||
"matched_text": claim.matched_text,
|
"matched_text": claim.matched_text,
|
||||||
"extractor": claim.concept_path.leaf(),
|
"extractor": claim.concept_path.leaf(),
|
||||||
"scan_tool": "sentinel",
|
"scan_tool": "aphoria",
|
||||||
"scan_version": env!("CARGO_PKG_VERSION"),
|
"scan_version": env!("CARGO_PKG_VERSION"),
|
||||||
}));
|
}));
|
||||||
|
|
||||||
@ -557,7 +557,7 @@ When code changes between scans, new assertions are created. Old assertions rema
|
|||||||
Each scan is recorded as an assertion about itself:
|
Each scan is recorded as an assertion about itself:
|
||||||
|
|
||||||
```
|
```
|
||||||
Subject: sentinel://scan/{project_name}/{scan_id}
|
Subject: aphoria://scan/{project_name}/{scan_id}
|
||||||
Predicate: completed
|
Predicate: completed
|
||||||
Object: Text(json!({
|
Object: Text(json!({
|
||||||
"project": "citadeldb",
|
"project": "citadeldb",
|
||||||
@ -570,7 +570,7 @@ Object: Text(json!({
|
|||||||
}))
|
}))
|
||||||
```
|
```
|
||||||
|
|
||||||
This enables `sentinel diff` — compare two scan records and their associated assertions.
|
This enables `aphoria diff` — compare two scan records and their associated assertions.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -662,7 +662,7 @@ async fn check_conflict(
|
|||||||
|
|
||||||
### Acknowledged Conflicts
|
### Acknowledged Conflicts
|
||||||
|
|
||||||
When a conflict has been acknowledged (via `sentinel ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:
|
When a conflict has been acknowledged (via `aphoria ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:
|
||||||
|
|
||||||
```
|
```
|
||||||
ACK code://rust/citadeldb/auth/jwt/audience_validation
|
ACK code://rust/citadeldb/auth/jwt/audience_validation
|
||||||
@ -689,9 +689,9 @@ SARIF (Static Analysis Results Interchange Format) is the standard for CI securi
|
|||||||
"runs": [{
|
"runs": [{
|
||||||
"tool": {
|
"tool": {
|
||||||
"driver": {
|
"driver": {
|
||||||
"name": "sentinel",
|
"name": "aphoria",
|
||||||
"version": "0.1.0",
|
"version": "0.1.0",
|
||||||
"informationUri": "https://github.com/orchard9/sentinel"
|
"informationUri": "https://github.com/orchard9/aphoria"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"results": [{
|
"results": [{
|
||||||
@ -754,26 +754,26 @@ SARIF (Static Analysis Results Interchange Format) is the standard for CI securi
|
|||||||
|
|
||||||
### Baseline
|
### Baseline
|
||||||
|
|
||||||
`sentinel baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts.
|
`aphoria baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts.
|
||||||
|
|
||||||
Implementation: store the baseline scan ID in `.sentinel/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's.
|
Implementation: store the baseline scan ID in `.aphoria/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's.
|
||||||
|
|
||||||
```
|
```
|
||||||
.sentinel/
|
.aphoria/
|
||||||
baseline # scan ID of the baseline
|
baseline # scan ID of the baseline
|
||||||
config.toml # symlink or copy of sentinel.toml
|
config.toml # symlink or copy of aphoria.toml
|
||||||
agent.key # Ed25519 keypair for this project's Sentinel agent
|
agent.key # Ed25519 keypair for this project's Aphoria agent
|
||||||
```
|
```
|
||||||
|
|
||||||
### Diff
|
### Diff
|
||||||
|
|
||||||
`sentinel diff` shows:
|
`aphoria diff` shows:
|
||||||
- New conflicts (in current scan but not baseline)
|
- New conflicts (in current scan but not baseline)
|
||||||
- Resolved conflicts (in baseline but not current scan)
|
- Resolved conflicts (in baseline but not current scan)
|
||||||
- Changed conflicts (same concept, different score or verdict)
|
- Changed conflicts (same concept, different score or verdict)
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel diff
|
$ aphoria diff
|
||||||
|
|
||||||
NEW code://rust/citadeldb/cache/redis/max_connections
|
NEW code://rust/citadeldb/cache/redis/max_connections
|
||||||
Your code: max_connections = 10000 (config/redis.yaml:5)
|
Your code: max_connections = 10000 (config/redis.yaml:5)
|
||||||
@ -791,12 +791,12 @@ $ sentinel diff
|
|||||||
|
|
||||||
## Agent Keypair
|
## Agent Keypair
|
||||||
|
|
||||||
Sentinel signs assertions with a per-project Ed25519 keypair stored in `.sentinel/agent.key`. Generated on first `sentinel scan` if it doesn't exist.
|
Aphoria signs assertions with a per-project Ed25519 keypair stored in `.aphoria/agent.key`. Generated on first `aphoria scan` if it doesn't exist.
|
||||||
|
|
||||||
The keypair identifies "Sentinel scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
|
The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
|
||||||
- Per-project audit trails ("which Sentinel agent found this?")
|
- Per-project audit trails ("which Aphoria agent found this?")
|
||||||
- TrustRank per Sentinel instance (a well-calibrated Sentinel gains reputation)
|
- TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation)
|
||||||
- Distinguishing human-authored assertions from Sentinel-extracted ones
|
- Distinguishing human-authored assertions from Aphoria-extracted ones
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -804,15 +804,15 @@ The keypair identifies "Sentinel scanning project X" as a distinct agent in Epis
|
|||||||
|
|
||||||
### Local Mode (Default)
|
### Local Mode (Default)
|
||||||
|
|
||||||
Sentinel ships with an embedded Episteme instance. No server needed. The database lives at `~/.sentinel/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`).
|
Aphoria ships with an embedded Episteme instance. No server needed. The database lives at `~/.aphoria/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`).
|
||||||
|
|
||||||
The authoritative corpus (RFCs, OWASP) is also in the local instance. `sentinel init` bootstraps it.
|
The authoritative corpus (RFCs, OWASP) is also in the local instance. `aphoria init` bootstraps it.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel init
|
$ aphoria init
|
||||||
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
|
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
|
||||||
Downloading OWASP cheat sheets ... 89 assertions ingested.
|
Downloading OWASP cheat sheets ... 89 assertions ingested.
|
||||||
Ready. Run `sentinel scan <project>` to begin.
|
Ready. Run `aphoria scan <project>` to begin.
|
||||||
```
|
```
|
||||||
|
|
||||||
### Remote Mode (Future)
|
### Remote Mode (Future)
|
||||||
@ -820,12 +820,12 @@ Ready. Run `sentinel scan <project>` to begin.
|
|||||||
```toml
|
```toml
|
||||||
[episteme]
|
[episteme]
|
||||||
url = "https://episteme.example.com"
|
url = "https://episteme.example.com"
|
||||||
api_key = "${SENTINEL_API_KEY}"
|
api_key = "${APHORIA_API_KEY}"
|
||||||
```
|
```
|
||||||
|
|
||||||
In remote mode, Sentinel ingests into and queries from a shared Episteme instance. This enables:
|
In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables:
|
||||||
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
|
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
|
||||||
- Shared authoritative corpus (ingested once, used by all Sentinel agents)
|
- Shared authoritative corpus (ingested once, used by all Aphoria agents)
|
||||||
- Centralized acknowledgment management
|
- Centralized acknowledgment management
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -839,7 +839,7 @@ In remote mode, Sentinel ingests into and queries from a shared Episteme instanc
|
|||||||
| 2 | BLOCK-level conflicts found (with `--exit-code`) |
|
| 2 | BLOCK-level conflicts found (with `--exit-code`) |
|
||||||
| 3 | Scan error (file access, Episteme connection, etc.) |
|
| 3 | Scan error (file access, Episteme connection, etc.) |
|
||||||
|
|
||||||
`--exit-code` enables non-zero exits. Without it, Sentinel always exits 0 (for interactive use where the report is the output, not the exit code).
|
`--exit-code` enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -866,7 +866,7 @@ The performance bottleneck is I/O (reading files), not extraction (regex matchin
|
|||||||
| `ignore` | File walking (respects .gitignore, fast) |
|
| `ignore` | File walking (respects .gitignore, fast) |
|
||||||
| `regex` | Pattern matching in extractors |
|
| `regex` | Pattern matching in extractors |
|
||||||
| `serde` + `serde_json` | Config parsing, JSON output |
|
| `serde` + `serde_json` | Config parsing, JSON output |
|
||||||
| `toml` | sentinel.toml parsing |
|
| `toml` | aphoria.toml parsing |
|
||||||
| `comfy-table` | Terminal table output |
|
| `comfy-table` | Terminal table output |
|
||||||
| `stemedb-core` | Types |
|
| `stemedb-core` | Types |
|
||||||
| `stemedb-storage` | Local KV store |
|
| `stemedb-storage` | Local KV store |
|
||||||
@ -1,8 +1,8 @@
|
|||||||
# Sentinel
|
# Aphoria
|
||||||
|
|
||||||
**A code-level truth linter powered by Episteme.**
|
**A code-level truth linter powered by Episteme.**
|
||||||
|
|
||||||
Sentinel scans a codebase, extracts the decisions embedded in config and code, and checks them against authoritative sources. It finds the places where what your code *does* contradicts what the specs *say*.
|
Aphoria scans a codebase, extracts the decisions embedded in config and code, and checks them against authoritative sources. It finds the places where what your code *does* contradicts what the specs *say*.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -24,10 +24,10 @@ AI agents make this worse. An agent deploying code doesn't read the RFC. It pick
|
|||||||
|
|
||||||
## The Solution
|
## The Solution
|
||||||
|
|
||||||
Sentinel gives codebases an epistemic audit trail.
|
Aphoria gives codebases an epistemic audit trail.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sentinel scan ./citadeldb
|
$ aphoria scan ./citadeldb
|
||||||
|
|
||||||
Scanning citadeldb (rust) ...
|
Scanning citadeldb (rust) ...
|
||||||
|
|
||||||
@ -49,7 +49,7 @@ Scanning citadeldb (rust) ...
|
|||||||
3 conflicts found (2 BLOCK, 1 FLAG)
|
3 conflicts found (2 BLOCK, 1 FLAG)
|
||||||
```
|
```
|
||||||
|
|
||||||
Sentinel doesn't lint syntax. It lints *epistemic drift* — the gap between what your code asserts and what authoritative sources say.
|
Aphoria doesn't lint syntax. It lints *epistemic drift* — the gap between what your code asserts and what authoritative sources say.
|
||||||
|
|
||||||
## How It Works
|
## How It Works
|
||||||
|
|
||||||
@ -65,34 +65,34 @@ The concept hierarchy is the backbone. `code://rust/citadeldb/auth/jwt/audience_
|
|||||||
|
|
||||||
**Engineering leads** who deploy AI agents and need a pre-flight check. "Before the agent merges this PR, did it contradict any RFCs?"
|
**Engineering leads** who deploy AI agents and need a pre-flight check. "Before the agent merges this PR, did it contradict any RFCs?"
|
||||||
|
|
||||||
**Platform teams** building internal developer tooling. Sentinel integrates into CI as a step between lint and deploy.
|
**Platform teams** building internal developer tooling. Aphoria integrates into CI as a step between lint and deploy.
|
||||||
|
|
||||||
**Security teams** who audit configs across multiple services. "Across all our projects, which ones skip certificate verification?"
|
**Security teams** who audit configs across multiple services. "Across all our projects, which ones skip certificate verification?"
|
||||||
|
|
||||||
## What This Is Not
|
## What This Is Not
|
||||||
|
|
||||||
- **Not a linter.** Linters check syntax rules. Sentinel checks claims against external authoritative sources.
|
- **Not a linter.** Linters check syntax rules. Aphoria checks claims against external authoritative sources.
|
||||||
- **Not a SAST tool.** SAST finds vulnerability patterns. Sentinel finds where code decisions contradict standards, which is a superset.
|
- **Not a SAST tool.** SAST finds vulnerability patterns. Aphoria finds where code decisions contradict standards, which is a superset.
|
||||||
- **Not a replacement for code review.** It augments review by surfacing conflicts that humans miss because they haven't memorized every RFC.
|
- **Not a replacement for code review.** It augments review by surfacing conflicts that humans miss because they haven't memorized every RFC.
|
||||||
|
|
||||||
## The Skill Integration
|
## The Skill Integration
|
||||||
|
|
||||||
Sentinel ships as both a CLI and a Claude Code skill. When working in a project:
|
Aphoria ships as both a CLI and a Claude Code skill. When working in a project:
|
||||||
|
|
||||||
```
|
```
|
||||||
/sentinel scan
|
/aphoria scan
|
||||||
```
|
```
|
||||||
|
|
||||||
The skill runs the CLI, ingests claims, queries for conflicts, and reports inline. The developer fixes the conflict or explicitly acknowledges it — which creates a new assertion: "engineering team decided to skip aud validation for internal services" (Tier 3, Expert). Now the disagreement is structured, documented, and visible next time anyone touches that code.
|
The skill runs the CLI, ingests claims, queries for conflicts, and reports inline. The developer fixes the conflict or explicitly acknowledges it — which creates a new assertion: "engineering team decided to skip aud validation for internal services" (Tier 3, Expert). Now the disagreement is structured, documented, and visible next time anyone touches that code.
|
||||||
|
|
||||||
The acknowledge flow is important. Not every conflict is a bug. Sometimes the code is right and the RFC is too strict for the context. Sentinel doesn't force compliance. It forces *visibility*. The decision to deviate from a standard becomes a recorded, auditable, queryable fact — not an invisible default.
|
The acknowledge flow is important. Not every conflict is a bug. Sometimes the code is right and the RFC is too strict for the context. Aphoria doesn't force compliance. It forces *visibility*. The decision to deviate from a standard becomes a recorded, auditable, queryable fact — not an invisible default.
|
||||||
|
|
||||||
## The Flywheel
|
## The Flywheel
|
||||||
|
|
||||||
Every project Sentinel scans adds claims to Episteme. Every acknowledged deviation adds structured context. Over time:
|
Every project Aphoria scans adds claims to Episteme. Every acknowledged deviation adds structured context. Over time:
|
||||||
|
|
||||||
- Common false positives get suppressed (the alias "internal services can skip aud" gets registered across projects)
|
- Common false positives get suppressed (the alias "internal services can skip aud" gets registered across projects)
|
||||||
- Common true positives get elevated (the same JWT misconfiguration across 50 projects becomes a systemic signal)
|
- Common true positives get elevated (the same JWT misconfiguration across 50 projects becomes a systemic signal)
|
||||||
- The authoritative source corpus grows (new RFCs, new OWASP entries, new vendor docs get ingested by research agents triggered by gaps)
|
- The authoritative source corpus grows (new RFCs, new OWASP entries, new vendor docs get ingested by research agents triggered by gaps)
|
||||||
|
|
||||||
The more projects Sentinel scans, the smarter it gets — not through ML, but through accumulated structured disagreement.
|
The more projects Aphoria scans, the smarter it gets — not through ML, but through accumulated structured disagreement.
|
||||||
407
batteries/pre-aphoria.md
Normal file
407
batteries/pre-aphoria.md
Normal file
@ -0,0 +1,407 @@
|
|||||||
|
# Pre-Aphoria Validation Battery
|
||||||
|
|
||||||
|
**Purpose:** Verify stemedb behaves as documented before building ConceptPath and Aphoria on top of it. Every test maps to a claim the product makes or a code path Aphoria depends on.
|
||||||
|
|
||||||
|
**Test file:** `crates/stemedb-query/tests/battery_pre_aphoria.rs`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 1: The Semaglutide Scenario
|
||||||
|
|
||||||
|
Reproduces the exact example from `what-is-episteme.md`. Four sources, four tiers, one subject, conflicting claims. If this doesn't work, the product demo fails.
|
||||||
|
|
||||||
|
### 1.1 `test_semaglutide_four_sources_ingest_and_query`
|
||||||
|
|
||||||
|
Setup:
|
||||||
|
- Agent A signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis_warning")`, source_class=Regulatory, confidence=1.0, timestamp=T
|
||||||
|
- Agent B signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+1
|
||||||
|
- Agent C signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis")`, source_class=Anecdotal, confidence=0.2, timestamp=T+2
|
||||||
|
- Agent D signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+3
|
||||||
|
|
||||||
|
Ingest all four through WAL + IngestWorker.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- All four assertions are stored (query with no lens returns 4 results)
|
||||||
|
- Authority lens (TrustAwareAuthority) winner is the Regulatory assertion (FDA)
|
||||||
|
- Recency lens winner is Agent D (most recent)
|
||||||
|
- Consensus lens groups by object value: "no_gastroparesis_signal" has 2 assertions, "gastroparesis" variants have 2
|
||||||
|
|
||||||
|
### 1.2 `test_semaglutide_skeptic_analysis`
|
||||||
|
|
||||||
|
Using the same four assertions from 1.1:
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- Skeptic lens `analyze()` returns `ConflictAnalysis` with:
|
||||||
|
- `candidates_count` = 4
|
||||||
|
- `claims.len()` >= 2 (at least two distinct object values)
|
||||||
|
- `status` = `Contested` (conflict_score >= 0.4)
|
||||||
|
- `conflict_score` > 0.3 (there is real disagreement between object values)
|
||||||
|
- The claim with object `"no_gastroparesis_signal"` has `assertion_count` = 2
|
||||||
|
- Claims are sorted descending by `weight_share`
|
||||||
|
|
||||||
|
### 1.3 `test_semaglutide_source_class_decay`
|
||||||
|
|
||||||
|
Using the same four assertions, all with timestamp 6 months ago:
|
||||||
|
|
||||||
|
Query with `source_class_decay: true`:
|
||||||
|
- Regulatory assertion (Tier 0): confidence unchanged (no half-life)
|
||||||
|
- Clinical assertions (Tier 1, 730-day half-life): confidence decayed slightly (~0.9 * 2^(-180/730) ~ 0.75)
|
||||||
|
- Anecdotal assertion (Tier 5, 30-day half-life): confidence decayed to near zero (~0.2 * 2^(-180/30) ~ 0.003)
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- After decay, the Anecdotal assertion's effective confidence is < 0.01
|
||||||
|
- After decay, the Regulatory assertion's confidence is exactly 1.0
|
||||||
|
- After decay, Clinical assertions' confidence is between 0.7 and 0.85
|
||||||
|
- Authority lens after decay still picks Regulatory as winner
|
||||||
|
|
||||||
|
### 1.4 `test_semaglutide_time_travel`
|
||||||
|
|
||||||
|
Using the same four assertions with staggered timestamps (T, T+100, T+200, T+300):
|
||||||
|
|
||||||
|
Query with `as_of: T+150`:
|
||||||
|
- Only assertions at T and T+100 are included
|
||||||
|
- Assert exactly 2 candidates
|
||||||
|
- Conflict landscape is different from the full query (only FDA + NEJM)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 2: The JWT Conflict Scenario
|
||||||
|
|
||||||
|
Reproduces the JWT outage story. Validates escalation — the claim that Episteme is an "active safety system."
|
||||||
|
|
||||||
|
### 2.1 `test_jwt_conflict_escalation_fires`
|
||||||
|
|
||||||
|
Setup:
|
||||||
|
- RFC 7519 (Tier 0, confidence 1.0): predicate=`aud_validation`, object=`Boolean(true)`
|
||||||
|
- Internal wiki (Tier 3, confidence 0.8): predicate=`aud_validation`, object=`Boolean(false)`
|
||||||
|
- Stack Overflow (Tier 5, confidence 0.6): predicate=`aud_validation`, object=`Boolean(false)`
|
||||||
|
- Approved runbook (Tier 2, confidence 0.95): predicate=`aud_validation`, object=`Boolean(true)`
|
||||||
|
|
||||||
|
Configure escalation policy:
|
||||||
|
```
|
||||||
|
name: "security-config"
|
||||||
|
min_conflict_score: 0.5
|
||||||
|
level: High
|
||||||
|
predicate_pattern: None
|
||||||
|
```
|
||||||
|
|
||||||
|
Ingest all four. Run materializer with escalation policies.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- Escalation event is created (query `ESC:` prefix, find at least one)
|
||||||
|
- Event has `level` = `High`
|
||||||
|
- Event has `conflict_score` >= 0.5
|
||||||
|
- Event has correct subject and predicate
|
||||||
|
- Event `resolved` = false
|
||||||
|
|
||||||
|
### 2.2 `test_jwt_escalation_predicate_filter`
|
||||||
|
|
||||||
|
Same four assertions as 2.1. Two policies:
|
||||||
|
- Policy A: `predicate_pattern: Some("aud")`, `min_conflict_score: 0.3`, `level: Critical`
|
||||||
|
- Policy B: `predicate_pattern: Some("revenue")`, `min_conflict_score: 0.3`, `level: Medium`
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- Policy A fires (predicate `aud_validation` contains "aud")
|
||||||
|
- Policy B does NOT fire (predicate doesn't contain "revenue")
|
||||||
|
- Only one escalation event exists, with level `Critical`
|
||||||
|
|
||||||
|
### 2.3 `test_jwt_layered_lens_tier_agreement`
|
||||||
|
|
||||||
|
Same four assertions. Query with Layered Consensus lens.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- Tier 0 result: winner object = `Boolean(true)` (RFC says validate)
|
||||||
|
- Tier 2 result: winner object = `Boolean(true)` (Runbook agrees)
|
||||||
|
- Tier 3 result: winner object = `Boolean(false)` (Wiki says skip)
|
||||||
|
- Tier 5 result: winner object = `Boolean(false)` (SO says skip)
|
||||||
|
- `overall_conflict_score` > 0.5 (cross-tier disagreement between 0/2 and 3/5)
|
||||||
|
- `overall_winner` comes from Tier 0 (highest authority)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 3: Decay Math Precision
|
||||||
|
|
||||||
|
Aphoria computes conflict scores after decay. If decay is wrong, every conflict score is wrong.
|
||||||
|
|
||||||
|
### 3.1 `test_decay_tier0_never_decays`
|
||||||
|
|
||||||
|
Regulatory assertion, confidence 0.95, timestamp 10 years ago.
|
||||||
|
Query with `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert: effective confidence is exactly 0.95 (unchanged).
|
||||||
|
|
||||||
|
### 3.2 `test_decay_tier1_exact_halflife`
|
||||||
|
|
||||||
|
Clinical assertion, confidence 1.0, timestamp exactly 730 days ago.
|
||||||
|
Query with `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert: effective confidence is 0.5 (within tolerance of 0.02).
|
||||||
|
|
||||||
|
### 3.3 `test_decay_tier1_two_halflives`
|
||||||
|
|
||||||
|
Clinical assertion, confidence 1.0, timestamp exactly 1460 days ago.
|
||||||
|
Query with `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert: effective confidence is 0.25 (within tolerance of 0.02).
|
||||||
|
|
||||||
|
### 3.4 `test_decay_tier5_exact_halflife`
|
||||||
|
|
||||||
|
Anecdotal assertion, confidence 1.0, timestamp exactly 30 days ago.
|
||||||
|
Query with `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert: effective confidence is 0.5 (within tolerance of 0.02).
|
||||||
|
|
||||||
|
### 3.5 `test_decay_tier5_three_halflives`
|
||||||
|
|
||||||
|
Anecdotal assertion, confidence 1.0, timestamp exactly 90 days ago.
|
||||||
|
Query with `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert: effective confidence is 0.125 (within tolerance of 0.02).
|
||||||
|
|
||||||
|
### 3.6 `test_decay_zero_confidence_stays_zero`
|
||||||
|
|
||||||
|
Assertion with confidence 0.0, any tier, any age.
|
||||||
|
|
||||||
|
Assert: effective confidence is 0.0 after decay (0 * anything = 0).
|
||||||
|
|
||||||
|
### 3.7 `test_decay_never_goes_negative`
|
||||||
|
|
||||||
|
Anecdotal assertion, confidence 0.01, timestamp 365 days ago (12+ half-lives).
|
||||||
|
|
||||||
|
Assert: effective confidence >= 0.0.
|
||||||
|
|
||||||
|
### 3.8 `test_decay_uses_as_of_for_age_calculation`
|
||||||
|
|
||||||
|
Two assertions, both at timestamp T=1000:
|
||||||
|
- Assertion A: Clinical, confidence 0.9
|
||||||
|
- Assertion B: Anecdotal, confidence 0.9
|
||||||
|
|
||||||
|
Query with `as_of: T + 730*86400` (exactly 730 days after assertions) and `source_class_decay: true`.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- A's effective confidence ~ 0.45 (Clinical, one half-life)
|
||||||
|
- B's effective confidence ~ near zero (Anecdotal, 24+ half-lives at 30-day rate)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 4: Conflict Score Calibration
|
||||||
|
|
||||||
|
Two conflict score implementations exist. `compute_conflict_score` in `traits.rs` uses confidence variance. `calculate_conflict_score` in `skeptic/analysis.rs` uses Shannon entropy over object value groups. Both need validation.
|
||||||
|
|
||||||
|
### 4.1 `test_variance_conflict_score_unanimous`
|
||||||
|
|
||||||
|
5 assertions, all confidence 0.8.
|
||||||
|
`compute_conflict_score()` returns 0.0 (no variance).
|
||||||
|
|
||||||
|
### 4.2 `test_variance_conflict_score_maximum`
|
||||||
|
|
||||||
|
2 assertions, confidence 0.0 and 1.0.
|
||||||
|
`compute_conflict_score()` returns 1.0 (maximum variance).
|
||||||
|
|
||||||
|
### 4.3 `test_variance_conflict_score_moderate`
|
||||||
|
|
||||||
|
3 assertions, confidence 0.2, 0.5, 0.8.
|
||||||
|
`compute_conflict_score()` returns a value between 0.2 and 0.8.
|
||||||
|
|
||||||
|
### 4.4 `test_variance_conflict_score_single`
|
||||||
|
|
||||||
|
1 assertion. Returns 0.0.
|
||||||
|
|
||||||
|
### 4.5 `test_variance_conflict_score_empty`
|
||||||
|
|
||||||
|
0 assertions. Returns 0.0.
|
||||||
|
|
||||||
|
### 4.6 `test_skeptic_entropy_same_confidence_different_objects` [POTENTIAL BUG DETECTOR]
|
||||||
|
|
||||||
|
Three assertions, ALL with confidence 0.9:
|
||||||
|
- Object A: `Text("yes")`, confidence 0.9
|
||||||
|
- Object B: `Text("no")`, confidence 0.9
|
||||||
|
- Object C: `Text("no")`, confidence 0.9
|
||||||
|
|
||||||
|
Skeptic lens `analyze()`:
|
||||||
|
- Groups into 2 claims: "yes" (weight 0.9) and "no" (weight 1.8)
|
||||||
|
- Entropy is non-zero because there are two groups with different weights
|
||||||
|
- `conflict_score` > 0.0
|
||||||
|
- `status` is NOT `Unanimous`
|
||||||
|
|
||||||
|
**Note:** The variance-based `compute_conflict_score` would return 0.0 for these candidates (all same confidence). The Skeptic entropy-based score correctly detects the disagreement. This test validates the Skeptic lens is the correct tool for Aphoria's conflict detection, NOT the variance-based score.
|
||||||
|
|
||||||
|
### 4.7 `test_skeptic_entropy_unanimous_different_confidence`
|
||||||
|
|
||||||
|
Three assertions, all same object `Text("yes")`, but different confidences (0.3, 0.6, 0.9):
|
||||||
|
|
||||||
|
Skeptic lens `analyze()`:
|
||||||
|
- Groups into 1 claim (all same object)
|
||||||
|
- `conflict_score` = 0.0 (unanimous — no disagreement on the value)
|
||||||
|
- `status` = `Unanimous`
|
||||||
|
|
||||||
|
**Note:** Even though confidences differ, there's no actual conflict — all sources agree. The Skeptic lens correctly identifies this as unanimous.
|
||||||
|
|
||||||
|
### 4.8 `test_variance_score_nan_defensive`
|
||||||
|
|
||||||
|
2 assertions with confidence `f32::NAN`.
|
||||||
|
`compute_conflict_score()` returns 0.0 (defensive, not NaN propagation).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 5: scan_prefix with ConceptPath-shaped Keys
|
||||||
|
|
||||||
|
Storage foundation for hierarchical queries.
|
||||||
|
|
||||||
|
### 5.1 `test_prefix_scan_concept_path_keys`
|
||||||
|
|
||||||
|
Store via IndexStore:
|
||||||
|
```
|
||||||
|
S:code://rust/citadeldb/auth/jwt/aud_validation → [hash_a]
|
||||||
|
S:code://rust/citadeldb/auth/jwt/expiry → [hash_b]
|
||||||
|
S:code://rust/citadeldb/net/tls/verify → [hash_c]
|
||||||
|
S:code://rust/citadeldb/auth/oauth/scopes → [hash_d]
|
||||||
|
```
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- `scan_prefix("S:code://rust/citadeldb/auth/jwt/")` → 2 keys (aud_validation, expiry)
|
||||||
|
- `scan_prefix("S:code://rust/citadeldb/auth/")` → 3 keys (jwt/aud, jwt/expiry, oauth/scopes)
|
||||||
|
- `scan_prefix("S:code://rust/citadeldb/")` → 4 keys (all)
|
||||||
|
- `scan_prefix("S:code://")` → 4 keys (all)
|
||||||
|
- `scan_prefix("S:rfc://")` → 0 keys (different scheme)
|
||||||
|
|
||||||
|
### 5.2 `test_prefix_scan_no_false_positives`
|
||||||
|
|
||||||
|
Store:
|
||||||
|
```
|
||||||
|
S:code://rust/citadeldb/auth → [hash_a]
|
||||||
|
S:code://rust/citadeldb/authentication → [hash_b]
|
||||||
|
```
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- `scan_prefix("S:code://rust/citadeldb/auth/")` → 0 keys (trailing slash prevents matching "auth" without children)
|
||||||
|
- `scan_prefix("S:code://rust/citadeldb/auth")` → 2 keys (both match the prefix "auth")
|
||||||
|
|
||||||
|
This validates that the trailing `/` in hierarchical queries is necessary to prevent `auth` from matching `authentication`.
|
||||||
|
|
||||||
|
### 5.3 `test_prefix_scan_sp_keys_with_concept_paths`
|
||||||
|
|
||||||
|
Store via IndexStore (using SP: compound keys):
|
||||||
|
```
|
||||||
|
SP:code://rust/citadeldb/auth/jwt/aud_validation:config_value → [hash_a]
|
||||||
|
SP:code://rust/citadeldb/auth/jwt/expiry:config_value → [hash_b]
|
||||||
|
```
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- `scan_prefix("SP:code://rust/citadeldb/auth/jwt/")` → 2 keys
|
||||||
|
- The parsed SP key for hash_a correctly splits into subject=`code://rust/citadeldb/auth/jwt/aud_validation` and predicate=`config_value` (validates the rfind fix)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 6: Signature Tamper Detection
|
||||||
|
|
||||||
|
Aphoria ingests signed assertions. If signature verification has gaps, tampered claims enter the graph.
|
||||||
|
|
||||||
|
### 6.1 `test_valid_signature_accepted`
|
||||||
|
|
||||||
|
Agent A signs an assertion. Ingest through IngestWorker.
|
||||||
|
|
||||||
|
Assert: assertion is stored, index entries exist.
|
||||||
|
|
||||||
|
### 6.2 `test_tampered_confidence_rejected`
|
||||||
|
|
||||||
|
Agent A signs assertion with confidence=0.8. Modify the serialized assertion bytes to change confidence to 1.0. Attempt to ingest.
|
||||||
|
|
||||||
|
Assert: `IngestError::InvalidSignature`. Assertion is NOT stored.
|
||||||
|
|
||||||
|
### 6.3 `test_tampered_subject_rejected`
|
||||||
|
|
||||||
|
Agent A signs assertion with subject="X". Clone the assertion, change subject to "Y", keep original signature.
|
||||||
|
|
||||||
|
Assert: ingestion fails with invalid signature.
|
||||||
|
|
||||||
|
### 6.4 `test_wrong_agent_id_rejected`
|
||||||
|
|
||||||
|
Agent A signs assertion. Replace `agent_id` in the `SignatureEntry` with Agent B's public key (but keep Agent A's signature bytes).
|
||||||
|
|
||||||
|
Assert: ingestion fails — the signature was made by A's private key but claims to be from B's public key.
|
||||||
|
|
||||||
|
### 6.5 `test_multi_sig_all_valid_accepted`
|
||||||
|
|
||||||
|
Agent A and Agent B both sign the same assertion (two valid SignatureEntries).
|
||||||
|
|
||||||
|
Assert: ingestion succeeds.
|
||||||
|
|
||||||
|
### 6.6 `test_multi_sig_one_invalid_rejected`
|
||||||
|
|
||||||
|
Agent A signs validly, Agent B's signature is invalid (tampered).
|
||||||
|
|
||||||
|
Assert: ingestion fails. ALL signatures must be valid.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Battery 7: Materialized View Consistency
|
||||||
|
|
||||||
|
Aphoria queries MVs for fast conflict checks. Stale or inconsistent MVs produce wrong verdicts.
|
||||||
|
|
||||||
|
### 7.1 `test_mv_initial_materialization`
|
||||||
|
|
||||||
|
Ingest assertion A (confidence 0.9) for subject=S, predicate=P.
|
||||||
|
Run materializer `step()`.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- MV exists at `MV:{S}:{P}`
|
||||||
|
- MV winner_hash matches A's content hash
|
||||||
|
- MV confidence = 0.9
|
||||||
|
- Changelog entry exists (first materialization)
|
||||||
|
|
||||||
|
### 7.2 `test_mv_winner_changes_on_update`
|
||||||
|
|
||||||
|
Ingest A (confidence 0.9), materialize. Then ingest B (same S/P, confidence 0.95), materialize again.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- MV winner changes to B
|
||||||
|
- Changelog has 2 entries: initial (winner=A), update (previous=A, new=B)
|
||||||
|
|
||||||
|
### 7.3 `test_mv_no_changelog_when_winner_unchanged`
|
||||||
|
|
||||||
|
Ingest A (confidence 0.9), materialize. Ingest B (same S/P, confidence 0.5), materialize again.
|
||||||
|
|
||||||
|
Assert:
|
||||||
|
- MV winner stays A (B has lower confidence)
|
||||||
|
- No new changelog entry after second materialization
|
||||||
|
|
||||||
|
### 7.4 `test_mv_since_query_returns_changelog`
|
||||||
|
|
||||||
|
Ingest A at T=1000, materialize at T=1001. Ingest B at T=2000, materialize at T=2001.
|
||||||
|
|
||||||
|
Query with `since: 1500`:
|
||||||
|
- Returns changelog entries only from after T=1500
|
||||||
|
- Should include the B materialization but not the A materialization
|
||||||
|
|
||||||
|
### 7.5 `test_mv_max_stale_fast_path`
|
||||||
|
|
||||||
|
Ingest A, materialize. Query immediately with `max_stale: 60`.
|
||||||
|
|
||||||
|
Assert: fast path is used (MV is fresh).
|
||||||
|
|
||||||
|
### 7.6 `test_mv_max_stale_slow_path`
|
||||||
|
|
||||||
|
Ingest A, materialize. Wait (or mock time) so MV is 120 seconds old. Query with `max_stale: 60`.
|
||||||
|
|
||||||
|
Assert: slow path is used (MV is stale, falls through to index lookup).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Findings to Watch For
|
||||||
|
|
||||||
|
### Known Risk: Two Conflict Score Implementations
|
||||||
|
|
||||||
|
`compute_conflict_score` in `traits.rs` (line 89) uses **confidence variance**. It measures how much confidence values disagree, not how much object values disagree. Three sources saying "yes" at 0.9 and two sources saying "no" at 0.9 produces a conflict score of **0.0** because all confidences are identical.
|
||||||
|
|
||||||
|
`calculate_conflict_score` in `skeptic/analysis.rs` (line 36) uses **Shannon entropy over object value groups**. It correctly detects that "yes" vs "no" is a real conflict regardless of confidence values.
|
||||||
|
|
||||||
|
**Aphoria must use the Skeptic lens for conflict detection, not the standard lens conflict score.** Battery 4.6 validates this distinction explicitly. If Aphoria were to use `compute_conflict_score` from standard lenses, it would miss conflicts where sources disagree on values but agree on confidence levels.
|
||||||
|
|
||||||
|
### Known Risk: Decay + Time-Travel Interaction
|
||||||
|
|
||||||
|
When both `source_class_decay` and `as_of` are set, the age calculation must use `as_of` as the reference time, not `now`. Battery 3.8 validates this. If the implementation uses `now` for age but filters by `as_of` for inclusion, the decay amounts will be wrong for historical queries.
|
||||||
|
|
||||||
|
### ConceptPath Readiness
|
||||||
|
|
||||||
|
Battery 5 validates the storage layer works with ConceptPath-shaped keys before any type changes. If these tests pass, the `scan_prefix` foundation is solid and ConceptPath implementation can proceed with confidence.
|
||||||
@ -9,7 +9,7 @@ workspace = true
|
|||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
stemedb-core = { path = "../stemedb-core" }
|
stemedb-core = { path = "../stemedb-core" }
|
||||||
stemedb-wal = { path = "../stemedb-wal" }
|
stemedb-wal = { path = "../stemedb-wal", features = ["group-commit"] }
|
||||||
stemedb-storage = { path = "../stemedb-storage" }
|
stemedb-storage = { path = "../stemedb-storage" }
|
||||||
stemedb-ingest = { path = "../stemedb-ingest" }
|
stemedb-ingest = { path = "../stemedb-ingest" }
|
||||||
stemedb-query = { path = "../stemedb-query" }
|
stemedb-query = { path = "../stemedb-query" }
|
||||||
|
|||||||
@ -34,7 +34,7 @@ STEMEDB_WAL_DIR=./my-wal STEMEDB_DB_DIR=./my-db STEMEDB_BIND_ADDR=0.0.0.0:8080 c
|
|||||||
```
|
```
|
||||||
|
|
||||||
The server automatically:
|
The server automatically:
|
||||||
1. Opens Journal (WAL) and SledStore (KV storage)
|
1. Opens Journal (WAL) and HybridStore (KV storage)
|
||||||
2. Spawns IngestWorker background task to tail WAL
|
2. Spawns IngestWorker background task to tail WAL
|
||||||
3. Starts HTTP server with OpenAPI documentation
|
3. Starts HTTP server with OpenAPI documentation
|
||||||
|
|
||||||
|
|||||||
@ -45,7 +45,7 @@ pub async fn decay_trust_ranks(
|
|||||||
let half_life = req.half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
|
let half_life = req.half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
|
||||||
|
|
||||||
// Create TrustRankStore from the shared store
|
// Create TrustRankStore from the shared store
|
||||||
let trust_store = GenericTrustRankStore::new((*state.store).clone());
|
let trust_store = GenericTrustRankStore::new(state.store.clone());
|
||||||
|
|
||||||
// Apply decay to all trust ranks
|
// Apply decay to all trust ranks
|
||||||
let decayed_count = trust_store.decay_trust_ranks(timestamp, Some(half_life)).await?;
|
let decayed_count = trust_store.decay_trust_ranks(timestamp, Some(half_life)).await?;
|
||||||
|
|||||||
@ -52,9 +52,8 @@ pub async fn create_assertion(
|
|||||||
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
|
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
|
||||||
let hash = blake3::hash(&serialized_assertion);
|
let hash = blake3::hash(&serialized_assertion);
|
||||||
|
|
||||||
// Append to WAL
|
// Append to WAL via group commit buffer
|
||||||
let mut journal = state.journal.lock().await;
|
state.commit_buffer.append(payload).await?;
|
||||||
journal.append(payload)?;
|
|
||||||
|
|
||||||
let response =
|
let response =
|
||||||
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };
|
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };
|
||||||
|
|||||||
@ -89,9 +89,8 @@ pub async fn create_epoch(
|
|||||||
// For the response, we return this same ID as a hex string
|
// For the response, we return this same ID as a hex string
|
||||||
let epoch_id_hex = ::hex::encode(epoch.id);
|
let epoch_id_hex = ::hex::encode(epoch.id);
|
||||||
|
|
||||||
// Append to WAL
|
// Append to WAL via group commit buffer
|
||||||
let mut journal = state.journal.lock().await;
|
state.commit_buffer.append(payload).await?;
|
||||||
journal.append(payload)?;
|
|
||||||
|
|
||||||
let response = CreateResponse { hash: epoch_id_hex, status: "created".to_string() };
|
let response = CreateResponse { hash: epoch_id_hex, status: "created".to_string() };
|
||||||
|
|
||||||
|
|||||||
@ -227,7 +227,7 @@ pub async fn verify_agent(
|
|||||||
.unwrap_or(0);
|
.unwrap_or(0);
|
||||||
|
|
||||||
// Verify the agent
|
// Verify the agent
|
||||||
let trust_store = GenericTrustRankStore::new((*state.store).clone());
|
let trust_store = GenericTrustRankStore::new(state.store.clone());
|
||||||
let adjustment = trust_store
|
let adjustment = trust_store
|
||||||
.verify_agent_against_gold_standard(&agent_id, &req.agent_object, &gs, timestamp)
|
.verify_agent_against_gold_standard(&agent_id, &req.agent_object, &gs, timestamp)
|
||||||
.await?;
|
.await?;
|
||||||
|
|||||||
@ -4,7 +4,7 @@ use axum::{extract::State, Json};
|
|||||||
use tracing::instrument;
|
use tracing::instrument;
|
||||||
|
|
||||||
use crate::{dto::HealthResponse, error::Result, state::AppState};
|
use crate::{dto::HealthResponse, error::Result, state::AppState};
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::{key_codec, KVStore};
|
||||||
|
|
||||||
/// Health check endpoint.
|
/// Health check endpoint.
|
||||||
///
|
///
|
||||||
@ -32,7 +32,12 @@ pub async fn health_check(State(state): State<AppState>) -> Result<Json<HealthRe
|
|||||||
|
|
||||||
/// Count the number of assertions in the database.
|
/// Count the number of assertions in the database.
|
||||||
async fn count_assertions(state: &AppState) -> Result<u64> {
|
async fn count_assertions(state: &AppState) -> Result<u64> {
|
||||||
// Scan all assertion keys (H: prefix)
|
// Read the atomic assertion count maintained by the ingestion pipeline
|
||||||
let keys = state.store.scan_prefix(b"H:").await?;
|
let count_key = key_codec::assertion_count_key();
|
||||||
Ok(keys.len() as u64)
|
match state.store.get(&count_key).await? {
|
||||||
|
Some(bytes) if bytes.len() == 8 => {
|
||||||
|
Ok(u64::from_le_bytes(bytes.try_into().unwrap_or([0u8; 8])))
|
||||||
|
}
|
||||||
|
_ => Ok(0),
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -344,7 +344,7 @@ fn build_contributing_from_metadata(
|
|||||||
async fn apply_lens_with_confidence(
|
async fn apply_lens_with_confidence(
|
||||||
lens_dto: LensDto,
|
lens_dto: LensDto,
|
||||||
assertions: Vec<Assertion>,
|
assertions: Vec<Assertion>,
|
||||||
store: std::sync::Arc<stemedb_storage::SledStore>,
|
store: std::sync::Arc<stemedb_storage::HybridStore>,
|
||||||
) -> Result<(Vec<Assertion>, f32, f32)> {
|
) -> Result<(Vec<Assertion>, f32, f32)> {
|
||||||
let assertion_count = assertions.len();
|
let assertion_count = assertions.len();
|
||||||
|
|
||||||
|
|||||||
@ -59,8 +59,8 @@ pub async fn skeptic_query(
|
|||||||
AxumQuery(params): AxumQuery<SkepticQueryParams>,
|
AxumQuery(params): AxumQuery<SkepticQueryParams>,
|
||||||
) -> Result<Json<SkepticResponse>> {
|
) -> Result<Json<SkepticResponse>> {
|
||||||
// Create the resolver with vote and trust stores
|
// Create the resolver with vote and trust stores
|
||||||
let vote_store = std::sync::Arc::new(GenericVoteStore::new((*state.store).clone()));
|
let vote_store = std::sync::Arc::new(GenericVoteStore::new(state.store.clone()));
|
||||||
let trust_store = std::sync::Arc::new(GenericTrustRankStore::new((*state.store).clone()));
|
let trust_store = std::sync::Arc::new(GenericTrustRankStore::new(state.store.clone()));
|
||||||
let resolver = SkepticResolver::new(state.store.clone(), vote_store, trust_store);
|
let resolver = SkepticResolver::new(state.store.clone(), vote_store, trust_store);
|
||||||
|
|
||||||
// Execute the skeptic resolution
|
// Execute the skeptic resolution
|
||||||
|
|||||||
@ -182,7 +182,7 @@ mod tests {
|
|||||||
http::{Method, Request},
|
http::{Method, Request},
|
||||||
};
|
};
|
||||||
use serde_json::json;
|
use serde_json::json;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tower::ServiceExt;
|
use tower::ServiceExt;
|
||||||
|
|
||||||
@ -199,10 +199,12 @@ mod tests {
|
|||||||
let wal_path = temp_dir.path().join("wal");
|
let wal_path = temp_dir.path().join("wal");
|
||||||
let store_path = temp_dir.path().join("store");
|
let store_path = temp_dir.path().join("store");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_path).expect("failed to open journal");
|
let write_journal = Journal::open(&wal_path).expect("failed to open write journal");
|
||||||
let store = SledStore::open(&store_path).expect("failed to open store");
|
let read_journal = Journal::open(&wal_path).expect("failed to open read journal");
|
||||||
|
let store =
|
||||||
|
std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
||||||
|
|
||||||
let state = AppState::new(journal, store);
|
let state = AppState::new(write_journal, read_journal, store);
|
||||||
|
|
||||||
let app = axum::Router::new()
|
let app = axum::Router::new()
|
||||||
.route("/v1/source", axum::routing::post(store_source))
|
.route("/v1/source", axum::routing::post(store_source))
|
||||||
|
|||||||
@ -50,9 +50,8 @@ pub async fn create_vote(
|
|||||||
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
|
.map_err(|e| ApiError::Serialization(format!("Failed to serialize for hash: {}", e)))?;
|
||||||
let hash = blake3::hash(&serialized_vote);
|
let hash = blake3::hash(&serialized_vote);
|
||||||
|
|
||||||
// Append to WAL
|
// Append to WAL via group commit buffer
|
||||||
let mut journal = state.journal.lock().await;
|
state.commit_buffer.append(payload).await?;
|
||||||
journal.append(payload)?;
|
|
||||||
|
|
||||||
let response =
|
let response =
|
||||||
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };
|
CreateResponse { hash: hash.to_hex().to_string(), status: "created".to_string() };
|
||||||
|
|||||||
@ -23,7 +23,7 @@
|
|||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_api::{create_router, AppState};
|
//! use stemedb_api::{create_router, AppState};
|
||||||
//!
|
//!
|
||||||
//! let state = AppState::new(journal, store);
|
//! let state = AppState::new(write_journal, read_journal, store);
|
||||||
//! let app = create_router(state);
|
//! let app = create_router(state);
|
||||||
//!
|
//!
|
||||||
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?;
|
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?;
|
||||||
|
|||||||
@ -1,10 +1,11 @@
|
|||||||
//! Episteme (StemeDB) API server binary.
|
//! Episteme (StemeDB) API server binary.
|
||||||
//!
|
//!
|
||||||
//! This starts the HTTP API server with the following components:
|
//! This starts the HTTP API server with the following components:
|
||||||
//! 1. Opens Journal (WAL) and SledStore (KV storage)
|
//! 1. Opens Journal (WAL) for writes (via GroupCommitBuffer) and reads
|
||||||
//! 2. Spawns IngestWorker background task to tail WAL
|
//! 2. Opens HybridStore (KV storage)
|
||||||
//! 3. Starts axum HTTP server with OpenAPI documentation
|
//! 3. Spawns IngestWorker background task to tail WAL
|
||||||
//! 4. Optionally enables The Meter (economic throttling)
|
//! 4. Starts axum HTTP server with OpenAPI documentation
|
||||||
|
//! 5. Optionally enables The Meter (economic throttling)
|
||||||
//!
|
//!
|
||||||
//! # Environment Variables
|
//! # Environment Variables
|
||||||
//!
|
//!
|
||||||
@ -22,7 +23,7 @@ use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
|
|||||||
|
|
||||||
use stemedb_api::{create_router, create_router_with_meter, AppState};
|
use stemedb_api::{create_router, create_router_with_meter, AppState};
|
||||||
use stemedb_ingest::worker::IngestWorker;
|
use stemedb_ingest::worker::IngestWorker;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
|
|
||||||
/// Server configuration.
|
/// Server configuration.
|
||||||
@ -96,20 +97,24 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
std::fs::create_dir_all(&config.wal_dir)?;
|
std::fs::create_dir_all(&config.wal_dir)?;
|
||||||
std::fs::create_dir_all(&config.db_dir)?;
|
std::fs::create_dir_all(&config.db_dir)?;
|
||||||
|
|
||||||
// Open Journal and Store
|
// Open write Journal (owned by GroupCommitBuffer)
|
||||||
info!("Opening Journal at {:?}", config.wal_dir);
|
info!("Opening write Journal at {:?}", config.wal_dir);
|
||||||
let journal = Journal::open(&config.wal_dir)?;
|
let write_journal = Journal::open(&config.wal_dir)?;
|
||||||
|
|
||||||
info!("Opening SledStore at {:?}", config.db_dir);
|
// Open read Journal (for IngestWorker to tail)
|
||||||
let store = SledStore::open(&config.db_dir)?;
|
info!("Opening read Journal at {:?}", config.wal_dir);
|
||||||
|
let read_journal = Journal::open(&config.wal_dir)?;
|
||||||
|
|
||||||
// Create application state
|
info!("Opening HybridStore at {:?}", config.db_dir);
|
||||||
let state = AppState::new(journal, store.clone());
|
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
||||||
|
|
||||||
// Spawn IngestWorker background task
|
// Create application state (initializes GroupCommitBuffer)
|
||||||
|
let state = AppState::new(write_journal, read_journal, Arc::clone(&store));
|
||||||
|
|
||||||
|
// Spawn IngestWorker background task (uses read journal)
|
||||||
info!("Spawning IngestWorker background task");
|
info!("Spawning IngestWorker background task");
|
||||||
let worker_journal = state.journal.clone();
|
let worker_journal = state.journal.clone();
|
||||||
let worker_store = Arc::new(store);
|
let worker_store = store;
|
||||||
tokio::spawn(async move {
|
tokio::spawn(async move {
|
||||||
let worker_result = IngestWorker::new(worker_journal, worker_store).await;
|
let worker_result = IngestWorker::new(worker_journal, worker_store).await;
|
||||||
match worker_result {
|
match worker_result {
|
||||||
|
|||||||
@ -4,25 +4,29 @@ use std::sync::Arc;
|
|||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
|
|
||||||
use stemedb_query::QueryEngine;
|
use stemedb_query::QueryEngine;
|
||||||
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, SledStore};
|
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, HybridStore};
|
||||||
|
use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
|
|
||||||
/// Quota store type alias for convenience.
|
/// Quota store type alias for convenience.
|
||||||
pub type QuotaStoreImpl = GenericQuotaStore<Arc<SledStore>>;
|
pub type QuotaStoreImpl = GenericQuotaStore<Arc<HybridStore>>;
|
||||||
|
|
||||||
/// Escalation store type alias for convenience.
|
/// Escalation store type alias for convenience.
|
||||||
pub type EscalationStoreImpl = GenericEscalationStore<SledStore>;
|
pub type EscalationStoreImpl = GenericEscalationStore<HybridStore>;
|
||||||
|
|
||||||
/// Application state shared across all HTTP handlers.
|
/// Application state shared across all HTTP handlers.
|
||||||
///
|
///
|
||||||
/// This is passed to every request via axum's `State` extractor.
|
/// This is passed to every request via axum's `State` extractor.
|
||||||
#[derive(Clone)]
|
#[derive(Clone)]
|
||||||
pub struct AppState {
|
pub struct AppState {
|
||||||
/// Write-ahead log for appending new assertions/votes/epochs
|
/// Group commit buffer for batched WAL writes (used by write handlers)
|
||||||
|
pub commit_buffer: GroupCommitBuffer,
|
||||||
|
|
||||||
|
/// Write-ahead log for reading records (IngestWorker uses this)
|
||||||
pub journal: Arc<Mutex<Journal>>,
|
pub journal: Arc<Mutex<Journal>>,
|
||||||
|
|
||||||
/// Key-value store for reading assertions
|
/// Key-value store for reading assertions
|
||||||
pub store: Arc<SledStore>,
|
pub store: Arc<HybridStore>,
|
||||||
|
|
||||||
/// Quota store for economic throttling (The Meter)
|
/// Quota store for economic throttling (The Meter)
|
||||||
pub quota_store: Arc<QuotaStoreImpl>,
|
pub quota_store: Arc<QuotaStoreImpl>,
|
||||||
@ -33,9 +37,13 @@ pub struct AppState {
|
|||||||
|
|
||||||
impl AppState {
|
impl AppState {
|
||||||
/// Create a new application state.
|
/// Create a new application state.
|
||||||
pub fn new(journal: Journal, store: SledStore) -> Self {
|
///
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
/// Takes two journals: one for the group commit buffer (writes) and
|
||||||
let store = Arc::new(store);
|
/// one for reading (used by IngestWorker). Both should be opened on
|
||||||
|
/// the same directory.
|
||||||
|
pub fn new(write_journal: Journal, read_journal: Journal, store: Arc<HybridStore>) -> Self {
|
||||||
|
let commit_buffer = GroupCommitBuffer::new(write_journal, GroupCommitConfig::default());
|
||||||
|
let journal = Arc::new(Mutex::new(read_journal));
|
||||||
|
|
||||||
// Create quota store backed by the same KV store
|
// Create quota store backed by the same KV store
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
|
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
|
||||||
@ -43,13 +51,13 @@ impl AppState {
|
|||||||
// Create escalation store backed by the same KV store
|
// Create escalation store backed by the same KV store
|
||||||
let escalation_store = Arc::new(GenericEscalationStore::new(Arc::clone(&store)));
|
let escalation_store = Arc::new(GenericEscalationStore::new(Arc::clone(&store)));
|
||||||
|
|
||||||
Self { journal, store, quota_store, escalation_store }
|
Self { commit_buffer, journal, store, quota_store, escalation_store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Get a QueryEngine for this state.
|
/// Get a QueryEngine for this state.
|
||||||
///
|
///
|
||||||
/// Creates a new QueryEngine each time since it cannot be cloned.
|
/// Creates a new QueryEngine each time since it cannot be cloned.
|
||||||
pub fn query_engine(&self) -> QueryEngine<SledStore> {
|
pub fn query_engine(&self) -> QueryEngine<HybridStore> {
|
||||||
QueryEngine::new(self.store.clone())
|
QueryEngine::new(self.store.clone())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -9,7 +9,7 @@ use serde_json::json;
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_api::AppState;
|
use stemedb_api::AppState;
|
||||||
use stemedb_ingest::Ingestor;
|
use stemedb_ingest::Ingestor;
|
||||||
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, SledStore};
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
|
|
||||||
@ -23,7 +23,7 @@ pub struct TestEnvironment {
|
|||||||
pub struct TestEnvironmentWithIngestor {
|
pub struct TestEnvironmentWithIngestor {
|
||||||
pub _temp_dir: tempfile::TempDir,
|
pub _temp_dir: tempfile::TempDir,
|
||||||
pub state: AppState,
|
pub state: AppState,
|
||||||
pub ingestor: Ingestor<SledStore>,
|
pub ingestor: Ingestor<HybridStore>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Helper to create a test environment with temporary directories.
|
/// Helper to create a test environment with temporary directories.
|
||||||
@ -35,10 +35,11 @@ pub async fn create_test_env() -> TestEnvironment {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("failed to create wal dir");
|
std::fs::create_dir_all(&wal_dir).expect("failed to create wal dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
|
std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("failed to open journal");
|
let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
|
||||||
let store = SledStore::open(&db_dir).expect("failed to open store");
|
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||||
|
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
|
||||||
|
|
||||||
let state = AppState::new(journal, store);
|
let state = AppState::new(write_journal, read_journal, store);
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state }
|
TestEnvironment { _temp_dir: temp_dir, state }
|
||||||
}
|
}
|
||||||
@ -46,8 +47,6 @@ pub async fn create_test_env() -> TestEnvironment {
|
|||||||
/// Helper to create a test environment with a running ingestor for roundtrip tests.
|
/// Helper to create a test environment with a running ingestor for roundtrip tests.
|
||||||
///
|
///
|
||||||
/// Note: We need to share the same store between AppState and Ingestor.
|
/// Note: We need to share the same store between AppState and Ingestor.
|
||||||
/// AppState::new() takes ownership, so we need a different approach:
|
|
||||||
/// we'll create the ingestor first, then construct AppState manually.
|
|
||||||
pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
||||||
let temp_dir = tempfile::tempdir().expect("failed to create temp dir");
|
let temp_dir = tempfile::tempdir().expect("failed to create temp dir");
|
||||||
let wal_dir = temp_dir.path().join("wal");
|
let wal_dir = temp_dir.path().join("wal");
|
||||||
@ -57,11 +56,7 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
|||||||
std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
|
std::fs::create_dir_all(&db_dir).expect("failed to create db dir");
|
||||||
|
|
||||||
// Create shared store
|
// Create shared store
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("failed to open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
|
||||||
|
|
||||||
// Journal for API (writing)
|
|
||||||
let journal_for_api =
|
|
||||||
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("failed to open journal for API")));
|
|
||||||
|
|
||||||
// Journal for ingestor (reading) - WAL allows multiple readers
|
// Journal for ingestor (reading) - WAL allows multiple readers
|
||||||
let journal_for_ingestor =
|
let journal_for_ingestor =
|
||||||
@ -72,14 +67,10 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
|||||||
.await
|
.await
|
||||||
.expect("failed to create ingestor");
|
.expect("failed to create ingestor");
|
||||||
|
|
||||||
// Create quota store for AppState
|
// Create AppState with write and read journals
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(store.clone()));
|
let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
|
||||||
|
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||||
// Create escalation store for AppState
|
let state = AppState::new(write_journal, read_journal, store);
|
||||||
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
|
|
||||||
|
|
||||||
// Construct AppState manually to share the store
|
|
||||||
let state = AppState { journal: journal_for_api, store, quota_store, escalation_store };
|
|
||||||
|
|
||||||
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
|
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -28,7 +28,7 @@ use stemedb_api::create_router;
|
|||||||
use stemedb_ingest::worker::IngestWorker;
|
use stemedb_ingest::worker::IngestWorker;
|
||||||
use stemedb_lens::VoteAwareConsensusLens;
|
use stemedb_lens::VoteAwareConsensusLens;
|
||||||
use stemedb_query::Materializer;
|
use stemedb_query::Materializer;
|
||||||
use stemedb_storage::{GenericVoteStore, SledStore};
|
use stemedb_storage::{GenericVoteStore, HybridStore};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
|
|
||||||
// Test configuration constants
|
// Test configuration constants
|
||||||
@ -44,7 +44,7 @@ const POLLING_INTERVAL_MS: u64 = 50;
|
|||||||
struct TestEnvironment {
|
struct TestEnvironment {
|
||||||
_temp_dir: tempfile::TempDir,
|
_temp_dir: tempfile::TempDir,
|
||||||
state: stemedb_api::AppState,
|
state: stemedb_api::AppState,
|
||||||
store: Arc<SledStore>,
|
store: Arc<HybridStore>,
|
||||||
journal: Arc<Mutex<Journal>>,
|
journal: Arc<Mutex<Journal>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -57,15 +57,15 @@ async fn create_test_environment() -> TestEnvironment {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
|
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
|
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
|
||||||
|
|
||||||
let journal_arc = Arc::new(Mutex::new(journal));
|
|
||||||
let store_arc = Arc::new(store);
|
let store_arc = Arc::new(store);
|
||||||
|
|
||||||
// Open a second journal handle for AppState (WAL supports multiple readers)
|
// Open journals: one for IngestWorker reads, one for AppState (write + read)
|
||||||
let journal_for_state = Journal::open(&wal_dir).expect("Failed to open second journal handle");
|
let journal_arc =
|
||||||
let state = stemedb_api::AppState::new(journal_for_state, (*store_arc).clone());
|
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||||
|
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||||
|
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||||
|
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -22,7 +22,7 @@ use tower::ServiceExt;
|
|||||||
|
|
||||||
use stemedb_api::{create_router, AppState};
|
use stemedb_api::{create_router, AppState};
|
||||||
use stemedb_ingest::worker::IngestWorker;
|
use stemedb_ingest::worker::IngestWorker;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
|
|
||||||
// Test configuration constants
|
// Test configuration constants
|
||||||
@ -32,7 +32,7 @@ const INGEST_ITERATIONS: usize = 10;
|
|||||||
struct TestEnvironment {
|
struct TestEnvironment {
|
||||||
_temp_dir: tempfile::TempDir,
|
_temp_dir: tempfile::TempDir,
|
||||||
state: AppState,
|
state: AppState,
|
||||||
store: Arc<SledStore>,
|
store: Arc<HybridStore>,
|
||||||
journal: Arc<Mutex<Journal>>,
|
journal: Arc<Mutex<Journal>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -45,15 +45,15 @@ async fn create_test_environment() -> TestEnvironment {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
|
std::fs::create_dir_all(&wal_dir).expect("Failed to create WAL dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
|
std::fs::create_dir_all(&db_dir).expect("Failed to create DB dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
|
||||||
|
|
||||||
let journal_arc = Arc::new(Mutex::new(journal));
|
|
||||||
let store_arc = Arc::new(store);
|
let store_arc = Arc::new(store);
|
||||||
|
|
||||||
// Open a second journal handle for AppState (WAL supports multiple readers)
|
// Open journals: one for IngestWorker reads, one for AppState (write + read)
|
||||||
let journal_for_state = Journal::open(&wal_dir).expect("Failed to open second journal handle");
|
let journal_arc =
|
||||||
let state = AppState::new(journal_for_state, (*store_arc).clone());
|
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||||
|
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||||
|
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||||
|
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -20,7 +20,7 @@ use std::sync::Arc;
|
|||||||
use tower::ServiceExt;
|
use tower::ServiceExt;
|
||||||
|
|
||||||
use stemedb_api::{create_router, create_router_with_meter, AppState};
|
use stemedb_api::{create_router, create_router_with_meter, AppState};
|
||||||
use stemedb_storage::{GenericEscalationStore, GenericQuotaStore, QuotaStore, SledStore};
|
use stemedb_storage::{HybridStore, QuotaStore};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
@ -148,7 +148,7 @@ async fn test_decay_trust_ranks_actually_decays() {
|
|||||||
let agent_id = [42u8; 32];
|
let agent_id = [42u8; 32];
|
||||||
let mut trust_rank = TrustRank::new(agent_id, 1000);
|
let mut trust_rank = TrustRank::new(agent_id, 1000);
|
||||||
trust_rank.score = 0.8;
|
trust_rank.score = 0.8;
|
||||||
let trust_store = GenericTrustRankStore::new((*env.state.store).clone());
|
let trust_store = GenericTrustRankStore::new(env.state.store.clone());
|
||||||
trust_store.put_trust_rank(&trust_rank).await.expect("put trust rank");
|
trust_store.put_trust_rank(&trust_rank).await.expect("put trust rank");
|
||||||
|
|
||||||
let app = create_router(env.state.clone());
|
let app = create_router(env.state.clone());
|
||||||
@ -198,18 +198,12 @@ async fn test_quota_consumption_with_meter() {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("db dir");
|
std::fs::create_dir_all(&db_dir).expect("db dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("journal");
|
let write_journal = Journal::open(&wal_dir).expect("write journal");
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("store"));
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
// Create AppState manually to share quota_store
|
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(store.clone()));
|
let quota_store = state.quota_store.clone();
|
||||||
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
|
|
||||||
let state = AppState {
|
|
||||||
journal: Arc::new(tokio::sync::Mutex::new(journal)),
|
|
||||||
store: store.clone(),
|
|
||||||
quota_store: quota_store.clone(),
|
|
||||||
escalation_store,
|
|
||||||
};
|
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
|
|
||||||
@ -260,17 +254,12 @@ async fn test_quota_exceeded_response() {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("db dir");
|
std::fs::create_dir_all(&db_dir).expect("db dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("journal");
|
let write_journal = Journal::open(&wal_dir).expect("write journal");
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("store"));
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(store.clone()));
|
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||||
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
|
let quota_store = state.quota_store.clone();
|
||||||
let state = AppState {
|
|
||||||
journal: Arc::new(tokio::sync::Mutex::new(journal)),
|
|
||||||
store: store.clone(),
|
|
||||||
quota_store: quota_store.clone(),
|
|
||||||
escalation_store,
|
|
||||||
};
|
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
|
|
||||||
@ -311,17 +300,12 @@ async fn test_quota_headers_format() {
|
|||||||
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
std::fs::create_dir_all(&wal_dir).expect("wal dir");
|
||||||
std::fs::create_dir_all(&db_dir).expect("db dir");
|
std::fs::create_dir_all(&db_dir).expect("db dir");
|
||||||
|
|
||||||
let journal = Journal::open(&wal_dir).expect("journal");
|
let write_journal = Journal::open(&wal_dir).expect("write journal");
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("store"));
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(store.clone()));
|
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||||
let escalation_store = Arc::new(GenericEscalationStore::new(store.clone()));
|
let quota_store = state.quota_store.clone();
|
||||||
let state = AppState {
|
|
||||||
journal: Arc::new(tokio::sync::Mutex::new(journal)),
|
|
||||||
store: store.clone(),
|
|
||||||
quota_store: quota_store.clone(),
|
|
||||||
escalation_store,
|
|
||||||
};
|
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
|
|
||||||
|
|||||||
@ -32,11 +32,6 @@ mod tests;
|
|||||||
// Re-exports
|
// Re-exports
|
||||||
pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType};
|
pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType};
|
||||||
|
|
||||||
/// The cursor tracks how far into the WAL the ingestor has processed,
|
|
||||||
/// allowing recovery to resume from the last checkpoint instead of
|
|
||||||
/// replaying the entire log.
|
|
||||||
const CURSOR_KEY: &[u8] = b"__CURSOR__:ingest";
|
|
||||||
|
|
||||||
/// Background worker that tails the WAL and updates the KV store.
|
/// Background worker that tails the WAL and updates the KV store.
|
||||||
pub struct IngestWorker<S> {
|
pub struct IngestWorker<S> {
|
||||||
journal: Arc<Mutex<Journal>>,
|
journal: Arc<Mutex<Journal>>,
|
||||||
@ -68,7 +63,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> {
|
pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> {
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
let vote_store = GenericVoteStore::new(store.clone());
|
let vote_store = GenericVoteStore::new(store.clone());
|
||||||
let current_offset = match store.get(CURSOR_KEY).await? {
|
let cursor_key = stemedb_storage::key_codec::cursor_key();
|
||||||
|
let current_offset = match store.get(&cursor_key).await? {
|
||||||
Some(bytes) if bytes.len() == 8 => {
|
Some(bytes) if bytes.len() == 8 => {
|
||||||
let offset =
|
let offset =
|
||||||
u64::from_le_bytes(bytes.try_into().map_err(|_| {
|
u64::from_le_bytes(bytes.try_into().map_err(|_| {
|
||||||
|
|||||||
@ -4,11 +4,12 @@
|
|||||||
//! including validation and signature verification.
|
//! including validation and signature verification.
|
||||||
|
|
||||||
use super::record_types::RECORD_HEADER_SIZE;
|
use super::record_types::RECORD_HEADER_SIZE;
|
||||||
use super::{IngestWorker, RecordType, CURSOR_KEY};
|
use super::{IngestWorker, RecordType};
|
||||||
use crate::error::{IngestError, Result};
|
use crate::error::{IngestError, Result};
|
||||||
use ed25519_dalek::{Signature, Verifier, VerifyingKey};
|
use ed25519_dalek::{Signature, Verifier, VerifyingKey};
|
||||||
use stemedb_core::serde::deserialize;
|
use stemedb_core::serde::deserialize;
|
||||||
use stemedb_core::types::{Assertion, Epoch, Hash, Vote};
|
use stemedb_core::types::{Assertion, Epoch, Hash, Vote};
|
||||||
|
use stemedb_storage::key_codec;
|
||||||
use stemedb_storage::{IndexStore, KVStore, VoteStore};
|
use stemedb_storage::{IndexStore, KVStore, VoteStore};
|
||||||
use tracing::{debug, info, warn};
|
use tracing::{debug, info, warn};
|
||||||
|
|
||||||
@ -16,7 +17,7 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
/// Process the next record from the WAL, returning bytes read (0 = no data).
|
/// Process the next record from the WAL, returning bytes read (0 = no data).
|
||||||
pub async fn step(&mut self) -> Result<u64> {
|
pub async fn step(&mut self) -> Result<u64> {
|
||||||
let record = {
|
let record = {
|
||||||
let journal = self.journal.lock().await;
|
let mut journal = self.journal.lock().await;
|
||||||
match journal.read(self.current_offset) {
|
match journal.read(self.current_offset) {
|
||||||
Ok(record) => record,
|
Ok(record) => record,
|
||||||
Err(stemedb_wal::QuarantineError::Io { source, .. })
|
Err(stemedb_wal::QuarantineError::Io { source, .. })
|
||||||
@ -80,7 +81,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
// Persist the cursor so recovery can skip already-processed records.
|
// Persist the cursor so recovery can skip already-processed records.
|
||||||
// This is safe even if it fails: the write path is idempotent
|
// This is safe even if it fails: the write path is idempotent
|
||||||
// (content-addressed keys), so re-processing a record is a no-op.
|
// (content-addressed keys), so re-processing a record is a no-op.
|
||||||
self.store.put(CURSOR_KEY, &self.current_offset.to_le_bytes()).await?;
|
let cursor_key = key_codec::cursor_key();
|
||||||
|
self.store.put(&cursor_key, &self.current_offset.to_le_bytes()).await?;
|
||||||
|
|
||||||
info!(
|
info!(
|
||||||
record_type = ?record_type,
|
record_type = ?record_type,
|
||||||
@ -116,14 +118,15 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
// Verify all signatures before storing
|
// Verify all signatures before storing
|
||||||
self.verify_assertion_signatures(&assertion)?;
|
self.verify_assertion_signatures(&assertion)?;
|
||||||
|
|
||||||
// Content-addressed key: H:{BLAKE3_hash}
|
// Content-addressed key: {subject}\x00H:{BLAKE3_hash}
|
||||||
let hash = blake3::hash(data);
|
let hash = blake3::hash(data);
|
||||||
let key = format!("H:{}", hash.to_hex()).into_bytes();
|
let hash_hex = hash.to_hex().to_string();
|
||||||
|
let key = key_codec::assertion_key(&assertion.subject, &hash_hex);
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
subject = %assertion.subject,
|
subject = %assertion.subject,
|
||||||
predicate = %assertion.predicate,
|
predicate = %assertion.predicate,
|
||||||
hash = %hash.to_hex(),
|
hash = %hash_hex,
|
||||||
signature_count = assertion.signatures.len(),
|
signature_count = assertion.signatures.len(),
|
||||||
"Ingesting verified assertion"
|
"Ingesting verified assertion"
|
||||||
);
|
);
|
||||||
@ -131,7 +134,19 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
// Store the assertion
|
// Store the assertion
|
||||||
self.store.put(&key, data).await?;
|
self.store.put(&key, data).await?;
|
||||||
|
|
||||||
// Update indexes: S:{subject} and SP:{subject}:{predicate}
|
// Write reverse index: \x00HASH_SUBJECT:{hash_hex} -> subject
|
||||||
|
let reverse_key = key_codec::hash_subject_key(&hash_hex);
|
||||||
|
self.store.put(&reverse_key, assertion.subject.as_bytes()).await?;
|
||||||
|
|
||||||
|
// Write subject discovery index: \x00SUBJECTS:{subject} -> empty
|
||||||
|
let subjects_key = key_codec::subjects_index_key(&assertion.subject);
|
||||||
|
self.store.put(&subjects_key, &[]).await?;
|
||||||
|
|
||||||
|
// Increment assertion count: \x00META:assertion_count
|
||||||
|
let count_key = key_codec::assertion_count_key();
|
||||||
|
self.store.fetch_and_add_u64(&count_key, 1).await?;
|
||||||
|
|
||||||
|
// Update indexes: {subject}\x00S: and {subject}\x00SP:{predicate}
|
||||||
let assertion_hash: Hash = *hash.as_bytes();
|
let assertion_hash: Hash = *hash.as_bytes();
|
||||||
self.index_store
|
self.index_store
|
||||||
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
|
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
|
||||||
@ -143,13 +158,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
if let Err(e) = vector_index.insert(&assertion_hash, vector) {
|
if let Err(e) = vector_index.insert(&assertion_hash, vector) {
|
||||||
// Log but don't fail the ingestion - vector index is supplementary
|
// Log but don't fail the ingestion - vector index is supplementary
|
||||||
warn!(
|
warn!(
|
||||||
hash = %hash.to_hex(),
|
hash = %hash_hex,
|
||||||
error = %e,
|
error = %e,
|
||||||
"Failed to insert into vector index"
|
"Failed to insert into vector index"
|
||||||
);
|
);
|
||||||
} else {
|
} else {
|
||||||
debug!(
|
debug!(
|
||||||
hash = %hash.to_hex(),
|
hash = %hash_hex,
|
||||||
dim = vector.len(),
|
dim = vector.len(),
|
||||||
"Inserted into vector index"
|
"Inserted into vector index"
|
||||||
);
|
);
|
||||||
@ -163,13 +178,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
if let Err(e) = visual_index.insert(&assertion_hash, phash) {
|
if let Err(e) = visual_index.insert(&assertion_hash, phash) {
|
||||||
// Log but don't fail the ingestion - visual index is supplementary
|
// Log but don't fail the ingestion - visual index is supplementary
|
||||||
warn!(
|
warn!(
|
||||||
hash = %hash.to_hex(),
|
hash = %hash_hex,
|
||||||
error = %e,
|
error = %e,
|
||||||
"Failed to insert into visual index"
|
"Failed to insert into visual index"
|
||||||
);
|
);
|
||||||
} else {
|
} else {
|
||||||
debug!(
|
debug!(
|
||||||
hash = %hash.to_hex(),
|
hash = %hash_hex,
|
||||||
phash = %hex::encode(phash),
|
phash = %hex::encode(phash),
|
||||||
"Inserted into visual index"
|
"Inserted into visual index"
|
||||||
);
|
);
|
||||||
@ -185,8 +200,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
/// - Confidence outside [0.0, 1.0] or NaN/Inf
|
/// - Confidence outside [0.0, 1.0] or NaN/Inf
|
||||||
/// - Subject exceeding MAX_SUBJECT_LEN bytes
|
/// - Subject exceeding MAX_SUBJECT_LEN bytes
|
||||||
/// - Predicate exceeding MAX_PREDICATE_LEN bytes
|
/// - Predicate exceeding MAX_PREDICATE_LEN bytes
|
||||||
|
/// - Subject containing null byte (would corrupt key boundaries)
|
||||||
/// - Timestamp more than 1 hour in the future (clock skew protection)
|
/// - Timestamp more than 1 hour in the future (clock skew protection)
|
||||||
fn validate_assertion(&self, assertion: &Assertion) -> Result<()> {
|
fn validate_assertion(&self, assertion: &Assertion) -> Result<()> {
|
||||||
|
// Validate subject does not contain separator byte
|
||||||
|
key_codec::validate_subject(&assertion.subject)
|
||||||
|
.map_err(|e| IngestError::InputValidation(format!("invalid subject: {}", e)))?;
|
||||||
|
|
||||||
// Validate confidence: must be finite and in [0.0, 1.0]
|
// Validate confidence: must be finite and in [0.0, 1.0]
|
||||||
if assertion.confidence.is_nan() {
|
if assertion.confidence.is_nan() {
|
||||||
return Err(IngestError::InputValidation(
|
return Err(IngestError::InputValidation(
|
||||||
@ -295,6 +315,9 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
/// Validates vote weight bounds (0.0 to 1.0, no NaN/Inf) and uses VoteStore
|
/// Validates vote weight bounds (0.0 to 1.0, no NaN/Inf) and uses VoteStore
|
||||||
/// to maintain vote count and aggregate weight caches automatically.
|
/// to maintain vote count and aggregate weight caches automatically.
|
||||||
/// This ensures VoteAwareConsensusLens has accurate data.
|
/// This ensures VoteAwareConsensusLens has accurate data.
|
||||||
|
///
|
||||||
|
/// Looks up the assertion's subject from the reverse index to co-locate
|
||||||
|
/// vote data with the assertion for range sharding.
|
||||||
async fn ingest_vote(&self, data: &[u8]) -> Result<()> {
|
async fn ingest_vote(&self, data: &[u8]) -> Result<()> {
|
||||||
let vote: Vote =
|
let vote: Vote =
|
||||||
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
|
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
|
||||||
@ -318,34 +341,54 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
)));
|
)));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Look up the subject from the reverse index
|
||||||
|
let hash_hex = hex::encode(vote.assertion_hash);
|
||||||
|
let reverse_key = key_codec::hash_subject_key(&hash_hex);
|
||||||
|
let subject = match self.store.get(&reverse_key).await? {
|
||||||
|
Some(bytes) => String::from_utf8(bytes).map_err(|e| {
|
||||||
|
IngestError::Serialization(format!("Invalid subject in reverse index: {}", e))
|
||||||
|
})?,
|
||||||
|
None => {
|
||||||
|
warn!(
|
||||||
|
assertion_hash = %hash_hex,
|
||||||
|
"Vote references unknown assertion (no reverse index entry)"
|
||||||
|
);
|
||||||
|
return Err(IngestError::InputValidation(format!(
|
||||||
|
"vote references unknown assertion {}",
|
||||||
|
hash_hex
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
assertion_hash = %hex::encode(vote.assertion_hash),
|
assertion_hash = %hash_hex,
|
||||||
|
subject = %subject,
|
||||||
weight = vote.weight,
|
weight = vote.weight,
|
||||||
"Ingesting vote via VoteStore"
|
"Ingesting vote via VoteStore"
|
||||||
);
|
);
|
||||||
|
|
||||||
// Delegate to VoteStore which handles:
|
// Delegate to VoteStore which handles:
|
||||||
// 1. Content-addressed storage at V:{assertion_hash}:{vote_hash}
|
// 1. Content-addressed storage at {subject}\x00V:{assertion_hex}:{vote_hex}
|
||||||
// 2. Vote count cache at VC:{assertion_hash}
|
// 2. Vote count cache at {subject}\x00VC:{assertion_hex}
|
||||||
// 3. Aggregate weight cache at VW:{assertion_hash}
|
// 3. Aggregate weight cache at {subject}\x00VW:{assertion_hex}
|
||||||
self.vote_store.put_vote(&vote).await?;
|
self.vote_store.put_vote(&vote, &subject).await?;
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Ingest an epoch into the KV store.
|
/// Ingest an epoch into the KV store.
|
||||||
///
|
///
|
||||||
/// In addition to storing the epoch at `E:{epoch_id}`, this method writes
|
/// Stores the epoch at `\x00E:{epoch_id}` and writes
|
||||||
/// `SUPERSEDED:{old_epoch_id}` marker keys for the full transitive closure
|
/// `\x00SUPERSEDED:{old_epoch_id}` marker keys for the full transitive closure
|
||||||
/// of superseded epochs. This enables O(1) "is superseded?" lookups at
|
/// of superseded epochs. This enables O(1) "is superseded?" lookups at
|
||||||
/// query time instead of O(chain_length) chain walks.
|
/// query time instead of O(chain_length) chain walks.
|
||||||
async fn ingest_epoch(&self, data: &[u8]) -> Result<()> {
|
async fn ingest_epoch(&self, data: &[u8]) -> Result<()> {
|
||||||
let epoch: Epoch =
|
let epoch: Epoch =
|
||||||
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
|
deserialize(data).map_err(|e| IngestError::Serialization(e.to_string()))?;
|
||||||
|
|
||||||
// Epoch key: E:{epoch_id_hash}
|
// Epoch key: \x00E:{epoch_id_hash}
|
||||||
let epoch_id_hex = hex::encode(epoch.id);
|
let epoch_id_hex = hex::encode(epoch.id);
|
||||||
let key = format!("E:{}", epoch_id_hex).into_bytes();
|
let key = key_codec::epoch_key(&epoch_id_hex);
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
epoch_id = %epoch_id_hex,
|
epoch_id = %epoch_id_hex,
|
||||||
|
|||||||
@ -7,6 +7,7 @@ use super::IngestWorker;
|
|||||||
use crate::error::Result;
|
use crate::error::Result;
|
||||||
use stemedb_core::serde::deserialize;
|
use stemedb_core::serde::deserialize;
|
||||||
use stemedb_core::types::Epoch;
|
use stemedb_core::types::Epoch;
|
||||||
|
use stemedb_storage::key_codec;
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::KVStore;
|
||||||
use tracing::{debug, warn};
|
use tracing::{debug, warn};
|
||||||
|
|
||||||
@ -14,13 +15,13 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
/// Maximum depth for walking supersession chains at write time.
|
/// Maximum depth for walking supersession chains at write time.
|
||||||
pub(super) const MAX_CASCADE_DEPTH: usize = 100;
|
pub(super) const MAX_CASCADE_DEPTH: usize = 100;
|
||||||
|
|
||||||
/// Write `SUPERSEDED:` markers for the full transitive closure of superseded epochs.
|
/// Write `\x00SUPERSEDED:` markers for the full transitive closure of superseded epochs.
|
||||||
///
|
///
|
||||||
/// All markers point to the LATEST superseding epoch (`new_epoch_id`).
|
/// All markers point to the LATEST superseding epoch (`new_epoch_id`).
|
||||||
/// For chain C→B→A: writes `SUPERSEDED:B→C` and `SUPERSEDED:A→C`.
|
/// For chain C→B→A: writes `SUPERSEDED:B→C` and `SUPERSEDED:A→C`.
|
||||||
///
|
///
|
||||||
/// This enables O(1) "is this epoch superseded?" checks at query time:
|
/// This enables O(1) "is this epoch superseded?" checks at query time:
|
||||||
/// just look for `SUPERSEDED:{epoch_id}` key existence.
|
/// just look for `\x00SUPERSEDED:{epoch_id}` key existence.
|
||||||
///
|
///
|
||||||
/// # Algorithm
|
/// # Algorithm
|
||||||
///
|
///
|
||||||
@ -63,8 +64,8 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Write marker: SUPERSEDED:{current_id} → new_epoch_id (always the LATEST)
|
// Write marker: \x00SUPERSEDED:{current_id} → new_epoch_id (always the LATEST)
|
||||||
let marker_key = Self::superseded_key(¤t_id);
|
let marker_key = key_codec::superseded_key(&hex::encode(current_id));
|
||||||
self.store.put(&marker_key, new_epoch_id).await?;
|
self.store.put(&marker_key, new_epoch_id).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
@ -75,7 +76,7 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
);
|
);
|
||||||
|
|
||||||
// Check if current_id also superseded something (transitive closure)
|
// Check if current_id also superseded something (transitive closure)
|
||||||
let epoch_key = format!("E:{}", hex::encode(current_id)).into_bytes();
|
let epoch_key = key_codec::epoch_key(&hex::encode(current_id));
|
||||||
let ancestor_epoch = match self.store.get(&epoch_key).await? {
|
let ancestor_epoch = match self.store.get(&epoch_key).await? {
|
||||||
Some(bytes) => match deserialize::<Epoch>(&bytes) {
|
Some(bytes) => match deserialize::<Epoch>(&bytes) {
|
||||||
Ok(e) => e,
|
Ok(e) => e,
|
||||||
@ -108,12 +109,4 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build key for superseded epoch marker.
|
|
||||||
///
|
|
||||||
/// Format: `SUPERSEDED:{epoch_id_hex}`
|
|
||||||
/// Value: The 32-byte ID of the epoch that superseded this one.
|
|
||||||
pub(super) fn superseded_key(epoch_id: &[u8; 32]) -> Vec<u8> {
|
|
||||||
format!("SUPERSEDED:{}", hex::encode(epoch_id)).into_bytes()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -10,7 +10,7 @@ async fn test_ingest_assertion() {
|
|||||||
|
|
||||||
// Create journal and store
|
// Create journal and store
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Write assertion to WAL
|
// Write assertion to WAL
|
||||||
let assertion = create_test_assertion();
|
let assertion = create_test_assertion();
|
||||||
@ -45,7 +45,7 @@ async fn test_ingest_vote() {
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let vote = create_test_vote();
|
let vote = create_test_vote();
|
||||||
let payload = serialize_vote(&vote).expect("Failed to serialize");
|
let payload = serialize_vote(&vote).expect("Failed to serialize");
|
||||||
@ -71,7 +71,7 @@ async fn test_ingest_epoch() {
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let epoch = create_test_epoch();
|
let epoch = create_test_epoch();
|
||||||
let payload = serialize_epoch(&epoch).expect("Failed to serialize");
|
let payload = serialize_epoch(&epoch).expect("Failed to serialize");
|
||||||
@ -97,7 +97,7 @@ async fn test_ingest_multiple_records() {
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Write multiple records
|
// Write multiple records
|
||||||
let assertion = create_test_assertion();
|
let assertion = create_test_assertion();
|
||||||
|
|||||||
@ -42,7 +42,7 @@ use super::*;
|
|||||||
// Phase 1: Write and ingest 2 records
|
// Phase 1: Write and ingest 2 records
|
||||||
{
|
{
|
||||||
let mut journal = Journal::open(&wal_dir).expect("open journal");
|
let mut journal = Journal::open(&wal_dir).expect("open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("open store");
|
let store = HybridStore::open(&db_dir).expect("open store");
|
||||||
|
|
||||||
let a1 = create_signed_assertion("Phase1_A", "prop");
|
let a1 = create_signed_assertion("Phase1_A", "prop");
|
||||||
let a2 = create_signed_assertion("Phase1_B", "prop");
|
let a2 = create_signed_assertion("Phase1_B", "prop");
|
||||||
@ -65,7 +65,7 @@ use super::*;
|
|||||||
let a3 = create_signed_assertion("Phase2_C", "prop");
|
let a3 = create_signed_assertion("Phase2_C", "prop");
|
||||||
journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
|
||||||
|
|
||||||
let store = SledStore::open(&db_dir).expect("reopen store");
|
let store = HybridStore::open(&db_dir).expect("reopen store");
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
|
|
||||||
@ -103,7 +103,7 @@ use super::*;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Write 5 assertions to the WAL
|
// Write 5 assertions to the WAL
|
||||||
let assertions: Vec<Assertion> =
|
let assertions: Vec<Assertion> =
|
||||||
@ -182,7 +182,7 @@ use super::*;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let assertion = create_test_assertion();
|
let assertion = create_test_assertion();
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|||||||
@ -33,7 +33,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -89,7 +89,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -126,7 +126,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -163,7 +163,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -226,7 +226,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -289,7 +289,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
|
|||||||
@ -40,7 +40,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -88,7 +88,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -116,7 +116,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Create epochs: B supersedes A
|
// Create epochs: B supersedes A
|
||||||
// Epoch A has no supersession (base epoch)
|
// Epoch A has no supersession (base epoch)
|
||||||
@ -172,7 +172,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Create chain: C → B → A
|
// Create chain: C → B → A
|
||||||
let epoch_a = stemedb_core::types::Epoch {
|
let epoch_a = stemedb_core::types::Epoch {
|
||||||
@ -243,7 +243,7 @@ use stemedb_core::serde::deserialize;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Create a cycle: A supersedes B, B supersedes A
|
// Create a cycle: A supersedes B, B supersedes A
|
||||||
// This is pathological but we must not hang
|
// This is pathological but we must not hang
|
||||||
|
|||||||
@ -31,7 +31,7 @@ use tracing::info;
|
|||||||
|
|
||||||
// PHASE 2: Partial ingestion, then "crash"
|
// PHASE 2: Partial ingestion, then "crash"
|
||||||
let cursor_before_crash = {
|
let cursor_before_crash = {
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
|
|
||||||
@ -70,7 +70,7 @@ use tracing::info;
|
|||||||
// PHASE 3: Recovery - reopen everything and verify cursor restoration
|
// PHASE 3: Recovery - reopen everything and verify cursor restoration
|
||||||
{
|
{
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to reopen store");
|
let store = HybridStore::open(&db_dir).expect("Failed to reopen store");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
|
|||||||
@ -17,7 +17,7 @@ use stemedb_core::testing::{self, AssertionBuilder};
|
|||||||
use stemedb_core::types::{
|
use stemedb_core::types::{
|
||||||
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote,
|
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote,
|
||||||
};
|
};
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tempfile::tempdir;
|
use tempfile::tempdir;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
|
|||||||
@ -12,7 +12,7 @@ use super::*;
|
|||||||
// Phase 2: Recovery - reopen everything and run ingestor
|
// Phase 2: Recovery - reopen everything and run ingestor
|
||||||
{
|
{
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
@ -62,7 +62,7 @@ use super::*;
|
|||||||
// Phase 2: Recovery
|
// Phase 2: Recovery
|
||||||
{
|
{
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
@ -108,7 +108,7 @@ use super::*;
|
|||||||
// Recover and ingest
|
// Recover and ingest
|
||||||
{
|
{
|
||||||
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
let journal = Journal::open(&wal_dir).expect("Failed to reopen journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(store);
|
let store = Arc::new(store);
|
||||||
@ -132,7 +132,7 @@ use super::*;
|
|||||||
|
|
||||||
// Final verification: all data from all cycles present
|
// Final verification: all data from all cycles present
|
||||||
{
|
{
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
let assertions = store.scan_prefix(b"H:").await.expect("scan");
|
let assertions = store.scan_prefix(b"H:").await.expect("scan");
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
assertions.len(),
|
assertions.len(),
|
||||||
@ -144,7 +144,7 @@ use super::*;
|
|||||||
|
|
||||||
/// Test: KV store persists across restarts.
|
/// Test: KV store persists across restarts.
|
||||||
///
|
///
|
||||||
/// Verifies that once data is ingested to sled, it survives store restarts.
|
/// Verifies that once data is ingested to storage, it survives store restarts.
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_kv_store_persistence() {
|
async fn test_kv_store_persistence() {
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
let dir = tempdir().expect("Failed to create temp dir");
|
||||||
@ -154,7 +154,7 @@ use super::*;
|
|||||||
// Phase 1: Write, ingest, and close everything
|
// Phase 1: Write, ingest, and close everything
|
||||||
{
|
{
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
let assertion = create_test_assertion();
|
let assertion = create_test_assertion();
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
@ -172,7 +172,7 @@ use super::*;
|
|||||||
|
|
||||||
// Phase 2: Reopen only the KV store and verify data persists
|
// Phase 2: Reopen only the KV store and verify data persists
|
||||||
{
|
{
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to reopen store");
|
let store = HybridStore::open(&db_dir).expect("Failed to reopen store");
|
||||||
let assertions = store.scan_prefix(b"H:").await.expect("scan");
|
let assertions = store.scan_prefix(b"H:").await.expect("scan");
|
||||||
assert_eq!(assertions.len(), 1, "Assertion should persist in KV store across restarts");
|
assert_eq!(assertions.len(), 1, "Assertion should persist in KV store across restarts");
|
||||||
}
|
}
|
||||||
|
|||||||
@ -17,7 +17,7 @@ use crate::error::IngestError;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -68,7 +68,7 @@ use crate::error::IngestError;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -129,7 +129,7 @@ use crate::error::IngestError;
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -157,6 +157,6 @@ use crate::error::IngestError;
|
|||||||
let db_dir = dir.path().join("db");
|
let db_dir = dir.path().join("db");
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
// Write two assertions to the WAL
|
// Write two assertions to the WAL
|
||||||
|
|||||||
@ -41,7 +41,7 @@ async fn test_rejects_high_confidence() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -96,7 +96,7 @@ async fn test_rejects_negative_confidence() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -130,7 +130,7 @@ async fn test_rejects_invalid_vote_weight() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -168,7 +168,7 @@ async fn test_rejects_negative_vote_weight() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -221,7 +221,7 @@ async fn test_rejects_oversized_subject() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -279,7 +279,7 @@ async fn test_rejects_oversized_predicate() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -339,7 +339,7 @@ async fn test_accepts_exact_max_subject_length() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -395,7 +395,7 @@ async fn test_accepts_exact_max_predicate_length() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -446,7 +446,7 @@ async fn test_rejects_nan_confidence() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
@ -479,7 +479,7 @@ async fn test_rejects_nan_vote_weight() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
let mut journal = Journal::open(&wal_dir).expect("Failed to open journal");
|
||||||
let store = SledStore::open(&db_dir).expect("Failed to open store");
|
let store = HybridStore::open(&db_dir).expect("Failed to open store");
|
||||||
|
|
||||||
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
journal.append(serialize_vote(&vote).expect("ser")).expect("append");
|
||||||
|
|
||||||
|
|||||||
@ -32,10 +32,10 @@
|
|||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_lens::{EpochAwareLens, RecencyLens};
|
//! use stemedb_lens::{EpochAwareLens, RecencyLens};
|
||||||
//! use stemedb_storage::SledStore;
|
//! use stemedb_storage::HybridStore;
|
||||||
//! use std::sync::Arc;
|
//! use std::sync::Arc;
|
||||||
//!
|
//!
|
||||||
//! let store = Arc::new(SledStore::open("./data").expect("store"));
|
//! let store = Arc::new(HybridStore::open("./data").expect("store"));
|
||||||
//! let lens = EpochAwareLens::with_recency(store);
|
//! let lens = EpochAwareLens::with_recency(store);
|
||||||
//!
|
//!
|
||||||
//! let resolution = lens.resolve_async(&candidates).await;
|
//! let resolution = lens.resolve_async(&candidates).await;
|
||||||
@ -49,7 +49,7 @@ use std::collections::HashSet;
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::serde::deserialize;
|
use stemedb_core::serde::deserialize;
|
||||||
use stemedb_core::types::{Assertion, Epoch, EpochId};
|
use stemedb_core::types::{Assertion, Epoch, EpochId};
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::{key_codec, KVStore};
|
||||||
use tracing::{debug, instrument, warn};
|
use tracing::{debug, instrument, warn};
|
||||||
|
|
||||||
/// Wrapper to use a sync Lens with EpochAwareLens.
|
/// Wrapper to use a sync Lens with EpochAwareLens.
|
||||||
@ -111,18 +111,11 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
|
|||||||
Self { store, inner }
|
Self { store, inner }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build the key for reading an epoch record.
|
|
||||||
///
|
|
||||||
/// Format: `E:{epoch_id_hex}`
|
|
||||||
fn epoch_key(epoch_id: &EpochId) -> Vec<u8> {
|
|
||||||
format!("E:{}", hex::encode(epoch_id)).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Read an epoch record from the store.
|
/// Read an epoch record from the store.
|
||||||
///
|
///
|
||||||
/// Returns `None` if the epoch doesn't exist or fails to deserialize.
|
/// Returns `None` if the epoch doesn't exist or fails to deserialize.
|
||||||
async fn read_epoch(&self, epoch_id: &EpochId) -> Option<Epoch> {
|
async fn read_epoch(&self, epoch_id: &EpochId) -> Option<Epoch> {
|
||||||
let key = Self::epoch_key(epoch_id);
|
let key = key_codec::epoch_key(&hex::encode(epoch_id));
|
||||||
|
|
||||||
match self.store.get(&key).await {
|
match self.store.get(&key).await {
|
||||||
Ok(Some(bytes)) => match deserialize::<Epoch>(&bytes) {
|
Ok(Some(bytes)) => match deserialize::<Epoch>(&bytes) {
|
||||||
@ -154,14 +147,6 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build the key for checking if an epoch is superseded.
|
|
||||||
///
|
|
||||||
/// Format: `SUPERSEDED:{epoch_id_hex}`
|
|
||||||
/// These markers are written by the IngestWorker when epochs are ingested.
|
|
||||||
fn superseded_key(epoch_id: &EpochId) -> Vec<u8> {
|
|
||||||
format!("SUPERSEDED:{}", hex::encode(epoch_id)).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Check if an epoch is superseded using O(1) marker lookup.
|
/// Check if an epoch is superseded using O(1) marker lookup.
|
||||||
///
|
///
|
||||||
/// The IngestWorker writes `SUPERSEDED:{epoch_id}` markers at epoch ingestion
|
/// The IngestWorker writes `SUPERSEDED:{epoch_id}` markers at epoch ingestion
|
||||||
@ -174,7 +159,7 @@ impl<S: KVStore, L> EpochAwareLens<S, L> {
|
|||||||
/// - Marker doesn't exist → epoch is NOT superseded (return false)
|
/// - Marker doesn't exist → epoch is NOT superseded (return false)
|
||||||
/// - Storage error → treat as NOT superseded (fail-open)
|
/// - Storage error → treat as NOT superseded (fail-open)
|
||||||
async fn is_epoch_superseded(&self, epoch_id: &EpochId) -> bool {
|
async fn is_epoch_superseded(&self, epoch_id: &EpochId) -> bool {
|
||||||
let key = Self::superseded_key(epoch_id);
|
let key = key_codec::superseded_key(&hex::encode(epoch_id));
|
||||||
match self.store.get(&key).await {
|
match self.store.get(&key).await {
|
||||||
Ok(Some(_)) => {
|
Ok(Some(_)) => {
|
||||||
debug!(epoch_id = %hex::encode(epoch_id), "Epoch is superseded (marker found)");
|
debug!(epoch_id = %hex::encode(epoch_id), "Epoch is superseded (marker found)");
|
||||||
|
|||||||
@ -3,14 +3,14 @@ use crate::consensus::ConsensusLens;
|
|||||||
use stemedb_core::serde::serialize;
|
use stemedb_core::serde::serialize;
|
||||||
use stemedb_core::testing::{test_epoch_with_supersession, AssertionBuilder};
|
use stemedb_core::testing::{test_epoch_with_supersession, AssertionBuilder};
|
||||||
use stemedb_core::types::SupersessionType;
|
use stemedb_core::types::SupersessionType;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::{key_codec, HybridStore};
|
||||||
|
|
||||||
/// Store an epoch in the KV store and write SUPERSEDED markers.
|
/// Store an epoch in the KV store and write SUPERSEDED markers.
|
||||||
///
|
///
|
||||||
/// This simulates what the IngestWorker does: store the epoch AND write
|
/// This simulates what the IngestWorker does: store the epoch AND write
|
||||||
/// cascade markers for the transitive closure of superseded epochs.
|
/// cascade markers for the transitive closure of superseded epochs.
|
||||||
async fn store_epoch(store: &SledStore, epoch: &Epoch) {
|
async fn store_epoch(store: &HybridStore, epoch: &Epoch) {
|
||||||
let key = format!("E:{}", hex::encode(epoch.id)).into_bytes();
|
let key = key_codec::epoch_key(&hex::encode(epoch.id));
|
||||||
let bytes = serialize(epoch).expect("serialize epoch");
|
let bytes = serialize(epoch).expect("serialize epoch");
|
||||||
store.put(&key, &bytes).await.expect("put epoch");
|
store.put(&key, &bytes).await.expect("put epoch");
|
||||||
|
|
||||||
@ -24,7 +24,7 @@ async fn store_epoch(store: &SledStore, epoch: &Epoch) {
|
|||||||
///
|
///
|
||||||
/// Mirrors the IngestWorker's cascade logic for test setup.
|
/// Mirrors the IngestWorker's cascade logic for test setup.
|
||||||
async fn write_supersession_cascade(
|
async fn write_supersession_cascade(
|
||||||
store: &SledStore,
|
store: &HybridStore,
|
||||||
new_epoch_id: &[u8; 32],
|
new_epoch_id: &[u8; 32],
|
||||||
superseded_id: &[u8; 32],
|
superseded_id: &[u8; 32],
|
||||||
) {
|
) {
|
||||||
@ -41,11 +41,11 @@ async fn write_supersession_cascade(
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Write marker
|
// Write marker
|
||||||
let marker_key = format!("SUPERSEDED:{}", hex::encode(current_id)).into_bytes();
|
let marker_key = key_codec::superseded_key(&hex::encode(current_id));
|
||||||
store.put(&marker_key, new_epoch_id).await.expect("put marker");
|
store.put(&marker_key, new_epoch_id).await.expect("put marker");
|
||||||
|
|
||||||
// Check for ancestor
|
// Check for ancestor
|
||||||
let epoch_key = format!("E:{}", hex::encode(current_id)).into_bytes();
|
let epoch_key = key_codec::epoch_key(&hex::encode(current_id));
|
||||||
let ancestor = match store.get(&epoch_key).await.expect("get") {
|
let ancestor = match store.get(&epoch_key).await.expect("get") {
|
||||||
Some(bytes) => stemedb_core::serde::deserialize::<Epoch>(&bytes).ok(),
|
Some(bytes) => stemedb_core::serde::deserialize::<Epoch>(&bytes).ok(),
|
||||||
None => None,
|
None => None,
|
||||||
@ -75,7 +75,7 @@ fn create_epoch(id: [u8; 32], name: &str) -> Epoch {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_candidates() {
|
async fn test_empty_candidates() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let lens = EpochAwareLens::with_recency(store);
|
let lens = EpochAwareLens::with_recency(store);
|
||||||
|
|
||||||
let resolution = lens.resolve_async(&[]).await;
|
let resolution = lens.resolve_async(&[]).await;
|
||||||
@ -86,7 +86,7 @@ async fn test_empty_candidates() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_single_candidate() {
|
async fn test_single_candidate() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let lens = EpochAwareLens::with_recency(store);
|
let lens = EpochAwareLens::with_recency(store);
|
||||||
|
|
||||||
let assertion = AssertionBuilder::new().subject("Tesla").timestamp(1000).build();
|
let assertion = AssertionBuilder::new().subject("Tesla").timestamp(1000).build();
|
||||||
@ -99,7 +99,7 @@ async fn test_single_candidate() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_no_epochs_passes_all() {
|
async fn test_epoch_aware_no_epochs_passes_all() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let lens = EpochAwareLens::with_recency(store);
|
let lens = EpochAwareLens::with_recency(store);
|
||||||
|
|
||||||
// Create assertions without epochs
|
// Create assertions without epochs
|
||||||
@ -116,7 +116,7 @@ async fn test_epoch_aware_no_epochs_passes_all() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_excludes_superseded() {
|
async fn test_epoch_aware_excludes_superseded() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create epochs: B supersedes A
|
// Create epochs: B supersedes A
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
@ -149,7 +149,7 @@ async fn test_epoch_aware_excludes_superseded() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_chain_supersession() {
|
async fn test_epoch_aware_chain_supersession() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create chain: C supersedes B, B supersedes A
|
// Create chain: C supersedes B, B supersedes A
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
@ -191,7 +191,7 @@ async fn test_epoch_aware_chain_supersession() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_missing_epoch_record_includes() {
|
async fn test_epoch_aware_missing_epoch_record_includes() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Only store epoch B which supersedes A, but DON'T store epoch A
|
// Only store epoch B which supersedes A, but DON'T store epoch A
|
||||||
let epoch_b = test_epoch_with_supersession(
|
let epoch_b = test_epoch_with_supersession(
|
||||||
@ -224,7 +224,7 @@ async fn test_epoch_aware_missing_epoch_record_includes() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_cycle_detection() {
|
async fn test_epoch_aware_cycle_detection() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create circular supersession: A supersedes B, B supersedes A
|
// Create circular supersession: A supersedes B, B supersedes A
|
||||||
let epoch_a = Epoch {
|
let epoch_a = Epoch {
|
||||||
@ -275,7 +275,7 @@ async fn test_epoch_aware_cycle_detection() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_with_consensus_lens() {
|
async fn test_epoch_aware_with_consensus_lens() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create epochs: B supersedes A
|
// Create epochs: B supersedes A
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
@ -323,7 +323,7 @@ async fn test_epoch_aware_with_consensus_lens() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_mixed_epochs_and_no_epochs() {
|
async fn test_epoch_aware_mixed_epochs_and_no_epochs() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create epochs: B supersedes A
|
// Create epochs: B supersedes A
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
@ -357,7 +357,7 @@ async fn test_superseded_epoch_filtered_even_without_new_assertions() {
|
|||||||
// With the O(1) marker-based approach, epochs are filtered based on
|
// With the O(1) marker-based approach, epochs are filtered based on
|
||||||
// SUPERSEDED: markers, not based on what's in the candidate set.
|
// SUPERSEDED: markers, not based on what's in the candidate set.
|
||||||
// If an epoch has a SUPERSEDED marker, its assertions are filtered.
|
// If an epoch has a SUPERSEDED marker, its assertions are filtered.
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create epochs: B supersedes A
|
// Create epochs: B supersedes A
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
@ -388,12 +388,12 @@ async fn test_epoch_without_marker_passes_through() {
|
|||||||
// This test documents fail-open behavior:
|
// This test documents fail-open behavior:
|
||||||
// If an epoch doesn't have a SUPERSEDED marker (e.g., data from before
|
// If an epoch doesn't have a SUPERSEDED marker (e.g., data from before
|
||||||
// cascade logic was added), assertions pass through.
|
// cascade logic was added), assertions pass through.
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Manually store epoch A WITHOUT writing cascade markers
|
// Manually store epoch A WITHOUT writing cascade markers
|
||||||
// (simulating old data before the cascade feature)
|
// (simulating old data before the cascade feature)
|
||||||
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
let epoch_a = create_epoch([1u8; 32], "Epoch A");
|
||||||
let key = format!("E:{}", hex::encode(epoch_a.id)).into_bytes();
|
let key = key_codec::epoch_key(&hex::encode(epoch_a.id));
|
||||||
let bytes = serialize(&epoch_a).expect("serialize epoch");
|
let bytes = serialize(&epoch_a).expect("serialize epoch");
|
||||||
store.put(&key, &bytes).await.expect("put epoch");
|
store.put(&key, &bytes).await.expect("put epoch");
|
||||||
|
|
||||||
@ -411,7 +411,7 @@ async fn test_epoch_without_marker_passes_through() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_lens_name() {
|
async fn test_lens_name() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let lens = EpochAwareLens::with_recency(store);
|
let lens = EpochAwareLens::with_recency(store);
|
||||||
|
|
||||||
assert_eq!(lens.name(), "EpochAware");
|
assert_eq!(lens.name(), "EpochAware");
|
||||||
@ -423,11 +423,11 @@ async fn test_lens_name() {
|
|||||||
/// we don't need to read E:{epoch_id} records to determine supersession.
|
/// we don't need to read E:{epoch_id} records to determine supersession.
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_epoch_aware_uses_marker_not_epoch_record() {
|
async fn test_epoch_aware_uses_marker_not_epoch_record() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Write ONLY the SUPERSEDED marker, NOT the epoch records themselves
|
// Write ONLY the SUPERSEDED marker, NOT the epoch records themselves
|
||||||
// This tests that we use the marker for filtering, not the epoch record
|
// This tests that we use the marker for filtering, not the epoch record
|
||||||
let marker_key = format!("SUPERSEDED:{}", hex::encode([1u8; 32])).into_bytes();
|
let marker_key = key_codec::superseded_key(&hex::encode([1u8; 32]));
|
||||||
store.put(&marker_key, &[2u8; 32]).await.expect("put marker");
|
store.put(&marker_key, &[2u8; 32]).await.expect("put marker");
|
||||||
|
|
||||||
let lens = EpochAwareLens::with_recency(Arc::clone(&store));
|
let lens = EpochAwareLens::with_recency(Arc::clone(&store));
|
||||||
@ -447,6 +447,6 @@ async fn test_epoch_aware_uses_marker_not_epoch_record() {
|
|||||||
|
|
||||||
// Verify we didn't need to read E:{epoch_id} records at all
|
// Verify we didn't need to read E:{epoch_id} records at all
|
||||||
// (they don't exist in this test)
|
// (they don't exist in this test)
|
||||||
let epochs = store.scan_prefix(b"E:").await.expect("scan");
|
let epochs = store.scan_prefix(b"\x00E:").await.expect("scan");
|
||||||
assert_eq!(epochs.len(), 0, "No epoch records should exist - test uses marker only");
|
assert_eq!(epochs.len(), 0, "No epoch records should exist - test uses marker only");
|
||||||
}
|
}
|
||||||
|
|||||||
@ -68,7 +68,7 @@ impl<V: VoteStore, T: TrustRankStore> SkepticLens<V, T> {
|
|||||||
/// If no votes exist, falls back to the assertion's own confidence score.
|
/// If no votes exist, falls back to the assertion's own confidence score.
|
||||||
async fn get_assertion_weight(&self, assertion: &Assertion) -> f32 {
|
async fn get_assertion_weight(&self, assertion: &Assertion) -> f32 {
|
||||||
let hash = Self::compute_assertion_hash(assertion);
|
let hash = Self::compute_assertion_hash(assertion);
|
||||||
match self.vote_store.get_aggregate_weight(&hash).await {
|
match self.vote_store.get_aggregate_weight(&hash, &assertion.subject).await {
|
||||||
Ok(weight) if weight > 0.0 => weight,
|
Ok(weight) if weight > 0.0 => weight,
|
||||||
Ok(_) => {
|
Ok(_) => {
|
||||||
// No votes exist, fall back to assertion confidence
|
// No votes exist, fall back to assertion confidence
|
||||||
|
|||||||
@ -21,10 +21,10 @@
|
|||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_lens::SkepticLens;
|
//! use stemedb_lens::SkepticLens;
|
||||||
//! use stemedb_storage::{SledStore, GenericVoteStore};
|
//! use stemedb_storage::{HybridStore, GenericVoteStore};
|
||||||
//! use std::sync::Arc;
|
//! use std::sync::Arc;
|
||||||
//!
|
//!
|
||||||
//! let store = SledStore::open("./data").await?;
|
//! let store = HybridStore::open("./data").await?;
|
||||||
//! let vote_store = Arc::new(GenericVoteStore::new(store));
|
//! let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
//! let lens = SkepticLens::new(vote_store);
|
//! let lens = SkepticLens::new(vote_store);
|
||||||
//!
|
//!
|
||||||
@ -74,9 +74,10 @@ impl<V: VoteStore, T: TrustRankStore> SkepticLens<V, T> {
|
|||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::traits::AnalysisLens;
|
use crate::traits::AnalysisLens;
|
||||||
|
use std::sync::Arc;
|
||||||
use stemedb_core::testing::AssertionBuilder;
|
use stemedb_core::testing::AssertionBuilder;
|
||||||
use stemedb_core::types::{Assertion, ObjectValue, ResolutionStatus, Vote};
|
use stemedb_core::types::{Assertion, ObjectValue, ResolutionStatus, Vote};
|
||||||
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, SledStore};
|
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, HybridStore};
|
||||||
|
|
||||||
fn create_assertion(subject: &str, value: f64, confidence: f32) -> Assertion {
|
fn create_assertion(subject: &str, value: f64, confidence: f32) -> Assertion {
|
||||||
AssertionBuilder::new()
|
AssertionBuilder::new()
|
||||||
@ -98,7 +99,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_candidates() {
|
async fn test_empty_candidates() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -113,7 +114,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_single_candidate_is_unanimous() {
|
async fn test_single_candidate_is_unanimous() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -129,7 +130,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_same_value_is_unanimous() {
|
async fn test_same_value_is_unanimous() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -148,7 +149,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_50_50_split_is_contested() {
|
async fn test_50_50_split_is_contested() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -166,7 +167,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_strong_majority_is_agreed() {
|
async fn test_strong_majority_is_agreed() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -186,7 +187,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_claims_sorted_by_weight() {
|
async fn test_claims_sorted_by_weight() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -204,7 +205,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_text_value_grouping() {
|
async fn test_text_value_grouping() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -232,24 +233,24 @@ mod tests {
|
|||||||
// Test the entropy calculation directly
|
// Test the entropy calculation directly
|
||||||
// 50/50 split: max entropy for 2 options = 1.0
|
// 50/50 split: max entropy for 2 options = 1.0
|
||||||
let weights_equal = vec![(ObjectValue::Number(1.0), 0.5), (ObjectValue::Number(2.0), 0.5)];
|
let weights_equal = vec![(ObjectValue::Number(1.0), 0.5), (ObjectValue::Number(2.0), 0.5)];
|
||||||
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_equal);
|
let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_equal);
|
||||||
assert!((score - 1.0).abs() < 0.01);
|
assert!((score - 1.0).abs() < 0.01);
|
||||||
|
|
||||||
// 100/0 split: zero entropy
|
// 100/0 split: zero entropy
|
||||||
let weights_unanimous =
|
let weights_unanimous =
|
||||||
vec![(ObjectValue::Number(1.0), 1.0), (ObjectValue::Number(2.0), 0.0)];
|
vec![(ObjectValue::Number(1.0), 1.0), (ObjectValue::Number(2.0), 0.0)];
|
||||||
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_unanimous);
|
let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_unanimous);
|
||||||
assert!((score - 0.0).abs() < 0.01);
|
assert!((score - 0.0).abs() < 0.01);
|
||||||
|
|
||||||
// Single claim: unanimous
|
// Single claim: unanimous
|
||||||
let weights_single = vec![(ObjectValue::Number(1.0), 1.0)];
|
let weights_single = vec![(ObjectValue::Number(1.0), 1.0)];
|
||||||
let score = SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::calculate_conflict_score(&weights_single);
|
let score = SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::calculate_conflict_score(&weights_single);
|
||||||
assert!((score - 0.0).abs() < 0.01);
|
assert!((score - 0.0).abs() < 0.01);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_lens_name() {
|
async fn test_lens_name() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(vote_store, trust_store);
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
@ -259,7 +260,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_with_votes() {
|
async fn test_with_votes() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = SkepticLens::new(Arc::clone(&vote_store), trust_store);
|
let lens = SkepticLens::new(Arc::clone(&vote_store), trust_store);
|
||||||
@ -270,9 +271,9 @@ mod tests {
|
|||||||
|
|
||||||
// Add votes to make a1 a strong winner
|
// Add votes to make a1 a strong winner
|
||||||
let hash1 =
|
let hash1 =
|
||||||
SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::compute_assertion_hash(&a1);
|
SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::compute_assertion_hash(&a1);
|
||||||
let hash2 =
|
let hash2 =
|
||||||
SkepticLens::<GenericVoteStore<SledStore>, GenericTrustRankStore<SledStore>>::compute_assertion_hash(&a2);
|
SkepticLens::<GenericVoteStore<HybridStore>, GenericTrustRankStore<HybridStore>>::compute_assertion_hash(&a2);
|
||||||
|
|
||||||
// a1 gets 10 votes totaling 9.0 weight (strong majority)
|
// a1 gets 10 votes totaling 9.0 weight (strong majority)
|
||||||
for i in 0..10 {
|
for i in 0..10 {
|
||||||
@ -285,7 +286,7 @@ mod tests {
|
|||||||
source_url: None,
|
source_url: None,
|
||||||
observed_context: None,
|
observed_context: None,
|
||||||
};
|
};
|
||||||
vote_store.put_vote(&vote).await.expect("put vote");
|
vote_store.put_vote(&vote, "Tesla").await.expect("put vote");
|
||||||
}
|
}
|
||||||
|
|
||||||
// a2 gets 1 vote with 0.5 weight (minority)
|
// a2 gets 1 vote with 0.5 weight (minority)
|
||||||
@ -298,7 +299,7 @@ mod tests {
|
|||||||
source_url: None,
|
source_url: None,
|
||||||
observed_context: None,
|
observed_context: None,
|
||||||
};
|
};
|
||||||
vote_store.put_vote(&vote).await.expect("put vote");
|
vote_store.put_vote(&vote, "Tesla").await.expect("put vote");
|
||||||
|
|
||||||
let analysis = lens.analyze(&[a1, a2]).await;
|
let analysis = lens.analyze(&[a1, a2]).await;
|
||||||
|
|
||||||
|
|||||||
@ -46,10 +46,10 @@ pub use crate::vote_aware_consensus::AsyncLens;
|
|||||||
///
|
///
|
||||||
/// ```ignore
|
/// ```ignore
|
||||||
/// use stemedb_lens::TrustAwareAuthorityLens;
|
/// use stemedb_lens::TrustAwareAuthorityLens;
|
||||||
/// use stemedb_storage::{SledStore, GenericTrustRankStore};
|
/// use stemedb_storage::{HybridStore, GenericTrustRankStore};
|
||||||
/// use std::sync::Arc;
|
/// use std::sync::Arc;
|
||||||
///
|
///
|
||||||
/// let store = SledStore::open("./data").await?;
|
/// let store = HybridStore::open("./data").await?;
|
||||||
/// let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
/// let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
/// let lens = TrustAwareAuthorityLens::new(trust_store);
|
/// let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
///
|
///
|
||||||
@ -209,7 +209,7 @@ mod tests {
|
|||||||
use super::*;
|
use super::*;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::testing::AssertionBuilder;
|
use stemedb_core::testing::AssertionBuilder;
|
||||||
use stemedb_storage::{GenericTrustRankStore, SledStore, TrustRank, TrustRankStore};
|
use stemedb_storage::{GenericTrustRankStore, HybridStore, TrustRank, TrustRankStore};
|
||||||
|
|
||||||
fn create_assertion(
|
fn create_assertion(
|
||||||
subject: &str,
|
subject: &str,
|
||||||
@ -227,7 +227,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_candidates() {
|
async fn test_empty_candidates() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
@ -239,7 +239,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_single_candidate() {
|
async fn test_single_candidate() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
@ -254,7 +254,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_selects_highest_weighted_score() {
|
async fn test_selects_highest_weighted_score() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -288,7 +288,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_high_confidence_low_trust_vs_low_confidence_high_trust() {
|
async fn test_high_confidence_low_trust_vs_low_confidence_high_trust() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -319,7 +319,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_default_trust_for_new_agent() {
|
async fn test_default_trust_for_new_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
@ -336,7 +336,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_no_signatures_treated_as_untrusted() {
|
async fn test_no_signatures_treated_as_untrusted() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -360,7 +360,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_tie_breaking_by_timestamp() {
|
async fn test_tie_breaking_by_timestamp() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -383,7 +383,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_lens_name() {
|
async fn test_lens_name() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(trust_store);
|
let lens = TrustAwareAuthorityLens::new(trust_store);
|
||||||
|
|
||||||
@ -392,7 +392,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multiple_candidates_different_trust_levels() {
|
async fn test_multiple_candidates_different_trust_levels() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -430,7 +430,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_zero_trust_agent() {
|
async fn test_zero_trust_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
@ -460,7 +460,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_perfect_trust_agent() {
|
async fn test_perfect_trust_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store));
|
||||||
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
let lens = TrustAwareAuthorityLens::new(Arc::clone(&trust_store));
|
||||||
|
|
||||||
|
|||||||
@ -67,10 +67,10 @@ pub trait AsyncLens: Send + Sync {
|
|||||||
///
|
///
|
||||||
/// ```ignore
|
/// ```ignore
|
||||||
/// use stemedb_lens::VoteAwareConsensusLens;
|
/// use stemedb_lens::VoteAwareConsensusLens;
|
||||||
/// use stemedb_storage::{SledStore, GenericVoteStore};
|
/// use stemedb_storage::{HybridStore, GenericVoteStore};
|
||||||
/// use std::sync::Arc;
|
/// use std::sync::Arc;
|
||||||
///
|
///
|
||||||
/// let store = SledStore::open("./data").await?;
|
/// let store = HybridStore::open("./data").await?;
|
||||||
/// let vote_store = Arc::new(GenericVoteStore::new(store));
|
/// let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
/// let lens = VoteAwareConsensusLens::new(vote_store);
|
/// let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
///
|
///
|
||||||
@ -147,19 +147,23 @@ impl<V: VoteStore + 'static> AsyncLens for VoteAwareConsensusLens<V> {
|
|||||||
|
|
||||||
// Lookup vote count and aggregate weight from VoteStore
|
// Lookup vote count and aggregate weight from VoteStore
|
||||||
// These are O(1) operations thanks to VoteStore's cached counters
|
// These are O(1) operations thanks to VoteStore's cached counters
|
||||||
let vote_count = match self.vote_store.get_vote_count(&assertion_hash).await {
|
let vote_count =
|
||||||
Ok(count) => count,
|
match self.vote_store.get_vote_count(&assertion_hash, &assertion.subject).await {
|
||||||
Err(e) => {
|
Ok(count) => count,
|
||||||
debug!(
|
Err(e) => {
|
||||||
assertion_hash = %hex::encode(assertion_hash),
|
debug!(
|
||||||
error = %e,
|
assertion_hash = %hex::encode(assertion_hash),
|
||||||
"Failed to get vote count, treating as 0"
|
error = %e,
|
||||||
);
|
"Failed to get vote count, treating as 0"
|
||||||
0
|
);
|
||||||
}
|
0
|
||||||
};
|
}
|
||||||
|
};
|
||||||
|
|
||||||
let aggregate_weight = match self.vote_store.get_aggregate_weight(&assertion_hash).await
|
let aggregate_weight = match self
|
||||||
|
.vote_store
|
||||||
|
.get_aggregate_weight(&assertion_hash, &assertion.subject)
|
||||||
|
.await
|
||||||
{
|
{
|
||||||
Ok(weight) => weight,
|
Ok(weight) => weight,
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
@ -228,7 +232,7 @@ mod tests {
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::testing::{self, AssertionBuilder};
|
use stemedb_core::testing::{self, AssertionBuilder};
|
||||||
use stemedb_core::types::Vote;
|
use stemedb_core::types::Vote;
|
||||||
use stemedb_storage::{GenericVoteStore, SledStore};
|
use stemedb_storage::{GenericVoteStore, HybridStore};
|
||||||
|
|
||||||
fn create_assertion(subject: &str, value: f64, timestamp: u64) -> Assertion {
|
fn create_assertion(subject: &str, value: f64, timestamp: u64) -> Assertion {
|
||||||
AssertionBuilder::new().subject(subject).object_number(value).timestamp(timestamp).build()
|
AssertionBuilder::new().subject(subject).object_number(value).timestamp(timestamp).build()
|
||||||
@ -240,7 +244,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_candidates() {
|
async fn test_empty_candidates() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
|
|
||||||
@ -252,7 +256,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_single_candidate() {
|
async fn test_single_candidate() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
|
|
||||||
@ -266,7 +270,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_selects_highest_vote_weight() {
|
async fn test_selects_highest_vote_weight() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -277,19 +281,31 @@ mod tests {
|
|||||||
|
|
||||||
// Add votes: a1 gets 0.5 weight, a2 gets 1.5 weight (winner), a3 gets 0.3 weight
|
// Add votes: a1 gets 0.5 weight, a2 gets 1.5 weight (winner), a3 gets 0.3 weight
|
||||||
let hash1 =
|
let hash1 =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a1)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a1)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
let hash2 =
|
let hash2 =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a2)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a2)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
let hash3 =
|
let hash3 =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&a3)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&a3)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
vote_store.put_vote(&create_vote(hash1, [1u8; 32], 0.5, 2000)).await.expect("put");
|
vote_store
|
||||||
vote_store.put_vote(&create_vote(hash2, [2u8; 32], 0.8, 2001)).await.expect("put");
|
.put_vote(&create_vote(hash1, [1u8; 32], 0.5, 2000), "Agent1")
|
||||||
vote_store.put_vote(&create_vote(hash2, [3u8; 32], 0.7, 2002)).await.expect("put");
|
.await
|
||||||
vote_store.put_vote(&create_vote(hash3, [4u8; 32], 0.3, 2003)).await.expect("put");
|
.expect("put");
|
||||||
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash2, [2u8; 32], 0.8, 2001), "Agent2")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash2, [3u8; 32], 0.7, 2002), "Agent2")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash3, [4u8; 32], 0.3, 2003), "Agent3")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
|
||||||
let resolution = lens.resolve_async(&[a1, a2.clone(), a3]).await;
|
let resolution = lens.resolve_async(&[a1, a2.clone(), a3]).await;
|
||||||
|
|
||||||
@ -305,7 +321,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_no_votes_returns_most_recent() {
|
async fn test_no_votes_returns_most_recent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
|
|
||||||
@ -324,7 +340,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_tie_breaking_by_timestamp() {
|
async fn test_tie_breaking_by_timestamp() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -333,14 +349,20 @@ mod tests {
|
|||||||
|
|
||||||
// Give both the same vote weight
|
// Give both the same vote weight
|
||||||
let hash_old =
|
let hash_old =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&old)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&old)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
let hash_new =
|
let hash_new =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&new)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(&new)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
vote_store.put_vote(&create_vote(hash_old, [1u8; 32], 0.5, 3000)).await.expect("put");
|
vote_store
|
||||||
vote_store.put_vote(&create_vote(hash_new, [2u8; 32], 0.5, 3001)).await.expect("put");
|
.put_vote(&create_vote(hash_old, [1u8; 32], 0.5, 3000), "Old")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash_new, [2u8; 32], 0.5, 3001), "New")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
|
||||||
let resolution = lens.resolve_async(&[old, new.clone()]).await;
|
let resolution = lens.resolve_async(&[old, new.clone()]).await;
|
||||||
|
|
||||||
@ -351,7 +373,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_mixed_votes_and_no_votes() {
|
async fn test_mixed_votes_and_no_votes() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -359,12 +381,15 @@ mod tests {
|
|||||||
let without_votes = create_assertion("NoVotes", 200.0, 2000);
|
let without_votes = create_assertion("NoVotes", 200.0, 2000);
|
||||||
|
|
||||||
let hash_with =
|
let hash_with =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
|
||||||
&with_votes,
|
&with_votes,
|
||||||
)
|
)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
vote_store.put_vote(&create_vote(hash_with, [1u8; 32], 0.8, 3000)).await.expect("put");
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash_with, [1u8; 32], 0.8, 3000), "WithVotes")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
|
||||||
let resolution = lens.resolve_async(&[with_votes.clone(), without_votes]).await;
|
let resolution = lens.resolve_async(&[with_votes.clone(), without_votes]).await;
|
||||||
|
|
||||||
@ -378,7 +403,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_lens_name() {
|
async fn test_lens_name() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
|
|
||||||
@ -387,7 +412,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_many_votes_on_single_assertion() {
|
async fn test_many_votes_on_single_assertion() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -395,10 +420,12 @@ mod tests {
|
|||||||
let unpopular = create_assertion("Unpopular", 200.0, 1100);
|
let unpopular = create_assertion("Unpopular", 200.0, 1100);
|
||||||
|
|
||||||
let hash_popular =
|
let hash_popular =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(&popular)
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
|
||||||
.unwrap();
|
&popular,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
let hash_unpopular =
|
let hash_unpopular =
|
||||||
VoteAwareConsensusLens::<GenericVoteStore<SledStore>>::compute_assertion_hash(
|
VoteAwareConsensusLens::<GenericVoteStore<HybridStore>>::compute_assertion_hash(
|
||||||
&unpopular,
|
&unpopular,
|
||||||
)
|
)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@ -411,14 +438,14 @@ mod tests {
|
|||||||
id
|
id
|
||||||
};
|
};
|
||||||
vote_store
|
vote_store
|
||||||
.put_vote(&create_vote(hash_popular, agent_id, 0.5, 2000 + i as u64))
|
.put_vote(&create_vote(hash_popular, agent_id, 0.5, 2000 + i as u64), "Popular")
|
||||||
.await
|
.await
|
||||||
.expect("put");
|
.expect("put");
|
||||||
}
|
}
|
||||||
|
|
||||||
// Unpopular gets 1 vote
|
// Unpopular gets 1 vote
|
||||||
vote_store
|
vote_store
|
||||||
.put_vote(&create_vote(hash_unpopular, [99u8; 32], 0.5, 2100))
|
.put_vote(&create_vote(hash_unpopular, [99u8; 32], 0.5, 2100), "Unpopular")
|
||||||
.await
|
.await
|
||||||
.expect("put");
|
.expect("put");
|
||||||
|
|
||||||
|
|||||||
@ -10,7 +10,7 @@
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
use stemedb_core::types::Assertion;
|
use stemedb_core::types::Assertion;
|
||||||
use stemedb_storage::{IndexStore, KVStore, VectorIndex, VisualIndex};
|
use stemedb_storage::{key_codec, IndexStore, KVStore, VectorIndex, VisualIndex};
|
||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
|
|
||||||
use crate::error::{QueryError, Result};
|
use crate::error::{QueryError, Result};
|
||||||
@ -25,7 +25,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
|
|
||||||
let mut results = Vec::with_capacity(hash_list.len());
|
let mut results = Vec::with_capacity(hash_list.len());
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
Ok(assertion) => results.push(assertion),
|
Ok(assertion) => results.push(assertion),
|
||||||
@ -49,7 +49,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
|
|
||||||
let mut results = Vec::with_capacity(hash_list.len());
|
let mut results = Vec::with_capacity(hash_list.len());
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
Ok(assertion) => results.push(assertion),
|
Ok(assertion) => results.push(assertion),
|
||||||
@ -63,20 +63,34 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
Ok(results)
|
Ok(results)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Fetch all assertions (full scan of H: prefix).
|
/// Fetch all assertions by scanning the subjects discovery index.
|
||||||
///
|
///
|
||||||
/// This is O(n) and should be avoided for large databases.
|
/// This scans `\x00SUBJECTS:` to discover all known subjects, then fetches
|
||||||
|
/// all assertions per subject. This is O(n) and should be avoided for large databases.
|
||||||
/// Use subject/predicate indexes when possible.
|
/// Use subject/predicate indexes when possible.
|
||||||
pub(super) async fn fetch_all_assertions(&self) -> Result<Vec<Assertion>> {
|
pub(super) async fn fetch_all_assertions(&self) -> Result<Vec<Assertion>> {
|
||||||
let entries = self.store.scan_prefix(b"H:").await?;
|
// Discover all subjects via the subjects index
|
||||||
|
let subject_entries = self.store.scan_prefix(&key_codec::subjects_scan_prefix()).await?;
|
||||||
|
|
||||||
let mut assertions = Vec::with_capacity(entries.len());
|
let mut assertions = Vec::new();
|
||||||
for (_key, data) in entries {
|
for (key, _) in subject_entries {
|
||||||
match self.deserialize_assertion(&data) {
|
// Extract subject from key: \x00SUBJECTS:{subject}
|
||||||
Ok(assertion) => assertions.push(assertion),
|
let subject = match key_codec::extract_subject_from_subjects_key(&key) {
|
||||||
Err(e) => {
|
Some(s) => s,
|
||||||
debug!("Skipping malformed assertion: {:?}", e);
|
None => continue,
|
||||||
// Skip malformed entries rather than failing the whole query
|
};
|
||||||
|
|
||||||
|
// Fetch all assertions for this subject via the subject index
|
||||||
|
let hash_list = self.index_store.get_by_subject(&subject).await?;
|
||||||
|
for hash in hash_list {
|
||||||
|
let assertion_key = key_codec::assertion_key(&subject, &hex::encode(hash));
|
||||||
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
|
match self.deserialize_assertion(&data) {
|
||||||
|
Ok(assertion) => assertions.push(assertion),
|
||||||
|
Err(e) => {
|
||||||
|
debug!("Skipping malformed assertion: {:?}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -99,22 +113,39 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
|
|
||||||
debug!(candidates_count = neighbors.len(), "Vector index returned candidates");
|
debug!(candidates_count = neighbors.len(), "Vector index returned candidates");
|
||||||
|
|
||||||
// Fetch assertions by their hashes
|
// Fetch assertions by their hashes using reverse index for subject lookup
|
||||||
let mut results = Vec::with_capacity(neighbors.len());
|
let mut results = Vec::with_capacity(neighbors.len());
|
||||||
for (hash, distance) in neighbors {
|
for (hash, distance) in neighbors {
|
||||||
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let hash_hex = hex::encode(hash);
|
||||||
|
// Look up subject from reverse index
|
||||||
|
let reverse_key = key_codec::hash_subject_key(&hash_hex);
|
||||||
|
let subject = match self.store.get(&reverse_key).await? {
|
||||||
|
Some(bytes) => match String::from_utf8(bytes) {
|
||||||
|
Ok(s) => s,
|
||||||
|
Err(_) => {
|
||||||
|
debug!(hash = %hash_hex, "Invalid UTF-8 in reverse index, skipping");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
None => {
|
||||||
|
debug!(hash = %hash_hex, "No reverse index entry, skipping");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let assertion_key = key_codec::assertion_key(&subject, &hash_hex);
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
Ok(assertion) => {
|
Ok(assertion) => {
|
||||||
debug!(
|
debug!(
|
||||||
hash = %hex::encode(hash),
|
hash = %hash_hex,
|
||||||
distance,
|
distance,
|
||||||
"Found assertion via vector index"
|
"Found assertion via vector index"
|
||||||
);
|
);
|
||||||
results.push(assertion);
|
results.push(assertion);
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e);
|
debug!(hash = %hash_hex, "Skipping malformed assertion: {:?}", e);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -147,22 +178,39 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
|
|
||||||
debug!(candidates_count = matches.len(), threshold, "Visual index returned candidates");
|
debug!(candidates_count = matches.len(), threshold, "Visual index returned candidates");
|
||||||
|
|
||||||
// Fetch assertions by their hashes
|
// Fetch assertions by their hashes using reverse index for subject lookup
|
||||||
let mut results = Vec::with_capacity(matches.len());
|
let mut results = Vec::with_capacity(matches.len());
|
||||||
for (hash, distance) in matches {
|
for (hash, distance) in matches {
|
||||||
let assertion_key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let hash_hex = hex::encode(hash);
|
||||||
|
// Look up subject from reverse index
|
||||||
|
let reverse_key = key_codec::hash_subject_key(&hash_hex);
|
||||||
|
let subject = match self.store.get(&reverse_key).await? {
|
||||||
|
Some(bytes) => match String::from_utf8(bytes) {
|
||||||
|
Ok(s) => s,
|
||||||
|
Err(_) => {
|
||||||
|
debug!(hash = %hash_hex, "Invalid UTF-8 in reverse index, skipping");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
None => {
|
||||||
|
debug!(hash = %hash_hex, "No reverse index entry, skipping");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let assertion_key = key_codec::assertion_key(&subject, &hash_hex);
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
Ok(assertion) => {
|
Ok(assertion) => {
|
||||||
debug!(
|
debug!(
|
||||||
hash = %hex::encode(hash),
|
hash = %hash_hex,
|
||||||
distance,
|
distance,
|
||||||
"Found assertion via visual index"
|
"Found assertion via visual index"
|
||||||
);
|
);
|
||||||
results.push(assertion);
|
results.push(assertion);
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e);
|
debug!(hash = %hash_hex, "Skipping malformed assertion: {:?}", e);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -9,7 +9,7 @@
|
|||||||
use std::time::{SystemTime, UNIX_EPOCH};
|
use std::time::{SystemTime, UNIX_EPOCH};
|
||||||
|
|
||||||
use stemedb_core::types::{Assertion, MaterializedView};
|
use stemedb_core::types::{Assertion, MaterializedView};
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::{key_codec, KVStore};
|
||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
|
|
||||||
use crate::decay::{apply_decay, apply_source_class_decay};
|
use crate::decay::{apply_decay, apply_source_class_decay};
|
||||||
@ -35,7 +35,7 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
predicate: &str,
|
predicate: &str,
|
||||||
query: &Query,
|
query: &Query,
|
||||||
) -> Result<Option<QueryResult>> {
|
) -> Result<Option<QueryResult>> {
|
||||||
let mv_key = format!("MV:{}:{}", subject, predicate).into_bytes();
|
let mv_key = key_codec::mv_key(subject, predicate);
|
||||||
|
|
||||||
let data = match self.store.get(&mv_key).await? {
|
let data = match self.store.get(&mv_key).await? {
|
||||||
Some(data) => data,
|
Some(data) => data,
|
||||||
|
|||||||
@ -2,14 +2,14 @@
|
|||||||
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::LifecycleStage;
|
use stemedb_core::types::LifecycleStage;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
|
|
||||||
use super::{create_test_assertion, store_assertion, QueryEngine};
|
use super::{create_test_assertion, store_assertion, QueryEngine};
|
||||||
use crate::query::Query;
|
use crate::query::Query;
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_empty_store() {
|
async fn test_query_empty_store() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let engine = QueryEngine::new(Arc::new(store));
|
let engine = QueryEngine::new(Arc::new(store));
|
||||||
|
|
||||||
let query = Query::builder().subject("Tesla").build();
|
let query = Query::builder().subject("Tesla").build();
|
||||||
@ -21,7 +21,7 @@ async fn test_query_empty_store() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_by_subject() {
|
async fn test_query_by_subject() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let tesla = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let tesla = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let apple = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);
|
let apple = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);
|
||||||
@ -40,7 +40,7 @@ async fn test_query_by_subject() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_by_lifecycle() {
|
async fn test_query_by_lifecycle() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let proposed = create_test_assertion("Tesla", "profit", LifecycleStage::Proposed);
|
let proposed = create_test_assertion("Tesla", "profit", LifecycleStage::Proposed);
|
||||||
@ -60,7 +60,7 @@ async fn test_query_by_lifecycle() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_with_limit() {
|
async fn test_query_with_limit() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store multiple assertions
|
// Store multiple assertions
|
||||||
for i in 0..5 {
|
for i in 0..5 {
|
||||||
@ -81,7 +81,7 @@ async fn test_query_with_limit() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_all_filters() {
|
async fn test_query_all_filters() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let target = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let target = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let wrong_subject = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);
|
let wrong_subject = create_test_assertion("Apple", "revenue", LifecycleStage::Approved);
|
||||||
|
|||||||
@ -2,14 +2,14 @@
|
|||||||
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::{LifecycleStage, MaterializedView};
|
use stemedb_core::types::{LifecycleStage, MaterializedView};
|
||||||
use stemedb_storage::{KVStore, SledStore};
|
use stemedb_storage::{key_codec, HybridStore, KVStore};
|
||||||
|
|
||||||
use super::{create_test_assertion, store_assertion, QueryEngine};
|
use super::{create_test_assertion, store_assertion, QueryEngine};
|
||||||
use crate::query::Query;
|
use crate::query::Query;
|
||||||
|
|
||||||
/// Helper to store a materialized view with a custom conflict score.
|
/// Helper to store a materialized view with a custom conflict score.
|
||||||
async fn store_mv_with_conflict(
|
async fn store_mv_with_conflict(
|
||||||
store: &SledStore,
|
store: &Arc<HybridStore>,
|
||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
conflict_score: f32,
|
conflict_score: f32,
|
||||||
@ -26,14 +26,14 @@ async fn store_mv_with_conflict(
|
|||||||
conflict_score,
|
conflict_score,
|
||||||
};
|
};
|
||||||
|
|
||||||
let key = format!("MV:{}:{}", subject, predicate).into_bytes();
|
let key = key_codec::mv_key(subject, predicate);
|
||||||
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
||||||
store.put(&key, &bytes).await.expect("put MV");
|
store.put(&key, &bytes).await.expect("put MV");
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_min_conflict_score_returns_empty_when_below_threshold() {
|
async fn test_min_conflict_score_returns_empty_when_below_threshold() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with low conflict (agreement)
|
// Store MV with low conflict (agreement)
|
||||||
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
|
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
|
||||||
@ -55,7 +55,7 @@ async fn test_min_conflict_score_returns_empty_when_below_threshold() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_min_conflict_score_returns_result_when_above_threshold() {
|
async fn test_min_conflict_score_returns_result_when_above_threshold() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with high conflict (disagreement)
|
// Store MV with high conflict (disagreement)
|
||||||
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
|
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
|
||||||
@ -77,7 +77,7 @@ async fn test_min_conflict_score_returns_result_when_above_threshold() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_min_conflict_score_edge_case_exact_match() {
|
async fn test_min_conflict_score_edge_case_exact_match() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with conflict score exactly at threshold
|
// Store MV with conflict score exactly at threshold
|
||||||
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
|
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
|
||||||
@ -98,7 +98,7 @@ async fn test_min_conflict_score_edge_case_exact_match() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_max_conflict_score_returns_result_when_below_threshold() {
|
async fn test_max_conflict_score_returns_result_when_below_threshold() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with low conflict (agreement)
|
// Store MV with low conflict (agreement)
|
||||||
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
|
store_mv_with_conflict(&store, "Aspirin", "cardiovascular_benefit", 0.15).await;
|
||||||
@ -120,7 +120,7 @@ async fn test_max_conflict_score_returns_result_when_below_threshold() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_max_conflict_score_returns_empty_when_above_threshold() {
|
async fn test_max_conflict_score_returns_empty_when_above_threshold() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with high conflict (disagreement)
|
// Store MV with high conflict (disagreement)
|
||||||
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
|
store_mv_with_conflict(&store, "Semaglutide", "muscle_effect", 0.85).await;
|
||||||
@ -142,7 +142,7 @@ async fn test_max_conflict_score_returns_empty_when_above_threshold() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_max_conflict_score_edge_case_exact_match() {
|
async fn test_max_conflict_score_edge_case_exact_match() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with conflict score exactly at threshold
|
// Store MV with conflict score exactly at threshold
|
||||||
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
|
store_mv_with_conflict(&store, "Drug", "effect", 0.5).await;
|
||||||
@ -163,7 +163,7 @@ async fn test_max_conflict_score_edge_case_exact_match() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_both_conflict_scores_filters_range() {
|
async fn test_both_conflict_scores_filters_range() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MVs with different conflict scores
|
// Store MVs with different conflict scores
|
||||||
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await; // Too low
|
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await; // Too low
|
||||||
@ -213,7 +213,7 @@ async fn test_both_conflict_scores_filters_range() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_no_conflict_filters_returns_all() {
|
async fn test_no_conflict_filters_returns_all() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MVs with different conflict scores
|
// Store MVs with different conflict scores
|
||||||
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await;
|
store_mv_with_conflict(&store, "Drug_A", "effect", 0.1).await;
|
||||||
@ -233,7 +233,7 @@ async fn test_no_conflict_filters_returns_all() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_conflict_filters_combine_with_lifecycle() {
|
async fn test_conflict_filters_combine_with_lifecycle() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with high conflict and Approved lifecycle
|
// Store MV with high conflict and Approved lifecycle
|
||||||
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
|
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
|
||||||
@ -248,7 +248,7 @@ async fn test_conflict_filters_combine_with_lifecycle() {
|
|||||||
conflict_score: 0.8,
|
conflict_score: 0.8,
|
||||||
};
|
};
|
||||||
|
|
||||||
let key = b"MV:Drug:effect".to_vec();
|
let key = key_codec::mv_key("Drug", "effect");
|
||||||
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
||||||
store.put(&key, &bytes).await.expect("put MV");
|
store.put(&key, &bytes).await.expect("put MV");
|
||||||
|
|
||||||
@ -271,7 +271,7 @@ async fn test_conflict_filters_combine_with_lifecycle() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() {
|
async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store MV with high conflict but Approved lifecycle
|
// Store MV with high conflict but Approved lifecycle
|
||||||
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
|
let approved = create_test_assertion("Drug", "effect", LifecycleStage::Approved);
|
||||||
@ -286,7 +286,7 @@ async fn test_conflict_filters_with_wrong_lifecycle_returns_empty() {
|
|||||||
conflict_score: 0.8,
|
conflict_score: 0.8,
|
||||||
};
|
};
|
||||||
|
|
||||||
let key = b"MV:Drug:effect".to_vec();
|
let key = key_codec::mv_key("Drug", "effect");
|
||||||
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
||||||
store.put(&key, &bytes).await.expect("put MV");
|
store.put(&key, &bytes).await.expect("put MV");
|
||||||
|
|
||||||
|
|||||||
@ -2,14 +2,14 @@
|
|||||||
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::{LifecycleStage, ObjectValue};
|
use stemedb_core::types::{LifecycleStage, ObjectValue};
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
|
|
||||||
use super::{create_test_assertion, store_assertion, QueryEngine};
|
use super::{create_test_assertion, store_assertion, QueryEngine};
|
||||||
use crate::query::Query;
|
use crate::query::Query;
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_compound_index_lookup() {
|
async fn test_compound_index_lookup() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Create multiple assertions with different subject/predicate combinations
|
// Create multiple assertions with different subject/predicate combinations
|
||||||
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
@ -34,7 +34,7 @@ async fn test_compound_index_lookup() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_compound_index_multiple_assertions() {
|
async fn test_compound_index_multiple_assertions() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store multiple assertions with same subject+predicate but different values/timestamps
|
// Store multiple assertions with same subject+predicate but different values/timestamps
|
||||||
let mut assertion1 = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
let mut assertion1 = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
||||||
@ -66,7 +66,7 @@ async fn test_compound_index_multiple_assertions() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_subject_only_index_still_works() {
|
async fn test_subject_only_index_still_works() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let tesla_profit = create_test_assertion("Tesla", "profit", LifecycleStage::Approved);
|
let tesla_profit = create_test_assertion("Tesla", "profit", LifecycleStage::Approved);
|
||||||
@ -89,7 +89,7 @@ async fn test_subject_only_index_still_works() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_compound_index_empty_result() {
|
async fn test_compound_index_empty_result() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
|
|
||||||
|
|||||||
@ -3,7 +3,7 @@
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::{SystemTime, UNIX_EPOCH};
|
use std::time::{SystemTime, UNIX_EPOCH};
|
||||||
use stemedb_core::types::LifecycleStage;
|
use stemedb_core::types::LifecycleStage;
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
|
|
||||||
use super::{
|
use super::{
|
||||||
create_test_assertion, store_assertion, store_materialized_view,
|
create_test_assertion, store_assertion, store_materialized_view,
|
||||||
@ -13,7 +13,7 @@ use crate::query::Query;
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_returns_materialized_view() {
|
async fn test_fast_path_returns_materialized_view() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
store_assertion(&store, &assertion).await;
|
store_assertion(&store, &assertion).await;
|
||||||
@ -32,7 +32,7 @@ async fn test_fast_path_returns_materialized_view() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_falls_back_when_no_mv() {
|
async fn test_fast_path_falls_back_when_no_mv() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store assertion but NO materialized view
|
// Store assertion but NO materialized view
|
||||||
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
@ -50,7 +50,7 @@ async fn test_fast_path_falls_back_when_no_mv() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_respects_lifecycle_filter() {
|
async fn test_fast_path_respects_lifecycle_filter() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// MV winner is Approved
|
// MV winner is Approved
|
||||||
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let approved = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
@ -78,7 +78,7 @@ async fn test_fast_path_respects_lifecycle_filter() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_not_used_for_subject_only() {
|
async fn test_fast_path_not_used_for_subject_only() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
store_assertion(&store, &assertion).await;
|
store_assertion(&store, &assertion).await;
|
||||||
@ -95,7 +95,7 @@ async fn test_fast_path_not_used_for_subject_only() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_query_strategy_selection() {
|
async fn test_query_strategy_selection() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let tesla_revenue = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let apple_profit = create_test_assertion("Apple", "profit", LifecycleStage::Approved);
|
let apple_profit = create_test_assertion("Apple", "profit", LifecycleStage::Approved);
|
||||||
@ -127,7 +127,7 @@ async fn test_query_strategy_selection() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_stale_view_falls_back() {
|
async fn test_fast_path_stale_view_falls_back() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
// Store an assertion and multiple MVs with different timestamps
|
// Store an assertion and multiple MVs with different timestamps
|
||||||
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
@ -159,7 +159,7 @@ async fn test_fast_path_stale_view_falls_back() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_fresh_view_used() {
|
async fn test_fast_path_fresh_view_used() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let assertion = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
store_assertion(&store, &assertion).await;
|
store_assertion(&store, &assertion).await;
|
||||||
@ -186,7 +186,7 @@ async fn test_fast_path_fresh_view_used() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_no_max_stale_always_uses_mv() {
|
async fn test_fast_path_no_max_stale_always_uses_mv() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
||||||
@ -212,7 +212,7 @@ async fn test_fast_path_no_max_stale_always_uses_mv() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_max_stale_zero_rejects_old_mv() {
|
async fn test_fast_path_max_stale_zero_rejects_old_mv() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
||||||
@ -239,7 +239,7 @@ async fn test_fast_path_max_stale_zero_rejects_old_mv() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_fast_path_max_stale_zero_accepts_brand_new_mv() {
|
async fn test_fast_path_max_stale_zero_accepts_brand_new_mv() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
|
|
||||||
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
let mv_winner = create_test_assertion("Tesla", "revenue", LifecycleStage::Approved);
|
||||||
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
let other = create_test_assertion("Tesla", "revenue", LifecycleStage::Proposed);
|
||||||
|
|||||||
@ -1,11 +1,10 @@
|
|||||||
//! Test suite for QueryEngine.
|
//! Test suite for QueryEngine.
|
||||||
|
|
||||||
use rkyv::ser::serializers::AllocSerializer;
|
use std::sync::Arc;
|
||||||
use rkyv::ser::Serializer;
|
|
||||||
|
|
||||||
use stemedb_core::testing::AssertionBuilder;
|
use stemedb_core::testing::AssertionBuilder;
|
||||||
use stemedb_core::types::{Assertion, LifecycleStage, MaterializedView};
|
use stemedb_core::types::{Assertion, LifecycleStage, MaterializedView};
|
||||||
use stemedb_storage::{GenericIndexStore, IndexStore, KVStore, SledStore};
|
use stemedb_storage::{key_codec, GenericIndexStore, HybridStore, IndexStore, KVStore};
|
||||||
|
|
||||||
use super::QueryEngine;
|
use super::QueryEngine;
|
||||||
|
|
||||||
@ -32,13 +31,11 @@ pub(super) fn create_test_assertion(
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Helper to store an assertion in the KV store and update indexes.
|
/// Helper to store an assertion in the KV store and update indexes.
|
||||||
pub(super) async fn store_assertion(store: &SledStore, assertion: &Assertion) {
|
pub(super) async fn store_assertion(store: &Arc<HybridStore>, assertion: &Assertion) {
|
||||||
let mut serializer = AllocSerializer::<4096>::default();
|
let bytes = stemedb_core::serde::serialize(assertion).expect("serialize");
|
||||||
serializer.serialize_value(assertion).expect("serialize");
|
|
||||||
let bytes = serializer.into_serializer().into_inner();
|
|
||||||
|
|
||||||
let hash = blake3::hash(&bytes);
|
let hash = blake3::hash(&bytes);
|
||||||
let key = format!("H:{}", hash.to_hex()).into_bytes();
|
let key = key_codec::assertion_key(&assertion.subject, &hash.to_hex());
|
||||||
store.put(&key, &bytes).await.expect("put");
|
store.put(&key, &bytes).await.expect("put");
|
||||||
|
|
||||||
// Update indexes using IndexStore
|
// Update indexes using IndexStore
|
||||||
@ -52,7 +49,7 @@ pub(super) async fn store_assertion(store: &SledStore, assertion: &Assertion) {
|
|||||||
|
|
||||||
/// Helper to store a materialized view directly in the KV store.
|
/// Helper to store a materialized view directly in the KV store.
|
||||||
pub(super) async fn store_materialized_view(
|
pub(super) async fn store_materialized_view(
|
||||||
store: &SledStore,
|
store: &Arc<HybridStore>,
|
||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
winner: &Assertion,
|
winner: &Assertion,
|
||||||
@ -62,7 +59,7 @@ pub(super) async fn store_materialized_view(
|
|||||||
|
|
||||||
/// Helper to store a materialized view with a custom materialized_at timestamp.
|
/// Helper to store a materialized view with a custom materialized_at timestamp.
|
||||||
pub(super) async fn store_materialized_view_with_time(
|
pub(super) async fn store_materialized_view_with_time(
|
||||||
store: &SledStore,
|
store: &Arc<HybridStore>,
|
||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
winner: &Assertion,
|
winner: &Assertion,
|
||||||
@ -77,7 +74,7 @@ pub(super) async fn store_materialized_view_with_time(
|
|||||||
conflict_score: 0.1,
|
conflict_score: 0.1,
|
||||||
};
|
};
|
||||||
|
|
||||||
let key = format!("MV:{}:{}", subject, predicate).into_bytes();
|
let key = key_codec::mv_key(subject, predicate);
|
||||||
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
let bytes = stemedb_core::serde::serialize(&view).expect("serialize MV");
|
||||||
store.put(&key, &bytes).await.expect("put MV");
|
store.put(&key, &bytes).await.expect("put MV");
|
||||||
}
|
}
|
||||||
|
|||||||
@ -29,16 +29,10 @@ use crate::error::{QueryError, Result};
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::{Assertion, EscalationEvent, EscalationPolicy, MaterializedView};
|
use stemedb_core::types::{Assertion, EscalationEvent, EscalationPolicy, MaterializedView};
|
||||||
use stemedb_lens::AsyncLens;
|
use stemedb_lens::AsyncLens;
|
||||||
use stemedb_storage::{EscalationStore, GenericIndexStore, KVStore};
|
use stemedb_storage::{key_codec, EscalationStore, GenericIndexStore, KVStore};
|
||||||
use tokio::sync::Notify;
|
use tokio::sync::Notify;
|
||||||
use tracing::{debug, error, info, instrument, warn};
|
use tracing::{debug, error, info, instrument, warn};
|
||||||
|
|
||||||
/// Key prefix for materialized views.
|
|
||||||
const MV_PREFIX: &str = "MV:";
|
|
||||||
|
|
||||||
/// Key prefix for compound indexes (used to discover subject+predicate pairs).
|
|
||||||
const SP_PREFIX: &[u8] = b"SP:";
|
|
||||||
|
|
||||||
/// Report from a single materialization pass.
|
/// Report from a single materialization pass.
|
||||||
#[derive(Debug, Default)]
|
#[derive(Debug, Default)]
|
||||||
pub struct MaterializeReport {
|
pub struct MaterializeReport {
|
||||||
@ -64,11 +58,11 @@ pub struct MaterializeReport {
|
|||||||
///
|
///
|
||||||
/// ```ignore
|
/// ```ignore
|
||||||
/// use stemedb_query::Materializer;
|
/// use stemedb_query::Materializer;
|
||||||
/// use stemedb_storage::{SledStore, GenericVoteStore};
|
/// use stemedb_storage::{HybridStore, GenericVoteStore};
|
||||||
/// use stemedb_lens::VoteAwareConsensusLens;
|
/// use stemedb_lens::VoteAwareConsensusLens;
|
||||||
/// use std::sync::Arc;
|
/// use std::sync::Arc;
|
||||||
///
|
///
|
||||||
/// let store = Arc::new(SledStore::open("./data")?);
|
/// let store = Arc::new(HybridStore::open("./data")?);
|
||||||
/// let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
/// let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
/// let lens = VoteAwareConsensusLens::new(vote_store);
|
/// let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
///
|
///
|
||||||
@ -138,24 +132,32 @@ impl<S: KVStore + 'static> Materializer<S> {
|
|||||||
pub async fn step(&self) -> Result<MaterializeReport> {
|
pub async fn step(&self) -> Result<MaterializeReport> {
|
||||||
let mut report = MaterializeReport::default();
|
let mut report = MaterializeReport::default();
|
||||||
|
|
||||||
// Discover all subject+predicate pairs from SP: index
|
// Discover all subject+predicate pairs from subject-prefixed SP: keys
|
||||||
let sp_entries = self.store.scan_prefix(SP_PREFIX).await?;
|
// We scan all subjects first, then fetch their SP: keys
|
||||||
|
let subject_entries = self.store.scan_prefix(&key_codec::subjects_scan_prefix()).await?;
|
||||||
|
let mut sp_pairs: Vec<(String, String)> = Vec::new();
|
||||||
|
|
||||||
for (key, _value) in &sp_entries {
|
for (key, _) in &subject_entries {
|
||||||
report.pairs_scanned += 1;
|
let subject = match key_codec::extract_subject_from_subjects_key(key) {
|
||||||
|
Some(s) => s,
|
||||||
// Parse the SP:{subject}:{predicate} key
|
None => continue,
|
||||||
let (subject, predicate) = match Self::parse_sp_key(key) {
|
|
||||||
Some(pair) => pair,
|
|
||||||
None => {
|
|
||||||
warn!(key = %String::from_utf8_lossy(key), "Skipping malformed SP: key");
|
|
||||||
report.errors += 1;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// Scan this subject's SP: keys
|
||||||
|
let sp_prefix = key_codec::subject_predicate_scan_prefix(&subject);
|
||||||
|
let sp_entries = self.store.scan_prefix(&sp_prefix).await?;
|
||||||
|
for (sp_key, _) in sp_entries {
|
||||||
|
if let Some((s, p)) = key_codec::extract_sp_key(&sp_key) {
|
||||||
|
sp_pairs.push((s, p));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (subject, predicate) in &sp_pairs {
|
||||||
|
report.pairs_scanned += 1;
|
||||||
|
|
||||||
// Materialize this subject+predicate pair
|
// Materialize this subject+predicate pair
|
||||||
match self.materialize_pair(&subject, &predicate).await {
|
match self.materialize_pair(subject, predicate).await {
|
||||||
Ok(Some(view)) => {
|
Ok(Some(view)) => {
|
||||||
report.views_updated += 1;
|
report.views_updated += 1;
|
||||||
// Check escalation policies
|
// Check escalation policies
|
||||||
@ -244,8 +246,8 @@ impl<S: KVStore + 'static> Materializer<S> {
|
|||||||
materialized_at: now,
|
materialized_at: now,
|
||||||
};
|
};
|
||||||
|
|
||||||
// Write to MV:{subject}:{predicate}
|
// Write to {subject}\x00MV:{predicate}
|
||||||
let mv_key = Self::mv_key(subject, predicate);
|
let mv_key = key_codec::mv_key(subject, predicate);
|
||||||
let serialized = stemedb_core::serde::serialize(&view)
|
let serialized = stemedb_core::serde::serialize(&view)
|
||||||
.map_err(|e| QueryError::Deserialization(e.to_string()))?;
|
.map_err(|e| QueryError::Deserialization(e.to_string()))?;
|
||||||
self.store.put(&mv_key, &serialized).await?;
|
self.store.put(&mv_key, &serialized).await?;
|
||||||
@ -271,7 +273,7 @@ impl<S: KVStore + 'static> Materializer<S> {
|
|||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
) -> Result<Option<MaterializedView>> {
|
) -> Result<Option<MaterializedView>> {
|
||||||
let key = Self::mv_key(subject, predicate);
|
let key = key_codec::mv_key(subject, predicate);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let view: MaterializedView = stemedb_core::serde::deserialize(&data)
|
let view: MaterializedView = stemedb_core::serde::deserialize(&data)
|
||||||
@ -358,7 +360,7 @@ impl<S: KVStore + 'static> Materializer<S> {
|
|||||||
|
|
||||||
let mut candidates = Vec::with_capacity(hash_list.len());
|
let mut candidates = Vec::with_capacity(hash_list.len());
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
let key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&key).await? {
|
if let Some(data) = self.store.get(&key).await? {
|
||||||
match stemedb_core::serde::deserialize::<Assertion>(&data) {
|
match stemedb_core::serde::deserialize::<Assertion>(&data) {
|
||||||
Ok(assertion) => candidates.push(assertion),
|
Ok(assertion) => candidates.push(assertion),
|
||||||
@ -376,34 +378,6 @@ impl<S: KVStore + 'static> Materializer<S> {
|
|||||||
Ok(candidates)
|
Ok(candidates)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Parse a `SP:{subject}:{predicate}` key into its components.
|
|
||||||
///
|
|
||||||
/// Uses `rfind(':')` to split on the **last** colon, because ConceptPath
|
|
||||||
/// subjects contain `://` (e.g., `code://rust/citadeldb/auth/jwt`).
|
|
||||||
/// Predicates never contain `://`, so the last colon is always the separator.
|
|
||||||
///
|
|
||||||
/// Returns `None` if the key is malformed.
|
|
||||||
fn parse_sp_key(key: &[u8]) -> Option<(String, String)> {
|
|
||||||
let key_str = std::str::from_utf8(key).ok()?;
|
|
||||||
let without_prefix = key_str.strip_prefix("SP:")?;
|
|
||||||
|
|
||||||
// Split on the LAST colon — subjects may contain colons (e.g., scheme://)
|
|
||||||
let colon_pos = without_prefix.rfind(':')?;
|
|
||||||
if colon_pos == 0 || colon_pos == without_prefix.len() - 1 {
|
|
||||||
return None;
|
|
||||||
}
|
|
||||||
|
|
||||||
let subject = &without_prefix[..colon_pos];
|
|
||||||
let predicate = &without_prefix[colon_pos + 1..];
|
|
||||||
|
|
||||||
Some((subject.to_string(), predicate.to_string()))
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the MV key for a subject+predicate pair.
|
|
||||||
fn mv_key(subject: &str, predicate: &str) -> Vec<u8> {
|
|
||||||
format!("{}{}:{}", MV_PREFIX, subject, predicate).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Check if any escalation policies should trigger for this materialized view.
|
/// Check if any escalation policies should trigger for this materialized view.
|
||||||
///
|
///
|
||||||
/// If a policy triggers, write an escalation event to storage.
|
/// If a policy triggers, write an escalation event to storage.
|
||||||
|
|||||||
@ -3,7 +3,7 @@ use stemedb_core::testing::{self, AssertionBuilder};
|
|||||||
use stemedb_core::types::{EscalationLevel, EscalationPolicy, ObjectValue, Vote};
|
use stemedb_core::types::{EscalationLevel, EscalationPolicy, ObjectValue, Vote};
|
||||||
use stemedb_lens::VoteAwareConsensusLens;
|
use stemedb_lens::VoteAwareConsensusLens;
|
||||||
use stemedb_storage::{
|
use stemedb_storage::{
|
||||||
EscalationStore, GenericEscalationStore, GenericVoteStore, SledStore, VoteStore,
|
key_codec, EscalationStore, GenericEscalationStore, GenericVoteStore, HybridStore, VoteStore,
|
||||||
};
|
};
|
||||||
use tokio::sync::Notify;
|
use tokio::sync::Notify;
|
||||||
|
|
||||||
@ -16,17 +16,17 @@ fn create_assertion(subject: &str, predicate: &str, value: f64, timestamp: u64)
|
|||||||
.build()
|
.build()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Store an assertion at H:{hash} and update indexes.
|
/// Store an assertion at {subject}\x00H:{hash} and update indexes.
|
||||||
async fn store_assertion(store: &Arc<SledStore>, assertion: &Assertion) -> [u8; 32] {
|
async fn store_assertion(store: &Arc<HybridStore>, assertion: &Assertion) -> [u8; 32] {
|
||||||
use stemedb_storage::IndexStore;
|
use stemedb_storage::IndexStore;
|
||||||
|
|
||||||
let bytes = stemedb_core::serde::serialize(assertion).expect("serialize");
|
let bytes = stemedb_core::serde::serialize(assertion).expect("serialize");
|
||||||
let hash = blake3::hash(&bytes);
|
let hash = blake3::hash(&bytes);
|
||||||
let key = format!("H:{}", hash.to_hex()).into_bytes();
|
let assertion_hash: [u8; 32] = *hash.as_bytes();
|
||||||
|
let key = key_codec::assertion_key(&assertion.subject, &hash.to_hex());
|
||||||
store.put(&key, &bytes).await.expect("put");
|
store.put(&key, &bytes).await.expect("put");
|
||||||
|
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
let assertion_hash: [u8; 32] = *hash.as_bytes();
|
|
||||||
index_store
|
index_store
|
||||||
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
|
.add_to_indexes(&assertion.subject, &assertion.predicate, &assertion_hash)
|
||||||
.await
|
.await
|
||||||
@ -41,7 +41,7 @@ fn create_vote(assertion_hash: [u8; 32], agent_id: [u8; 32], weight: f32, timest
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_store() {
|
async fn test_empty_store() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store, Box::new(lens));
|
let materializer = Materializer::new(store, Box::new(lens));
|
||||||
@ -55,7 +55,7 @@ async fn test_empty_store() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_single_assertion_materialized() {
|
async fn test_single_assertion_materialized() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
||||||
@ -86,7 +86,7 @@ async fn test_single_assertion_materialized() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_vote_weighted_winner() {
|
async fn test_vote_weighted_winner() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
||||||
@ -98,9 +98,9 @@ async fn test_vote_weighted_winner() {
|
|||||||
let hash2 = store_assertion(&store, &a2).await;
|
let hash2 = store_assertion(&store, &a2).await;
|
||||||
|
|
||||||
// Give a2 more votes
|
// Give a2 more votes
|
||||||
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.3, 2000)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.3, 2000), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.8, 2001)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.8, 2001), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash2, [30u8; 32], 0.7, 2002)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash2, [30u8; 32], 0.7, 2002), "Tesla").await.expect("put");
|
||||||
|
|
||||||
// Materialize
|
// Materialize
|
||||||
let report = materializer.step().await.expect("step");
|
let report = materializer.step().await.expect("step");
|
||||||
@ -119,7 +119,7 @@ async fn test_vote_weighted_winner() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multiple_pairs_materialized() {
|
async fn test_multiple_pairs_materialized() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
||||||
@ -145,7 +145,7 @@ async fn test_multiple_pairs_materialized() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_idempotent_materialization() {
|
async fn test_idempotent_materialization() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
||||||
@ -163,7 +163,7 @@ async fn test_idempotent_materialization() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_no_mv_for_nonexistent_pair() {
|
async fn test_no_mv_for_nonexistent_pair() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store, Box::new(lens));
|
let materializer = Materializer::new(store, Box::new(lens));
|
||||||
@ -174,26 +174,21 @@ async fn test_no_mv_for_nonexistent_pair() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_parse_sp_key() {
|
async fn test_parse_sp_key() {
|
||||||
// Valid key
|
// Valid key (key_codec format: {subject}\x00SP:{predicate})
|
||||||
let key = b"SP:Tesla:revenue";
|
let key = key_codec::subject_predicate_key("Tesla", "revenue");
|
||||||
let result = Materializer::<SledStore>::parse_sp_key(key);
|
let result = key_codec::extract_sp_key(&key);
|
||||||
assert_eq!(result, Some(("Tesla".to_string(), "revenue".to_string())));
|
assert_eq!(result, Some(("Tesla".to_string(), "revenue".to_string())));
|
||||||
|
|
||||||
// Missing predicate
|
// Wrong prefix (subject index key, not SP: key)
|
||||||
let key = b"SP:Tesla";
|
let key = key_codec::subject_index_key("Tesla");
|
||||||
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
|
assert!(key_codec::extract_sp_key(&key).is_none());
|
||||||
|
|
||||||
// Empty subject
|
// ConceptPath subject with :// in scheme
|
||||||
let key = b"SP::revenue";
|
let key = key_codec::subject_predicate_key(
|
||||||
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
|
"code://rust/citadeldb/auth/jwt/audience_validation",
|
||||||
|
"config_value",
|
||||||
// Wrong prefix
|
);
|
||||||
let key = b"S:Tesla";
|
let result = key_codec::extract_sp_key(&key);
|
||||||
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
|
|
||||||
|
|
||||||
// ConceptPath subject with :// in scheme — must split on LAST colon
|
|
||||||
let key = b"SP:code://rust/citadeldb/auth/jwt/audience_validation:config_value";
|
|
||||||
let result = Materializer::<SledStore>::parse_sp_key(key);
|
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
result,
|
result,
|
||||||
Some((
|
Some((
|
||||||
@ -203,21 +198,18 @@ async fn test_parse_sp_key() {
|
|||||||
);
|
);
|
||||||
|
|
||||||
// ConceptPath with multiple scheme-like colons
|
// ConceptPath with multiple scheme-like colons
|
||||||
let key = b"SP:rfc://7519/jwt/audience_validation:must_validate";
|
let key =
|
||||||
let result = Materializer::<SledStore>::parse_sp_key(key);
|
key_codec::subject_predicate_key("rfc://7519/jwt/audience_validation", "must_validate");
|
||||||
|
let result = key_codec::extract_sp_key(&key);
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
result,
|
result,
|
||||||
Some(("rfc://7519/jwt/audience_validation".to_string(), "must_validate".to_string(),))
|
Some(("rfc://7519/jwt/audience_validation".to_string(), "must_validate".to_string(),))
|
||||||
);
|
);
|
||||||
|
|
||||||
// Empty predicate after ConceptPath subject
|
|
||||||
let key = b"SP:code://rust/citadeldb/auth/jwt:";
|
|
||||||
assert!(Materializer::<SledStore>::parse_sp_key(key).is_none());
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_materialize_pair_directly() {
|
async fn test_materialize_pair_directly() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
let materializer = Materializer::new(store.clone(), Box::new(lens));
|
||||||
@ -242,13 +234,17 @@ async fn test_materialize_pair_directly() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_mv_key_construction() {
|
async fn test_mv_key_construction() {
|
||||||
let key = Materializer::<SledStore>::mv_key("Tesla", "revenue");
|
let key = key_codec::mv_key("Tesla", "revenue");
|
||||||
assert_eq!(key, b"MV:Tesla:revenue");
|
// key_codec format: {subject}\x00MV:{predicate}
|
||||||
|
let mut expected = b"Tesla".to_vec();
|
||||||
|
expected.push(0x00);
|
||||||
|
expected.extend_from_slice(b"MV:revenue");
|
||||||
|
assert_eq!(key, expected);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_run_notified_triggers_on_notify() {
|
async fn test_run_notified_triggers_on_notify() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(vote_store);
|
let lens = VoteAwareConsensusLens::new(vote_store);
|
||||||
let materializer = Arc::new(Materializer::new(store.clone(), Box::new(lens)));
|
let materializer = Arc::new(Materializer::new(store.clone(), Box::new(lens)));
|
||||||
@ -285,7 +281,7 @@ async fn test_run_notified_triggers_on_notify() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_escalation_triggers_on_high_conflict() {
|
async fn test_escalation_triggers_on_high_conflict() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -311,8 +307,8 @@ async fn test_escalation_triggers_on_high_conflict() {
|
|||||||
let hash2 = store_assertion(&store, &a2).await;
|
let hash2 = store_assertion(&store, &a2).await;
|
||||||
|
|
||||||
// Give both some votes (not relevant for conflict, but for resolution)
|
// Give both some votes (not relevant for conflict, but for resolution)
|
||||||
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Tesla").await.expect("put");
|
||||||
|
|
||||||
// Materialize
|
// Materialize
|
||||||
let report = materializer.step().await.expect("step");
|
let report = materializer.step().await.expect("step");
|
||||||
@ -340,7 +336,7 @@ async fn test_escalation_triggers_on_high_conflict() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_escalation_does_not_trigger_on_low_conflict() {
|
async fn test_escalation_does_not_trigger_on_low_conflict() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -363,8 +359,8 @@ async fn test_escalation_does_not_trigger_on_low_conflict() {
|
|||||||
let hash2 = store_assertion(&store, &a2).await;
|
let hash2 = store_assertion(&store, &a2).await;
|
||||||
|
|
||||||
// Skewed votes create low conflict
|
// Skewed votes create low conflict
|
||||||
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.2, 2000)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.2, 2000), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.9, 2001)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.9, 2001), "Tesla").await.expect("put");
|
||||||
|
|
||||||
// Materialize
|
// Materialize
|
||||||
let report = materializer.step().await.expect("step");
|
let report = materializer.step().await.expect("step");
|
||||||
@ -379,7 +375,7 @@ async fn test_escalation_does_not_trigger_on_low_conflict() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_escalation_predicate_pattern_matching() {
|
async fn test_escalation_predicate_pattern_matching() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -403,8 +399,14 @@ async fn test_escalation_predicate_pattern_matching() {
|
|||||||
a2.confidence = 0.9; // High confidence
|
a2.confidence = 0.9; // High confidence
|
||||||
let hash1 = store_assertion(&store, &a1).await;
|
let hash1 = store_assertion(&store, &a1).await;
|
||||||
let hash2 = store_assertion(&store, &a2).await;
|
let hash2 = store_assertion(&store, &a2).await;
|
||||||
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put");
|
vote_store
|
||||||
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put");
|
.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Semaglutide")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
vote_store
|
||||||
|
.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Semaglutide")
|
||||||
|
.await
|
||||||
|
.expect("put");
|
||||||
|
|
||||||
let mut a3 = create_assertion("Tesla", "revenue", 96.7, 1000);
|
let mut a3 = create_assertion("Tesla", "revenue", 96.7, 1000);
|
||||||
a3.confidence = 0.3; // Low confidence
|
a3.confidence = 0.3; // Low confidence
|
||||||
@ -412,8 +414,8 @@ async fn test_escalation_predicate_pattern_matching() {
|
|||||||
a4.confidence = 1.0; // High confidence
|
a4.confidence = 1.0; // High confidence
|
||||||
let hash3 = store_assertion(&store, &a3).await;
|
let hash3 = store_assertion(&store, &a3).await;
|
||||||
let hash4 = store_assertion(&store, &a4).await;
|
let hash4 = store_assertion(&store, &a4).await;
|
||||||
vote_store.put_vote(&create_vote(hash3, [30u8; 32], 0.5, 2002)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash3, [30u8; 32], 0.5, 2002), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash4, [40u8; 32], 0.5, 2003)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash4, [40u8; 32], 0.5, 2003), "Tesla").await.expect("put");
|
||||||
|
|
||||||
// Materialize
|
// Materialize
|
||||||
let report = materializer.step().await.expect("step");
|
let report = materializer.step().await.expect("step");
|
||||||
@ -429,7 +431,7 @@ async fn test_escalation_predicate_pattern_matching() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_no_escalation_without_store() {
|
async fn test_no_escalation_without_store() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
|
|
||||||
@ -441,8 +443,8 @@ async fn test_no_escalation_without_store() {
|
|||||||
let a2 = create_assertion("Tesla", "revenue", 100.0, 1100);
|
let a2 = create_assertion("Tesla", "revenue", 100.0, 1100);
|
||||||
let hash1 = store_assertion(&store, &a1).await;
|
let hash1 = store_assertion(&store, &a1).await;
|
||||||
let hash2 = store_assertion(&store, &a2).await;
|
let hash2 = store_assertion(&store, &a2).await;
|
||||||
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash1, [10u8; 32], 0.5, 2000), "Tesla").await.expect("put");
|
||||||
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001)).await.expect("put");
|
vote_store.put_vote(&create_vote(hash2, [20u8; 32], 0.5, 2001), "Tesla").await.expect("put");
|
||||||
|
|
||||||
// Materialize
|
// Materialize
|
||||||
let report = materializer.step().await.expect("step");
|
let report = materializer.step().await.expect("step");
|
||||||
|
|||||||
@ -37,7 +37,7 @@ use stemedb_core::types::{ConflictAnalysis, EntityId, RelationId};
|
|||||||
use stemedb_lens::{AnalysisLens, SkepticLens};
|
use stemedb_lens::{AnalysisLens, SkepticLens};
|
||||||
use stemedb_storage::trust_rank_store::TrustRankStore;
|
use stemedb_storage::trust_rank_store::TrustRankStore;
|
||||||
use stemedb_storage::vote_store::VoteStore;
|
use stemedb_storage::vote_store::VoteStore;
|
||||||
use stemedb_storage::{GenericIndexStore, IndexStore, KVStore};
|
use stemedb_storage::{key_codec, GenericIndexStore, IndexStore, KVStore};
|
||||||
use tracing::instrument;
|
use tracing::instrument;
|
||||||
|
|
||||||
/// A "Trust but Verify" view that shows disagreement instead of hiding it.
|
/// A "Trust but Verify" view that shows disagreement instead of hiding it.
|
||||||
@ -96,7 +96,7 @@ where
|
|||||||
// Load all assertions
|
// Load all assertions
|
||||||
let mut candidates = Vec::with_capacity(hash_list.len());
|
let mut candidates = Vec::with_capacity(hash_list.len());
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
let key = format!("H:{}", hex::encode(hash)).into_bytes();
|
let key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&key).await? {
|
if let Some(data) = self.store.get(&key).await? {
|
||||||
if let Ok(assertion) = stemedb_core::serde::deserialize(&data) {
|
if let Ok(assertion) = stemedb_core::serde::deserialize(&data) {
|
||||||
candidates.push(assertion);
|
candidates.push(assertion);
|
||||||
@ -129,11 +129,11 @@ mod tests {
|
|||||||
use super::*;
|
use super::*;
|
||||||
use stemedb_core::testing::AssertionBuilder;
|
use stemedb_core::testing::AssertionBuilder;
|
||||||
use stemedb_core::types::ResolutionStatus;
|
use stemedb_core::types::ResolutionStatus;
|
||||||
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, SledStore};
|
use stemedb_storage::{GenericTrustRankStore, GenericVoteStore, HybridStore};
|
||||||
|
|
||||||
async fn store_assertion(
|
async fn store_assertion(
|
||||||
store: &Arc<SledStore>,
|
store: &Arc<HybridStore>,
|
||||||
index_store: &GenericIndexStore<Arc<SledStore>>,
|
index_store: &GenericIndexStore<Arc<HybridStore>>,
|
||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
value: f64,
|
value: f64,
|
||||||
@ -148,7 +148,7 @@ mod tests {
|
|||||||
|
|
||||||
let bytes = stemedb_core::serde::serialize(&assertion).expect("serialize");
|
let bytes = stemedb_core::serde::serialize(&assertion).expect("serialize");
|
||||||
let hash = blake3::hash(&bytes);
|
let hash = blake3::hash(&bytes);
|
||||||
let key = format!("H:{}", hash.to_hex()).into_bytes();
|
let key = key_codec::assertion_key(subject, &hash.to_hex());
|
||||||
store.put(&key, &bytes).await.expect("put");
|
store.put(&key, &bytes).await.expect("put");
|
||||||
|
|
||||||
let assertion_hash: [u8; 32] = *hash.as_bytes();
|
let assertion_hash: [u8; 32] = *hash.as_bytes();
|
||||||
@ -157,9 +157,9 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_empty() {
|
async fn test_resolve_empty() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new((*store).clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone()));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
|
||||||
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
||||||
|
|
||||||
let result = resolver.resolve("NonExistent", "predicate").await.expect("resolve");
|
let result = resolver.resolve("NonExistent", "predicate").await.expect("resolve");
|
||||||
@ -168,13 +168,13 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_single_claim() {
|
async fn test_resolve_single_claim() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
|
|
||||||
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
|
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
|
||||||
|
|
||||||
let vote_store = Arc::new(GenericVoteStore::new((*store).clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone()));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
|
||||||
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
||||||
|
|
||||||
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
||||||
@ -189,15 +189,15 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_contested_claims() {
|
async fn test_resolve_contested_claims() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
|
|
||||||
// Add two conflicting claims with equal weight
|
// Add two conflicting claims with equal weight
|
||||||
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.5).await;
|
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.5).await;
|
||||||
store_assertion(&store, &index_store, "Drug", "effect", 200.0, 0.5).await;
|
store_assertion(&store, &index_store, "Drug", "effect", 200.0, 0.5).await;
|
||||||
|
|
||||||
let vote_store = Arc::new(GenericVoteStore::new((*store).clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone()));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
|
||||||
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
||||||
|
|
||||||
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
||||||
@ -211,13 +211,13 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_includes_computed_at() {
|
async fn test_resolve_includes_computed_at() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
|
|
||||||
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
|
store_assertion(&store, &index_store, "Drug", "effect", 100.0, 0.9).await;
|
||||||
|
|
||||||
let vote_store = Arc::new(GenericVoteStore::new((*store).clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
let trust_store = Arc::new(GenericTrustRankStore::new((*store).clone()));
|
let trust_store = Arc::new(GenericTrustRankStore::new(store.clone()));
|
||||||
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
let resolver = SkepticResolver::new(store, vote_store, trust_store);
|
||||||
|
|
||||||
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
let result = resolver.resolve("Drug", "effect").await.expect("resolve");
|
||||||
|
|||||||
@ -17,7 +17,7 @@ use stemedb_core::testing::AssertionBuilder;
|
|||||||
use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry};
|
use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry};
|
||||||
use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
|
use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
|
||||||
use stemedb_query::{Query, QueryEngine};
|
use stemedb_query::{Query, QueryEngine};
|
||||||
use stemedb_storage::{KVStore, SledStore};
|
use stemedb_storage::{key_codec, HybridStore, KVStore};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tempfile::tempdir;
|
use tempfile::tempdir;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
@ -100,15 +100,27 @@ async fn test_e2e_decay_reduces_old_confidence() {
|
|||||||
journal.append(serialize_assertion(&new_assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&new_assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
|
|
||||||
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
|
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
|
||||||
worker.step().await.expect("step 1");
|
worker.step().await.expect("step 1");
|
||||||
worker.step().await.expect("step 2");
|
worker.step().await.expect("step 2");
|
||||||
|
|
||||||
// Verify both assertions are stored
|
// Verify both assertions are stored (check via subject-scoped assertion keys)
|
||||||
let h_entries = store.scan_prefix(b"H:").await.expect("scan");
|
let old_hash =
|
||||||
assert_eq!(h_entries.len(), 2, "should have two assertions");
|
*blake3::hash(&stemedb_core::serde::serialize(&old_assertion).expect("ser")).as_bytes();
|
||||||
|
let new_hash =
|
||||||
|
*blake3::hash(&stemedb_core::serde::serialize(&new_assertion).expect("ser")).as_bytes();
|
||||||
|
let old_key = key_codec::assertion_key("Semaglutide", &hex::encode(old_hash));
|
||||||
|
let new_key = key_codec::assertion_key("Semaglutide", &hex::encode(new_hash));
|
||||||
|
assert!(
|
||||||
|
store.get(&old_key).await.expect("get old").is_some(),
|
||||||
|
"old assertion should be stored"
|
||||||
|
);
|
||||||
|
assert!(
|
||||||
|
store.get(&new_key).await.expect("get new").is_some(),
|
||||||
|
"new assertion should be stored"
|
||||||
|
);
|
||||||
|
|
||||||
// Query WITHOUT decay: old assertion wins (0.95 > 0.6)
|
// Query WITHOUT decay: old assertion wins (0.95 > 0.6)
|
||||||
let engine = QueryEngine::new(store.clone());
|
let engine = QueryEngine::new(store.clone());
|
||||||
|
|||||||
@ -23,7 +23,7 @@ use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry
|
|||||||
use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
|
use stemedb_ingest::worker::{serialize_assertion, IngestWorker};
|
||||||
use stemedb_lens::{RecencyLens, SyncLensWrapper, VoteAwareConsensusLens};
|
use stemedb_lens::{RecencyLens, SyncLensWrapper, VoteAwareConsensusLens};
|
||||||
use stemedb_query::{Materializer, Query, QueryEngine};
|
use stemedb_query::{Materializer, Query, QueryEngine};
|
||||||
use stemedb_storage::{GenericVoteStore, KVStore, SledStore, VoteStore};
|
use stemedb_storage::{key_codec, GenericVoteStore, HybridStore, KVStore, VoteStore};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tempfile::tempdir;
|
use tempfile::tempdir;
|
||||||
use tokio::sync::{Mutex, Notify};
|
use tokio::sync::{Mutex, Notify};
|
||||||
@ -114,7 +114,7 @@ async fn test_e2e_write_materialize_read() {
|
|||||||
|
|
||||||
// === Step 2: Run IngestWorker to process WAL ===
|
// === Step 2: Run IngestWorker to process WAL ===
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
let notify = Arc::new(Notify::new());
|
let notify = Arc::new(Notify::new());
|
||||||
|
|
||||||
let mut worker = IngestWorker::new(journal.clone(), store.clone())
|
let mut worker = IngestWorker::new(journal.clone(), store.clone())
|
||||||
@ -125,15 +125,15 @@ async fn test_e2e_write_materialize_read() {
|
|||||||
let bytes_processed = worker.step().await.expect("ingest step");
|
let bytes_processed = worker.step().await.expect("ingest step");
|
||||||
assert!(bytes_processed > 0, "should have processed data from WAL");
|
assert!(bytes_processed > 0, "should have processed data from WAL");
|
||||||
|
|
||||||
// Verify assertion stored at H:{hash}
|
// Verify assertion stored at {subject}\x00H:{hash}
|
||||||
let assertion_hash = compute_assertion_hash(&assertion);
|
let assertion_hash = compute_assertion_hash(&assertion);
|
||||||
let h_key = format!("H:{}", hex::encode(assertion_hash)).into_bytes();
|
let h_key = key_codec::assertion_key("Tesla_Inc", &hex::encode(assertion_hash));
|
||||||
let stored = store.get(&h_key).await.expect("get assertion");
|
let stored = store.get(&h_key).await.expect("get assertion");
|
||||||
assert!(stored.is_some(), "assertion should be stored at H: key");
|
assert!(stored.is_some(), "assertion should be stored at H: key");
|
||||||
|
|
||||||
// Verify compound index SP:{subject}:{predicate} created
|
// Verify compound index {subject}\x00SP:{predicate} created
|
||||||
let sp_key = b"SP:Tesla_Inc:has_revenue";
|
let sp_prefix = key_codec::subject_predicate_scan_prefix("Tesla_Inc");
|
||||||
let sp_entries = store.scan_prefix(sp_key).await.expect("scan SP: prefix");
|
let sp_entries = store.scan_prefix(&sp_prefix).await.expect("scan SP: prefix");
|
||||||
assert_eq!(sp_entries.len(), 1, "should have one SP: index entry");
|
assert_eq!(sp_entries.len(), 1, "should have one SP: index entry");
|
||||||
|
|
||||||
// === Step 3: Run Materializer ===
|
// === Step 3: Run Materializer ===
|
||||||
@ -145,9 +145,9 @@ async fn test_e2e_write_materialize_read() {
|
|||||||
assert_eq!(report.pairs_scanned, 1, "should scan one subject+predicate pair");
|
assert_eq!(report.pairs_scanned, 1, "should scan one subject+predicate pair");
|
||||||
assert_eq!(report.views_updated, 1, "should update one materialized view");
|
assert_eq!(report.views_updated, 1, "should update one materialized view");
|
||||||
|
|
||||||
// Verify MV:{subject}:{predicate} written
|
// Verify {subject}\x00MV:{predicate} written
|
||||||
let mv_key = b"MV:Tesla_Inc:has_revenue";
|
let mv_key = key_codec::mv_key("Tesla_Inc", "has_revenue");
|
||||||
let mv_data = store.get(mv_key).await.expect("get MV");
|
let mv_data = store.get(&mv_key).await.expect("get MV");
|
||||||
assert!(mv_data.is_some(), "materialized view should exist");
|
assert!(mv_data.is_some(), "materialized view should exist");
|
||||||
|
|
||||||
// === Step 4: Query via QueryEngine ===
|
// === Step 4: Query via QueryEngine ===
|
||||||
@ -186,7 +186,7 @@ async fn test_e2e_vote_consensus() {
|
|||||||
|
|
||||||
// Ingest both
|
// Ingest both
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
|
|
||||||
let mut worker =
|
let mut worker =
|
||||||
IngestWorker::new(journal.clone(), store.clone()).await.expect("create worker");
|
IngestWorker::new(journal.clone(), store.clone()).await.expect("create worker");
|
||||||
@ -197,25 +197,28 @@ async fn test_e2e_vote_consensus() {
|
|||||||
let bytes2 = worker.step().await.expect("step 2");
|
let bytes2 = worker.step().await.expect("step 2");
|
||||||
assert!(bytes2 > 0, "should ingest second assertion");
|
assert!(bytes2 > 0, "should ingest second assertion");
|
||||||
|
|
||||||
|
// Compute hashes for both assertions
|
||||||
|
let hash_a = compute_assertion_hash(&assertion_a);
|
||||||
|
let hash_b = compute_assertion_hash(&assertion_b);
|
||||||
|
|
||||||
// Verify both are stored
|
// Verify both are stored
|
||||||
let h_entries = store.scan_prefix(b"H:").await.expect("scan H:");
|
let h_key_a = key_codec::assertion_key("Semaglutide", &hex::encode(hash_a));
|
||||||
assert_eq!(h_entries.len(), 2, "should have two assertions");
|
let h_key_b = key_codec::assertion_key("Semaglutide", &hex::encode(hash_b));
|
||||||
|
assert!(store.get(&h_key_a).await.expect("get a").is_some(), "assertion_a should be stored");
|
||||||
|
assert!(store.get(&h_key_b).await.expect("get b").is_some(), "assertion_b should be stored");
|
||||||
|
|
||||||
// Add votes via VoteStore
|
// Add votes via VoteStore
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
|
|
||||||
let hash_a = compute_assertion_hash(&assertion_a);
|
|
||||||
let hash_b = compute_assertion_hash(&assertion_b);
|
|
||||||
|
|
||||||
// assertion_a gets 3 votes (total weight = 2.7)
|
// assertion_a gets 3 votes (total weight = 2.7)
|
||||||
for i in 0..3 {
|
for i in 0..3 {
|
||||||
let vote = create_vote(hash_a, i, 0.9, 2000 + i as u64);
|
let vote = create_vote(hash_a, i, 0.9, 2000 + i as u64);
|
||||||
vote_store.put_vote(&vote).await.expect("put vote for a");
|
vote_store.put_vote(&vote, "Semaglutide").await.expect("put vote for a");
|
||||||
}
|
}
|
||||||
|
|
||||||
// assertion_b gets 1 vote (total weight = 0.2)
|
// assertion_b gets 1 vote (total weight = 0.2)
|
||||||
let vote_b = create_vote(hash_b, 10, 0.2, 2100);
|
let vote_b = create_vote(hash_b, 10, 0.2, 2100);
|
||||||
vote_store.put_vote(&vote_b).await.expect("put vote for b");
|
vote_store.put_vote(&vote_b, "Semaglutide").await.expect("put vote for b");
|
||||||
|
|
||||||
// Materialize with VoteAwareConsensusLens
|
// Materialize with VoteAwareConsensusLens
|
||||||
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
let lens = VoteAwareConsensusLens::new(Arc::clone(&vote_store));
|
||||||
@ -258,7 +261,7 @@ async fn test_e2e_update_winner() {
|
|||||||
journal.append(serialize_assertion(&assertion_v1).expect("ser")).expect("append v1");
|
journal.append(serialize_assertion(&assertion_v1).expect("ser")).expect("append v1");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
|
|
||||||
// Ingest v1
|
// Ingest v1
|
||||||
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
|
let mut worker = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker");
|
||||||
@ -294,8 +297,12 @@ async fn test_e2e_update_winner() {
|
|||||||
assert!(bytes2 > 0, "should process new assertion");
|
assert!(bytes2 > 0, "should process new assertion");
|
||||||
|
|
||||||
// Verify both assertions are now stored
|
// Verify both assertions are now stored
|
||||||
let h_entries = store.scan_prefix(b"H:").await.expect("scan");
|
let hash_v1 = compute_assertion_hash(&assertion_v1);
|
||||||
assert_eq!(h_entries.len(), 2, "should have two assertions");
|
let hash_v2 = compute_assertion_hash(&assertion_v2);
|
||||||
|
let key_v1 = key_codec::assertion_key("Apple_Inc", &hex::encode(hash_v1));
|
||||||
|
let key_v2 = key_codec::assertion_key("Apple_Inc", &hex::encode(hash_v2));
|
||||||
|
assert!(store.get(&key_v1).await.expect("get v1").is_some(), "v1 should be stored");
|
||||||
|
assert!(store.get(&key_v2).await.expect("get v2").is_some(), "v2 should be stored");
|
||||||
|
|
||||||
// Re-materialize
|
// Re-materialize
|
||||||
let lens2 = SyncLensWrapper(RecencyLens);
|
let lens2 = SyncLensWrapper(RecencyLens);
|
||||||
@ -334,7 +341,7 @@ async fn test_e2e_cursor_persistence() {
|
|||||||
journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&a3).expect("ser")).expect("append");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
|
|
||||||
// Worker 1: Process first 2 assertions
|
// Worker 1: Process first 2 assertions
|
||||||
let mut worker1 = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker1");
|
let mut worker1 = IngestWorker::new(journal.clone(), store.clone()).await.expect("worker1");
|
||||||
@ -342,8 +349,12 @@ async fn test_e2e_cursor_persistence() {
|
|||||||
worker1.step().await.expect("step 2");
|
worker1.step().await.expect("step 2");
|
||||||
|
|
||||||
// Verify 2 assertions stored
|
// Verify 2 assertions stored
|
||||||
let h_entries = store.scan_prefix(b"H:").await.expect("scan");
|
let hash1 = compute_assertion_hash(&a1);
|
||||||
assert_eq!(h_entries.len(), 2, "worker1 should have processed 2 assertions");
|
let hash2 = compute_assertion_hash(&a2);
|
||||||
|
let key1 = key_codec::assertion_key("Entity_A", &hex::encode(hash1));
|
||||||
|
let key2 = key_codec::assertion_key("Entity_B", &hex::encode(hash2));
|
||||||
|
assert!(store.get(&key1).await.expect("get a1").is_some(), "a1 should be stored");
|
||||||
|
assert!(store.get(&key2).await.expect("get a2").is_some(), "a2 should be stored");
|
||||||
|
|
||||||
// Drop worker1, simulate restart
|
// Drop worker1, simulate restart
|
||||||
drop(worker1);
|
drop(worker1);
|
||||||
@ -358,8 +369,9 @@ async fn test_e2e_cursor_persistence() {
|
|||||||
assert_eq!(steps, 1, "worker2 should only process 1 new assertion");
|
assert_eq!(steps, 1, "worker2 should only process 1 new assertion");
|
||||||
|
|
||||||
// Verify all 3 assertions now stored
|
// Verify all 3 assertions now stored
|
||||||
let h_entries = store.scan_prefix(b"H:").await.expect("scan");
|
let hash3 = compute_assertion_hash(&a3);
|
||||||
assert_eq!(h_entries.len(), 3, "should have all 3 assertions");
|
let key3 = key_codec::assertion_key("Entity_C", &hex::encode(hash3));
|
||||||
|
assert!(store.get(&key3).await.expect("get a3").is_some(), "a3 should be stored");
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Test: Event-driven materialization via Notify.
|
/// Test: Event-driven materialization via Notify.
|
||||||
@ -378,7 +390,7 @@ async fn test_e2e_notify_integration() {
|
|||||||
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
journal.append(serialize_assertion(&assertion).expect("ser")).expect("append");
|
||||||
|
|
||||||
let journal = Arc::new(Mutex::new(journal));
|
let journal = Arc::new(Mutex::new(journal));
|
||||||
let store = Arc::new(SledStore::open(&db_dir).expect("open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("open store"));
|
||||||
let notify = Arc::new(Notify::new());
|
let notify = Arc::new(Notify::new());
|
||||||
|
|
||||||
// Track if notification was received
|
// Track if notification was received
|
||||||
|
|||||||
@ -11,7 +11,7 @@ use tracing::debug;
|
|||||||
|
|
||||||
use crate::agent::Agent;
|
use crate::agent::Agent;
|
||||||
use crate::helpers::{
|
use crate::helpers::{
|
||||||
verify_assertion_text, wait_until_ingested, write_assertion_to_wal, CURSOR_KEY,
|
cursor_key, verify_assertion_text, wait_until_ingested, write_assertion_to_wal,
|
||||||
};
|
};
|
||||||
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
||||||
|
|
||||||
@ -48,7 +48,7 @@ pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
|
|||||||
);
|
);
|
||||||
|
|
||||||
// Check cursor state before writing
|
// Check cursor state before writing
|
||||||
let cursor_before = match store.get(CURSOR_KEY).await {
|
let cursor_before = match store.get(&cursor_key()).await {
|
||||||
Ok(Some(bytes)) => {
|
Ok(Some(bytes)) => {
|
||||||
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
||||||
u64::from_le_bytes(arr)
|
u64::from_le_bytes(arr)
|
||||||
|
|||||||
@ -5,7 +5,7 @@ use std::time::{Duration, Instant};
|
|||||||
use stemedb_core::serde::serialize;
|
use stemedb_core::serde::serialize;
|
||||||
use stemedb_core::types::{Assertion, Hash, Vote};
|
use stemedb_core::types::{Assertion, Hash, Vote};
|
||||||
use stemedb_ingest::{serialize_assertion, serialize_vote};
|
use stemedb_ingest::{serialize_assertion, serialize_vote};
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::{key_codec, KVStore};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
@ -68,7 +68,10 @@ pub(crate) fn compute_assertion_hash(assertion: &Assertion) -> Hash {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// The cursor key used by the ingestor to track its progress.
|
/// The cursor key used by the ingestor to track its progress.
|
||||||
pub(crate) const CURSOR_KEY: &[u8] = b"__CURSOR__:ingest";
|
/// Uses key_codec format: `\x00META:cursor:ingest`
|
||||||
|
pub(crate) fn cursor_key() -> Vec<u8> {
|
||||||
|
key_codec::cursor_key()
|
||||||
|
}
|
||||||
|
|
||||||
/// Wait until the ingestor cursor reaches or exceeds the target offset.
|
/// Wait until the ingestor cursor reaches or exceeds the target offset.
|
||||||
///
|
///
|
||||||
@ -96,7 +99,7 @@ pub(crate) async fn wait_until_ingested<S: KVStore>(
|
|||||||
|
|
||||||
loop {
|
loop {
|
||||||
// Read current cursor position
|
// Read current cursor position
|
||||||
if let Ok(Some(bytes)) = store.get(CURSOR_KEY).await {
|
if let Ok(Some(bytes)) = store.get(&cursor_key()).await {
|
||||||
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
||||||
let cursor = u64::from_le_bytes(arr);
|
let cursor = u64::from_le_bytes(arr);
|
||||||
// Use > (strictly greater) because journal.append() returns the START offset
|
// Use > (strictly greater) because journal.append() returns the START offset
|
||||||
|
|||||||
@ -5,7 +5,7 @@ use std::sync::Arc;
|
|||||||
use stemedb_core::types::{LifecycleStage, ObjectValue};
|
use stemedb_core::types::{LifecycleStage, ObjectValue};
|
||||||
use stemedb_ingest::Ingestor;
|
use stemedb_ingest::Ingestor;
|
||||||
use stemedb_query::{Query, QueryEngine};
|
use stemedb_query::{Query, QueryEngine};
|
||||||
use stemedb_storage::SledStore;
|
use stemedb_storage::HybridStore;
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
use tracing::{debug, info, warn};
|
use tracing::{debug, info, warn};
|
||||||
@ -61,7 +61,7 @@ pub async fn run_simulation(
|
|||||||
.map_err(|e| SimulationSetupError::JournalOpen(e.to_string()))?,
|
.map_err(|e| SimulationSetupError::JournalOpen(e.to_string()))?,
|
||||||
));
|
));
|
||||||
let store = Arc::new(
|
let store = Arc::new(
|
||||||
SledStore::open(temp_db_dir.path())
|
HybridStore::open(temp_db_dir.path())
|
||||||
.map_err(|e| SimulationSetupError::StoreOpen(e.to_string()))?,
|
.map_err(|e| SimulationSetupError::StoreOpen(e.to_string()))?,
|
||||||
);
|
);
|
||||||
|
|
||||||
|
|||||||
@ -10,18 +10,27 @@ workspace = true
|
|||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
stemedb-core = { path = "../stemedb-core" }
|
stemedb-core = { path = "../stemedb-core" }
|
||||||
sled = "0.34"
|
fjall = "2"
|
||||||
|
redb = "2"
|
||||||
|
dashmap = "6"
|
||||||
|
tempfile = "3.10"
|
||||||
thiserror = "1.0"
|
thiserror = "1.0"
|
||||||
tracing = "0.1"
|
tracing = "0.1"
|
||||||
async-trait = "0.1"
|
async-trait = "0.1"
|
||||||
blake3 = "1.5"
|
blake3 = "1.5"
|
||||||
hex = "0.4"
|
hex = "0.4"
|
||||||
|
memchr = "2"
|
||||||
rkyv = { version = "0.7", features = ["validation"] }
|
rkyv = { version = "0.7", features = ["validation"] }
|
||||||
# HNSW vector index for k-NN similarity search
|
# HNSW vector index for k-NN similarity search
|
||||||
hnsw_rs = "0.3"
|
hnsw_rs = "0.3"
|
||||||
# Thread-safe read-write locks for index access
|
# Thread-safe read-write locks for index access
|
||||||
parking_lot = "0.12"
|
parking_lot = "0.12"
|
||||||
|
tokio = { version = "1", features = ["sync", "rt"] }
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
tokio = { version = "1", features = ["macros", "rt"] }
|
tokio = { version = "1", features = ["macros", "rt", "rt-multi-thread"] }
|
||||||
tempfile = "3.10"
|
criterion = { version = "0.5", features = ["html_reports", "async_tokio"] }
|
||||||
|
|
||||||
|
[[bench]]
|
||||||
|
name = "kv_store"
|
||||||
|
harness = false
|
||||||
|
|||||||
145
crates/stemedb-storage/benches/kv_store.rs
Normal file
145
crates/stemedb-storage/benches/kv_store.rs
Normal file
@ -0,0 +1,145 @@
|
|||||||
|
#![allow(missing_docs, clippy::unwrap_used, clippy::expect_used)]
|
||||||
|
|
||||||
|
use criterion::{criterion_group, criterion_main, Criterion};
|
||||||
|
use stemedb_storage::key_codec;
|
||||||
|
use stemedb_storage::{HybridStore, KVStore};
|
||||||
|
use tokio::runtime::Runtime;
|
||||||
|
|
||||||
|
fn sequential_put(c: &mut Criterion) {
|
||||||
|
let rt = Runtime::new().expect("runtime");
|
||||||
|
let store = HybridStore::open_temp().expect("store");
|
||||||
|
|
||||||
|
c.bench_function("sequential_put_10k", |b| {
|
||||||
|
b.iter(|| {
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..10_000u64 {
|
||||||
|
let hash_hex = format!("bench_{}", i);
|
||||||
|
let key = key_codec::assertion_key("Bench", &hash_hex);
|
||||||
|
let value = format!("value_{}", i);
|
||||||
|
store.put(&key, value.as_bytes()).await.unwrap();
|
||||||
|
}
|
||||||
|
})
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
fn random_get(c: &mut Criterion) {
|
||||||
|
let rt = Runtime::new().expect("runtime");
|
||||||
|
let store = HybridStore::open_temp().expect("store");
|
||||||
|
|
||||||
|
// Pre-populate (read-heavy keys → redb via S: tag)
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..10_000u64 {
|
||||||
|
let key = key_codec::subject_predicate_key("Bench", &format!("pred_{}", i));
|
||||||
|
let value = format!("value_{}", i);
|
||||||
|
store.put(&key, value.as_bytes()).await.unwrap();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("random_get_10k", |b| {
|
||||||
|
b.iter(|| {
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..10_000u64 {
|
||||||
|
let key = key_codec::subject_predicate_key("Bench", &format!("pred_{}", i));
|
||||||
|
let _ = store.get(&key).await.unwrap();
|
||||||
|
}
|
||||||
|
})
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
fn prefix_scan(c: &mut Criterion) {
|
||||||
|
let rt = Runtime::new().expect("runtime");
|
||||||
|
let store = HybridStore::open_temp().expect("store");
|
||||||
|
|
||||||
|
// Pre-populate: 1K keys under "target", 9K under "other"
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..1_000u64 {
|
||||||
|
let key = key_codec::subject_predicate_key("target", &format!("pred_{}", i));
|
||||||
|
store.put(&key, b"matching").await.unwrap();
|
||||||
|
}
|
||||||
|
for i in 0..9_000u64 {
|
||||||
|
let key = key_codec::subject_predicate_key("other", &format!("pred_{}", i));
|
||||||
|
store.put(&key, b"non_matching").await.unwrap();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("prefix_scan_1k_of_10k", |b| {
|
||||||
|
b.iter(|| {
|
||||||
|
rt.block_on(async {
|
||||||
|
let prefix = key_codec::subject_scan_prefix("target");
|
||||||
|
let results = store.scan_prefix(&prefix).await.unwrap();
|
||||||
|
assert_eq!(results.len(), 1_000);
|
||||||
|
})
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
fn atomic_increment(c: &mut Criterion) {
|
||||||
|
let rt = Runtime::new().expect("runtime");
|
||||||
|
let store = HybridStore::open_temp().expect("store");
|
||||||
|
|
||||||
|
c.bench_function("atomic_increment_10k", |b| {
|
||||||
|
b.iter(|| {
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..10_000u64 {
|
||||||
|
let hash_hex = format!("counter_{}", i % 100);
|
||||||
|
let key = key_codec::vote_count_key("Bench", &hash_hex);
|
||||||
|
store.fetch_and_add_u64(&key, 1).await.unwrap();
|
||||||
|
}
|
||||||
|
})
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
fn mixed_workload(c: &mut Criterion) {
|
||||||
|
let rt = Runtime::new().expect("runtime");
|
||||||
|
let store = HybridStore::open_temp().expect("store");
|
||||||
|
|
||||||
|
// Pre-populate read-heavy keys
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..1_000u64 {
|
||||||
|
let key = key_codec::subject_predicate_key("mixed", &format!("pred_{}", i));
|
||||||
|
store.put(&key, b"initial_value").await.unwrap();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("mixed_70r_20w_10s", |b| {
|
||||||
|
b.iter(|| {
|
||||||
|
rt.block_on(async {
|
||||||
|
for i in 0..1_000u64 {
|
||||||
|
match i % 10 {
|
||||||
|
// 70% reads (redb path)
|
||||||
|
0..=6 => {
|
||||||
|
let key = key_codec::subject_predicate_key(
|
||||||
|
"mixed",
|
||||||
|
&format!("pred_{}", i % 1000),
|
||||||
|
);
|
||||||
|
let _ = store.get(&key).await.unwrap();
|
||||||
|
}
|
||||||
|
// 20% writes (fjall path)
|
||||||
|
7 | 8 => {
|
||||||
|
let key = key_codec::assertion_key("mixed", &format!("write_{}", i));
|
||||||
|
store.put(&key, b"new_value").await.unwrap();
|
||||||
|
}
|
||||||
|
// 10% scans (redb path)
|
||||||
|
_ => {
|
||||||
|
let prefix = key_codec::subject_scan_prefix("mixed");
|
||||||
|
let _ = store.scan_prefix(&prefix).await.unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
criterion_group!(
|
||||||
|
benches,
|
||||||
|
sequential_put,
|
||||||
|
random_get,
|
||||||
|
prefix_scan,
|
||||||
|
atomic_increment,
|
||||||
|
mixed_workload
|
||||||
|
);
|
||||||
|
criterion_main!(benches);
|
||||||
@ -8,8 +8,8 @@
|
|||||||
//!
|
//!
|
||||||
//! | Key Pattern | Value | Purpose |
|
//! | Key Pattern | Value | Purpose |
|
||||||
//! |-------------|-------|---------|
|
//! |-------------|-------|---------|
|
||||||
//! | `AUD:{query_id}` | Serialized QueryAudit | Individual audit records |
|
//! | `\x00AUD:{query_id}` | Serialized QueryAudit | Individual audit records |
|
||||||
//! | `AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries |
|
//! | `\x00AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries |
|
||||||
//!
|
//!
|
||||||
//! # Design Philosophy
|
//! # Design Philosophy
|
||||||
//!
|
//!
|
||||||
@ -54,8 +54,8 @@ pub trait AuditStore: Send + Sync {
|
|||||||
///
|
///
|
||||||
/// This operation:
|
/// This operation:
|
||||||
/// 1. Serializes the audit using rkyv
|
/// 1. Serializes the audit using rkyv
|
||||||
/// 2. Stores at `AUD:{query_id}`
|
/// 2. Stores at `\x00AUD:{query_id}`
|
||||||
/// 3. Creates agent index entry at `AUDA:{agent_id}:{timestamp}:{query_id}`
|
/// 3. Creates agent index entry at `\x00AUDA:{agent_id}:{timestamp}:{query_id}`
|
||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// The query_id for reference.
|
/// The query_id for reference.
|
||||||
@ -89,7 +89,7 @@ pub trait AuditStore: Send + Sync {
|
|||||||
|
|
||||||
/// List recent audit records across all agents.
|
/// List recent audit records across all agents.
|
||||||
///
|
///
|
||||||
/// Scans all `AUD:` keys and returns the most recent audits.
|
/// Scans all `\x00AUD:` keys and returns the most recent audits.
|
||||||
///
|
///
|
||||||
/// # Arguments
|
/// # Arguments
|
||||||
/// * `limit` - Maximum number of records to return
|
/// * `limit` - Maximum number of records to return
|
||||||
@ -105,7 +105,8 @@ pub trait AuditStore: Send + Sync {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::{ContributingAssertion, LifecycleStage};
|
use stemedb_core::types::{ContributingAssertion, LifecycleStage};
|
||||||
|
|
||||||
fn create_test_audit(
|
fn create_test_audit(
|
||||||
@ -137,7 +138,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_put_and_get_audit() {
|
async fn test_put_and_get_audit() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
let query_id = [10u8; 32];
|
let query_id = [10u8; 32];
|
||||||
@ -161,7 +162,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_audits_for_agent() {
|
async fn test_get_audits_for_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
let agent1 = [1u8; 32];
|
let agent1 = [1u8; 32];
|
||||||
@ -201,7 +202,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_list_recent_audits() {
|
async fn test_list_recent_audits() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
// Create audits with different timestamps
|
// Create audits with different timestamps
|
||||||
@ -224,7 +225,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_audit_without_agent() {
|
async fn test_audit_without_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
// Audit without agent_id (anonymous query)
|
// Audit without agent_id (anonymous query)
|
||||||
@ -241,7 +242,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_has_audits_for_agent() {
|
async fn test_has_audits_for_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
let agent1 = [1u8; 32];
|
let agent1 = [1u8; 32];
|
||||||
@ -262,7 +263,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_nonexistent_audit() {
|
async fn test_get_nonexistent_audit() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
let nonexistent = [99u8; 32];
|
let nonexistent = [99u8; 32];
|
||||||
@ -273,7 +274,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_agent_audits() {
|
async fn test_empty_agent_audits() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let audit_store = GenericAuditStore::new(store);
|
let audit_store = GenericAuditStore::new(store);
|
||||||
|
|
||||||
let agent = [1u8; 32];
|
let agent = [1u8; 32];
|
||||||
|
|||||||
@ -1,6 +1,7 @@
|
|||||||
//! AuditStore implementation backed by a generic KVStore.
|
//! AuditStore implementation backed by a generic KVStore.
|
||||||
|
|
||||||
use crate::error::Result;
|
use crate::error::Result;
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use stemedb_core::types::{QueryAudit, QueryId};
|
use stemedb_core::types::{QueryAudit, QueryId};
|
||||||
@ -8,12 +9,6 @@ use tracing::{debug, instrument};
|
|||||||
|
|
||||||
use super::AuditStore;
|
use super::AuditStore;
|
||||||
|
|
||||||
/// Key prefix for individual audit records.
|
|
||||||
pub(crate) const AUDIT_PREFIX: &[u8] = b"AUD:";
|
|
||||||
/// Key prefix for agent-based temporal index.
|
|
||||||
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
|
|
||||||
pub(crate) const AGENT_AUDIT_PREFIX: &[u8] = b"AUDA:";
|
|
||||||
|
|
||||||
/// AuditStore implementation backed by a generic KVStore.
|
/// AuditStore implementation backed by a generic KVStore.
|
||||||
///
|
///
|
||||||
/// This implementation maintains an agent index for efficient temporal queries.
|
/// This implementation maintains an agent index for efficient temporal queries.
|
||||||
@ -29,9 +24,8 @@ impl<S: KVStore> GenericAuditStore<S> {
|
|||||||
|
|
||||||
/// Construct the key for an individual audit record.
|
/// Construct the key for an individual audit record.
|
||||||
pub(crate) fn audit_key(query_id: &QueryId) -> Vec<u8> {
|
pub(crate) fn audit_key(query_id: &QueryId) -> Vec<u8> {
|
||||||
let mut key = AUDIT_PREFIX.to_vec();
|
let query_hex = hex::encode(query_id);
|
||||||
key.extend_from_slice(&hex::encode(query_id).into_bytes());
|
key_codec::audit_key(&query_hex)
|
||||||
key
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the agent index key.
|
/// Construct the agent index key.
|
||||||
@ -40,26 +34,16 @@ impl<S: KVStore> GenericAuditStore<S> {
|
|||||||
timestamp: u64,
|
timestamp: u64,
|
||||||
query_id: &QueryId,
|
query_id: &QueryId,
|
||||||
) -> Vec<u8> {
|
) -> Vec<u8> {
|
||||||
// Format: AUDA:{agent_hex}:{timestamp_be}:{query_hex}
|
|
||||||
// Using big-endian timestamp for lexicographic ordering
|
|
||||||
let agent_hex = hex::encode(agent_id);
|
let agent_hex = hex::encode(agent_id);
|
||||||
let timestamp_hex = format!("{:016x}", timestamp); // Zero-padded hex for sorting
|
let timestamp_hex = format!("{:016x}", timestamp);
|
||||||
let query_hex = hex::encode(query_id);
|
let query_hex = hex::encode(query_id);
|
||||||
format!("AUDA:{}:{}:{}", agent_hex, timestamp_hex, query_hex).into_bytes()
|
key_codec::audit_agent_index_key(&agent_hex, ×tamp_hex, &query_hex)
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the prefix for scanning an agent's audits from a timestamp.
|
|
||||||
#[allow(dead_code)] // Reserved for future optimized range queries
|
|
||||||
pub(crate) fn agent_scan_prefix(agent_id: &[u8; 32], from_timestamp: u64) -> Vec<u8> {
|
|
||||||
let agent_hex = hex::encode(agent_id);
|
|
||||||
let timestamp_hex = format!("{:016x}", from_timestamp);
|
|
||||||
format!("AUDA:{}:{}", agent_hex, timestamp_hex).into_bytes()
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the prefix for scanning all audits for an agent.
|
/// Construct the prefix for scanning all audits for an agent.
|
||||||
pub(crate) fn agent_full_prefix(agent_id: &[u8; 32]) -> Vec<u8> {
|
pub(crate) fn agent_full_prefix(agent_id: &[u8; 32]) -> Vec<u8> {
|
||||||
let agent_hex = hex::encode(agent_id);
|
let agent_hex = hex::encode(agent_id);
|
||||||
format!("AUDA:{}:", agent_hex).into_bytes()
|
key_codec::audit_agent_prefix(&agent_hex)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Serialize an audit using the canonical serde helpers.
|
/// Serialize an audit using the canonical serde helpers.
|
||||||
@ -74,9 +58,13 @@ impl<S: KVStore> GenericAuditStore<S> {
|
|||||||
|
|
||||||
/// Extract query_id from an agent index key.
|
/// Extract query_id from an agent index key.
|
||||||
pub(crate) fn extract_query_id_from_key(key: &[u8]) -> Option<QueryId> {
|
pub(crate) fn extract_query_id_from_key(key: &[u8]) -> Option<QueryId> {
|
||||||
// Key format: AUDA:{agent_hex}:{timestamp_hex}:{query_hex}
|
// Key format: \x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}
|
||||||
let key_str = std::str::from_utf8(key).ok()?;
|
let key_str = std::str::from_utf8(key).ok()?;
|
||||||
let parts: Vec<&str> = key_str.split(':').collect();
|
|
||||||
|
// Skip the leading \x00 if present
|
||||||
|
let key_content = key_str.strip_prefix('\x00').unwrap_or(key_str);
|
||||||
|
|
||||||
|
let parts: Vec<&str> = key_content.split(':').collect();
|
||||||
if parts.len() != 4 {
|
if parts.len() != 4 {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
@ -92,8 +80,13 @@ impl<S: KVStore> GenericAuditStore<S> {
|
|||||||
|
|
||||||
/// Extract timestamp from an agent index key.
|
/// Extract timestamp from an agent index key.
|
||||||
pub(crate) fn extract_timestamp_from_key(key: &[u8]) -> Option<u64> {
|
pub(crate) fn extract_timestamp_from_key(key: &[u8]) -> Option<u64> {
|
||||||
|
// Key format: \x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}
|
||||||
let key_str = std::str::from_utf8(key).ok()?;
|
let key_str = std::str::from_utf8(key).ok()?;
|
||||||
let parts: Vec<&str> = key_str.split(':').collect();
|
|
||||||
|
// Skip the leading \x00 if present
|
||||||
|
let key_content = key_str.strip_prefix('\x00').unwrap_or(key_str);
|
||||||
|
|
||||||
|
let parts: Vec<&str> = key_content.split(':').collect();
|
||||||
if parts.len() != 4 {
|
if parts.len() != 4 {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
@ -202,7 +195,8 @@ impl<S: KVStore + 'static> AuditStore for GenericAuditStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(limit))]
|
#[instrument(skip(self), fields(limit))]
|
||||||
async fn list_recent_audits(&self, limit: usize) -> Result<Vec<QueryAudit>> {
|
async fn list_recent_audits(&self, limit: usize) -> Result<Vec<QueryAudit>> {
|
||||||
let entries = self.store.scan_prefix(AUDIT_PREFIX).await?;
|
let prefix = key_codec::audit_scan_prefix();
|
||||||
|
let entries = self.store.scan_prefix(&prefix).await?;
|
||||||
|
|
||||||
let mut audits = Vec::with_capacity(entries.len().min(limit));
|
let mut audits = Vec::with_capacity(entries.len().min(limit));
|
||||||
|
|
||||||
|
|||||||
@ -10,9 +10,9 @@ pub enum StorageError {
|
|||||||
#[error("Storage IO error: {0}")]
|
#[error("Storage IO error: {0}")]
|
||||||
Io(#[from] std::io::Error),
|
Io(#[from] std::io::Error),
|
||||||
|
|
||||||
/// Error specific to the sled backend.
|
/// Error from the underlying storage backend (fjall, redb, etc.).
|
||||||
#[error("Sled error: {0}")]
|
#[error("Backend error: {0}")]
|
||||||
Sled(#[from] sled::Error),
|
Backend(String),
|
||||||
|
|
||||||
/// Serialization/Deserialization error.
|
/// Serialization/Deserialization error.
|
||||||
#[error("Serialization error: {0}")]
|
#[error("Serialization error: {0}")]
|
||||||
|
|||||||
@ -4,15 +4,13 @@
|
|||||||
//! time-range queries. External systems can poll for pending escalations and
|
//! time-range queries. External systems can poll for pending escalations and
|
||||||
//! resolve them after review.
|
//! resolve them after review.
|
||||||
|
|
||||||
|
use crate::key_codec;
|
||||||
use crate::{KVStore, Result, StorageError};
|
use crate::{KVStore, Result, StorageError};
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::EscalationEvent;
|
use stemedb_core::types::EscalationEvent;
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
|
|
||||||
/// Key prefix for escalation events.
|
|
||||||
const ESC_PREFIX: &[u8] = b"ESC:";
|
|
||||||
|
|
||||||
/// Storage trait for escalation events.
|
/// Storage trait for escalation events.
|
||||||
///
|
///
|
||||||
/// Provides operations for writing, reading, and resolving escalations triggered
|
/// Provides operations for writing, reading, and resolving escalations triggered
|
||||||
@ -56,16 +54,13 @@ impl<S: KVStore> GenericEscalationStore<S> {
|
|||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the storage key for an escalation event.
|
|
||||||
///
|
|
||||||
/// Format: `ESC:{timestamp_nanos}:{id_hex}`
|
|
||||||
fn escalation_key(event: &EscalationEvent) -> Vec<u8> {
|
|
||||||
format!("ESC:{}:{}", event.timestamp, hex::encode(event.id)).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Parse a key into (timestamp, id).
|
/// Parse a key into (timestamp, id).
|
||||||
|
///
|
||||||
|
/// Key format: `\x00ESC:{timestamp}:{id_hex}`
|
||||||
fn parse_key(key: &[u8]) -> Option<(u64, [u8; 32])> {
|
fn parse_key(key: &[u8]) -> Option<(u64, [u8; 32])> {
|
||||||
let key_str = std::str::from_utf8(key).ok()?;
|
let key_str = std::str::from_utf8(key).ok()?;
|
||||||
|
// Remove the leading \x00 if present
|
||||||
|
let key_str = key_str.strip_prefix('\x00').unwrap_or(key_str);
|
||||||
let parts: Vec<&str> = key_str.split(':').collect();
|
let parts: Vec<&str> = key_str.split(':').collect();
|
||||||
if parts.len() != 3 || parts[0] != "ESC" {
|
if parts.len() != 3 || parts[0] != "ESC" {
|
||||||
return None;
|
return None;
|
||||||
@ -88,7 +83,7 @@ impl<S: KVStore> GenericEscalationStore<S> {
|
|||||||
impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
||||||
#[instrument(skip(self, event), fields(id = %hex::encode(event.id), subject = %event.subject, predicate = %event.predicate))]
|
#[instrument(skip(self, event), fields(id = %hex::encode(event.id), subject = %event.subject, predicate = %event.predicate))]
|
||||||
async fn write_escalation(&self, event: &EscalationEvent) -> Result<()> {
|
async fn write_escalation(&self, event: &EscalationEvent) -> Result<()> {
|
||||||
let key = Self::escalation_key(event);
|
let key = key_codec::escalation_key(event.timestamp, &hex::encode(event.id));
|
||||||
let serialized = stemedb_core::serde::serialize(event)
|
let serialized = stemedb_core::serde::serialize(event)
|
||||||
.map_err(|e| StorageError::Serialization(e.to_string()))?;
|
.map_err(|e| StorageError::Serialization(e.to_string()))?;
|
||||||
|
|
||||||
@ -109,7 +104,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
|||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn get_escalations_since(&self, since: u64) -> Result<Vec<EscalationEvent>> {
|
async fn get_escalations_since(&self, since: u64) -> Result<Vec<EscalationEvent>> {
|
||||||
// Scan all escalation keys and filter by timestamp
|
// Scan all escalation keys and filter by timestamp
|
||||||
let entries = self.store.scan_prefix(ESC_PREFIX).await?;
|
let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
|
||||||
|
|
||||||
let mut events = Vec::new();
|
let mut events = Vec::new();
|
||||||
for (key, data) in entries {
|
for (key, data) in entries {
|
||||||
@ -138,7 +133,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
|||||||
#[instrument(skip(self), fields(id = %hex::encode(id)))]
|
#[instrument(skip(self), fields(id = %hex::encode(id)))]
|
||||||
async fn resolve_escalation(&self, id: &[u8; 32]) -> Result<bool> {
|
async fn resolve_escalation(&self, id: &[u8; 32]) -> Result<bool> {
|
||||||
// Scan for the event with this ID
|
// Scan for the event with this ID
|
||||||
let entries = self.store.scan_prefix(ESC_PREFIX).await?;
|
let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
|
||||||
|
|
||||||
for (key, data) in entries {
|
for (key, data) in entries {
|
||||||
if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
|
if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
|
||||||
@ -176,7 +171,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn get_pending_escalations(&self) -> Result<Vec<EscalationEvent>> {
|
async fn get_pending_escalations(&self) -> Result<Vec<EscalationEvent>> {
|
||||||
let entries = self.store.scan_prefix(ESC_PREFIX).await?;
|
let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
|
||||||
|
|
||||||
let mut events = Vec::new();
|
let mut events = Vec::new();
|
||||||
for (_key, data) in entries {
|
for (_key, data) in entries {
|
||||||
@ -199,7 +194,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(id = %hex::encode(id)))]
|
#[instrument(skip(self), fields(id = %hex::encode(id)))]
|
||||||
async fn get_escalation(&self, id: &[u8; 32]) -> Result<Option<EscalationEvent>> {
|
async fn get_escalation(&self, id: &[u8; 32]) -> Result<Option<EscalationEvent>> {
|
||||||
let entries = self.store.scan_prefix(ESC_PREFIX).await?;
|
let entries = self.store.scan_prefix(&key_codec::escalation_scan_prefix()).await?;
|
||||||
|
|
||||||
for (key, data) in entries {
|
for (key, data) in entries {
|
||||||
if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
|
if let Some((_timestamp, found_id)) = Self::parse_key(&key) {
|
||||||
@ -218,7 +213,7 @@ impl<S: KVStore + 'static> EscalationStore for GenericEscalationStore<S> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
use stemedb_core::types::{EscalationEvent, EscalationLevel};
|
use stemedb_core::types::{EscalationEvent, EscalationLevel};
|
||||||
|
|
||||||
fn create_event(
|
fn create_event(
|
||||||
@ -243,7 +238,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_write_and_get_escalation() {
|
async fn test_write_and_get_escalation() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let esc_store = GenericEscalationStore::new(store);
|
let esc_store = GenericEscalationStore::new(store);
|
||||||
|
|
||||||
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
||||||
@ -256,7 +251,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_escalations_since() {
|
async fn test_get_escalations_since() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let esc_store = GenericEscalationStore::new(store);
|
let esc_store = GenericEscalationStore::new(store);
|
||||||
|
|
||||||
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
||||||
@ -277,7 +272,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_escalation() {
|
async fn test_resolve_escalation() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let esc_store = GenericEscalationStore::new(store);
|
let esc_store = GenericEscalationStore::new(store);
|
||||||
|
|
||||||
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
||||||
@ -303,7 +298,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_pending_escalations() {
|
async fn test_get_pending_escalations() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let esc_store = GenericEscalationStore::new(store);
|
let esc_store = GenericEscalationStore::new(store);
|
||||||
|
|
||||||
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
let e1 = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
||||||
@ -327,7 +322,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_resolve_nonexistent_escalation() {
|
async fn test_resolve_nonexistent_escalation() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let esc_store = GenericEscalationStore::new(store);
|
let esc_store = GenericEscalationStore::new(store);
|
||||||
|
|
||||||
let nonexistent_id = [42u8; 32];
|
let nonexistent_id = [42u8; 32];
|
||||||
@ -338,10 +333,10 @@ mod tests {
|
|||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_parse_key() {
|
async fn test_parse_key() {
|
||||||
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
let event = create_event("Tesla", "revenue", 0.85, 1000, EscalationLevel::High);
|
||||||
let key = GenericEscalationStore::<SledStore>::escalation_key(&event);
|
let key = key_codec::escalation_key(event.timestamp, &hex::encode(event.id));
|
||||||
|
|
||||||
let (timestamp, id) =
|
let (timestamp, id) =
|
||||||
GenericEscalationStore::<SledStore>::parse_key(&key).expect("parse should succeed");
|
GenericEscalationStore::<HybridStore>::parse_key(&key).expect("parse should succeed");
|
||||||
|
|
||||||
assert_eq!(timestamp, 1000);
|
assert_eq!(timestamp, 1000);
|
||||||
assert_eq!(id, event.id);
|
assert_eq!(id, event.id);
|
||||||
|
|||||||
213
crates/stemedb-storage/src/fjall_backend.rs
Normal file
213
crates/stemedb-storage/src/fjall_backend.rs
Normal file
@ -0,0 +1,213 @@
|
|||||||
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::traits::KVStore;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use dashmap::DashMap;
|
||||||
|
use std::path::Path;
|
||||||
|
use std::sync::Arc;
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
fn fjall_err(e: fjall::Error) -> StorageError {
|
||||||
|
StorageError::Backend(e.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Fjall (LSM-tree) implementation of the KVStore trait.
|
||||||
|
///
|
||||||
|
/// Used for write-heavy key prefixes: assertions (`H:`), votes (`V:`, `VC:`, `VW:`),
|
||||||
|
/// epochs (`E:`), supersession markers (`SUPERSEDED:`), and ingestion cursors (`__CURSOR__:`).
|
||||||
|
pub struct FjallStore {
|
||||||
|
keyspace: fjall::Keyspace,
|
||||||
|
partition: fjall::PartitionHandle,
|
||||||
|
atomic_locks: Arc<DashMap<Vec<u8>, Arc<tokio::sync::Mutex<()>>>>,
|
||||||
|
_temp_dir: Option<tempfile::TempDir>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for FjallStore {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
f.debug_struct("FjallStore").finish()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl FjallStore {
|
||||||
|
/// Open or create a Fjall database at the given path.
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
|
||||||
|
let keyspace = fjall::Config::new(path.as_ref()).open().map_err(fjall_err)?;
|
||||||
|
let partition = keyspace
|
||||||
|
.open_partition("default", fjall::PartitionCreateOptions::default())
|
||||||
|
.map_err(fjall_err)?;
|
||||||
|
Ok(Self { keyspace, partition, atomic_locks: Arc::new(DashMap::new()), _temp_dir: None })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Open a temporary Fjall database for testing.
|
||||||
|
///
|
||||||
|
/// The database will be automatically deleted when the returned store is dropped.
|
||||||
|
pub fn open_temp() -> Result<Self> {
|
||||||
|
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
|
||||||
|
let keyspace = fjall::Config::new(temp_dir.path()).open().map_err(fjall_err)?;
|
||||||
|
let partition = keyspace
|
||||||
|
.open_partition("default", fjall::PartitionCreateOptions::default())
|
||||||
|
.map_err(fjall_err)?;
|
||||||
|
Ok(Self {
|
||||||
|
keyspace,
|
||||||
|
partition,
|
||||||
|
atomic_locks: Arc::new(DashMap::new()),
|
||||||
|
_temp_dir: Some(temp_dir),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl KVStore for FjallStore {
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
|
||||||
|
let result = self.partition.get(key).map_err(fjall_err)?;
|
||||||
|
Ok(result.map(|slice| slice.to_vec()))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
|
||||||
|
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
|
||||||
|
self.partition.insert(key, value).map_err(fjall_err)?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn delete(&self, key: &[u8]) -> Result<()> {
|
||||||
|
self.partition.remove(key).map_err(fjall_err)?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
|
||||||
|
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
|
||||||
|
let mut results = Vec::new();
|
||||||
|
for item in self.partition.prefix(prefix) {
|
||||||
|
let (k, v) = item.map_err(fjall_err)?;
|
||||||
|
results.push((k.to_vec(), v.to_vec()));
|
||||||
|
}
|
||||||
|
Ok(results)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
async fn flush(&self) -> Result<()> {
|
||||||
|
self.keyspace.persist(fjall::PersistMode::SyncAll).map_err(fjall_err)?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), delta))]
|
||||||
|
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
|
||||||
|
let lock = self
|
||||||
|
.atomic_locks
|
||||||
|
.entry(key.to_vec())
|
||||||
|
.or_insert_with(|| Arc::new(tokio::sync::Mutex::new(())))
|
||||||
|
.clone();
|
||||||
|
let _guard = lock.lock().await;
|
||||||
|
|
||||||
|
let current = match self.partition.get(key).map_err(fjall_err)? {
|
||||||
|
Some(bytes) => {
|
||||||
|
let arr: [u8; 8] = bytes.as_ref().try_into().map_err(|_| {
|
||||||
|
StorageError::Serialization(format!(
|
||||||
|
"Corrupted u64 counter: expected 8 bytes, got {}",
|
||||||
|
bytes.len()
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
u64::from_le_bytes(arr)
|
||||||
|
}
|
||||||
|
None => 0,
|
||||||
|
};
|
||||||
|
let new_val = current.saturating_add(delta);
|
||||||
|
self.partition.insert(key, new_val.to_le_bytes()).map_err(fjall_err)?;
|
||||||
|
Ok(new_val)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
|
||||||
|
where
|
||||||
|
F: Fn(f32) -> f32 + Send + Sync,
|
||||||
|
{
|
||||||
|
let lock = self
|
||||||
|
.atomic_locks
|
||||||
|
.entry(key.to_vec())
|
||||||
|
.or_insert_with(|| Arc::new(tokio::sync::Mutex::new(())))
|
||||||
|
.clone();
|
||||||
|
let _guard = lock.lock().await;
|
||||||
|
|
||||||
|
let current = match self.partition.get(key).map_err(fjall_err)? {
|
||||||
|
Some(bytes) => {
|
||||||
|
let arr: [u8; 4] = bytes.as_ref().try_into().map_err(|_| {
|
||||||
|
StorageError::Serialization(format!(
|
||||||
|
"Corrupted f32 value: expected 4 bytes, got {}",
|
||||||
|
bytes.len()
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
f32::from_le_bytes(arr)
|
||||||
|
}
|
||||||
|
None => 0.0,
|
||||||
|
};
|
||||||
|
let new_val = update_fn(current);
|
||||||
|
self.partition.insert(key, new_val.to_le_bytes()).map_err(fjall_err)?;
|
||||||
|
Ok(new_val)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_fjall_store_roundtrip() {
|
||||||
|
let store = FjallStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"test_key";
|
||||||
|
let value = b"test_value";
|
||||||
|
|
||||||
|
store.put(key, value).await.expect("Put failed");
|
||||||
|
let retrieved = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(retrieved, Some(value.to_vec()));
|
||||||
|
|
||||||
|
store.delete(key).await.expect("Delete failed");
|
||||||
|
let deleted = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(deleted, None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_fjall_scan_prefix() {
|
||||||
|
let store = FjallStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
store.put(b"prefix:1", b"val1").await.unwrap();
|
||||||
|
store.put(b"prefix:2", b"val2").await.unwrap();
|
||||||
|
store.put(b"other:3", b"val3").await.unwrap();
|
||||||
|
|
||||||
|
let results = store.scan_prefix(b"prefix:").await.unwrap();
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
|
||||||
|
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_fjall_fetch_and_add() {
|
||||||
|
let store = FjallStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"counter";
|
||||||
|
|
||||||
|
let val = store.fetch_and_add_u64(key, 5).await.unwrap();
|
||||||
|
assert_eq!(val, 5);
|
||||||
|
|
||||||
|
let val = store.fetch_and_add_u64(key, 3).await.unwrap();
|
||||||
|
assert_eq!(val, 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_fjall_compare_and_swap_f32() {
|
||||||
|
let store = FjallStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"weight";
|
||||||
|
|
||||||
|
let val = store.compare_and_swap_f32(key, |current| current + 1.5).await.unwrap();
|
||||||
|
assert!((val - 1.5).abs() < f32::EPSILON);
|
||||||
|
|
||||||
|
let val = store.compare_and_swap_f32(key, |current| current + 2.0).await.unwrap();
|
||||||
|
assert!((val - 3.5).abs() < f32::EPSILON);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_fjall_flush() {
|
||||||
|
let store = FjallStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
store.put(b"key", b"value").await.unwrap();
|
||||||
|
store.flush().await.expect("Flush should succeed");
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -1,17 +1,14 @@
|
|||||||
//! Storage for gold standard assertions.
|
//! Storage for gold standard assertions.
|
||||||
//!
|
//!
|
||||||
//! Gold standards are stored at `GS:{subject}:{predicate}` to enable efficient
|
//! Gold standards are stored at `{subject}\x00GS:{predicate}` with a secondary
|
||||||
//! lookups when verifying agent submissions against known truths.
|
//! index at `\x00GS_LIST:{subject}:{predicate}` for listing all gold standards.
|
||||||
|
|
||||||
use crate::{KVStore, Result, StorageError};
|
use crate::{key_codec, KVStore, Result, StorageError};
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::GoldStandard;
|
use stemedb_core::types::GoldStandard;
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
|
|
||||||
/// Key prefix for gold standard entries.
|
|
||||||
const GS_PREFIX: &[u8] = b"GS:";
|
|
||||||
|
|
||||||
/// Storage trait for gold standard operations.
|
/// Storage trait for gold standard operations.
|
||||||
///
|
///
|
||||||
/// Provides operations for creating, reading, listing, and removing gold standards
|
/// Provides operations for creating, reading, listing, and removing gold standards
|
||||||
@ -71,25 +68,23 @@ impl<S: KVStore> GenericGoldStandardStore<S> {
|
|||||||
pub fn new(store: Arc<S>) -> Self {
|
pub fn new(store: Arc<S>) -> Self {
|
||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the storage key for a gold standard.
|
|
||||||
///
|
|
||||||
/// Format: `GS:{subject}:{predicate}`
|
|
||||||
fn gold_standard_key(subject: &str, predicate: &str) -> Vec<u8> {
|
|
||||||
format!("GS:{}:{}", subject, predicate).into_bytes()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
||||||
#[instrument(skip(self, gs), fields(subject = %gs.subject, predicate = %gs.predicate))]
|
#[instrument(skip(self, gs), fields(subject = %gs.subject, predicate = %gs.predicate))]
|
||||||
async fn set_gold_standard(&self, gs: &GoldStandard) -> Result<()> {
|
async fn set_gold_standard(&self, gs: &GoldStandard) -> Result<()> {
|
||||||
let key = Self::gold_standard_key(&gs.subject, &gs.predicate);
|
let key = key_codec::gold_standard_key(&gs.subject, &gs.predicate);
|
||||||
|
let list_key = key_codec::gs_list_key(&gs.subject, &gs.predicate);
|
||||||
let serialized = stemedb_core::serde::serialize(gs)
|
let serialized = stemedb_core::serde::serialize(gs)
|
||||||
.map_err(|e| StorageError::Serialization(e.to_string()))?;
|
.map_err(|e| StorageError::Serialization(e.to_string()))?;
|
||||||
|
|
||||||
|
// Write primary key
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
|
// Write secondary index for listing (empty value, just presence matters)
|
||||||
|
self.store.put(&list_key, &[]).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
subject = %gs.subject,
|
subject = %gs.subject,
|
||||||
predicate = %gs.predicate,
|
predicate = %gs.predicate,
|
||||||
@ -106,7 +101,7 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
|||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
) -> Result<Option<GoldStandard>> {
|
) -> Result<Option<GoldStandard>> {
|
||||||
let key = Self::gold_standard_key(subject, predicate);
|
let key = key_codec::gold_standard_key(subject, predicate);
|
||||||
|
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
@ -135,14 +130,31 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn list_gold_standards(&self) -> Result<Vec<GoldStandard>> {
|
async fn list_gold_standards(&self) -> Result<Vec<GoldStandard>> {
|
||||||
let entries = self.store.scan_prefix(GS_PREFIX).await?;
|
// Scan the GS_LIST secondary index
|
||||||
|
let list_entries = self.store.scan_prefix(&key_codec::gs_list_scan_prefix()).await?;
|
||||||
|
|
||||||
let mut gold_standards = Vec::new();
|
let mut gold_standards = Vec::new();
|
||||||
for (_key, data) in entries {
|
for (list_key, _) in list_entries {
|
||||||
match stemedb_core::serde::deserialize::<GoldStandard>(&data) {
|
// Extract subject and predicate from GS_LIST key: \x00GS_LIST:{subject}:{predicate}
|
||||||
Ok(gs) => gold_standards.push(gs),
|
let tag = key_codec::extract_tag(&list_key);
|
||||||
Err(e) => {
|
if let Some(suffix) = tag.strip_prefix(b"GS_LIST:") {
|
||||||
debug!(error = %e, "Skipping malformed gold standard");
|
if let Ok(suffix_str) = std::str::from_utf8(suffix) {
|
||||||
|
// Split by first colon to get subject and predicate
|
||||||
|
if let Some(colon_pos) = suffix_str.find(':') {
|
||||||
|
let subject = &suffix_str[..colon_pos];
|
||||||
|
let predicate = &suffix_str[colon_pos + 1..];
|
||||||
|
|
||||||
|
// Fetch the actual gold standard from the primary key
|
||||||
|
let key = key_codec::gold_standard_key(subject, predicate);
|
||||||
|
if let Some(data) = self.store.get(&key).await? {
|
||||||
|
match stemedb_core::serde::deserialize::<GoldStandard>(&data) {
|
||||||
|
Ok(gs) => gold_standards.push(gs),
|
||||||
|
Err(e) => {
|
||||||
|
debug!(error = %e, subject = %subject, predicate = %predicate, "Skipping malformed gold standard");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -158,13 +170,16 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
||||||
async fn remove_gold_standard(&self, subject: &str, predicate: &str) -> Result<bool> {
|
async fn remove_gold_standard(&self, subject: &str, predicate: &str) -> Result<bool> {
|
||||||
let key = Self::gold_standard_key(subject, predicate);
|
let key = key_codec::gold_standard_key(subject, predicate);
|
||||||
|
let list_key = key_codec::gs_list_key(subject, predicate);
|
||||||
|
|
||||||
// Check if it exists first
|
// Check if it exists first
|
||||||
let exists = self.store.get(&key).await?.is_some();
|
let exists = self.store.get(&key).await?.is_some();
|
||||||
|
|
||||||
if exists {
|
if exists {
|
||||||
|
// Delete both primary key and secondary index
|
||||||
self.store.delete(&key).await?;
|
self.store.delete(&key).await?;
|
||||||
|
self.store.delete(&list_key).await?;
|
||||||
debug!(
|
debug!(
|
||||||
subject = %subject,
|
subject = %subject,
|
||||||
predicate = %predicate,
|
predicate = %predicate,
|
||||||
@ -185,7 +200,7 @@ impl<S: KVStore + 'static> GoldStandardStore for GenericGoldStandardStore<S> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
use stemedb_core::types::GoldStandard;
|
use stemedb_core::types::GoldStandard;
|
||||||
|
|
||||||
fn create_gold_standard(subject: &str, predicate: &str, expected_object: &str) -> GoldStandard {
|
fn create_gold_standard(subject: &str, predicate: &str, expected_object: &str) -> GoldStandard {
|
||||||
@ -201,7 +216,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_set_and_get_gold_standard() {
|
async fn test_set_and_get_gold_standard() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
||||||
@ -218,7 +233,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_nonexistent_gold_standard() {
|
async fn test_get_nonexistent_gold_standard() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let result = gs_store.get_gold_standard("NonExistent", "predicate").await.expect("get");
|
let result = gs_store.get_gold_standard("NonExistent", "predicate").await.expect("get");
|
||||||
@ -228,7 +243,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_list_gold_standards() {
|
async fn test_list_gold_standards() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let gs1 = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
let gs1 = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
||||||
@ -253,7 +268,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_remove_gold_standard() {
|
async fn test_remove_gold_standard() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
let gs = create_gold_standard("Earth", "has_shape", "oblate_spheroid");
|
||||||
@ -274,7 +289,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_remove_nonexistent_gold_standard() {
|
async fn test_remove_nonexistent_gold_standard() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let removed =
|
let removed =
|
||||||
@ -284,7 +299,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_overwrite_gold_standard() {
|
async fn test_overwrite_gold_standard() {
|
||||||
let store = Arc::new(SledStore::open_temp().expect("store"));
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let gs_store = GenericGoldStandardStore::new(store);
|
let gs_store = GenericGoldStandardStore::new(store);
|
||||||
|
|
||||||
let gs1 = create_gold_standard("Earth", "has_shape", "sphere");
|
let gs1 = create_gold_standard("Earth", "has_shape", "sphere");
|
||||||
|
|||||||
352
crates/stemedb-storage/src/hybrid_backend.rs
Normal file
352
crates/stemedb-storage/src/hybrid_backend.rs
Normal file
@ -0,0 +1,352 @@
|
|||||||
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::fjall_backend::FjallStore;
|
||||||
|
use crate::key_codec;
|
||||||
|
use crate::redb_backend::RedbStore;
|
||||||
|
use crate::traits::KVStore;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use std::path::Path;
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
/// Which backend handles a given key.
|
||||||
|
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||||
|
enum Backend {
|
||||||
|
/// Fjall (LSM) — optimized for write-heavy workloads.
|
||||||
|
Fjall,
|
||||||
|
/// Redb (B-tree) — optimized for read-heavy workloads.
|
||||||
|
Redb,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Hybrid storage backend that routes keys to fjall (write-heavy) or redb (read-heavy).
|
||||||
|
///
|
||||||
|
/// Keys follow the `key_codec` format:
|
||||||
|
/// - Subject-prefixed: `{subject}\x00{TAG}:{suffix}`
|
||||||
|
/// - Global: `\x00{TAG}:{suffix}`
|
||||||
|
///
|
||||||
|
/// Routing extracts the TAG and dispatches:
|
||||||
|
/// - **Fjall**: `H:` (assertions), `V:` (votes), `VC:` (vote counts), `VW:` (vote weights),
|
||||||
|
/// `E:` (epochs), `SUPERSEDED:`, `META:` (cursors, counters)
|
||||||
|
/// - **Redb**: `S:` (subject index), `SP:` (compound index), `MV:` (materialized views),
|
||||||
|
/// `TRUST:` (trust ranks), `AUD:` (audits), `QUOTA:` (quotas), `TP:` (trust packs),
|
||||||
|
/// `GS:` (gold standards), `ESC:` (escalations), and everything else
|
||||||
|
pub struct HybridStore {
|
||||||
|
fjall: FjallStore,
|
||||||
|
redb: RedbStore,
|
||||||
|
_temp_dir: Option<tempfile::TempDir>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for HybridStore {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
f.debug_struct("HybridStore").finish()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Route a key to the appropriate backend based on its tag.
|
||||||
|
///
|
||||||
|
/// Uses `key_codec::extract_tag` to parse the tag portion from keys in
|
||||||
|
/// `{subject}\x00{TAG}:{suffix}` or `\x00{TAG}:{suffix}` format.
|
||||||
|
fn route(key: &[u8]) -> Backend {
|
||||||
|
let tag = key_codec::extract_tag(key);
|
||||||
|
if tag.starts_with(b"H:")
|
||||||
|
|| tag.starts_with(b"V:")
|
||||||
|
|| tag.starts_with(b"VC:")
|
||||||
|
|| tag.starts_with(b"VW:")
|
||||||
|
|| tag.starts_with(b"E:")
|
||||||
|
|| tag.starts_with(b"SUPERSEDED:")
|
||||||
|
|| tag.starts_with(b"META:")
|
||||||
|
{
|
||||||
|
Backend::Fjall
|
||||||
|
} else {
|
||||||
|
Backend::Redb
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if a prefix is ambiguous — it could match keys in both backends.
|
||||||
|
///
|
||||||
|
/// This happens when scanning by subject only (`{subject}\x00`) since a subject
|
||||||
|
/// can have keys in both fjall (assertions, votes) and redb (indexes, views).
|
||||||
|
fn is_cross_backend_prefix(prefix: &[u8]) -> bool {
|
||||||
|
// A subject-only prefix ends with \x00 and has no tag after it
|
||||||
|
if prefix.is_empty() {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
let tag = key_codec::extract_tag(prefix);
|
||||||
|
// If the extracted tag is empty, the prefix doesn't specify which backend
|
||||||
|
tag.is_empty()
|
||||||
|
}
|
||||||
|
|
||||||
|
impl HybridStore {
|
||||||
|
/// Open or create a HybridStore at the given path.
|
||||||
|
///
|
||||||
|
/// Creates `fjall/` and `redb/` subdirectories under the given path.
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
|
||||||
|
let base = path.as_ref();
|
||||||
|
let fjall_path = base.join("fjall");
|
||||||
|
let redb_path = base.join("redb");
|
||||||
|
|
||||||
|
std::fs::create_dir_all(&fjall_path).map_err(StorageError::Io)?;
|
||||||
|
std::fs::create_dir_all(&redb_path).map_err(StorageError::Io)?;
|
||||||
|
|
||||||
|
let fjall = FjallStore::open(&fjall_path)?;
|
||||||
|
let redb = RedbStore::open(redb_path.join("data.redb"))?;
|
||||||
|
|
||||||
|
Ok(Self { fjall, redb, _temp_dir: None })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Open a temporary HybridStore for testing.
|
||||||
|
///
|
||||||
|
/// Both backends share one temp directory with `fjall/` and `redb/` subdirectories.
|
||||||
|
pub fn open_temp() -> Result<Self> {
|
||||||
|
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
|
||||||
|
let redb_dir = temp_dir.path().join("redb");
|
||||||
|
std::fs::create_dir_all(&redb_dir).map_err(StorageError::Io)?;
|
||||||
|
let fjall = FjallStore::open(temp_dir.path().join("fjall"))?;
|
||||||
|
let redb = RedbStore::open(redb_dir.join("data.redb"))?;
|
||||||
|
|
||||||
|
Ok(Self { fjall, redb, _temp_dir: Some(temp_dir) })
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl KVStore for HybridStore {
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
|
||||||
|
match route(key) {
|
||||||
|
Backend::Fjall => self.fjall.get(key).await,
|
||||||
|
Backend::Redb => self.redb.get(key).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
|
||||||
|
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
|
||||||
|
match route(key) {
|
||||||
|
Backend::Fjall => self.fjall.put(key, value).await,
|
||||||
|
Backend::Redb => self.redb.put(key, value).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn delete(&self, key: &[u8]) -> Result<()> {
|
||||||
|
match route(key) {
|
||||||
|
Backend::Fjall => self.fjall.delete(key).await,
|
||||||
|
Backend::Redb => self.redb.delete(key).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
|
||||||
|
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
|
||||||
|
if is_cross_backend_prefix(prefix) {
|
||||||
|
// Subject-only prefix — scan both backends and merge
|
||||||
|
let mut results = self.fjall.scan_prefix(prefix).await?;
|
||||||
|
results.extend(self.redb.scan_prefix(prefix).await?);
|
||||||
|
results.sort_by(|a, b| a.0.cmp(&b.0));
|
||||||
|
return Ok(results);
|
||||||
|
}
|
||||||
|
match route(prefix) {
|
||||||
|
Backend::Fjall => self.fjall.scan_prefix(prefix).await,
|
||||||
|
Backend::Redb => self.redb.scan_prefix(prefix).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
async fn flush(&self) -> Result<()> {
|
||||||
|
// Flush fjall first (write-heavy, most critical for durability),
|
||||||
|
// then redb (always durable after commit, so this is a no-op).
|
||||||
|
self.fjall.flush().await?;
|
||||||
|
self.redb.flush().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), delta))]
|
||||||
|
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
|
||||||
|
match route(key) {
|
||||||
|
Backend::Fjall => self.fjall.fetch_and_add_u64(key, delta).await,
|
||||||
|
Backend::Redb => self.redb.fetch_and_add_u64(key, delta).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
|
||||||
|
where
|
||||||
|
F: Fn(f32) -> f32 + Send + Sync,
|
||||||
|
{
|
||||||
|
match route(key) {
|
||||||
|
Backend::Fjall => self.fjall.compare_and_swap_f32(key, update_fn).await,
|
||||||
|
Backend::Redb => self.redb.compare_and_swap_f32(key, update_fn).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::key_codec;
|
||||||
|
|
||||||
|
// ── Basic KVStore contract tests ──
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_hybrid_store_roundtrip() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"test_key";
|
||||||
|
let value = b"test_value";
|
||||||
|
|
||||||
|
store.put(key, value).await.expect("Put failed");
|
||||||
|
let retrieved = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(retrieved, Some(value.to_vec()));
|
||||||
|
|
||||||
|
store.delete(key).await.expect("Delete failed");
|
||||||
|
let deleted = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(deleted, None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_hybrid_scan_prefix() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let k1 = key_codec::subject_index_key("subject1");
|
||||||
|
let k2 = key_codec::subject_predicate_key("subject1", "pred");
|
||||||
|
let k3 = key_codec::subject_index_key("subject2");
|
||||||
|
|
||||||
|
store.put(&k1, b"val1").await.unwrap();
|
||||||
|
store.put(&k2, b"val2").await.unwrap();
|
||||||
|
store.put(&k3, b"val3").await.unwrap();
|
||||||
|
|
||||||
|
let prefix = key_codec::subject_scan_prefix("subject1");
|
||||||
|
let results = store.scan_prefix(&prefix).await.unwrap();
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_hybrid_fetch_and_add() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
|
||||||
|
// Vote count (fjall path — subject-prefixed)
|
||||||
|
let vc_key = key_codec::vote_count_key("Tesla", "abc123");
|
||||||
|
let val = store.fetch_and_add_u64(&vc_key, 5).await.unwrap();
|
||||||
|
assert_eq!(val, 5);
|
||||||
|
let val = store.fetch_and_add_u64(&vc_key, 3).await.unwrap();
|
||||||
|
assert_eq!(val, 8);
|
||||||
|
|
||||||
|
// Quota counter (redb path — global)
|
||||||
|
let qt_key = key_codec::quota_key("agent1", 1000);
|
||||||
|
let val = store.fetch_and_add_u64(&qt_key, 10).await.unwrap();
|
||||||
|
assert_eq!(val, 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_hybrid_compare_and_swap_f32() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
|
||||||
|
// Vote weight (fjall path — subject-prefixed)
|
||||||
|
let vw_key = key_codec::vote_weight_key("Tesla", "abc123");
|
||||||
|
let val = store.compare_and_swap_f32(&vw_key, |c| c + 1.5).await.unwrap();
|
||||||
|
assert!((val - 1.5).abs() < f32::EPSILON);
|
||||||
|
|
||||||
|
// Trust rank (redb path — global)
|
||||||
|
let tr_key = key_codec::trust_rank_key("agent1");
|
||||||
|
let val = store.compare_and_swap_f32(&tr_key, |c| c + 0.8).await.unwrap();
|
||||||
|
assert!((val - 0.8).abs() < f32::EPSILON);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_hybrid_flush() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let h_key = key_codec::assertion_key("Tesla", "hash1");
|
||||||
|
let s_key = key_codec::subject_index_key("Tesla");
|
||||||
|
store.put(&h_key, b"assertion_data").await.unwrap();
|
||||||
|
store.put(&s_key, b"index_data").await.unwrap();
|
||||||
|
store.flush().await.expect("Flush should succeed");
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Routing tests with key_codec keys ──
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_routing_fjall_subject_prefixed() {
|
||||||
|
// Subject-prefixed write-heavy keys → Fjall
|
||||||
|
assert_eq!(route(&key_codec::assertion_key("Tesla", "abc")), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::vote_key("Tesla", "abc", "def")), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::vote_count_key("Tesla", "abc")), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::vote_weight_key("Tesla", "abc")), Backend::Fjall);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_routing_fjall_global() {
|
||||||
|
// Global write-heavy keys → Fjall
|
||||||
|
assert_eq!(route(&key_codec::epoch_key("deadbeef")), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::superseded_key("deadbeef")), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::cursor_key()), Backend::Fjall);
|
||||||
|
assert_eq!(route(&key_codec::assertion_count_key()), Backend::Fjall);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_routing_redb_subject_prefixed() {
|
||||||
|
// Subject-prefixed read-heavy keys → Redb
|
||||||
|
assert_eq!(route(&key_codec::subject_index_key("Tesla")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::subject_predicate_key("Tesla", "rev")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::mv_key("Tesla", "revenue")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::gold_standard_key("Earth", "shape")), Backend::Redb);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_routing_redb_global() {
|
||||||
|
// Global read-heavy keys → Redb
|
||||||
|
assert_eq!(route(&key_codec::trust_rank_key("agent1")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::quota_key("agent1", 1000)), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::audit_key("query1")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::escalation_key(1000, "hash1")), Backend::Redb);
|
||||||
|
assert_eq!(route(&key_codec::trust_pack_key(&[1u8; 32])), Backend::Redb);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_routing_default_to_redb() {
|
||||||
|
assert_eq!(route(b"unknown:key"), Backend::Redb);
|
||||||
|
assert_eq!(route(b""), Backend::Redb);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_cross_backend_isolation() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
|
||||||
|
// Write to fjall (assertion — subject-prefixed)
|
||||||
|
let h_key = key_codec::assertion_key("Tesla", "hash1");
|
||||||
|
store.put(&h_key, b"assertion").await.unwrap();
|
||||||
|
// Write to redb (index — subject-prefixed)
|
||||||
|
let s_key = key_codec::subject_index_key("Tesla");
|
||||||
|
store.put(&s_key, b"index").await.unwrap();
|
||||||
|
|
||||||
|
// Both should be retrievable
|
||||||
|
assert_eq!(store.get(&h_key).await.unwrap(), Some(b"assertion".to_vec()));
|
||||||
|
assert_eq!(store.get(&s_key).await.unwrap(), Some(b"index".to_vec()));
|
||||||
|
|
||||||
|
// Delete from one backend shouldn't affect the other
|
||||||
|
store.delete(&h_key).await.unwrap();
|
||||||
|
assert_eq!(store.get(&h_key).await.unwrap(), None);
|
||||||
|
assert_eq!(store.get(&s_key).await.unwrap(), Some(b"index".to_vec()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_prefix_scan_within_backend() {
|
||||||
|
let store = HybridStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
|
||||||
|
// Write assertion hashes (fjall — subject-prefixed)
|
||||||
|
let h1 = key_codec::assertion_key("Earth", "aaa");
|
||||||
|
let h2 = key_codec::assertion_key("Earth", "bbb");
|
||||||
|
store.put(&h1, b"val1").await.unwrap();
|
||||||
|
store.put(&h2, b"val2").await.unwrap();
|
||||||
|
|
||||||
|
// Write index entries (redb — global)
|
||||||
|
let tr1 = key_codec::trust_rank_key("agent_a");
|
||||||
|
let tr2 = key_codec::trust_rank_key("agent_b");
|
||||||
|
store.put(&tr1, b"rank1").await.unwrap();
|
||||||
|
store.put(&tr2, b"rank2").await.unwrap();
|
||||||
|
|
||||||
|
// Scan fjall (subject prefix)
|
||||||
|
let earth_prefix = key_codec::subject_scan_prefix("Earth");
|
||||||
|
let h_results = store.scan_prefix(&earth_prefix).await.unwrap();
|
||||||
|
assert_eq!(h_results.len(), 2);
|
||||||
|
|
||||||
|
// Scan redb (global prefix)
|
||||||
|
let trust_prefix = key_codec::trust_rank_scan_prefix();
|
||||||
|
let tr_results = store.scan_prefix(&trust_prefix).await.unwrap();
|
||||||
|
assert_eq!(tr_results.len(), 2);
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -21,17 +21,12 @@
|
|||||||
//! All operations are append-only and content-addressed.
|
//! All operations are append-only and content-addressed.
|
||||||
|
|
||||||
use crate::error::Result;
|
use crate::error::Result;
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use stemedb_core::types::Hash;
|
use stemedb_core::types::Hash;
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
|
|
||||||
/// Key prefix for subject-only index.
|
|
||||||
const SUBJECT_PREFIX: &[u8] = b"S:";
|
|
||||||
|
|
||||||
/// Key prefix for compound subject+predicate index.
|
|
||||||
const SUBJECT_PREDICATE_PREFIX: &[u8] = b"SP:";
|
|
||||||
|
|
||||||
/// Specialized storage trait for assertion index operations.
|
/// Specialized storage trait for assertion index operations.
|
||||||
///
|
///
|
||||||
/// This trait provides index-specific operations on top of a generic KVStore,
|
/// This trait provides index-specific operations on top of a generic KVStore,
|
||||||
@ -108,22 +103,6 @@ impl<S: KVStore> GenericIndexStore<S> {
|
|||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the key for the subject index.
|
|
||||||
fn subject_key(subject: &str) -> Vec<u8> {
|
|
||||||
let mut key = SUBJECT_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(subject.as_bytes());
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the key for the compound subject+predicate index.
|
|
||||||
fn subject_predicate_key(subject: &str, predicate: &str) -> Vec<u8> {
|
|
||||||
let mut key = SUBJECT_PREDICATE_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(subject.as_bytes());
|
|
||||||
key.push(b':');
|
|
||||||
key.extend_from_slice(predicate.as_bytes());
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Serialize a hash list using the canonical serde helpers.
|
/// Serialize a hash list using the canonical serde helpers.
|
||||||
fn serialize_hash_list(hashes: &Vec<Hash>) -> Result<Vec<u8>> {
|
fn serialize_hash_list(hashes: &Vec<Hash>) -> Result<Vec<u8>> {
|
||||||
crate::serde_helpers::serialize(hashes)
|
crate::serde_helpers::serialize(hashes)
|
||||||
@ -164,14 +143,18 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
|
|||||||
predicate: &str,
|
predicate: &str,
|
||||||
assertion_hash: &Hash,
|
assertion_hash: &Hash,
|
||||||
) -> Result<()> {
|
) -> Result<()> {
|
||||||
// Update subject index: S:{subject}
|
// Update subject index
|
||||||
let subject_key = Self::subject_key(subject);
|
let subject_key = key_codec::subject_index_key(subject);
|
||||||
self.append_to_index(subject_key, assertion_hash).await?;
|
self.append_to_index(subject_key, assertion_hash).await?;
|
||||||
|
|
||||||
// Update compound index: SP:{subject}:{predicate}
|
// Update compound index
|
||||||
let sp_key = Self::subject_predicate_key(subject, predicate);
|
let sp_key = key_codec::subject_predicate_key(subject, predicate);
|
||||||
self.append_to_index(sp_key, assertion_hash).await?;
|
self.append_to_index(sp_key, assertion_hash).await?;
|
||||||
|
|
||||||
|
// Update subjects discovery index
|
||||||
|
let subjects_index_key = key_codec::subjects_index_key(subject);
|
||||||
|
self.store.put(&subjects_index_key, &[]).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
subject,
|
subject,
|
||||||
predicate,
|
predicate,
|
||||||
@ -184,7 +167,7 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(subject = %subject))]
|
#[instrument(skip(self), fields(subject = %subject))]
|
||||||
async fn get_by_subject(&self, subject: &str) -> Result<Vec<Hash>> {
|
async fn get_by_subject(&self, subject: &str) -> Result<Vec<Hash>> {
|
||||||
let key = Self::subject_key(subject);
|
let key = key_codec::subject_index_key(subject);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let hashes = Self::deserialize_hash_list(&data)?;
|
let hashes = Self::deserialize_hash_list(&data)?;
|
||||||
@ -200,7 +183,7 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
||||||
async fn get_by_subject_predicate(&self, subject: &str, predicate: &str) -> Result<Vec<Hash>> {
|
async fn get_by_subject_predicate(&self, subject: &str, predicate: &str) -> Result<Vec<Hash>> {
|
||||||
let key = Self::subject_predicate_key(subject, predicate);
|
let key = key_codec::subject_predicate_key(subject, predicate);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let hashes = Self::deserialize_hash_list(&data)?;
|
let hashes = Self::deserialize_hash_list(&data)?;
|
||||||
@ -216,13 +199,13 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(subject = %subject))]
|
#[instrument(skip(self), fields(subject = %subject))]
|
||||||
async fn has_subject(&self, subject: &str) -> Result<bool> {
|
async fn has_subject(&self, subject: &str) -> Result<bool> {
|
||||||
let key = Self::subject_key(subject);
|
let key = key_codec::subject_index_key(subject);
|
||||||
Ok(self.store.get(&key).await?.is_some())
|
Ok(self.store.get(&key).await?.is_some())
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
#[instrument(skip(self), fields(subject = %subject, predicate = %predicate))]
|
||||||
async fn has_subject_predicate(&self, subject: &str, predicate: &str) -> Result<bool> {
|
async fn has_subject_predicate(&self, subject: &str, predicate: &str) -> Result<bool> {
|
||||||
let key = Self::subject_predicate_key(subject, predicate);
|
let key = key_codec::subject_predicate_key(subject, predicate);
|
||||||
Ok(self.store.get(&key).await?.is_some())
|
Ok(self.store.get(&key).await?.is_some())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -230,11 +213,12 @@ impl<S: KVStore + 'static> IndexStore for GenericIndexStore<S> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_add_and_get_by_subject() {
|
async fn test_add_and_get_by_subject() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let subject = "Tesla";
|
let subject = "Tesla";
|
||||||
@ -255,7 +239,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_add_and_get_by_subject_predicate() {
|
async fn test_add_and_get_by_subject_predicate() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let subject = "Tesla";
|
let subject = "Tesla";
|
||||||
@ -279,7 +263,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_idempotent_insert() {
|
async fn test_idempotent_insert() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let subject = "Tesla";
|
let subject = "Tesla";
|
||||||
@ -299,7 +283,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_index_returns_empty_vec() {
|
async fn test_empty_index_returns_empty_vec() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let hashes = index_store.get_by_subject("Nonexistent").await.expect("get");
|
let hashes = index_store.get_by_subject("Nonexistent").await.expect("get");
|
||||||
@ -312,7 +296,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_has_subject() {
|
async fn test_has_subject() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let subject = "Tesla";
|
let subject = "Tesla";
|
||||||
@ -331,7 +315,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_has_subject_predicate() {
|
async fn test_has_subject_predicate() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let subject = "Tesla";
|
let subject = "Tesla";
|
||||||
@ -353,7 +337,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multiple_subjects_isolated() {
|
async fn test_multiple_subjects_isolated() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let index_store = GenericIndexStore::new(store);
|
let index_store = GenericIndexStore::new(store);
|
||||||
|
|
||||||
let hash1 = [1u8; 32];
|
let hash1 = [1u8; 32];
|
||||||
@ -378,8 +362,8 @@ mod tests {
|
|||||||
let hashes = vec![[1u8; 32], [2u8; 32], [3u8; 32]];
|
let hashes = vec![[1u8; 32], [2u8; 32], [3u8; 32]];
|
||||||
|
|
||||||
let serialized =
|
let serialized =
|
||||||
GenericIndexStore::<SledStore>::serialize_hash_list(&hashes).expect("serialize");
|
GenericIndexStore::<HybridStore>::serialize_hash_list(&hashes).expect("serialize");
|
||||||
let deserialized = GenericIndexStore::<SledStore>::deserialize_hash_list(&serialized)
|
let deserialized = GenericIndexStore::<HybridStore>::deserialize_hash_list(&serialized)
|
||||||
.expect("deserialize");
|
.expect("deserialize");
|
||||||
|
|
||||||
assert_eq!(hashes, deserialized);
|
assert_eq!(hashes, deserialized);
|
||||||
|
|||||||
340
crates/stemedb-storage/src/key_codec/mod.rs
Normal file
340
crates/stemedb-storage/src/key_codec/mod.rs
Normal file
@ -0,0 +1,340 @@
|
|||||||
|
//! Central key encoding/decoding for subject-prefix range sharding.
|
||||||
|
//!
|
||||||
|
//! ALL storage keys flow through this module. Keys are partitioned into two families:
|
||||||
|
//!
|
||||||
|
//! **Subject-prefixed keys** — co-located by subject for range sharding:
|
||||||
|
//! ```text
|
||||||
|
//! {subject}\x00{TAG}:{suffix}
|
||||||
|
//! ```
|
||||||
|
//!
|
||||||
|
//! **Global keys** — metadata, trust, quotas, epochs (sort first under `\x00`):
|
||||||
|
//! ```text
|
||||||
|
//! \x00{TAG}:{suffix}
|
||||||
|
//! ```
|
||||||
|
//!
|
||||||
|
//! A prefix scan on `{subject}\x00` returns ALL data for that subject.
|
||||||
|
//! A prefix scan on `\x00` returns ALL global metadata.
|
||||||
|
|
||||||
|
use crate::error::{Result, StorageError};
|
||||||
|
|
||||||
|
/// Separator byte between subject and tag. Also serves as global key prefix.
|
||||||
|
pub const SEPARATOR: u8 = 0x00;
|
||||||
|
|
||||||
|
// ── Subject validation ──────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Validate that a subject string does not contain the separator byte.
|
||||||
|
///
|
||||||
|
/// Subjects containing `\x00` would corrupt key boundaries. This MUST be
|
||||||
|
/// called on all inbound subjects at the ingestion boundary.
|
||||||
|
pub fn validate_subject(subject: &str) -> Result<()> {
|
||||||
|
if subject.as_bytes().contains(&SEPARATOR) {
|
||||||
|
return Err(StorageError::InputValidation(
|
||||||
|
"Subject must not contain null byte (\\x00)".to_string(),
|
||||||
|
));
|
||||||
|
}
|
||||||
|
if subject.is_empty() {
|
||||||
|
return Err(StorageError::InputValidation("Subject must not be empty".to_string()));
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Key builders ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Build a subject-prefixed key: `{subject}\x00{tag}{suffix}`.
|
||||||
|
fn subject_key(subject: &str, tag: &[u8], suffix: &[u8]) -> Vec<u8> {
|
||||||
|
let mut key = Vec::with_capacity(subject.len() + 1 + tag.len() + suffix.len());
|
||||||
|
key.extend_from_slice(subject.as_bytes());
|
||||||
|
key.push(SEPARATOR);
|
||||||
|
key.extend_from_slice(tag);
|
||||||
|
key.extend_from_slice(suffix);
|
||||||
|
key
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Build a global key: `\x00{tag}{suffix}`.
|
||||||
|
fn global_key(tag: &[u8], suffix: &[u8]) -> Vec<u8> {
|
||||||
|
let mut key = Vec::with_capacity(1 + tag.len() + suffix.len());
|
||||||
|
key.push(SEPARATOR);
|
||||||
|
key.extend_from_slice(tag);
|
||||||
|
key.extend_from_slice(suffix);
|
||||||
|
key
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Subject-prefixed keys ───────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Assertion key: `{subject}\x00H:{hash_hex}`
|
||||||
|
pub fn assertion_key(subject: &str, hash_hex: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"H:", hash_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Subject index key: `{subject}\x00S:`
|
||||||
|
pub fn subject_index_key(subject: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"S:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Subject+predicate index key: `{subject}\x00SP:{predicate}`
|
||||||
|
pub fn subject_predicate_key(subject: &str, predicate: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"SP:", predicate.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Materialized view key: `{subject}\x00MV:{predicate}`
|
||||||
|
pub fn mv_key(subject: &str, predicate: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"MV:", predicate.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Vote key: `{subject}\x00V:{assert_hex}:{vote_hex}`
|
||||||
|
pub fn vote_key(subject: &str, assertion_hex: &str, vote_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}", assertion_hex, vote_hex);
|
||||||
|
subject_key(subject, b"V:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Vote scan prefix: `{subject}\x00V:{assert_hex}:`
|
||||||
|
pub fn vote_scan_prefix(subject: &str, assertion_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:", assertion_hex);
|
||||||
|
subject_key(subject, b"V:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Vote count cache key: `{subject}\x00VC:{assert_hex}`
|
||||||
|
pub fn vote_count_key(subject: &str, assertion_hex: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"VC:", assertion_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Vote weight cache key: `{subject}\x00VW:{assert_hex}`
|
||||||
|
pub fn vote_weight_key(subject: &str, assertion_hex: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"VW:", assertion_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Gold standard key: `{subject}\x00GS:{predicate}`
|
||||||
|
pub fn gold_standard_key(subject: &str, predicate: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"GS:", predicate.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Subject+predicate scan prefix: `{subject}\x00SP:` — returns all SP keys for a subject.
|
||||||
|
pub fn subject_predicate_scan_prefix(subject: &str) -> Vec<u8> {
|
||||||
|
subject_key(subject, b"SP:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Subject scan prefix: `{subject}\x00` — returns ALL data for a subject.
|
||||||
|
pub fn subject_scan_prefix(subject: &str) -> Vec<u8> {
|
||||||
|
let mut key = Vec::with_capacity(subject.len() + 1);
|
||||||
|
key.extend_from_slice(subject.as_bytes());
|
||||||
|
key.push(SEPARATOR);
|
||||||
|
key
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Global keys ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Trust rank key: `\x00TRUST:{agent_id_hex}`
|
||||||
|
pub fn trust_rank_key(agent_id_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"TRUST:", agent_id_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Quota record key: `\x00QUOTA:{agent_hex}:{window}`
|
||||||
|
pub fn quota_key(agent_hex: &str, window: u64) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}", agent_hex, window);
|
||||||
|
global_key(b"QUOTA:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Quota limit key: `\x00QLIMIT:{agent_id_hex}`
|
||||||
|
pub fn quota_limit_key(agent_id_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"QLIMIT:", agent_id_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Epoch key: `\x00E:{epoch_id_hex}`
|
||||||
|
pub fn epoch_key(epoch_id_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"E:", epoch_id_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Superseded marker key: `\x00SUPERSEDED:{epoch_id_hex}`
|
||||||
|
pub fn superseded_key(epoch_id_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"SUPERSEDED:", epoch_id_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Supersession record key: `\x00SUP:{target_hash_hex}`
|
||||||
|
pub fn supersession_key(target_hash_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"SUP:", target_hash_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Supersession agent index key: `\x00SUP:IDX:{agent_hex}:{ts_be_bytes}`
|
||||||
|
pub fn supersession_index_key(agent_hex: &str, timestamp_be_bytes: &[u8]) -> Vec<u8> {
|
||||||
|
let mut suffix = Vec::with_capacity(agent_hex.len() + 1 + timestamp_be_bytes.len());
|
||||||
|
suffix.extend_from_slice(agent_hex.as_bytes());
|
||||||
|
suffix.push(b':');
|
||||||
|
suffix.extend_from_slice(timestamp_be_bytes);
|
||||||
|
global_key(b"SUP:IDX:", &suffix)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Supersession agent scan prefix: `\x00SUP:IDX:{agent_hex}:`
|
||||||
|
pub fn supersession_index_prefix(agent_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:", agent_hex);
|
||||||
|
global_key(b"SUP:IDX:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Audit record key: `\x00AUD:{query_id_hex}`
|
||||||
|
pub fn audit_key(query_id_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"AUD:", query_id_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Audit agent index key: `\x00AUDA:{agent_hex}:{timestamp_hex}:{query_hex}`
|
||||||
|
pub fn audit_agent_index_key(agent_hex: &str, timestamp_hex: &str, query_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}:{}", agent_hex, timestamp_hex, query_hex);
|
||||||
|
global_key(b"AUDA:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Audit agent scan prefix: `\x00AUDA:{agent_hex}:`
|
||||||
|
pub fn audit_agent_prefix(agent_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:", agent_hex);
|
||||||
|
global_key(b"AUDA:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Audit listing prefix: `\x00AUD:`
|
||||||
|
pub fn audit_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"AUD:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Escalation key: `\x00ESC:{timestamp}:{id_hex}`
|
||||||
|
pub fn escalation_key(timestamp: u64, id_hex: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}", timestamp, id_hex);
|
||||||
|
global_key(b"ESC:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Escalation scan prefix: `\x00ESC:`
|
||||||
|
pub fn escalation_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"ESC:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Trust pack key: `\x00TP:{pack_id_bytes}`
|
||||||
|
pub fn trust_pack_key(pack_id: &[u8]) -> Vec<u8> {
|
||||||
|
global_key(b"TP:", pack_id)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Trust pack scan prefix: `\x00TP:`
|
||||||
|
pub fn trust_pack_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"TP:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Gold standard verified key: `\x00GS_VERIFIED:{agent_hex}:{subject}:{predicate}`
|
||||||
|
pub fn gs_verified_key(agent_hex: &str, subject: &str, predicate: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}:{}", agent_hex, subject, predicate);
|
||||||
|
global_key(b"GS_VERIFIED:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Cursor key: `\x00META:cursor:ingest`
|
||||||
|
pub fn cursor_key() -> Vec<u8> {
|
||||||
|
global_key(b"META:cursor:ingest", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Assertion count key: `\x00META:assertion_count`
|
||||||
|
pub fn assertion_count_key() -> Vec<u8> {
|
||||||
|
global_key(b"META:assertion_count", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Trust rank scan prefix for decay: `\x00TRUST:`
|
||||||
|
pub fn trust_rank_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"TRUST:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Secondary indexes ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Known subjects index key: `\x00SUBJECTS:{subject}`
|
||||||
|
pub fn subjects_index_key(subject: &str) -> Vec<u8> {
|
||||||
|
global_key(b"SUBJECTS:", subject.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Known subjects scan prefix: `\x00SUBJECTS:`
|
||||||
|
pub fn subjects_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"SUBJECTS:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Gold standard listing index: `\x00GS_LIST:{subject}:{predicate}`
|
||||||
|
pub fn gs_list_key(subject: &str, predicate: &str) -> Vec<u8> {
|
||||||
|
let suffix = format!("{}:{}", subject, predicate);
|
||||||
|
global_key(b"GS_LIST:", suffix.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Gold standard listing scan prefix: `\x00GS_LIST:`
|
||||||
|
pub fn gs_list_scan_prefix() -> Vec<u8> {
|
||||||
|
global_key(b"GS_LIST:", b"")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Hash-to-subject reverse index: `\x00HASH_SUBJECT:{hash_hex}`
|
||||||
|
pub fn hash_subject_key(hash_hex: &str) -> Vec<u8> {
|
||||||
|
global_key(b"HASH_SUBJECT:", hash_hex.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Key extraction / parsing ────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Extract subject from a `\x00SUBJECTS:{subject}` key.
|
||||||
|
///
|
||||||
|
/// Returns the subject string, or `None` if the key doesn't match the expected format.
|
||||||
|
pub fn extract_subject_from_subjects_key(key: &[u8]) -> Option<String> {
|
||||||
|
let prefix = b"\x00SUBJECTS:";
|
||||||
|
if key.starts_with(prefix) {
|
||||||
|
std::str::from_utf8(&key[prefix.len()..]).ok().map(|s| s.to_string())
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Extract subject and predicate from a `{subject}\x00SP:{predicate}` key.
|
||||||
|
///
|
||||||
|
/// Returns `(subject, predicate)` or `None` if the key doesn't match.
|
||||||
|
pub fn extract_sp_key(key: &[u8]) -> Option<(String, String)> {
|
||||||
|
// Find the \x00 separator
|
||||||
|
let sep_pos = memchr::memchr(SEPARATOR, key)?;
|
||||||
|
if sep_pos == 0 {
|
||||||
|
return None; // Global key, not subject-prefixed
|
||||||
|
}
|
||||||
|
|
||||||
|
let subject = std::str::from_utf8(&key[..sep_pos]).ok()?;
|
||||||
|
let after_sep = &key[sep_pos + 1..];
|
||||||
|
|
||||||
|
// Check for SP: tag
|
||||||
|
if !after_sep.starts_with(b"SP:") {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
|
||||||
|
let predicate = std::str::from_utf8(&after_sep[3..]).ok()?;
|
||||||
|
if subject.is_empty() || predicate.is_empty() {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
|
||||||
|
Some((subject.to_string(), predicate.to_string()))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Extract the tag portion from a key (the part after the separator).
|
||||||
|
///
|
||||||
|
/// For subject-prefixed keys: returns bytes after `{subject}\x00`
|
||||||
|
/// For global keys: returns bytes after `\x00`
|
||||||
|
pub fn extract_tag(key: &[u8]) -> &[u8] {
|
||||||
|
if key.first() == Some(&SEPARATOR) {
|
||||||
|
// Global key: \x00TAG:rest
|
||||||
|
&key[1..]
|
||||||
|
} else if let Some(pos) = memchr::memchr(SEPARATOR, key) {
|
||||||
|
// Subject-prefixed: subject\x00TAG:rest
|
||||||
|
&key[pos + 1..]
|
||||||
|
} else {
|
||||||
|
key
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if a key is a global key (starts with `\x00`).
|
||||||
|
pub fn is_global_key(key: &[u8]) -> bool {
|
||||||
|
key.first() == Some(&SEPARATOR)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Extract the subject from a subject-prefixed key.
|
||||||
|
///
|
||||||
|
/// Returns `None` for global keys or keys without a separator.
|
||||||
|
pub fn extract_subject(key: &[u8]) -> Option<&str> {
|
||||||
|
if is_global_key(key) {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
if let Some(pos) = memchr::memchr(SEPARATOR, key) {
|
||||||
|
std::str::from_utf8(&key[..pos]).ok()
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests;
|
||||||
231
crates/stemedb-storage/src/key_codec/tests.rs
Normal file
231
crates/stemedb-storage/src/key_codec/tests.rs
Normal file
@ -0,0 +1,231 @@
|
|||||||
|
//! Tests for key encoding/decoding.
|
||||||
|
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_validate_subject_rejects_null() {
|
||||||
|
let result = validate_subject("has\x00null");
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_validate_subject_rejects_empty() {
|
||||||
|
let result = validate_subject("");
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_validate_subject_accepts_normal() {
|
||||||
|
validate_subject("Tesla").expect("normal subject should be valid");
|
||||||
|
validate_subject("AAPL").expect("ticker should be valid");
|
||||||
|
validate_subject("Earth::Moon").expect("colons should be valid");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_assertion_key() {
|
||||||
|
let key = assertion_key("Tesla", "abc123");
|
||||||
|
assert_eq!(key, b"Tesla\x00H:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_subject_index_key() {
|
||||||
|
let key = subject_index_key("Tesla");
|
||||||
|
assert_eq!(key, b"Tesla\x00S:");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_subject_predicate_key() {
|
||||||
|
let key = subject_predicate_key("Tesla", "revenue");
|
||||||
|
assert_eq!(key, b"Tesla\x00SP:revenue");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_mv_key() {
|
||||||
|
let key = mv_key("Tesla", "revenue");
|
||||||
|
assert_eq!(key, b"Tesla\x00MV:revenue");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_vote_key() {
|
||||||
|
let key = vote_key("Tesla", "aaa", "bbb");
|
||||||
|
assert_eq!(key, b"Tesla\x00V:aaa:bbb");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_vote_scan_prefix() {
|
||||||
|
let key = vote_scan_prefix("Tesla", "aaa");
|
||||||
|
assert_eq!(key, b"Tesla\x00V:aaa:");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_vote_count_key() {
|
||||||
|
let key = vote_count_key("Tesla", "aaa");
|
||||||
|
assert_eq!(key, b"Tesla\x00VC:aaa");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_vote_weight_key() {
|
||||||
|
let key = vote_weight_key("Tesla", "aaa");
|
||||||
|
assert_eq!(key, b"Tesla\x00VW:aaa");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_gold_standard_key() {
|
||||||
|
let key = gold_standard_key("Earth", "has_shape");
|
||||||
|
assert_eq!(key, b"Earth\x00GS:has_shape");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_trust_rank_key() {
|
||||||
|
let key = trust_rank_key("abc123");
|
||||||
|
assert_eq!(key, b"\x00TRUST:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_quota_key() {
|
||||||
|
let key = quota_key("abc", 1705314000);
|
||||||
|
assert_eq!(key, b"\x00QUOTA:abc:1705314000");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_quota_limit_key() {
|
||||||
|
let key = quota_limit_key("abc");
|
||||||
|
assert_eq!(key, b"\x00QLIMIT:abc");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_epoch_key() {
|
||||||
|
let key = epoch_key("deadbeef");
|
||||||
|
assert_eq!(key, b"\x00E:deadbeef");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_superseded_key() {
|
||||||
|
let key = superseded_key("deadbeef");
|
||||||
|
assert_eq!(key, b"\x00SUPERSEDED:deadbeef");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_supersession_key() {
|
||||||
|
let key = supersession_key("deadbeef");
|
||||||
|
assert_eq!(key, b"\x00SUP:deadbeef");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_audit_key() {
|
||||||
|
let key = audit_key("abc123");
|
||||||
|
assert_eq!(key, b"\x00AUD:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_escalation_key() {
|
||||||
|
let key = escalation_key(1000, "abc123");
|
||||||
|
assert_eq!(key, b"\x00ESC:1000:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_trust_pack_key() {
|
||||||
|
let key = trust_pack_key(&[1u8; 32]);
|
||||||
|
assert_eq!(&key[..4], b"\x00TP:");
|
||||||
|
assert_eq!(&key[4..], &[1u8; 32]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_cursor_key() {
|
||||||
|
let key = cursor_key();
|
||||||
|
assert_eq!(key, b"\x00META:cursor:ingest");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_assertion_count_key() {
|
||||||
|
let key = assertion_count_key();
|
||||||
|
assert_eq!(key, b"\x00META:assertion_count");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_subjects_index_key() {
|
||||||
|
let key = subjects_index_key("Tesla");
|
||||||
|
assert_eq!(key, b"\x00SUBJECTS:Tesla");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_gs_list_key() {
|
||||||
|
let key = gs_list_key("Earth", "has_shape");
|
||||||
|
assert_eq!(key, b"\x00GS_LIST:Earth:has_shape");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_hash_subject_key() {
|
||||||
|
let key = hash_subject_key("abc123");
|
||||||
|
assert_eq!(key, b"\x00HASH_SUBJECT:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_gs_verified_key() {
|
||||||
|
let key = gs_verified_key("abc", "Earth", "has_shape");
|
||||||
|
assert_eq!(key, b"\x00GS_VERIFIED:abc:Earth:has_shape");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_subject_scan_prefix() {
|
||||||
|
let prefix = subject_scan_prefix("Tesla");
|
||||||
|
assert_eq!(prefix, b"Tesla\x00");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_extract_tag_global() {
|
||||||
|
let key = b"\x00TRUST:abc123";
|
||||||
|
assert_eq!(extract_tag(key), b"TRUST:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_extract_tag_subject() {
|
||||||
|
let key = b"Tesla\x00H:abc123";
|
||||||
|
assert_eq!(extract_tag(key), b"H:abc123");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_is_global_key() {
|
||||||
|
assert!(is_global_key(b"\x00TRUST:abc"));
|
||||||
|
assert!(!is_global_key(b"Tesla\x00H:abc"));
|
||||||
|
assert!(!is_global_key(b"plain_key"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_extract_subject() {
|
||||||
|
assert_eq!(extract_subject(b"Tesla\x00H:abc"), Some("Tesla"));
|
||||||
|
assert_eq!(extract_subject(b"Earth\x00GS:pred"), Some("Earth"));
|
||||||
|
assert_eq!(extract_subject(b"\x00TRUST:abc"), None);
|
||||||
|
assert_eq!(extract_subject(b"no_separator"), None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_subject_colocation() {
|
||||||
|
// All Tesla keys should share the same prefix for range sharding
|
||||||
|
let h = assertion_key("Tesla", "abc");
|
||||||
|
let s = subject_index_key("Tesla");
|
||||||
|
let sp = subject_predicate_key("Tesla", "revenue");
|
||||||
|
let mv = mv_key("Tesla", "revenue");
|
||||||
|
let v = vote_key("Tesla", "abc", "def");
|
||||||
|
let vc = vote_count_key("Tesla", "abc");
|
||||||
|
let vw = vote_weight_key("Tesla", "abc");
|
||||||
|
let gs = gold_standard_key("Tesla", "stock_price");
|
||||||
|
|
||||||
|
let prefix = b"Tesla\x00";
|
||||||
|
assert!(h.starts_with(prefix));
|
||||||
|
assert!(s.starts_with(prefix));
|
||||||
|
assert!(sp.starts_with(prefix));
|
||||||
|
assert!(mv.starts_with(prefix));
|
||||||
|
assert!(v.starts_with(prefix));
|
||||||
|
assert!(vc.starts_with(prefix));
|
||||||
|
assert!(vw.starts_with(prefix));
|
||||||
|
assert!(gs.starts_with(prefix));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_global_keys_sort_first() {
|
||||||
|
// Global keys (\x00...) sort before subject keys (a-z...)
|
||||||
|
let global = trust_rank_key("abc");
|
||||||
|
let subject = assertion_key("Apple", "abc");
|
||||||
|
assert!(global < subject, "Global keys should sort before subject keys");
|
||||||
|
}
|
||||||
@ -1,7 +1,7 @@
|
|||||||
//! Storage engine abstractions and implementations for Episteme.
|
//! Storage engine abstractions and implementations for Episteme.
|
||||||
//!
|
//!
|
||||||
//! This crate provides the `KVStore` trait for pluggable storage backends
|
//! This crate provides the `KVStore` trait for pluggable storage backends
|
||||||
//! and a concrete implementation using `sled`.
|
//! and a concrete `HybridStore` that routes keys to fjall (write-heavy) or redb (read-heavy).
|
||||||
//!
|
//!
|
||||||
//! # The Ballot Box
|
//! # The Ballot Box
|
||||||
//!
|
//!
|
||||||
@ -10,13 +10,13 @@
|
|||||||
//! votes from assertions to enable thousands of agents to vote simultaneously.
|
//! votes from assertions to enable thousands of agents to vote simultaneously.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericVoteStore, VoteStore};
|
//! use stemedb_storage::{HybridStore, GenericVoteStore, VoteStore};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let vote_store = GenericVoteStore::new(kv_store);
|
//! let vote_store = GenericVoteStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // High-velocity vote ingestion
|
//! // High-velocity vote ingestion
|
||||||
//! let vote_hash = vote_store.put_vote(&vote).await?;
|
//! let vote_hash = vote_store.put_vote(&vote, "subject").await?;
|
||||||
//!
|
//!
|
||||||
//! // O(1) aggregation via caches
|
//! // O(1) aggregation via caches
|
||||||
//! let count = vote_store.get_vote_count(&assertion_hash).await?;
|
//! let count = vote_store.get_vote_count(&assertion_hash).await?;
|
||||||
@ -30,9 +30,9 @@
|
|||||||
//! weighted in the Authority lens.
|
//! weighted in the Authority lens.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericTrustRankStore, TrustRankStore};
|
//! use stemedb_storage::{HybridStore, GenericTrustRankStore, TrustRankStore};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let trust_store = GenericTrustRankStore::new(kv_store);
|
//! let trust_store = GenericTrustRankStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // Get agent's current reputation
|
//! // Get agent's current reputation
|
||||||
@ -51,9 +51,9 @@
|
|||||||
//! Every query is logged with provenance to enable "Why did you think that?" debugging.
|
//! Every query is logged with provenance to enable "Why did you think that?" debugging.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericAuditStore, AuditStore};
|
//! use stemedb_storage::{HybridStore, GenericAuditStore, AuditStore};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let audit_store = GenericAuditStore::new(kv_store);
|
//! let audit_store = GenericAuditStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // Log a query audit
|
//! // Log a query audit
|
||||||
@ -72,9 +72,9 @@
|
|||||||
//! Users subscribe to domain expert packs to see reality through trusted lenses.
|
//! Users subscribe to domain expert packs to see reality through trusted lenses.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericTrustPackStore, TrustPackStore};
|
//! use stemedb_storage::{HybridStore, GenericTrustPackStore, TrustPackStore};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let pack_store = GenericTrustPackStore::new(kv_store);
|
//! let pack_store = GenericTrustPackStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // Create and store a pack
|
//! // Create and store a pack
|
||||||
@ -94,9 +94,9 @@
|
|||||||
//! runaway agents from exhausting system resources.
|
//! runaway agents from exhausting system resources.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericQuotaStore, QuotaStore, OperationType};
|
//! use stemedb_storage::{HybridStore, GenericQuotaStore, QuotaStore, OperationType};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let quota_store = GenericQuotaStore::new(kv_store);
|
//! let quota_store = GenericQuotaStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // Check and record cost for an operation
|
//! // Check and record cost for an operation
|
||||||
@ -123,9 +123,9 @@
|
|||||||
//! earn TrustRank and unlock premium features.
|
//! earn TrustRank and unlock premium features.
|
||||||
//!
|
//!
|
||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_storage::{SledStore, GenericGoldStandardStore, GoldStandardStore};
|
//! use stemedb_storage::{HybridStore, GenericGoldStandardStore, GoldStandardStore};
|
||||||
//!
|
//!
|
||||||
//! let kv_store = SledStore::open("./data")?;
|
//! let kv_store = HybridStore::open("./data")?;
|
||||||
//! let gs_store = GenericGoldStandardStore::new(kv_store);
|
//! let gs_store = GenericGoldStandardStore::new(kv_store);
|
||||||
//!
|
//!
|
||||||
//! // Create and store a gold standard
|
//! // Create and store a gold standard
|
||||||
@ -141,22 +141,29 @@
|
|||||||
//! }
|
//! }
|
||||||
//! ```
|
//! ```
|
||||||
|
|
||||||
|
/// Central key encoding/decoding for subject-prefix range sharding.
|
||||||
|
pub mod key_codec;
|
||||||
|
|
||||||
/// Query audit trail storage for incident investigation.
|
/// Query audit trail storage for incident investigation.
|
||||||
pub mod audit_store;
|
pub mod audit_store;
|
||||||
/// Error types and Result wrapper for storage operations.
|
/// Error types and Result wrapper for storage operations.
|
||||||
pub mod error;
|
pub mod error;
|
||||||
/// Escalation event storage for high-conflict assertions.
|
/// Escalation event storage for high-conflict assertions.
|
||||||
pub mod escalation_store;
|
pub mod escalation_store;
|
||||||
|
/// Fjall (LSM-tree) backend for write-heavy key prefixes.
|
||||||
|
pub mod fjall_backend;
|
||||||
/// Gold standard assertions for agent verification.
|
/// Gold standard assertions for agent verification.
|
||||||
pub mod gold_standard_store;
|
pub mod gold_standard_store;
|
||||||
|
/// Hybrid storage backend: routes keys to fjall (write-heavy) or redb (read-heavy).
|
||||||
|
pub mod hybrid_backend;
|
||||||
/// Specialized storage for assertion indexes.
|
/// Specialized storage for assertion indexes.
|
||||||
pub mod index_store;
|
pub mod index_store;
|
||||||
/// Economic throttling via Token Bucket quotas (The Meter).
|
/// Economic throttling via Token Bucket quotas (The Meter).
|
||||||
pub mod quota_store;
|
pub mod quota_store;
|
||||||
|
/// Redb (B-tree) backend for read-heavy key prefixes.
|
||||||
|
pub mod redb_backend;
|
||||||
/// Storage-layer serialization helpers.
|
/// Storage-layer serialization helpers.
|
||||||
pub(crate) mod serde_helpers;
|
pub(crate) mod serde_helpers;
|
||||||
/// Sled implementation of the storage backend.
|
|
||||||
pub mod sled_backend;
|
|
||||||
/// Assertion supersession storage (Error Correction).
|
/// Assertion supersession storage (Error Correction).
|
||||||
pub mod supersession_store;
|
pub mod supersession_store;
|
||||||
/// Core traits for key-value storage.
|
/// Core traits for key-value storage.
|
||||||
@ -176,12 +183,12 @@ pub use audit_store::{AuditStore, GenericAuditStore};
|
|||||||
pub use error::{Result, StorageError};
|
pub use error::{Result, StorageError};
|
||||||
pub use escalation_store::{EscalationStore, GenericEscalationStore};
|
pub use escalation_store::{EscalationStore, GenericEscalationStore};
|
||||||
pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore};
|
pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore};
|
||||||
|
pub use hybrid_backend::HybridStore;
|
||||||
pub use index_store::{GenericIndexStore, IndexStore};
|
pub use index_store::{GenericIndexStore, IndexStore};
|
||||||
pub use quota_store::{
|
pub use quota_store::{
|
||||||
CostConfig, GenericQuotaStore, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore,
|
CostConfig, GenericQuotaStore, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore,
|
||||||
DEFAULT_QUOTA_LIMIT,
|
DEFAULT_QUOTA_LIMIT,
|
||||||
};
|
};
|
||||||
pub use sled_backend::SledStore;
|
|
||||||
pub use supersession_store::{GenericSupersessionStore, SupersessionStore};
|
pub use supersession_store::{GenericSupersessionStore, SupersessionStore};
|
||||||
pub use traits::KVStore;
|
pub use traits::KVStore;
|
||||||
pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore};
|
pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore};
|
||||||
|
|||||||
@ -37,9 +37,6 @@ use async_trait::async_trait;
|
|||||||
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
|
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
|
||||||
const QUOTA_PREFIX: &[u8] = b"QT:";
|
const QUOTA_PREFIX: &[u8] = b"QT:";
|
||||||
|
|
||||||
/// Key prefix for per-agent quota limit overrides.
|
|
||||||
const QUOTA_LIMIT_PREFIX: &[u8] = b"QL:";
|
|
||||||
|
|
||||||
/// Default quota limit per agent per hour (10,000 tokens).
|
/// Default quota limit per agent per hour (10,000 tokens).
|
||||||
pub const DEFAULT_QUOTA_LIMIT: u64 = 10_000;
|
pub const DEFAULT_QUOTA_LIMIT: u64 = 10_000;
|
||||||
|
|
||||||
@ -108,7 +105,8 @@ pub trait QuotaStore: Send + Sync {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
fn test_agent() -> [u8; 32] {
|
fn test_agent() -> [u8; 32] {
|
||||||
[1u8; 32]
|
[1u8; 32]
|
||||||
@ -157,7 +155,7 @@ mod tests {
|
|||||||
fn test_hour_window() {
|
fn test_hour_window() {
|
||||||
// 2024-01-15 09:30:00 UTC = 1705315800
|
// 2024-01-15 09:30:00 UTC = 1705315800
|
||||||
let timestamp = 1705315800;
|
let timestamp = 1705315800;
|
||||||
let window = GenericQuotaStore::<SledStore>::hour_window(timestamp);
|
let window = GenericQuotaStore::<HybridStore>::hour_window(timestamp);
|
||||||
|
|
||||||
// 1705315800 / 3600 = 473698.833... -> 473698 * 3600 = 1705312800
|
// 1705315800 / 3600 = 473698.833... -> 473698 * 3600 = 1705312800
|
||||||
assert_eq!(window, 1705312800);
|
assert_eq!(window, 1705312800);
|
||||||
@ -167,7 +165,7 @@ mod tests {
|
|||||||
#[test]
|
#[test]
|
||||||
fn test_reset_timestamp() {
|
fn test_reset_timestamp() {
|
||||||
let timestamp = 1705315800; // :30 past the hour
|
let timestamp = 1705315800; // :30 past the hour
|
||||||
let reset = GenericQuotaStore::<SledStore>::reset_timestamp(timestamp);
|
let reset = GenericQuotaStore::<HybridStore>::reset_timestamp(timestamp);
|
||||||
|
|
||||||
// Window is 1705312800, next hour is +3600 = 1705316400
|
// Window is 1705312800, next hour is +3600 = 1705316400
|
||||||
assert_eq!(reset, 1705316400);
|
assert_eq!(reset, 1705316400);
|
||||||
@ -203,7 +201,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_check_and_record_basic() {
|
async fn test_check_and_record_basic() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::new(store);
|
let quota_store = GenericQuotaStore::new(store);
|
||||||
|
|
||||||
let agent_id = test_agent();
|
let agent_id = test_agent();
|
||||||
@ -223,7 +221,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_quota_enforcement() {
|
async fn test_quota_enforcement() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::with_config(
|
let quota_store = GenericQuotaStore::with_config(
|
||||||
store,
|
store,
|
||||||
CostConfig::default(),
|
CostConfig::default(),
|
||||||
@ -260,7 +258,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_quota_resets_each_hour() {
|
async fn test_quota_resets_each_hour() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::with_config(store, CostConfig::default(), 50);
|
let quota_store = GenericQuotaStore::with_config(store, CostConfig::default(), 50);
|
||||||
|
|
||||||
let agent_id = test_agent();
|
let agent_id = test_agent();
|
||||||
@ -294,7 +292,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_custom_quota_limit() {
|
async fn test_custom_quota_limit() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::new(store);
|
let quota_store = GenericQuotaStore::new(store);
|
||||||
|
|
||||||
let agent_id = test_agent();
|
let agent_id = test_agent();
|
||||||
@ -312,7 +310,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_quota_status() {
|
async fn test_get_quota_status() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::new(store);
|
let quota_store = GenericQuotaStore::new(store);
|
||||||
|
|
||||||
let agent_id = test_agent();
|
let agent_id = test_agent();
|
||||||
@ -337,7 +335,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_different_operation_types() {
|
async fn test_different_operation_types() {
|
||||||
let store = SledStore::open_temp().expect("store");
|
let store = Arc::new(HybridStore::open_temp().expect("store"));
|
||||||
let quota_store = GenericQuotaStore::new(store);
|
let quota_store = GenericQuotaStore::new(store);
|
||||||
|
|
||||||
let agent_id = test_agent();
|
let agent_id = test_agent();
|
||||||
@ -376,9 +374,9 @@ mod tests {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let serialized =
|
let serialized =
|
||||||
GenericQuotaStore::<SledStore>::serialize_record(&record).expect("serialize");
|
GenericQuotaStore::<HybridStore>::serialize_record(&record).expect("serialize");
|
||||||
let deserialized =
|
let deserialized =
|
||||||
GenericQuotaStore::<SledStore>::deserialize_record(&serialized).expect("deserialize");
|
GenericQuotaStore::<HybridStore>::deserialize_record(&serialized).expect("deserialize");
|
||||||
|
|
||||||
assert_eq!(record, deserialized);
|
assert_eq!(record, deserialized);
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,9 +2,9 @@
|
|||||||
|
|
||||||
use super::{
|
use super::{
|
||||||
CostConfig, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore, DEFAULT_QUOTA_LIMIT,
|
CostConfig, OperationType, QuotaCheckResult, QuotaRecord, QuotaStore, DEFAULT_QUOTA_LIMIT,
|
||||||
QUOTA_LIMIT_PREFIX,
|
|
||||||
};
|
};
|
||||||
use crate::error::{Result, StorageError};
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use tracing::{debug, instrument, warn};
|
use tracing::{debug, instrument, warn};
|
||||||
@ -40,19 +40,6 @@ impl<S: KVStore> GenericQuotaStore<S> {
|
|||||||
Self::hour_window(timestamp) + 3600
|
Self::hour_window(timestamp) + 3600
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the key for a quota record.
|
|
||||||
fn quota_key(agent_id: &[u8; 32], window_start: u64) -> Vec<u8> {
|
|
||||||
let agent_hex = hex::encode(agent_id);
|
|
||||||
format!("QT:{}:{}", agent_hex, window_start).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the key for a quota limit override.
|
|
||||||
fn limit_key(agent_id: &[u8; 32]) -> Vec<u8> {
|
|
||||||
let mut key = QUOTA_LIMIT_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(agent_id);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Serialize a QuotaRecord using the canonical serde helpers.
|
/// Serialize a QuotaRecord using the canonical serde helpers.
|
||||||
pub(crate) fn serialize_record(record: &QuotaRecord) -> Result<Vec<u8>> {
|
pub(crate) fn serialize_record(record: &QuotaRecord) -> Result<Vec<u8>> {
|
||||||
crate::serde_helpers::serialize(record)
|
crate::serde_helpers::serialize(record)
|
||||||
@ -91,7 +78,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
|
|||||||
let limit = self.get_quota_limit(agent_id).await?;
|
let limit = self.get_quota_limit(agent_id).await?;
|
||||||
|
|
||||||
// Get or create quota record for this window
|
// Get or create quota record for this window
|
||||||
let key = Self::quota_key(agent_id, window_start);
|
let key = key_codec::quota_key(&hex::encode(agent_id), window_start);
|
||||||
let mut record = match self.store.get(&key).await? {
|
let mut record = match self.store.get(&key).await? {
|
||||||
Some(data) => Self::deserialize_record(&data)?,
|
Some(data) => Self::deserialize_record(&data)?,
|
||||||
None => QuotaRecord::new(*agent_id, window_start),
|
None => QuotaRecord::new(*agent_id, window_start),
|
||||||
@ -138,7 +125,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
|
|||||||
let reset_at = Self::reset_timestamp(timestamp);
|
let reset_at = Self::reset_timestamp(timestamp);
|
||||||
let limit = self.get_quota_limit(agent_id).await?;
|
let limit = self.get_quota_limit(agent_id).await?;
|
||||||
|
|
||||||
let key = Self::quota_key(agent_id, window_start);
|
let key = key_codec::quota_key(&hex::encode(agent_id), window_start);
|
||||||
let record = match self.store.get(&key).await? {
|
let record = match self.store.get(&key).await? {
|
||||||
Some(data) => Self::deserialize_record(&data)?,
|
Some(data) => Self::deserialize_record(&data)?,
|
||||||
None => QuotaRecord::new(*agent_id, window_start),
|
None => QuotaRecord::new(*agent_id, window_start),
|
||||||
@ -155,7 +142,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id), limit))]
|
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id), limit))]
|
||||||
async fn set_quota_limit(&self, agent_id: &[u8; 32], limit: u64) -> Result<()> {
|
async fn set_quota_limit(&self, agent_id: &[u8; 32], limit: u64) -> Result<()> {
|
||||||
let key = Self::limit_key(agent_id);
|
let key = key_codec::quota_limit_key(&hex::encode(agent_id));
|
||||||
self.store.put(&key, &limit.to_le_bytes()).await?;
|
self.store.put(&key, &limit.to_le_bytes()).await?;
|
||||||
debug!("Set custom quota limit");
|
debug!("Set custom quota limit");
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -163,7 +150,7 @@ impl<S: KVStore + 'static> QuotaStore for GenericQuotaStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
|
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
|
||||||
async fn get_quota_limit(&self, agent_id: &[u8; 32]) -> Result<u64> {
|
async fn get_quota_limit(&self, agent_id: &[u8; 32]) -> Result<u64> {
|
||||||
let key = Self::limit_key(agent_id);
|
let key = key_codec::quota_limit_key(&hex::encode(agent_id));
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(bytes) if bytes.len() == 8 => {
|
Some(bytes) if bytes.len() == 8 => {
|
||||||
let arr: [u8; 8] = bytes
|
let arr: [u8; 8] = bytes
|
||||||
|
|||||||
280
crates/stemedb-storage/src/redb_backend.rs
Normal file
280
crates/stemedb-storage/src/redb_backend.rs
Normal file
@ -0,0 +1,280 @@
|
|||||||
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::traits::KVStore;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use redb::ReadableTable;
|
||||||
|
use std::path::Path;
|
||||||
|
use std::sync::Arc;
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
const DATA_TABLE: redb::TableDefinition<&[u8], &[u8]> = redb::TableDefinition::new("data");
|
||||||
|
|
||||||
|
fn redb_err(e: impl std::fmt::Display) -> StorageError {
|
||||||
|
StorageError::Backend(e.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compute the lexicographic successor of a byte prefix.
|
||||||
|
///
|
||||||
|
/// Returns `None` if the prefix is all `0xFF` (no successor possible).
|
||||||
|
fn prefix_successor(prefix: &[u8]) -> Option<Vec<u8>> {
|
||||||
|
let mut end = prefix.to_vec();
|
||||||
|
while let Some(last) = end.last_mut() {
|
||||||
|
if *last < 0xFF {
|
||||||
|
*last += 1;
|
||||||
|
return Some(end);
|
||||||
|
}
|
||||||
|
end.pop();
|
||||||
|
}
|
||||||
|
None
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Redb (B-tree) implementation of the KVStore trait.
|
||||||
|
///
|
||||||
|
/// Used for read-heavy key prefixes: indexes (`S:`, `SP:`), materialized views (`MV:`),
|
||||||
|
/// trust ranks (`TR:`), audits (`QA:`), quotas (`QT:`), trust packs (`TP:`),
|
||||||
|
/// gold standards (`GS:`), and escalations (`ESC:`).
|
||||||
|
pub struct RedbStore {
|
||||||
|
db: Arc<redb::Database>,
|
||||||
|
_temp_dir: Option<tempfile::TempDir>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for RedbStore {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
f.debug_struct("RedbStore").finish()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl RedbStore {
|
||||||
|
/// Open or create a Redb database at the given path.
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
|
||||||
|
let db = redb::Database::create(path.as_ref()).map_err(redb_err)?;
|
||||||
|
Ok(Self { db: Arc::new(db), _temp_dir: None })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Open a temporary Redb database for testing.
|
||||||
|
///
|
||||||
|
/// The database will be automatically deleted when the returned store is dropped.
|
||||||
|
pub fn open_temp() -> Result<Self> {
|
||||||
|
let temp_dir = tempfile::tempdir().map_err(StorageError::Io)?;
|
||||||
|
let db_path = temp_dir.path().join("data.redb");
|
||||||
|
let db = redb::Database::create(&db_path).map_err(redb_err)?;
|
||||||
|
Ok(Self { db: Arc::new(db), _temp_dir: Some(temp_dir) })
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl KVStore for RedbStore {
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
|
||||||
|
let read_txn = self.db.begin_read().map_err(redb_err)?;
|
||||||
|
let table = match read_txn.open_table(DATA_TABLE) {
|
||||||
|
Ok(t) => t,
|
||||||
|
Err(redb::TableError::TableDoesNotExist(_)) => return Ok(None),
|
||||||
|
Err(e) => return Err(redb_err(e)),
|
||||||
|
};
|
||||||
|
match table.get(key).map_err(redb_err)? {
|
||||||
|
Some(guard) => Ok(Some(guard.value().to_vec())),
|
||||||
|
None => Ok(None),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), value_len = value.len()))]
|
||||||
|
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
|
||||||
|
let write_txn = self.db.begin_write().map_err(redb_err)?;
|
||||||
|
{
|
||||||
|
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
|
||||||
|
table.insert(key, value).map_err(redb_err)?;
|
||||||
|
}
|
||||||
|
write_txn.commit().map_err(redb_err)?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn delete(&self, key: &[u8]) -> Result<()> {
|
||||||
|
let write_txn = self.db.begin_write().map_err(redb_err)?;
|
||||||
|
{
|
||||||
|
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
|
||||||
|
table.remove(key).map_err(redb_err)?;
|
||||||
|
}
|
||||||
|
write_txn.commit().map_err(redb_err)?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(prefix_len = prefix.len()))]
|
||||||
|
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
|
||||||
|
let read_txn = self.db.begin_read().map_err(redb_err)?;
|
||||||
|
let table = match read_txn.open_table(DATA_TABLE) {
|
||||||
|
Ok(t) => t,
|
||||||
|
Err(redb::TableError::TableDoesNotExist(_)) => return Ok(Vec::new()),
|
||||||
|
Err(e) => return Err(redb_err(e)),
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut results = Vec::new();
|
||||||
|
match prefix_successor(prefix) {
|
||||||
|
Some(end_key) => {
|
||||||
|
let range = table.range(prefix..end_key.as_slice()).map_err(redb_err)?;
|
||||||
|
for entry in range {
|
||||||
|
let (k, v) = entry.map_err(redb_err)?;
|
||||||
|
results.push((k.value().to_vec(), v.value().to_vec()));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
None => {
|
||||||
|
// prefix is all 0xFF — scan from prefix to end
|
||||||
|
let range = table.range(prefix..).map_err(redb_err)?;
|
||||||
|
for entry in range {
|
||||||
|
let (k, v) = entry.map_err(redb_err)?;
|
||||||
|
results.push((k.value().to_vec(), v.value().to_vec()));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Ok(results)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all)]
|
||||||
|
async fn flush(&self) -> Result<()> {
|
||||||
|
// redb is always durable after commit — flush is a no-op
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len(), delta))]
|
||||||
|
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
|
||||||
|
let write_txn = self.db.begin_write().map_err(redb_err)?;
|
||||||
|
let new_val = {
|
||||||
|
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
|
||||||
|
let current = match table.get(key).map_err(redb_err)? {
|
||||||
|
Some(guard) => {
|
||||||
|
let arr: [u8; 8] = guard.value().try_into().map_err(|_| {
|
||||||
|
StorageError::Serialization(format!(
|
||||||
|
"Corrupted u64 counter: expected 8 bytes, got {}",
|
||||||
|
guard.value().len()
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
u64::from_le_bytes(arr)
|
||||||
|
}
|
||||||
|
None => 0,
|
||||||
|
};
|
||||||
|
let new_val = current.saturating_add(delta);
|
||||||
|
table.insert(key, new_val.to_le_bytes().as_slice()).map_err(redb_err)?;
|
||||||
|
new_val
|
||||||
|
};
|
||||||
|
write_txn.commit().map_err(redb_err)?;
|
||||||
|
Ok(new_val)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip_all, fields(key_len = key.len()))]
|
||||||
|
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
|
||||||
|
where
|
||||||
|
F: Fn(f32) -> f32 + Send + Sync,
|
||||||
|
{
|
||||||
|
let write_txn = self.db.begin_write().map_err(redb_err)?;
|
||||||
|
let new_val = {
|
||||||
|
let mut table = write_txn.open_table(DATA_TABLE).map_err(redb_err)?;
|
||||||
|
let current = match table.get(key).map_err(redb_err)? {
|
||||||
|
Some(guard) => {
|
||||||
|
let arr: [u8; 4] = guard.value().try_into().map_err(|_| {
|
||||||
|
StorageError::Serialization(format!(
|
||||||
|
"Corrupted f32 value: expected 4 bytes, got {}",
|
||||||
|
guard.value().len()
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
f32::from_le_bytes(arr)
|
||||||
|
}
|
||||||
|
None => 0.0,
|
||||||
|
};
|
||||||
|
let new_val = update_fn(current);
|
||||||
|
table.insert(key, new_val.to_le_bytes().as_slice()).map_err(redb_err)?;
|
||||||
|
new_val
|
||||||
|
};
|
||||||
|
write_txn.commit().map_err(redb_err)?;
|
||||||
|
Ok(new_val)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_store_roundtrip() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"test_key";
|
||||||
|
let value = b"test_value";
|
||||||
|
|
||||||
|
store.put(key, value).await.expect("Put failed");
|
||||||
|
let retrieved = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(retrieved, Some(value.to_vec()));
|
||||||
|
|
||||||
|
store.delete(key).await.expect("Delete failed");
|
||||||
|
let deleted = store.get(key).await.expect("Get failed");
|
||||||
|
assert_eq!(deleted, None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_scan_prefix() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
store.put(b"prefix:1", b"val1").await.unwrap();
|
||||||
|
store.put(b"prefix:2", b"val2").await.unwrap();
|
||||||
|
store.put(b"other:3", b"val3").await.unwrap();
|
||||||
|
|
||||||
|
let results = store.scan_prefix(b"prefix:").await.unwrap();
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
|
||||||
|
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_fetch_and_add() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"counter";
|
||||||
|
|
||||||
|
let val = store.fetch_and_add_u64(key, 5).await.unwrap();
|
||||||
|
assert_eq!(val, 5);
|
||||||
|
|
||||||
|
let val = store.fetch_and_add_u64(key, 3).await.unwrap();
|
||||||
|
assert_eq!(val, 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_compare_and_swap_f32() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
let key = b"weight";
|
||||||
|
|
||||||
|
let val = store.compare_and_swap_f32(key, |current| current + 1.5).await.unwrap();
|
||||||
|
assert!((val - 1.5).abs() < f32::EPSILON);
|
||||||
|
|
||||||
|
let val = store.compare_and_swap_f32(key, |current| current + 2.0).await.unwrap();
|
||||||
|
assert!((val - 3.5).abs() < f32::EPSILON);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_flush() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
store.put(b"key", b"value").await.unwrap();
|
||||||
|
store.flush().await.expect("Flush should succeed");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_get_nonexistent_table() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
// Get from empty database (table doesn't exist yet)
|
||||||
|
let result = store.get(b"missing").await.unwrap();
|
||||||
|
assert_eq!(result, None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_redb_scan_prefix_empty_table() {
|
||||||
|
let store = RedbStore::open_temp().expect("Failed to create temp DB");
|
||||||
|
// Scan from empty database
|
||||||
|
let results = store.scan_prefix(b"prefix:").await.unwrap();
|
||||||
|
assert!(results.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_prefix_successor() {
|
||||||
|
assert_eq!(prefix_successor(b"abc"), Some(b"abd".to_vec()));
|
||||||
|
assert_eq!(prefix_successor(b"ab\xff"), Some(b"ac".to_vec()));
|
||||||
|
assert_eq!(prefix_successor(b"\xff\xff\xff"), None);
|
||||||
|
assert_eq!(prefix_successor(b""), None);
|
||||||
|
assert_eq!(prefix_successor(b"a\xff\xff"), Some(b"b".to_vec()));
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -1,156 +0,0 @@
|
|||||||
use crate::error::{Result, StorageError};
|
|
||||||
use crate::traits::KVStore;
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use sled::Db;
|
|
||||||
use std::path::Path;
|
|
||||||
|
|
||||||
/// Sled-based implementation of the KVStore trait.
|
|
||||||
#[derive(Debug, Clone)]
|
|
||||||
pub struct SledStore {
|
|
||||||
db: Db,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl SledStore {
|
|
||||||
/// Open or create a new Sled database at the given path.
|
|
||||||
pub fn open(path: impl AsRef<Path>) -> Result<Self> {
|
|
||||||
let db = sled::open(path).map_err(StorageError::Sled)?;
|
|
||||||
Ok(Self { db })
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Open a temporary Sled database for testing.
|
|
||||||
///
|
|
||||||
/// The database will be automatically deleted when dropped.
|
|
||||||
/// Useful for unit tests in this and other crates.
|
|
||||||
pub fn open_temp() -> Result<Self> {
|
|
||||||
let config = sled::Config::new().temporary(true);
|
|
||||||
let db = config.open().map_err(StorageError::Sled)?;
|
|
||||||
Ok(Self { db })
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl KVStore for SledStore {
|
|
||||||
async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>> {
|
|
||||||
let result = self.db.get(key).map_err(StorageError::Sled)?;
|
|
||||||
Ok(result.map(|ivec| ivec.to_vec()))
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
|
|
||||||
self.db.insert(key, value).map_err(StorageError::Sled)?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn delete(&self, key: &[u8]) -> Result<()> {
|
|
||||||
self.db.remove(key).map_err(StorageError::Sled)?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
|
|
||||||
let iter = self.db.scan_prefix(prefix);
|
|
||||||
let mut results = Vec::new();
|
|
||||||
for item in iter {
|
|
||||||
let (k, v) = item.map_err(StorageError::Sled)?;
|
|
||||||
results.push((k.to_vec(), v.to_vec()));
|
|
||||||
}
|
|
||||||
Ok(results)
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn flush(&self) -> Result<()> {
|
|
||||||
self.db.flush_async().await.map_err(StorageError::Sled)?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn fetch_and_add_u64(&self, key: &[u8], delta: u64) -> Result<u64> {
|
|
||||||
let result = self
|
|
||||||
.db
|
|
||||||
.update_and_fetch(key, |old| {
|
|
||||||
let current = match old {
|
|
||||||
Some(bytes) => match <[u8; 8]>::try_from(bytes) {
|
|
||||||
Ok(arr) => u64::from_le_bytes(arr),
|
|
||||||
Err(_) => 0, // Corrupted data, start fresh
|
|
||||||
},
|
|
||||||
None => 0, // Key doesn't exist, start at 0
|
|
||||||
};
|
|
||||||
Some(current.saturating_add(delta).to_le_bytes().to_vec())
|
|
||||||
})
|
|
||||||
.map_err(StorageError::Sled)?;
|
|
||||||
|
|
||||||
// Result is Some because our update_fn always returns Some
|
|
||||||
let bytes = result.ok_or_else(|| {
|
|
||||||
StorageError::Serialization("fetch_and_add_u64 returned None unexpectedly".to_string())
|
|
||||||
})?;
|
|
||||||
let arr: [u8; 8] = bytes.as_ref().try_into().map_err(|_| {
|
|
||||||
StorageError::Serialization("fetch_and_add_u64 returned wrong size".to_string())
|
|
||||||
})?;
|
|
||||||
Ok(u64::from_le_bytes(arr))
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn compare_and_swap_f32<F>(&self, key: &[u8], update_fn: F) -> Result<f32>
|
|
||||||
where
|
|
||||||
F: Fn(f32) -> f32 + Send + Sync,
|
|
||||||
{
|
|
||||||
let result = self
|
|
||||||
.db
|
|
||||||
.update_and_fetch(key, |old| {
|
|
||||||
let current = match old {
|
|
||||||
Some(bytes) => match <[u8; 4]>::try_from(bytes) {
|
|
||||||
Ok(arr) => f32::from_le_bytes(arr),
|
|
||||||
Err(_) => 0.0, // Corrupted data, start fresh
|
|
||||||
},
|
|
||||||
None => 0.0, // Key doesn't exist, start at 0.0
|
|
||||||
};
|
|
||||||
let new_value = update_fn(current);
|
|
||||||
Some(new_value.to_le_bytes().to_vec())
|
|
||||||
})
|
|
||||||
.map_err(StorageError::Sled)?;
|
|
||||||
|
|
||||||
let bytes = result.ok_or_else(|| {
|
|
||||||
StorageError::Serialization(
|
|
||||||
"compare_and_swap_f32 returned None unexpectedly".to_string(),
|
|
||||||
)
|
|
||||||
})?;
|
|
||||||
let arr: [u8; 4] = bytes.as_ref().try_into().map_err(|_| {
|
|
||||||
StorageError::Serialization("compare_and_swap_f32 returned wrong size".to_string())
|
|
||||||
})?;
|
|
||||||
Ok(f32::from_le_bytes(arr))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[cfg(test)]
|
|
||||||
mod tests {
|
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[tokio::test]
|
|
||||||
async fn test_sled_store_roundtrip() {
|
|
||||||
let store = SledStore::open_temp().expect("Failed to create temp DB");
|
|
||||||
let key = b"test_key";
|
|
||||||
let value = b"test_value";
|
|
||||||
|
|
||||||
// Put
|
|
||||||
store.put(key, value).await.expect("Put failed");
|
|
||||||
|
|
||||||
// Get
|
|
||||||
let retrieved = store.get(key).await.expect("Get failed");
|
|
||||||
assert_eq!(retrieved, Some(value.to_vec()));
|
|
||||||
|
|
||||||
// Delete
|
|
||||||
store.delete(key).await.expect("Delete failed");
|
|
||||||
|
|
||||||
// Get after delete
|
|
||||||
let deleted = store.get(key).await.expect("Get failed");
|
|
||||||
assert_eq!(deleted, None);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[tokio::test]
|
|
||||||
async fn test_scan_prefix() {
|
|
||||||
let store = SledStore::open_temp().expect("Failed to create temp DB");
|
|
||||||
store.put(b"prefix:1", b"val1").await.unwrap();
|
|
||||||
store.put(b"prefix:2", b"val2").await.unwrap();
|
|
||||||
store.put(b"other:3", b"val3").await.unwrap();
|
|
||||||
|
|
||||||
let results = store.scan_prefix(b"prefix:").await.unwrap();
|
|
||||||
assert_eq!(results.len(), 2);
|
|
||||||
assert_eq!(results[0], (b"prefix:1".to_vec(), b"val1".to_vec()));
|
|
||||||
assert_eq!(results[1], (b"prefix:2".to_vec(), b"val2".to_vec()));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@ -20,17 +20,13 @@
|
|||||||
//! 5. Audit trail preserved: "Who fixed it? When? Why?"
|
//! 5. Audit trail preserved: "Who fixed it? When? Why?"
|
||||||
|
|
||||||
use crate::error::{Result, StorageError};
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use stemedb_core::serde::{deserialize, serialize};
|
use stemedb_core::serde::{deserialize, serialize};
|
||||||
use stemedb_core::types::{Hash, Supersession};
|
use stemedb_core::types::{Hash, Supersession};
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
|
|
||||||
/// Key prefix for supersession records.
|
|
||||||
const SUPERSESSION_PREFIX: &[u8] = b"SUP:";
|
|
||||||
/// Key prefix for agent supersession index.
|
|
||||||
const SUPERSESSION_INDEX_PREFIX: &[u8] = b"SUP:IDX:";
|
|
||||||
|
|
||||||
/// Specialized storage trait for supersession operations.
|
/// Specialized storage trait for supersession operations.
|
||||||
///
|
///
|
||||||
/// This trait provides supersession-specific operations on top of a generic KVStore,
|
/// This trait provides supersession-specific operations on top of a generic KVStore,
|
||||||
@ -95,30 +91,6 @@ impl<S: KVStore> GenericSupersessionStore<S> {
|
|||||||
pub fn new(store: S) -> Self {
|
pub fn new(store: S) -> Self {
|
||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build the key for a supersession record.
|
|
||||||
fn supersession_key(target_hash: &Hash) -> Vec<u8> {
|
|
||||||
let mut key = SUPERSESSION_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(target_hash);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Build the key for the agent supersession index.
|
|
||||||
fn index_key(agent_id: &[u8; 32], timestamp: u64) -> Vec<u8> {
|
|
||||||
let mut key = SUPERSESSION_INDEX_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(agent_id);
|
|
||||||
key.push(b':');
|
|
||||||
key.extend_from_slice(×tamp.to_be_bytes());
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Build the prefix for scanning an agent's supersessions.
|
|
||||||
fn index_prefix(agent_id: &[u8; 32]) -> Vec<u8> {
|
|
||||||
let mut key = SUPERSESSION_INDEX_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(agent_id);
|
|
||||||
key.push(b':');
|
|
||||||
key
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@ -131,11 +103,15 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
})?;
|
})?;
|
||||||
|
|
||||||
// Store at primary key
|
// Store at primary key
|
||||||
let key = Self::supersession_key(&supersession.target_hash);
|
let key = key_codec::supersession_key(&hex::encode(supersession.target_hash));
|
||||||
self.store.put(&key, &bytes).await?;
|
self.store.put(&key, &bytes).await?;
|
||||||
|
|
||||||
// Store index entry (value is the target_hash for lookup)
|
// Store index entry (value is the target_hash for lookup)
|
||||||
let index_key = Self::index_key(&supersession.agent_id, supersession.timestamp);
|
let timestamp_bytes = supersession.timestamp.to_be_bytes();
|
||||||
|
let index_key = key_codec::supersession_index_key(
|
||||||
|
&hex::encode(supersession.agent_id),
|
||||||
|
×tamp_bytes,
|
||||||
|
);
|
||||||
self.store.put(&index_key, &supersession.target_hash).await?;
|
self.store.put(&index_key, &supersession.target_hash).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
@ -149,7 +125,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
|
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn get_supersession(&self, target_hash: &Hash) -> Result<Option<Supersession>> {
|
async fn get_supersession(&self, target_hash: &Hash) -> Result<Option<Supersession>> {
|
||||||
let key = Self::supersession_key(target_hash);
|
let key = key_codec::supersession_key(&hex::encode(target_hash));
|
||||||
|
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(bytes) => {
|
Some(bytes) => {
|
||||||
@ -167,7 +143,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
|
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn is_superseded(&self, target_hash: &Hash) -> Result<bool> {
|
async fn is_superseded(&self, target_hash: &Hash) -> Result<bool> {
|
||||||
let key = Self::supersession_key(target_hash);
|
let key = key_codec::supersession_key(&hex::encode(target_hash));
|
||||||
Ok(self.store.get(&key).await?.is_some())
|
Ok(self.store.get(&key).await?.is_some())
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -179,7 +155,7 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
to_timestamp: Option<u64>,
|
to_timestamp: Option<u64>,
|
||||||
limit: Option<usize>,
|
limit: Option<usize>,
|
||||||
) -> Result<Vec<Supersession>> {
|
) -> Result<Vec<Supersession>> {
|
||||||
let prefix = Self::index_prefix(agent_id);
|
let prefix = key_codec::supersession_index_prefix(&hex::encode(agent_id));
|
||||||
let entries = self.store.scan_prefix(&prefix).await?;
|
let entries = self.store.scan_prefix(&prefix).await?;
|
||||||
|
|
||||||
let to_ts = to_timestamp.unwrap_or(u64::MAX);
|
let to_ts = to_timestamp.unwrap_or(u64::MAX);
|
||||||
@ -188,38 +164,41 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
let mut supersessions = Vec::new();
|
let mut supersessions = Vec::new();
|
||||||
|
|
||||||
for (key, target_hash_bytes) in entries {
|
for (key, target_hash_bytes) in entries {
|
||||||
// Extract timestamp from key (last 8 bytes after the prefix + agent_id + colon)
|
// Extract timestamp from key
|
||||||
// Key format: SUP:IDX:{agent_id}:{timestamp}
|
// Key format: \x00SUP:IDX:{agent_hex}:{timestamp_be_bytes}
|
||||||
let timestamp_start = SUPERSESSION_INDEX_PREFIX.len() + 32 + 1; // prefix + agent_id + ':'
|
// We need to find the last colon and extract the 8 bytes after it
|
||||||
if key.len() < timestamp_start + 8 {
|
if let Some(last_colon_pos) = key.iter().rposition(|&b| b == b':') {
|
||||||
continue; // Malformed key
|
let timestamp_start = last_colon_pos + 1;
|
||||||
}
|
if key.len() < timestamp_start + 8 {
|
||||||
|
continue; // Malformed key
|
||||||
|
}
|
||||||
|
|
||||||
let timestamp_bytes: [u8; 8] =
|
let timestamp_bytes: [u8; 8] =
|
||||||
key[timestamp_start..timestamp_start + 8].try_into().map_err(|_| {
|
key[timestamp_start..timestamp_start + 8].try_into().map_err(|_| {
|
||||||
StorageError::Serialization("Invalid timestamp in index key".to_string())
|
StorageError::Serialization("Invalid timestamp in index key".to_string())
|
||||||
|
})?;
|
||||||
|
let timestamp = u64::from_be_bytes(timestamp_bytes);
|
||||||
|
|
||||||
|
// Filter by time range
|
||||||
|
if timestamp < from_timestamp || timestamp > to_ts {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse target hash
|
||||||
|
if target_hash_bytes.len() != 32 {
|
||||||
|
continue; // Malformed value
|
||||||
|
}
|
||||||
|
let target_hash: Hash = target_hash_bytes.try_into().map_err(|_| {
|
||||||
|
StorageError::Serialization("Invalid target hash in index".to_string())
|
||||||
})?;
|
})?;
|
||||||
let timestamp = u64::from_be_bytes(timestamp_bytes);
|
|
||||||
|
|
||||||
// Filter by time range
|
// Fetch the actual supersession record
|
||||||
if timestamp < from_timestamp || timestamp > to_ts {
|
if let Some(supersession) = self.get_supersession(&target_hash).await? {
|
||||||
continue;
|
supersessions.push(supersession);
|
||||||
}
|
|
||||||
|
|
||||||
// Parse target hash
|
if supersessions.len() >= max_results {
|
||||||
if target_hash_bytes.len() != 32 {
|
break;
|
||||||
continue; // Malformed value
|
}
|
||||||
}
|
|
||||||
let target_hash: Hash = target_hash_bytes.try_into().map_err(|_| {
|
|
||||||
StorageError::Serialization("Invalid target hash in index".to_string())
|
|
||||||
})?;
|
|
||||||
|
|
||||||
// Fetch the actual supersession record
|
|
||||||
if let Some(supersession) = self.get_supersession(&target_hash).await? {
|
|
||||||
supersessions.push(supersession);
|
|
||||||
|
|
||||||
if supersessions.len() >= max_results {
|
|
||||||
break;
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -234,13 +213,13 @@ impl<S: KVStore + Send + Sync> SupersessionStore for GenericSupersessionStore<S>
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
use stemedb_core::types::SupersessionType;
|
use stemedb_core::types::SupersessionType;
|
||||||
use tempfile::tempdir;
|
use tempfile::tempdir;
|
||||||
|
|
||||||
async fn create_test_store() -> GenericSupersessionStore<SledStore> {
|
async fn create_test_store() -> GenericSupersessionStore<HybridStore> {
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
let dir = tempdir().expect("Failed to create temp dir");
|
||||||
let store = SledStore::open(dir.path()).expect("Failed to open store");
|
let store = HybridStore::open(dir.path()).expect("Failed to open store");
|
||||||
GenericSupersessionStore::new(store)
|
GenericSupersessionStore::new(store)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -4,7 +4,7 @@ use std::sync::Arc;
|
|||||||
|
|
||||||
/// Abstract interface for Key-Value storage backends.
|
/// Abstract interface for Key-Value storage backends.
|
||||||
///
|
///
|
||||||
/// This trait allows us to swap the underlying storage engine (e.g., sled, RocksDB)
|
/// This trait allows us to swap the underlying storage engine (e.g., fjall, redb)
|
||||||
/// without changing the core logic of the database.
|
/// without changing the core logic of the database.
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
pub trait KVStore: Send + Sync {
|
pub trait KVStore: Send + Sync {
|
||||||
|
|||||||
@ -8,7 +8,7 @@
|
|||||||
//!
|
//!
|
||||||
//! | Key Pattern | Value | Purpose |
|
//! | Key Pattern | Value | Purpose |
|
||||||
//! |-------------|-------|---------|
|
//! |-------------|-------|---------|
|
||||||
//! | `TP:{pack_id}` | Serialized TrustPack | Pack definition and agent membership |
|
//! | `\x00TP:{pack_id}` | Serialized TrustPack | Pack definition and agent membership |
|
||||||
//!
|
//!
|
||||||
//! # Design Philosophy
|
//! # Design Philosophy
|
||||||
//!
|
//!
|
||||||
@ -21,14 +21,12 @@
|
|||||||
//! All operations are defensive against missing data (missing pack returns None).
|
//! All operations are defensive against missing data (missing pack returns None).
|
||||||
|
|
||||||
use crate::error::{Result, StorageError};
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use stemedb_core::types::{PackId, TrustPack};
|
use stemedb_core::types::{PackId, TrustPack};
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
|
|
||||||
/// Key prefix for TrustPack entries.
|
|
||||||
const TRUST_PACK_PREFIX: &[u8] = b"TP:";
|
|
||||||
|
|
||||||
/// Specialized storage trait for TrustPack operations.
|
/// Specialized storage trait for TrustPack operations.
|
||||||
///
|
///
|
||||||
/// This trait provides pack-specific operations on top of a generic KVStore,
|
/// This trait provides pack-specific operations on top of a generic KVStore,
|
||||||
@ -127,13 +125,6 @@ impl<S: KVStore> GenericTrustPackStore<S> {
|
|||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the key for a TrustPack entry.
|
|
||||||
fn trust_pack_key(pack_id: &PackId) -> Vec<u8> {
|
|
||||||
let mut key = TRUST_PACK_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(pack_id);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Serialize a TrustPack using the canonical serde helpers.
|
/// Serialize a TrustPack using the canonical serde helpers.
|
||||||
fn serialize_pack(pack: &TrustPack) -> Result<Vec<u8>> {
|
fn serialize_pack(pack: &TrustPack) -> Result<Vec<u8>> {
|
||||||
crate::serde_helpers::serialize(pack)
|
crate::serde_helpers::serialize(pack)
|
||||||
@ -150,7 +141,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
#[instrument(skip(self, pack), fields(pack_id = %hex::encode(pack.id), pack_name = %pack.name, agent_count = pack.agent_count()))]
|
#[instrument(skip(self, pack), fields(pack_id = %hex::encode(pack.id), pack_name = %pack.name, agent_count = pack.agent_count()))]
|
||||||
async fn put_pack(&self, pack: &TrustPack) -> Result<PackId> {
|
async fn put_pack(&self, pack: &TrustPack) -> Result<PackId> {
|
||||||
let serialized = Self::serialize_pack(pack)?;
|
let serialized = Self::serialize_pack(pack)?;
|
||||||
let key = Self::trust_pack_key(&pack.id);
|
let key = key_codec::trust_pack_key(&pack.id);
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
@ -163,7 +154,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self), fields(pack_id = %hex::encode(pack_id)))]
|
#[instrument(skip(self), fields(pack_id = %hex::encode(pack_id)))]
|
||||||
async fn get_pack(&self, pack_id: &PackId) -> Result<Option<TrustPack>> {
|
async fn get_pack(&self, pack_id: &PackId) -> Result<Option<TrustPack>> {
|
||||||
let key = Self::trust_pack_key(pack_id);
|
let key = key_codec::trust_pack_key(pack_id);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let pack = Self::deserialize_pack(&data)?;
|
let pack = Self::deserialize_pack(&data)?;
|
||||||
@ -192,7 +183,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
let new_count = pack.agent_count();
|
let new_count = pack.agent_count();
|
||||||
|
|
||||||
let serialized = Self::serialize_pack(&pack)?;
|
let serialized = Self::serialize_pack(&pack)?;
|
||||||
let key = Self::trust_pack_key(pack_id);
|
let key = key_codec::trust_pack_key(pack_id);
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(old_count, new_count, added = new_count > old_count, "Updated pack membership");
|
debug!(old_count, new_count, added = new_count > old_count, "Updated pack membership");
|
||||||
@ -211,7 +202,7 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
let new_count = pack.agent_count();
|
let new_count = pack.agent_count();
|
||||||
|
|
||||||
let serialized = Self::serialize_pack(&pack)?;
|
let serialized = Self::serialize_pack(&pack)?;
|
||||||
let key = Self::trust_pack_key(pack_id);
|
let key = key_codec::trust_pack_key(pack_id);
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(old_count, new_count, removed = new_count < old_count, "Updated pack membership");
|
debug!(old_count, new_count, removed = new_count < old_count, "Updated pack membership");
|
||||||
@ -236,16 +227,17 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
|
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
async fn list_packs(&self) -> Result<Vec<PackId>> {
|
async fn list_packs(&self) -> Result<Vec<PackId>> {
|
||||||
let prefix = TRUST_PACK_PREFIX.to_vec();
|
let prefix = key_codec::trust_pack_scan_prefix();
|
||||||
let entries = self.store.scan_prefix(&prefix).await?;
|
let entries = self.store.scan_prefix(&prefix).await?;
|
||||||
|
|
||||||
let pack_ids: Vec<PackId> = entries
|
let pack_ids: Vec<PackId> = entries
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.filter_map(|(key, _data)| {
|
.filter_map(|(key, _data)| {
|
||||||
// Extract pack_id from key: "TP:{pack_id}"
|
// Extract pack_id from key: "\x00TP:{pack_id}"
|
||||||
if key.len() == TRUST_PACK_PREFIX.len() + 32 {
|
// Key format: \x00 (1 byte) + "TP:" (3 bytes) + pack_id (32 bytes) = 36 bytes
|
||||||
|
if key.len() == 36 {
|
||||||
let mut pack_id = [0u8; 32];
|
let mut pack_id = [0u8; 32];
|
||||||
pack_id.copy_from_slice(&key[TRUST_PACK_PREFIX.len()..]);
|
pack_id.copy_from_slice(&key[4..]); // Skip \x00TP:
|
||||||
Some(pack_id)
|
Some(pack_id)
|
||||||
} else {
|
} else {
|
||||||
None
|
None
|
||||||
@ -261,7 +253,8 @@ impl<S: KVStore + 'static> TrustPackStore for GenericTrustPackStore<S> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
fn create_test_pack(id: PackId, name: &str, maintainer: [u8; 32]) -> TrustPack {
|
fn create_test_pack(id: PackId, name: &str, maintainer: [u8; 32]) -> TrustPack {
|
||||||
TrustPack::new(id, name.to_string(), maintainer)
|
TrustPack::new(id, name.to_string(), maintainer)
|
||||||
@ -269,7 +262,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_put_and_get_pack() {
|
async fn test_put_and_get_pack() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let pack_id = [1u8; 32];
|
let pack_id = [1u8; 32];
|
||||||
@ -292,7 +285,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_add_agent_idempotent() {
|
async fn test_add_agent_idempotent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let pack_id = [2u8; 32];
|
let pack_id = [2u8; 32];
|
||||||
@ -315,7 +308,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_remove_agent() {
|
async fn test_remove_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let pack_id = [3u8; 32];
|
let pack_id = [3u8; 32];
|
||||||
@ -345,7 +338,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_contains_agent() {
|
async fn test_contains_agent() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let pack_id = [4u8; 32];
|
let pack_id = [4u8; 32];
|
||||||
@ -367,7 +360,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_list_packs() {
|
async fn test_list_packs() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
// Initially empty
|
// Initially empty
|
||||||
@ -394,7 +387,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_missing_pack_returns_none() {
|
async fn test_missing_pack_returns_none() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let nonexistent_id = [99u8; 32];
|
let nonexistent_id = [99u8; 32];
|
||||||
@ -411,7 +404,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_add_to_missing_pack_errors() {
|
async fn test_add_to_missing_pack_errors() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let nonexistent_id = [98u8; 32];
|
let nonexistent_id = [98u8; 32];
|
||||||
@ -424,7 +417,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_remove_from_missing_pack_errors() {
|
async fn test_remove_from_missing_pack_errors() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let nonexistent_id = [97u8; 32];
|
let nonexistent_id = [97u8; 32];
|
||||||
@ -446,9 +439,9 @@ mod tests {
|
|||||||
original.add_agent([3u8; 32]);
|
original.add_agent([3u8; 32]);
|
||||||
|
|
||||||
let serialized =
|
let serialized =
|
||||||
GenericTrustPackStore::<SledStore>::serialize_pack(&original).expect("serialize");
|
GenericTrustPackStore::<HybridStore>::serialize_pack(&original).expect("serialize");
|
||||||
let deserialized =
|
let deserialized = GenericTrustPackStore::<HybridStore>::deserialize_pack(&serialized)
|
||||||
GenericTrustPackStore::<SledStore>::deserialize_pack(&serialized).expect("deserialize");
|
.expect("deserialize");
|
||||||
|
|
||||||
assert_eq!(original, deserialized);
|
assert_eq!(original, deserialized);
|
||||||
assert_eq!(deserialized.agent_count(), 3);
|
assert_eq!(deserialized.agent_count(), 3);
|
||||||
@ -456,7 +449,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multiple_agents() {
|
async fn test_multiple_agents() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let pack_store = GenericTrustPackStore::new(store);
|
let pack_store = GenericTrustPackStore::new(store);
|
||||||
|
|
||||||
let pack_id = [5u8; 32];
|
let pack_id = [5u8; 32];
|
||||||
|
|||||||
@ -1,8 +1,9 @@
|
|||||||
//! Basic tests for TrustRank model and CRUD operations.
|
//! Basic tests for TrustRank model and CRUD operations.
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
use model::{DEFAULT_HALF_LIFE_SECONDS, DEFAULT_TRUST_SCORE};
|
use model::{DEFAULT_HALF_LIFE_SECONDS, DEFAULT_TRUST_SCORE};
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_trust_rank_new() {
|
fn test_trust_rank_new() {
|
||||||
@ -88,7 +89,7 @@ fn test_decay_no_time_elapsed() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_default_trust_rank() {
|
async fn test_get_default_trust_rank() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [1u8; 32];
|
let agent_id = [1u8; 32];
|
||||||
@ -100,7 +101,7 @@ async fn test_get_default_trust_rank() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_put_and_get_trust_rank() {
|
async fn test_put_and_get_trust_rank() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [2u8; 32];
|
let agent_id = [2u8; 32];
|
||||||
@ -117,7 +118,7 @@ async fn test_put_and_get_trust_rank() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_update_trust_rank() {
|
async fn test_update_trust_rank() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [3u8; 32];
|
let agent_id = [3u8; 32];
|
||||||
@ -133,7 +134,7 @@ async fn test_update_trust_rank() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_record_outcome_updates_score() {
|
async fn test_record_outcome_updates_score() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [4u8; 32];
|
let agent_id = [4u8; 32];
|
||||||
@ -155,7 +156,7 @@ async fn test_record_outcome_updates_score() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_decay_trust_ranks() {
|
async fn test_decay_trust_ranks() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
// Create several agents with different scores
|
// Create several agents with different scores
|
||||||
@ -184,7 +185,7 @@ async fn test_decay_trust_ranks() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_decay_no_change_skips_update() {
|
async fn test_decay_no_change_skips_update() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [5u8; 32];
|
let agent_id = [5u8; 32];
|
||||||
@ -208,8 +209,8 @@ async fn test_serialization_roundtrip() {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let serialized =
|
let serialized =
|
||||||
GenericTrustRankStore::<SledStore>::serialize_trust_rank(&original).expect("serialize");
|
GenericTrustRankStore::<HybridStore>::serialize_trust_rank(&original).expect("serialize");
|
||||||
let deserialized = GenericTrustRankStore::<SledStore>::deserialize_trust_rank(&serialized)
|
let deserialized = GenericTrustRankStore::<HybridStore>::deserialize_trust_rank(&serialized)
|
||||||
.expect("deserialize");
|
.expect("deserialize");
|
||||||
|
|
||||||
assert_eq!(original, deserialized);
|
assert_eq!(original, deserialized);
|
||||||
|
|||||||
@ -1,12 +1,13 @@
|
|||||||
//! Tests for gold standard verification and advanced TrustRank features.
|
//! Tests for gold standard verification and advanced TrustRank features.
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
|
use std::sync::Arc;
|
||||||
use stemedb_core::types::GoldStandard;
|
use stemedb_core::types::GoldStandard;
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_custom_half_life() {
|
async fn test_custom_half_life() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [6u8; 32];
|
let agent_id = [6u8; 32];
|
||||||
@ -33,7 +34,7 @@ async fn test_custom_half_life() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_verify_correct_answer() {
|
async fn test_verify_correct_answer() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [7u8; 32];
|
let agent_id = [7u8; 32];
|
||||||
@ -65,7 +66,7 @@ async fn test_verify_correct_answer() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_verify_incorrect_answer() {
|
async fn test_verify_incorrect_answer() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [8u8; 32];
|
let agent_id = [8u8; 32];
|
||||||
@ -97,7 +98,7 @@ async fn test_verify_incorrect_answer() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_verify_multiple_gold_standards() {
|
async fn test_verify_multiple_gold_standards() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [9u8; 32];
|
let agent_id = [9u8; 32];
|
||||||
@ -142,7 +143,7 @@ async fn test_verify_multiple_gold_standards() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_verify_score_clamping() {
|
async fn test_verify_score_clamping() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [10u8; 32];
|
let agent_id = [10u8; 32];
|
||||||
@ -173,7 +174,7 @@ async fn test_verify_score_clamping() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_first_gold_standard_verification_succeeds_with_reward() {
|
async fn test_first_gold_standard_verification_succeeds_with_reward() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [11u8; 32];
|
let agent_id = [11u8; 32];
|
||||||
@ -198,7 +199,7 @@ async fn test_first_gold_standard_verification_succeeds_with_reward() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_second_gold_standard_verification_returns_already_verified() {
|
async fn test_second_gold_standard_verification_returns_already_verified() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [12u8; 32];
|
let agent_id = [12u8; 32];
|
||||||
@ -232,7 +233,7 @@ async fn test_second_gold_standard_verification_returns_already_verified() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_different_gold_standard_verification_works_after_first() {
|
async fn test_different_gold_standard_verification_works_after_first() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [13u8; 32];
|
let agent_id = [13u8; 32];
|
||||||
@ -277,7 +278,7 @@ async fn test_different_gold_standard_verification_works_after_first() {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_incorrect_answer_penalizes_and_marks_verified() {
|
async fn test_incorrect_answer_penalizes_and_marks_verified() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let trust_store = GenericTrustRankStore::new(store);
|
let trust_store = GenericTrustRankStore::new(store);
|
||||||
|
|
||||||
let agent_id = [14u8; 32];
|
let agent_id = [14u8; 32];
|
||||||
|
|||||||
@ -4,6 +4,7 @@
|
|||||||
//! including CRUD operations, decay mechanics, and learning loop integration.
|
//! including CRUD operations, decay mechanics, and learning loop integration.
|
||||||
|
|
||||||
use crate::error::Result;
|
use crate::error::Result;
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use tracing::{debug, instrument};
|
use tracing::{debug, instrument};
|
||||||
@ -11,16 +12,9 @@ use tracing::{debug, instrument};
|
|||||||
use super::model::{TrustRank, DEFAULT_HALF_LIFE_SECONDS};
|
use super::model::{TrustRank, DEFAULT_HALF_LIFE_SECONDS};
|
||||||
use super::TrustRankStore;
|
use super::TrustRankStore;
|
||||||
|
|
||||||
/// Key prefix for TrustRank entries.
|
|
||||||
const TRUST_RANK_PREFIX: &[u8] = b"TR:";
|
|
||||||
|
|
||||||
/// Key prefix for gold standard verification markers.
|
|
||||||
/// Format: GS_VERIFIED:{agent_id_hex}:{subject}:{predicate}
|
|
||||||
const GS_VERIFIED_PREFIX: &[u8] = b"GS_VERIFIED:";
|
|
||||||
|
|
||||||
/// TrustRankStore implementation backed by a generic KVStore.
|
/// TrustRankStore implementation backed by a generic KVStore.
|
||||||
///
|
///
|
||||||
/// This implementation stores TrustRank data at `TR:{agent_id}` and provides
|
/// This implementation stores TrustRank data at `\x00TRUST:{agent_id_hex}` and provides
|
||||||
/// all operations for reputation management.
|
/// all operations for reputation management.
|
||||||
pub struct GenericTrustRankStore<S> {
|
pub struct GenericTrustRankStore<S> {
|
||||||
store: S,
|
store: S,
|
||||||
@ -32,25 +26,6 @@ impl<S: KVStore> GenericTrustRankStore<S> {
|
|||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the key for a TrustRank entry.
|
|
||||||
pub(crate) fn trust_rank_key(agent_id: &[u8; 32]) -> Vec<u8> {
|
|
||||||
let mut key = TRUST_RANK_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(agent_id);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the key for a gold standard verification marker.
|
|
||||||
/// Format: GS_VERIFIED:{agent_id_hex}:{subject}:{predicate}
|
|
||||||
pub(crate) fn gs_verified_key(agent_id: &[u8; 32], subject: &str, predicate: &str) -> Vec<u8> {
|
|
||||||
let mut key = GS_VERIFIED_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(hex::encode(agent_id).as_bytes());
|
|
||||||
key.push(b':');
|
|
||||||
key.extend_from_slice(subject.as_bytes());
|
|
||||||
key.push(b':');
|
|
||||||
key.extend_from_slice(predicate.as_bytes());
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Serialize a TrustRank using the canonical serde helpers.
|
/// Serialize a TrustRank using the canonical serde helpers.
|
||||||
pub(crate) fn serialize_trust_rank(trust_rank: &TrustRank) -> Result<Vec<u8>> {
|
pub(crate) fn serialize_trust_rank(trust_rank: &TrustRank) -> Result<Vec<u8>> {
|
||||||
crate::serde_helpers::serialize(trust_rank)
|
crate::serde_helpers::serialize(trust_rank)
|
||||||
@ -66,7 +41,7 @@ impl<S: KVStore> GenericTrustRankStore<S> {
|
|||||||
impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
||||||
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
|
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
|
||||||
async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> {
|
async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> {
|
||||||
let key = Self::trust_rank_key(agent_id);
|
let key = key_codec::trust_rank_key(&hex::encode(agent_id));
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let trust_rank = Self::deserialize_trust_rank(&data)?;
|
let trust_rank = Self::deserialize_trust_rank(&data)?;
|
||||||
@ -97,7 +72,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
let new_score = trust_rank.adjust_score(delta, timestamp);
|
let new_score = trust_rank.adjust_score(delta, timestamp);
|
||||||
|
|
||||||
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
||||||
let key = Self::trust_rank_key(agent_id);
|
let key = key_codec::trust_rank_key(&hex::encode(agent_id));
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(new_score, "Updated TrustRank");
|
debug!(new_score, "Updated TrustRank");
|
||||||
@ -111,7 +86,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
half_life_seconds: Option<u64>,
|
half_life_seconds: Option<u64>,
|
||||||
) -> Result<usize> {
|
) -> Result<usize> {
|
||||||
let half_life = half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
|
let half_life = half_life_seconds.unwrap_or(DEFAULT_HALF_LIFE_SECONDS);
|
||||||
let prefix = TRUST_RANK_PREFIX.to_vec();
|
let prefix = key_codec::trust_rank_scan_prefix();
|
||||||
let entries = self.store.scan_prefix(&prefix).await?;
|
let entries = self.store.scan_prefix(&prefix).await?;
|
||||||
|
|
||||||
let mut decayed_count = 0;
|
let mut decayed_count = 0;
|
||||||
@ -158,7 +133,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
let new_score = trust_rank.adjust_score(delta, timestamp);
|
let new_score = trust_rank.adjust_score(delta, timestamp);
|
||||||
|
|
||||||
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
||||||
let key = Self::trust_rank_key(agent_id);
|
let key = key_codec::trust_rank_key(&hex::encode(agent_id));
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
@ -172,7 +147,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
#[instrument(skip(self, trust_rank), fields(agent_id = %hex::encode(trust_rank.agent_id)))]
|
#[instrument(skip(self, trust_rank), fields(agent_id = %hex::encode(trust_rank.agent_id)))]
|
||||||
async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> {
|
async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> {
|
||||||
let serialized = Self::serialize_trust_rank(trust_rank)?;
|
let serialized = Self::serialize_trust_rank(trust_rank)?;
|
||||||
let key = Self::trust_rank_key(&trust_rank.agent_id);
|
let key = key_codec::trust_rank_key(&hex::encode(trust_rank.agent_id));
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
debug!(score = trust_rank.score, "Stored TrustRank");
|
debug!(score = trust_rank.score, "Stored TrustRank");
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -194,8 +169,11 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
use super::model::TrustAdjustment;
|
use super::model::TrustAdjustment;
|
||||||
|
|
||||||
// Check if the agent has already verified this gold standard
|
// Check if the agent has already verified this gold standard
|
||||||
let verified_key =
|
let verified_key = key_codec::gs_verified_key(
|
||||||
Self::gs_verified_key(agent_id, &gold_standard.subject, &gold_standard.predicate);
|
&hex::encode(agent_id),
|
||||||
|
&gold_standard.subject,
|
||||||
|
&gold_standard.predicate,
|
||||||
|
);
|
||||||
if self.store.get(&verified_key).await?.is_some() {
|
if self.store.get(&verified_key).await?.is_some() {
|
||||||
debug!("Agent has already verified this gold standard");
|
debug!("Agent has already verified this gold standard");
|
||||||
return Ok(TrustAdjustment::AlreadyVerified);
|
return Ok(TrustAdjustment::AlreadyVerified);
|
||||||
@ -224,7 +202,7 @@ impl<S: KVStore + 'static> TrustRankStore for GenericTrustRankStore<S> {
|
|||||||
|
|
||||||
// Store the updated trust rank
|
// Store the updated trust rank
|
||||||
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
let serialized = Self::serialize_trust_rank(&trust_rank)?;
|
||||||
let key = Self::trust_rank_key(agent_id);
|
let key = key_codec::trust_rank_key(&hex::encode(agent_id));
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
// Mark this gold standard as verified by this agent (value is just a timestamp)
|
// Mark this gold standard as verified by this agent (value is just a timestamp)
|
||||||
|
|||||||
@ -7,9 +7,9 @@
|
|||||||
//!
|
//!
|
||||||
//! | Key Pattern | Value | Purpose |
|
//! | Key Pattern | Value | Purpose |
|
||||||
//! |-------------|-------|---------|
|
//! |-------------|-------|---------|
|
||||||
//! | `V:{assertion_hash}:{vote_hash}` | Serialized Vote | Individual votes |
|
//! | `{subject}\x00V:{assertion_hex}:{vote_hex}` | Serialized Vote | Individual votes |
|
||||||
//! | `VC:{assertion_hash}` | u64 (LE) | Vote count cache |
|
//! | `{subject}\x00VC:{assertion_hex}` | u64 (LE) | Vote count cache |
|
||||||
//! | `VW:{assertion_hash}` | f32 (LE) | Aggregate weight cache |
|
//! | `{subject}\x00VW:{assertion_hex}` | f32 (LE) | Aggregate weight cache |
|
||||||
//!
|
//!
|
||||||
//! # Design Philosophy
|
//! # Design Philosophy
|
||||||
//!
|
//!
|
||||||
@ -29,21 +29,20 @@ use crate::error::Result;
|
|||||||
|
|
||||||
pub use store_impl::GenericVoteStore;
|
pub use store_impl::GenericVoteStore;
|
||||||
|
|
||||||
/// Key prefix for individual votes.
|
|
||||||
#[allow(dead_code)] // Documented for reference; actual key construction uses format!()
|
|
||||||
const VOTE_PREFIX: &[u8] = b"V:";
|
|
||||||
|
|
||||||
/// Specialized storage trait for high-velocity vote operations.
|
/// Specialized storage trait for high-velocity vote operations.
|
||||||
///
|
///
|
||||||
/// This trait provides vote-specific operations on top of a generic KVStore,
|
/// This trait provides vote-specific operations on top of a generic KVStore,
|
||||||
/// enabling efficient vote ingestion and aggregation for the Ballot Box pattern.
|
/// enabling efficient vote ingestion and aggregation for the Ballot Box pattern.
|
||||||
///
|
///
|
||||||
|
/// All methods require a `subject` parameter to co-locate vote data with the
|
||||||
|
/// assertion's subject for range sharding.
|
||||||
|
///
|
||||||
/// # Example
|
/// # Example
|
||||||
///
|
///
|
||||||
/// ```ignore
|
/// ```ignore
|
||||||
/// let vote_store = SledVoteStore::new(kv_store);
|
/// let vote_store = GenericVoteStore::new(kv_store);
|
||||||
/// let vote_hash = vote_store.put_vote(&vote).await?;
|
/// let vote_hash = vote_store.put_vote(&vote, "Tesla").await?;
|
||||||
/// let votes = vote_store.get_votes_for_assertion(&assertion_hash).await?;
|
/// let votes = vote_store.get_votes_for_assertion(&assertion_hash, "Tesla").await?;
|
||||||
/// ```
|
/// ```
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
pub trait VoteStore: Send + Sync {
|
pub trait VoteStore: Send + Sync {
|
||||||
@ -52,26 +51,32 @@ pub trait VoteStore: Send + Sync {
|
|||||||
/// This operation:
|
/// This operation:
|
||||||
/// 1. Serializes the vote using rkyv
|
/// 1. Serializes the vote using rkyv
|
||||||
/// 2. Computes BLAKE3 hash for content addressing
|
/// 2. Computes BLAKE3 hash for content addressing
|
||||||
/// 3. Stores at `V:{assertion_hash}:{vote_hash}`
|
/// 3. Stores at `{subject}\x00V:{assertion_hex}:{vote_hex}`
|
||||||
/// 4. Updates vote count and aggregate weight caches
|
/// 4. Updates vote count and aggregate weight caches
|
||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// The BLAKE3 hash of the serialized vote (content address).
|
/// The BLAKE3 hash of the serialized vote (content address).
|
||||||
async fn put_vote(&self, vote: &Vote) -> Result<Hash>;
|
async fn put_vote(&self, vote: &Vote, subject: &str) -> Result<Hash>;
|
||||||
|
|
||||||
/// Get a specific vote by its hash.
|
/// Get a specific vote by its hash.
|
||||||
///
|
///
|
||||||
/// # Arguments
|
/// # Arguments
|
||||||
/// * `assertion_hash` - The assertion this vote is for
|
/// * `assertion_hash` - The assertion this vote is for
|
||||||
/// * `vote_hash` - The content-addressed hash of the vote
|
/// * `vote_hash` - The content-addressed hash of the vote
|
||||||
|
/// * `subject` - The subject the assertion belongs to
|
||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// The vote if found, None otherwise.
|
/// The vote if found, None otherwise.
|
||||||
async fn get_vote(&self, assertion_hash: &Hash, vote_hash: &Hash) -> Result<Option<Vote>>;
|
async fn get_vote(
|
||||||
|
&self,
|
||||||
|
assertion_hash: &Hash,
|
||||||
|
vote_hash: &Hash,
|
||||||
|
subject: &str,
|
||||||
|
) -> Result<Option<Vote>>;
|
||||||
|
|
||||||
/// Get all votes for a specific assertion.
|
/// Get all votes for a specific assertion.
|
||||||
///
|
///
|
||||||
/// Scans all keys with prefix `V:{assertion_hash}:` and deserializes.
|
/// Scans all keys with prefix `{subject}\x00V:{assertion_hex}:` and deserializes.
|
||||||
///
|
///
|
||||||
/// # Performance
|
/// # Performance
|
||||||
/// O(n) where n is the number of votes. For high-cardinality assertions,
|
/// O(n) where n is the number of votes. For high-cardinality assertions,
|
||||||
@ -79,37 +84,43 @@ pub trait VoteStore: Send + Sync {
|
|||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// Vector of votes, empty if no votes exist.
|
/// Vector of votes, empty if no votes exist.
|
||||||
async fn get_votes_for_assertion(&self, assertion_hash: &Hash) -> Result<Vec<Vote>>;
|
async fn get_votes_for_assertion(
|
||||||
|
&self,
|
||||||
|
assertion_hash: &Hash,
|
||||||
|
subject: &str,
|
||||||
|
) -> Result<Vec<Vote>>;
|
||||||
|
|
||||||
/// Get the number of votes for an assertion.
|
/// Get the number of votes for an assertion.
|
||||||
///
|
///
|
||||||
/// Uses cached counter at `VC:{assertion_hash}` for O(1) performance.
|
/// Uses cached counter at `{subject}\x00VC:{assertion_hex}` for O(1) performance.
|
||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// Vote count, 0 if no votes exist.
|
/// Vote count, 0 if no votes exist.
|
||||||
async fn get_vote_count(&self, assertion_hash: &Hash) -> Result<u64>;
|
async fn get_vote_count(&self, assertion_hash: &Hash, subject: &str) -> Result<u64>;
|
||||||
|
|
||||||
/// Get the aggregate weight (sum of all vote weights) for an assertion.
|
/// Get the aggregate weight (sum of all vote weights) for an assertion.
|
||||||
///
|
///
|
||||||
/// Uses cached value at `VW:{assertion_hash}` for O(1) performance.
|
/// Uses cached value at `{subject}\x00VW:{assertion_hex}` for O(1) performance.
|
||||||
/// The weight is the sum of all `vote.weight` values.
|
/// The weight is the sum of all `vote.weight` values.
|
||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
/// Aggregate weight, 0.0 if no votes exist.
|
/// Aggregate weight, 0.0 if no votes exist.
|
||||||
async fn get_aggregate_weight(&self, assertion_hash: &Hash) -> Result<f32>;
|
async fn get_aggregate_weight(&self, assertion_hash: &Hash, subject: &str) -> Result<f32>;
|
||||||
|
|
||||||
/// Check if any votes exist for an assertion.
|
/// Check if any votes exist for an assertion.
|
||||||
///
|
///
|
||||||
/// More efficient than `get_vote_count() > 0` as it can short-circuit.
|
/// More efficient than `get_vote_count() > 0` as it can short-circuit.
|
||||||
async fn has_votes(&self, assertion_hash: &Hash) -> Result<bool>;
|
async fn has_votes(&self, assertion_hash: &Hash, subject: &str) -> Result<bool>;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::SledStore;
|
use crate::HybridStore;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
const TEST_SUBJECT: &str = "TestSubject";
|
||||||
|
|
||||||
fn create_test_vote(assertion_hash: Hash, weight: f32) -> Vote {
|
fn create_test_vote(assertion_hash: Hash, weight: f32) -> Vote {
|
||||||
Vote {
|
Vote {
|
||||||
assertion_hash,
|
assertion_hash,
|
||||||
@ -124,18 +135,20 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_put_and_get_vote() {
|
async fn test_put_and_get_vote() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [0u8; 32];
|
let assertion_hash = [0u8; 32];
|
||||||
let vote = create_test_vote(assertion_hash, 0.8);
|
let vote = create_test_vote(assertion_hash, 0.8);
|
||||||
|
|
||||||
// Put vote
|
// Put vote
|
||||||
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote");
|
let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
|
||||||
|
|
||||||
// Get vote back
|
// Get vote back
|
||||||
let retrieved =
|
let retrieved = vote_store
|
||||||
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote");
|
.get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
|
||||||
|
.await
|
||||||
|
.expect("Failed to get vote");
|
||||||
|
|
||||||
assert!(retrieved.is_some());
|
assert!(retrieved.is_some());
|
||||||
let retrieved_vote = retrieved.expect("Vote should exist");
|
let retrieved_vote = retrieved.expect("Vote should exist");
|
||||||
@ -145,7 +158,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_get_votes_for_assertion() {
|
async fn test_get_votes_for_assertion() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [1u8; 32];
|
let assertion_hash = [1u8; 32];
|
||||||
@ -159,51 +172,55 @@ mod tests {
|
|||||||
};
|
};
|
||||||
let vote_other = create_test_vote(other_assertion, 0.9);
|
let vote_other = create_test_vote(other_assertion, 0.9);
|
||||||
|
|
||||||
vote_store.put_vote(&vote1).await.expect("put");
|
vote_store.put_vote(&vote1, TEST_SUBJECT).await.expect("put");
|
||||||
vote_store.put_vote(&vote2).await.expect("put");
|
vote_store.put_vote(&vote2, TEST_SUBJECT).await.expect("put");
|
||||||
vote_store.put_vote(&vote_other).await.expect("put");
|
vote_store.put_vote(&vote_other, TEST_SUBJECT).await.expect("put");
|
||||||
|
|
||||||
// Get votes for assertion
|
// Get votes for assertion
|
||||||
let votes = vote_store.get_votes_for_assertion(&assertion_hash).await.expect("get");
|
let votes =
|
||||||
|
vote_store.get_votes_for_assertion(&assertion_hash, TEST_SUBJECT).await.expect("get");
|
||||||
|
|
||||||
assert_eq!(votes.len(), 2);
|
assert_eq!(votes.len(), 2);
|
||||||
|
|
||||||
// Get votes for other assertion
|
// Get votes for other assertion
|
||||||
let other_votes = vote_store.get_votes_for_assertion(&other_assertion).await.expect("get");
|
let other_votes =
|
||||||
|
vote_store.get_votes_for_assertion(&other_assertion, TEST_SUBJECT).await.expect("get");
|
||||||
|
|
||||||
assert_eq!(other_votes.len(), 1);
|
assert_eq!(other_votes.len(), 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_vote_count_cache() {
|
async fn test_vote_count_cache() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [3u8; 32];
|
let assertion_hash = [3u8; 32];
|
||||||
|
|
||||||
// Initially zero
|
// Initially zero
|
||||||
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count");
|
let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
|
||||||
assert_eq!(count, 0);
|
assert_eq!(count, 0);
|
||||||
|
|
||||||
// Add votes and check count increments
|
// Add votes and check count increments
|
||||||
for i in 0..5 {
|
for i in 0..5 {
|
||||||
let vote = Vote { agent_id: [i; 32], ..create_test_vote(assertion_hash, 0.5) };
|
let vote = Vote { agent_id: [i; 32], ..create_test_vote(assertion_hash, 0.5) };
|
||||||
vote_store.put_vote(&vote).await.expect("put");
|
vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
|
|
||||||
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count");
|
let count =
|
||||||
|
vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
|
||||||
assert_eq!(count, (i as u64) + 1);
|
assert_eq!(count, (i as u64) + 1);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_aggregate_weight_cache() {
|
async fn test_aggregate_weight_cache() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [4u8; 32];
|
let assertion_hash = [4u8; 32];
|
||||||
|
|
||||||
// Initially zero
|
// Initially zero
|
||||||
let weight = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight");
|
let weight =
|
||||||
|
vote_store.get_aggregate_weight(&assertion_hash, TEST_SUBJECT).await.expect("weight");
|
||||||
assert!((weight - 0.0).abs() < f32::EPSILON);
|
assert!((weight - 0.0).abs() < f32::EPSILON);
|
||||||
|
|
||||||
// Add votes with known weights
|
// Add votes with known weights
|
||||||
@ -213,10 +230,13 @@ mod tests {
|
|||||||
for (i, &w) in weights.iter().enumerate() {
|
for (i, &w) in weights.iter().enumerate() {
|
||||||
let vote =
|
let vote =
|
||||||
Vote { agent_id: [i as u8; 32], weight: w, ..create_test_vote(assertion_hash, w) };
|
Vote { agent_id: [i as u8; 32], weight: w, ..create_test_vote(assertion_hash, w) };
|
||||||
vote_store.put_vote(&vote).await.expect("put");
|
vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
expected_total += w;
|
expected_total += w;
|
||||||
|
|
||||||
let actual = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight");
|
let actual = vote_store
|
||||||
|
.get_aggregate_weight(&assertion_hash, TEST_SUBJECT)
|
||||||
|
.await
|
||||||
|
.expect("weight");
|
||||||
|
|
||||||
// Float comparison with tolerance
|
// Float comparison with tolerance
|
||||||
assert!(
|
assert!(
|
||||||
@ -230,57 +250,58 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_has_votes() {
|
async fn test_has_votes() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [5u8; 32];
|
let assertion_hash = [5u8; 32];
|
||||||
|
|
||||||
// No votes initially
|
// No votes initially
|
||||||
assert!(!vote_store.has_votes(&assertion_hash).await.expect("has"));
|
assert!(!vote_store.has_votes(&assertion_hash, TEST_SUBJECT).await.expect("has"));
|
||||||
|
|
||||||
// Add a vote
|
// Add a vote
|
||||||
let vote = create_test_vote(assertion_hash, 0.5);
|
let vote = create_test_vote(assertion_hash, 0.5);
|
||||||
vote_store.put_vote(&vote).await.expect("put");
|
vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
|
|
||||||
// Now has votes
|
// Now has votes
|
||||||
assert!(vote_store.has_votes(&assertion_hash).await.expect("has"));
|
assert!(vote_store.has_votes(&assertion_hash, TEST_SUBJECT).await.expect("has"));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_empty_assertion_returns_empty_vec() {
|
async fn test_empty_assertion_returns_empty_vec() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let nonexistent = [99u8; 32];
|
let nonexistent = [99u8; 32];
|
||||||
let votes = vote_store.get_votes_for_assertion(&nonexistent).await.expect("get");
|
let votes =
|
||||||
|
vote_store.get_votes_for_assertion(&nonexistent, TEST_SUBJECT).await.expect("get");
|
||||||
|
|
||||||
assert!(votes.is_empty(), "Should return empty vec, not error");
|
assert!(votes.is_empty(), "Should return empty vec, not error");
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_content_addressing() {
|
async fn test_content_addressing() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [6u8; 32];
|
let assertion_hash = [6u8; 32];
|
||||||
let vote = create_test_vote(assertion_hash, 0.5);
|
let vote = create_test_vote(assertion_hash, 0.5);
|
||||||
|
|
||||||
// Same vote should produce same hash
|
// Same vote should produce same hash
|
||||||
let hash1 = vote_store.put_vote(&vote).await.expect("put");
|
let hash1 = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
let hash2 = vote_store.put_vote(&vote).await.expect("put");
|
let hash2 = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
|
|
||||||
assert_eq!(hash1, hash2, "Same vote should produce same hash");
|
assert_eq!(hash1, hash2, "Same vote should produce same hash");
|
||||||
|
|
||||||
// Count should still increment (idempotent storage but not idempotent counting)
|
// Count should still increment (idempotent storage but not idempotent counting)
|
||||||
// This is by design - duplicate vote detection is a higher-level concern
|
// This is by design - duplicate vote detection is a higher-level concern
|
||||||
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count");
|
let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
|
||||||
assert_eq!(count, 2);
|
assert_eq!(count, 2);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_high_velocity_simulation() {
|
async fn test_high_velocity_simulation() {
|
||||||
// Simulate many agents voting on the same assertion
|
// Simulate many agents voting on the same assertion
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
|
|
||||||
let assertion_hash = [7u8; 32];
|
let assertion_hash = [7u8; 32];
|
||||||
@ -302,15 +323,16 @@ mod tests {
|
|||||||
source_url: None,
|
source_url: None,
|
||||||
observed_context: None,
|
observed_context: None,
|
||||||
};
|
};
|
||||||
vote_store.put_vote(&vote).await.expect("put");
|
vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("put");
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify counts
|
// Verify counts
|
||||||
let count = vote_store.get_vote_count(&assertion_hash).await.expect("count");
|
let count = vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
|
||||||
assert_eq!(count, num_votes);
|
assert_eq!(count, num_votes);
|
||||||
|
|
||||||
// Verify we can retrieve all votes
|
// Verify we can retrieve all votes
|
||||||
let votes = vote_store.get_votes_for_assertion(&assertion_hash).await.expect("get");
|
let votes =
|
||||||
|
vote_store.get_votes_for_assertion(&assertion_hash, TEST_SUBJECT).await.expect("get");
|
||||||
assert_eq!(votes.len(), num_votes as usize);
|
assert_eq!(votes.len(), num_votes as usize);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -324,7 +346,7 @@ mod tests {
|
|||||||
async fn test_concurrent_vote_ingestion() {
|
async fn test_concurrent_vote_ingestion() {
|
||||||
use tokio::task::JoinSet;
|
use tokio::task::JoinSet;
|
||||||
|
|
||||||
let store = Arc::new(SledStore::open_temp().expect("Failed to create store"));
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = Arc::new(GenericVoteStore::new(store));
|
let vote_store = Arc::new(GenericVoteStore::new(store));
|
||||||
|
|
||||||
let assertion_hash = [8u8; 32];
|
let assertion_hash = [8u8; 32];
|
||||||
@ -352,7 +374,7 @@ mod tests {
|
|||||||
source_url: None,
|
source_url: None,
|
||||||
observed_context: None,
|
observed_context: None,
|
||||||
};
|
};
|
||||||
vs.put_vote(&vote).await.expect("concurrent put should succeed");
|
vs.put_vote(&vote, TEST_SUBJECT).await.expect("concurrent put should succeed");
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@ -364,7 +386,8 @@ mod tests {
|
|||||||
|
|
||||||
// Verify final vote count is exactly num_concurrent_tasks * votes_per_task
|
// Verify final vote count is exactly num_concurrent_tasks * votes_per_task
|
||||||
let expected_count = (num_concurrent_tasks * votes_per_task) as u64;
|
let expected_count = (num_concurrent_tasks * votes_per_task) as u64;
|
||||||
let actual_count = vote_store.get_vote_count(&assertion_hash).await.expect("count");
|
let actual_count =
|
||||||
|
vote_store.get_vote_count(&assertion_hash, TEST_SUBJECT).await.expect("count");
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
actual_count, expected_count,
|
actual_count, expected_count,
|
||||||
"Vote count should be {} (got {}). Race condition detected!",
|
"Vote count should be {} (got {}). Race condition detected!",
|
||||||
@ -374,7 +397,8 @@ mod tests {
|
|||||||
// Verify aggregate weight is approximately correct
|
// Verify aggregate weight is approximately correct
|
||||||
// (some float imprecision is expected with concurrent additions)
|
// (some float imprecision is expected with concurrent additions)
|
||||||
let expected_weight = (num_concurrent_tasks * votes_per_task) as f32 * vote_weight;
|
let expected_weight = (num_concurrent_tasks * votes_per_task) as f32 * vote_weight;
|
||||||
let actual_weight = vote_store.get_aggregate_weight(&assertion_hash).await.expect("weight");
|
let actual_weight =
|
||||||
|
vote_store.get_aggregate_weight(&assertion_hash, TEST_SUBJECT).await.expect("weight");
|
||||||
let tolerance = 0.01 * expected_weight; // 1% tolerance for float accumulation
|
let tolerance = 0.01 * expected_weight; // 1% tolerance for float accumulation
|
||||||
assert!(
|
assert!(
|
||||||
(actual_weight - expected_weight).abs() < tolerance,
|
(actual_weight - expected_weight).abs() < tolerance,
|
||||||
@ -386,7 +410,7 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_vote_with_provenance_fields() {
|
async fn test_vote_with_provenance_fields() {
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [10u8; 32];
|
let assertion_hash = [10u8; 32];
|
||||||
@ -401,11 +425,13 @@ mod tests {
|
|||||||
};
|
};
|
||||||
|
|
||||||
// Put vote with provenance
|
// Put vote with provenance
|
||||||
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote");
|
let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
|
||||||
|
|
||||||
// Get vote back and verify provenance fields
|
// Get vote back and verify provenance fields
|
||||||
let retrieved =
|
let retrieved = vote_store
|
||||||
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote");
|
.get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
|
||||||
|
.await
|
||||||
|
.expect("Failed to get vote");
|
||||||
|
|
||||||
assert!(retrieved.is_some());
|
assert!(retrieved.is_some());
|
||||||
let retrieved_vote = retrieved.expect("Vote should exist");
|
let retrieved_vote = retrieved.expect("Vote should exist");
|
||||||
@ -417,7 +443,7 @@ mod tests {
|
|||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_vote_backward_compatibility() {
|
async fn test_vote_backward_compatibility() {
|
||||||
// Test that votes without provenance fields (None) work correctly
|
// Test that votes without provenance fields (None) work correctly
|
||||||
let store = SledStore::open_temp().expect("Failed to create store");
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
let vote_store = GenericVoteStore::new(store);
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
let assertion_hash = [11u8; 32];
|
let assertion_hash = [11u8; 32];
|
||||||
@ -428,9 +454,11 @@ mod tests {
|
|||||||
assert_eq!(vote.observed_context, None);
|
assert_eq!(vote.observed_context, None);
|
||||||
|
|
||||||
// Put and retrieve
|
// Put and retrieve
|
||||||
let vote_hash = vote_store.put_vote(&vote).await.expect("Failed to put vote");
|
let vote_hash = vote_store.put_vote(&vote, TEST_SUBJECT).await.expect("Failed to put vote");
|
||||||
let retrieved =
|
let retrieved = vote_store
|
||||||
vote_store.get_vote(&assertion_hash, &vote_hash).await.expect("Failed to get vote");
|
.get_vote(&assertion_hash, &vote_hash, TEST_SUBJECT)
|
||||||
|
.await
|
||||||
|
.expect("Failed to get vote");
|
||||||
|
|
||||||
assert!(retrieved.is_some());
|
assert!(retrieved.is_some());
|
||||||
let retrieved_vote = retrieved.expect("Vote should exist");
|
let retrieved_vote = retrieved.expect("Vote should exist");
|
||||||
@ -438,4 +466,24 @@ mod tests {
|
|||||||
assert_eq!(retrieved_vote.observed_context, None);
|
assert_eq!(retrieved_vote.observed_context, None);
|
||||||
assert_eq!(retrieved_vote.weight, 0.7);
|
assert_eq!(retrieved_vote.weight, 0.7);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_votes_isolated_by_subject() {
|
||||||
|
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
|
||||||
|
let vote_store = GenericVoteStore::new(store);
|
||||||
|
|
||||||
|
let assertion_hash = [12u8; 32];
|
||||||
|
let vote = create_test_vote(assertion_hash, 0.5);
|
||||||
|
|
||||||
|
// Store vote under "Tesla"
|
||||||
|
vote_store.put_vote(&vote, "Tesla").await.expect("put");
|
||||||
|
|
||||||
|
// Should NOT be visible under "Apple"
|
||||||
|
let count = vote_store.get_vote_count(&assertion_hash, "Apple").await.expect("count");
|
||||||
|
assert_eq!(count, 0, "Votes should be isolated by subject");
|
||||||
|
|
||||||
|
// Should be visible under "Tesla"
|
||||||
|
let count = vote_store.get_vote_count(&assertion_hash, "Tesla").await.expect("count");
|
||||||
|
assert_eq!(count, 1);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,6 +1,7 @@
|
|||||||
//! GenericVoteStore implementation backed by a generic KVStore.
|
//! GenericVoteStore implementation backed by a generic KVStore.
|
||||||
|
|
||||||
use crate::error::{Result, StorageError};
|
use crate::error::{Result, StorageError};
|
||||||
|
use crate::key_codec;
|
||||||
use crate::traits::KVStore;
|
use crate::traits::KVStore;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use stemedb_core::types::{Hash, Vote};
|
use stemedb_core::types::{Hash, Vote};
|
||||||
@ -8,11 +9,6 @@ use tracing::{debug, instrument};
|
|||||||
|
|
||||||
use super::VoteStore;
|
use super::VoteStore;
|
||||||
|
|
||||||
/// Key prefix for vote count cache.
|
|
||||||
const VOTE_COUNT_PREFIX: &[u8] = b"VC:";
|
|
||||||
/// Key prefix for aggregate weight cache.
|
|
||||||
const VOTE_WEIGHT_PREFIX: &[u8] = b"VW:";
|
|
||||||
|
|
||||||
/// VoteStore implementation backed by a generic KVStore.
|
/// VoteStore implementation backed by a generic KVStore.
|
||||||
///
|
///
|
||||||
/// This implementation maintains caches for vote counts and aggregate weights
|
/// This implementation maintains caches for vote counts and aggregate weights
|
||||||
@ -27,33 +23,6 @@ impl<S: KVStore> GenericVoteStore<S> {
|
|||||||
Self { store }
|
Self { store }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct the key for an individual vote.
|
|
||||||
fn vote_key(assertion_hash: &Hash, vote_hash: &Hash) -> Vec<u8> {
|
|
||||||
let assertion_hex = hex::encode(assertion_hash);
|
|
||||||
let vote_hex = hex::encode(vote_hash);
|
|
||||||
format!("V:{}:{}", assertion_hex, vote_hex).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the prefix for scanning all votes on an assertion.
|
|
||||||
fn vote_scan_prefix(assertion_hash: &Hash) -> Vec<u8> {
|
|
||||||
let assertion_hex = hex::encode(assertion_hash);
|
|
||||||
format!("V:{}:", assertion_hex).into_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the key for the vote count cache.
|
|
||||||
fn vote_count_key(assertion_hash: &Hash) -> Vec<u8> {
|
|
||||||
let mut key = VOTE_COUNT_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(assertion_hash);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Construct the key for the aggregate weight cache.
|
|
||||||
fn vote_weight_key(assertion_hash: &Hash) -> Vec<u8> {
|
|
||||||
let mut key = VOTE_WEIGHT_PREFIX.to_vec();
|
|
||||||
key.extend_from_slice(assertion_hash);
|
|
||||||
key
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Serialize a vote using the canonical serde helpers.
|
/// Serialize a vote using the canonical serde helpers.
|
||||||
fn serialize_vote(vote: &Vote) -> Result<Vec<u8>> {
|
fn serialize_vote(vote: &Vote) -> Result<Vec<u8>> {
|
||||||
crate::serde_helpers::serialize(vote)
|
crate::serde_helpers::serialize(vote)
|
||||||
@ -67,8 +36,8 @@ impl<S: KVStore> GenericVoteStore<S> {
|
|||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
||||||
#[instrument(skip(self, vote), fields(assertion_hash = %hex::encode(vote.assertion_hash), weight = vote.weight))]
|
#[instrument(skip(self, vote), fields(assertion_hash = %hex::encode(vote.assertion_hash), weight = vote.weight, subject))]
|
||||||
async fn put_vote(&self, vote: &Vote) -> Result<Hash> {
|
async fn put_vote(&self, vote: &Vote, subject: &str) -> Result<Hash> {
|
||||||
// Serialize the vote
|
// Serialize the vote
|
||||||
let serialized = Self::serialize_vote(vote)?;
|
let serialized = Self::serialize_vote(vote)?;
|
||||||
|
|
||||||
@ -76,21 +45,23 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
let vote_hash_bytes = blake3::hash(&serialized);
|
let vote_hash_bytes = blake3::hash(&serialized);
|
||||||
let vote_hash: Hash = *vote_hash_bytes.as_bytes();
|
let vote_hash: Hash = *vote_hash_bytes.as_bytes();
|
||||||
|
|
||||||
// Store the vote
|
// Store the vote using subject-prefixed key
|
||||||
let key = Self::vote_key(&vote.assertion_hash, &vote_hash);
|
let assertion_hex = hex::encode(vote.assertion_hash);
|
||||||
|
let vote_hex = hex::encode(vote_hash);
|
||||||
|
let key = key_codec::vote_key(subject, &assertion_hex, &vote_hex);
|
||||||
self.store.put(&key, &serialized).await?;
|
self.store.put(&key, &serialized).await?;
|
||||||
|
|
||||||
debug!(
|
debug!(
|
||||||
vote_hash = %hex::encode(vote_hash),
|
vote_hash = %vote_hex,
|
||||||
"Stored vote"
|
"Stored vote"
|
||||||
);
|
);
|
||||||
|
|
||||||
// Update vote count cache (atomic increment - prevents race conditions)
|
// Update vote count cache (atomic increment - prevents race conditions)
|
||||||
let count_key = Self::vote_count_key(&vote.assertion_hash);
|
let count_key = key_codec::vote_count_key(subject, &assertion_hex);
|
||||||
let new_count = self.store.fetch_and_add_u64(&count_key, 1).await?;
|
let new_count = self.store.fetch_and_add_u64(&count_key, 1).await?;
|
||||||
|
|
||||||
// Update aggregate weight cache (atomic CAS - prevents race conditions)
|
// Update aggregate weight cache (atomic CAS - prevents race conditions)
|
||||||
let weight_key = Self::vote_weight_key(&vote.assertion_hash);
|
let weight_key = key_codec::vote_weight_key(subject, &assertion_hex);
|
||||||
let vote_weight = vote.weight;
|
let vote_weight = vote.weight;
|
||||||
let new_weight = self
|
let new_weight = self
|
||||||
.store
|
.store
|
||||||
@ -102,9 +73,16 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
Ok(vote_hash)
|
Ok(vote_hash)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), vote_hash = %hex::encode(vote_hash)))]
|
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), vote_hash = %hex::encode(vote_hash), subject))]
|
||||||
async fn get_vote(&self, assertion_hash: &Hash, vote_hash: &Hash) -> Result<Option<Vote>> {
|
async fn get_vote(
|
||||||
let key = Self::vote_key(assertion_hash, vote_hash);
|
&self,
|
||||||
|
assertion_hash: &Hash,
|
||||||
|
vote_hash: &Hash,
|
||||||
|
subject: &str,
|
||||||
|
) -> Result<Option<Vote>> {
|
||||||
|
let assertion_hex = hex::encode(assertion_hash);
|
||||||
|
let vote_hex = hex::encode(vote_hash);
|
||||||
|
let key = key_codec::vote_key(subject, &assertion_hex, &vote_hex);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(data) => {
|
Some(data) => {
|
||||||
let vote = Self::deserialize_vote(&data)?;
|
let vote = Self::deserialize_vote(&data)?;
|
||||||
@ -114,9 +92,14 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))]
|
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
|
||||||
async fn get_votes_for_assertion(&self, assertion_hash: &Hash) -> Result<Vec<Vote>> {
|
async fn get_votes_for_assertion(
|
||||||
let prefix = Self::vote_scan_prefix(assertion_hash);
|
&self,
|
||||||
|
assertion_hash: &Hash,
|
||||||
|
subject: &str,
|
||||||
|
) -> Result<Vec<Vote>> {
|
||||||
|
let assertion_hex = hex::encode(assertion_hash);
|
||||||
|
let prefix = key_codec::vote_scan_prefix(subject, &assertion_hex);
|
||||||
let entries = self.store.scan_prefix(&prefix).await?;
|
let entries = self.store.scan_prefix(&prefix).await?;
|
||||||
|
|
||||||
let mut votes = Vec::with_capacity(entries.len());
|
let mut votes = Vec::with_capacity(entries.len());
|
||||||
@ -129,9 +112,10 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
Ok(votes)
|
Ok(votes)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))]
|
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
|
||||||
async fn get_vote_count(&self, assertion_hash: &Hash) -> Result<u64> {
|
async fn get_vote_count(&self, assertion_hash: &Hash, subject: &str) -> Result<u64> {
|
||||||
let key = Self::vote_count_key(assertion_hash);
|
let assertion_hex = hex::encode(assertion_hash);
|
||||||
|
let key = key_codec::vote_count_key(subject, &assertion_hex);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(bytes) if bytes.len() == 8 => {
|
Some(bytes) if bytes.len() == 8 => {
|
||||||
let arr: [u8; 8] = bytes.try_into().map_err(|_| {
|
let arr: [u8; 8] = bytes.try_into().map_err(|_| {
|
||||||
@ -143,9 +127,10 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))]
|
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
|
||||||
async fn get_aggregate_weight(&self, assertion_hash: &Hash) -> Result<f32> {
|
async fn get_aggregate_weight(&self, assertion_hash: &Hash, subject: &str) -> Result<f32> {
|
||||||
let key = Self::vote_weight_key(assertion_hash);
|
let assertion_hex = hex::encode(assertion_hash);
|
||||||
|
let key = key_codec::vote_weight_key(subject, &assertion_hex);
|
||||||
match self.store.get(&key).await? {
|
match self.store.get(&key).await? {
|
||||||
Some(bytes) if bytes.len() == 4 => {
|
Some(bytes) if bytes.len() == 4 => {
|
||||||
let arr: [u8; 4] = bytes
|
let arr: [u8; 4] = bytes
|
||||||
@ -157,9 +142,9 @@ impl<S: KVStore + 'static> VoteStore for GenericVoteStore<S> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash)))]
|
#[instrument(skip(self), fields(assertion_hash = %hex::encode(assertion_hash), subject))]
|
||||||
async fn has_votes(&self, assertion_hash: &Hash) -> Result<bool> {
|
async fn has_votes(&self, assertion_hash: &Hash, subject: &str) -> Result<bool> {
|
||||||
let count = self.get_vote_count(assertion_hash).await?;
|
let count = self.get_vote_count(assertion_hash, subject).await?;
|
||||||
Ok(count > 0)
|
Ok(count > 0)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -14,6 +14,12 @@ thiserror = "1.0"
|
|||||||
tracing = "0.1"
|
tracing = "0.1"
|
||||||
byteorder = "1.5"
|
byteorder = "1.5"
|
||||||
blake3 = "1.5"
|
blake3 = "1.5"
|
||||||
|
crc32c = "0.6"
|
||||||
|
tokio = { version = "1", features = ["sync", "time", "rt"], optional = true }
|
||||||
|
|
||||||
|
[features]
|
||||||
|
group-commit = ["tokio"]
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
tempfile = "3.10"
|
tempfile = "3.10"
|
||||||
|
tokio = { version = "1", features = ["sync", "time", "rt", "macros"] }
|
||||||
@ -33,6 +33,37 @@ pub enum QuarantineError {
|
|||||||
path: PathBuf,
|
path: PathBuf,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
/// CRC32C checksum mismatch (fast integrity check, detects torn writes).
|
||||||
|
#[error(
|
||||||
|
"CRC32C mismatch at offset {offset}: expected {expected:#010x}, actual {actual:#010x}"
|
||||||
|
)]
|
||||||
|
Crc32cMismatch {
|
||||||
|
/// File offset where the corrupt record starts.
|
||||||
|
offset: u64,
|
||||||
|
/// Expected CRC32C value from the record header.
|
||||||
|
expected: u32,
|
||||||
|
/// Actual CRC32C computed from the data.
|
||||||
|
actual: u32,
|
||||||
|
},
|
||||||
|
|
||||||
|
/// Record length field is invalid (zero or exceeds MAX_RECORD_SIZE).
|
||||||
|
#[error("Invalid record length at offset {offset}: {length} bytes")]
|
||||||
|
InvalidRecordLength {
|
||||||
|
/// File offset where the record starts.
|
||||||
|
offset: u64,
|
||||||
|
/// The invalid length value read.
|
||||||
|
length: u32,
|
||||||
|
},
|
||||||
|
|
||||||
|
/// Generic record corruption with a descriptive reason.
|
||||||
|
#[error("Corrupt record at offset {offset}: {reason}")]
|
||||||
|
CorruptRecord {
|
||||||
|
/// File offset where corruption was detected.
|
||||||
|
offset: u64,
|
||||||
|
/// Human-readable description of the corruption.
|
||||||
|
reason: String,
|
||||||
|
},
|
||||||
|
|
||||||
/// Generic IO error.
|
/// Generic IO error.
|
||||||
#[error(transparent)]
|
#[error(transparent)]
|
||||||
IoGeneric(#[from] io::Error),
|
IoGeneric(#[from] io::Error),
|
||||||
|
|||||||
@ -6,12 +6,16 @@ use std::io::{Read, Write};
|
|||||||
pub const MAGIC: &[u8; 4] = b"STEM";
|
pub const MAGIC: &[u8; 4] = b"STEM";
|
||||||
|
|
||||||
/// Current file format version.
|
/// Current file format version.
|
||||||
pub const VERSION: u8 = 1;
|
pub const VERSION: u8 = 2;
|
||||||
|
|
||||||
/// Size of the file header in bytes.
|
/// Size of the file header in bytes.
|
||||||
/// Magic (4) + Version (1) + Reserved (3)
|
/// Magic (4) + Version (1) + Reserved (3)
|
||||||
pub const HEADER_SIZE: usize = 8;
|
pub const HEADER_SIZE: usize = 8;
|
||||||
|
|
||||||
|
/// Per-record overhead in bytes (v2 format).
|
||||||
|
/// payload_len (4) + crc32c (4) + blake3 (32) = 40
|
||||||
|
pub const RECORD_OVERHEAD: usize = 40;
|
||||||
|
|
||||||
/// Maximum record size (100 MB).
|
/// Maximum record size (100 MB).
|
||||||
pub const MAX_RECORD_SIZE: usize = 100 * 1024 * 1024;
|
pub const MAX_RECORD_SIZE: usize = 100 * 1024 * 1024;
|
||||||
|
|
||||||
@ -61,10 +65,13 @@ impl FileHeader {
|
|||||||
|
|
||||||
let version = reader.read_u8().map_err(QuarantineError::IoGeneric)?;
|
let version = reader.read_u8().map_err(QuarantineError::IoGeneric)?;
|
||||||
if version != VERSION {
|
if version != VERSION {
|
||||||
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
return Err(QuarantineError::CorruptRecord {
|
||||||
std::io::ErrorKind::InvalidData,
|
offset: 0,
|
||||||
format!("Unsupported version: {}", version),
|
reason: format!(
|
||||||
)));
|
"Unsupported WAL version {} (expected {}). Delete the WAL and re-ingest.",
|
||||||
|
version, VERSION
|
||||||
|
),
|
||||||
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
// Skip reserved bytes
|
// Skip reserved bytes
|
||||||
@ -75,69 +82,102 @@ impl FileHeader {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Compute CRC32C over the concatenation of len_bytes, blake3, and payload.
|
||||||
|
///
|
||||||
|
/// The CRC covers everything except itself, providing fast integrity checking
|
||||||
|
/// that detects torn writes before the more expensive BLAKE3 verification.
|
||||||
|
pub fn compute_crc32c(len_bytes: &[u8; 4], blake3: &[u8; 32], payload: &[u8]) -> u32 {
|
||||||
|
let crc = crc32c::crc32c_append(0, len_bytes);
|
||||||
|
let crc = crc32c::crc32c_append(crc, blake3);
|
||||||
|
crc32c::crc32c_append(crc, payload)
|
||||||
|
}
|
||||||
|
|
||||||
/// A single log record in the WAL.
|
/// A single log record in the WAL.
|
||||||
///
|
///
|
||||||
/// Format:
|
/// v2 Format: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
|
||||||
/// - Checksum (32 bytes, BLAKE3)
|
///
|
||||||
/// - Payload Length (4 bytes, u32 LE)
|
/// - Length first: recovery scanner knows read size before touching checksums
|
||||||
/// - Payload (N bytes)
|
/// - CRC32C second: fast integrity check, rejects torn writes
|
||||||
|
/// - BLAKE3 before payload: content-addressing hash in fixed 40-byte header
|
||||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
pub struct Record {
|
pub struct Record {
|
||||||
/// BLAKE3 checksum of the payload.
|
/// BLAKE3 checksum of the payload.
|
||||||
pub checksum: [u8; 32],
|
pub checksum: [u8; 32],
|
||||||
|
/// CRC32C integrity check covering len + blake3 + payload.
|
||||||
|
pub crc: u32,
|
||||||
/// The actual data payload.
|
/// The actual data payload.
|
||||||
pub payload: Vec<u8>,
|
pub payload: Vec<u8>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Record {
|
impl Record {
|
||||||
/// Create a new record from a payload, calculating the checksum.
|
/// Create a new record from a payload, calculating both checksums.
|
||||||
pub fn new(payload: Vec<u8>) -> Self {
|
pub fn new(payload: Vec<u8>) -> Self {
|
||||||
let checksum = blake3::hash(&payload).into();
|
let checksum: [u8; 32] = blake3::hash(&payload).into();
|
||||||
Self { checksum, payload }
|
let len_bytes = (payload.len() as u32).to_le_bytes();
|
||||||
|
let crc = compute_crc32c(&len_bytes, &checksum, &payload);
|
||||||
|
Self { checksum, crc, payload }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Calculate the on-disk size of this record.
|
/// Calculate the on-disk size of this record.
|
||||||
pub fn disk_size(&self) -> u64 {
|
pub fn disk_size(&self) -> u64 {
|
||||||
(32 + 4 + self.payload.len()) as u64
|
(RECORD_OVERHEAD + self.payload.len()) as u64
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Write the record to a writer.
|
/// Write the record to a writer in v2 format.
|
||||||
|
///
|
||||||
|
/// Layout: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
|
||||||
pub fn write_to<W: Write>(&self, writer: &mut W) -> Result<()> {
|
pub fn write_to<W: Write>(&self, writer: &mut W) -> Result<()> {
|
||||||
writer.write_all(&self.checksum).map_err(QuarantineError::IoGeneric)?;
|
|
||||||
writer
|
writer
|
||||||
.write_u32::<LittleEndian>(self.payload.len() as u32)
|
.write_u32::<LittleEndian>(self.payload.len() as u32)
|
||||||
.map_err(QuarantineError::IoGeneric)?;
|
.map_err(QuarantineError::IoGeneric)?;
|
||||||
|
writer.write_u32::<LittleEndian>(self.crc).map_err(QuarantineError::IoGeneric)?;
|
||||||
|
writer.write_all(&self.checksum).map_err(QuarantineError::IoGeneric)?;
|
||||||
writer.write_all(&self.payload).map_err(QuarantineError::IoGeneric)?;
|
writer.write_all(&self.payload).map_err(QuarantineError::IoGeneric)?;
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Read a record from a reader and verify its checksum.
|
/// Read a record from a reader in v2 format and verify both checksums.
|
||||||
|
///
|
||||||
|
/// CRC32C is checked first (fast reject for torn writes), then BLAKE3.
|
||||||
pub fn read_from<R: Read>(reader: &mut R) -> Result<Self> {
|
pub fn read_from<R: Read>(reader: &mut R) -> Result<Self> {
|
||||||
let mut checksum = [0u8; 32];
|
|
||||||
reader.read_exact(&mut checksum).map_err(QuarantineError::IoGeneric)?;
|
|
||||||
|
|
||||||
let len = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?;
|
let len = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?;
|
||||||
|
|
||||||
if len as usize > MAX_RECORD_SIZE {
|
if len == 0 || len as usize > MAX_RECORD_SIZE {
|
||||||
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
||||||
std::io::ErrorKind::InvalidData,
|
std::io::ErrorKind::InvalidData,
|
||||||
format!("Record too large: {} bytes", len),
|
format!("Invalid record length: {} bytes", len),
|
||||||
)));
|
)));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
let stored_crc = reader.read_u32::<LittleEndian>().map_err(QuarantineError::IoGeneric)?;
|
||||||
|
|
||||||
|
let mut checksum = [0u8; 32];
|
||||||
|
reader.read_exact(&mut checksum).map_err(QuarantineError::IoGeneric)?;
|
||||||
|
|
||||||
let mut payload = vec![0u8; len as usize];
|
let mut payload = vec![0u8; len as usize];
|
||||||
reader.read_exact(&mut payload).map_err(QuarantineError::IoGeneric)?;
|
reader.read_exact(&mut payload).map_err(QuarantineError::IoGeneric)?;
|
||||||
|
|
||||||
// Verify checksum
|
// Verify CRC32C first (fast reject for torn writes)
|
||||||
|
let len_bytes = len.to_le_bytes();
|
||||||
|
let computed_crc = compute_crc32c(&len_bytes, &checksum, &payload);
|
||||||
|
if stored_crc != computed_crc {
|
||||||
|
return Err(QuarantineError::Crc32cMismatch {
|
||||||
|
offset: 0, // caller should adjust
|
||||||
|
expected: stored_crc,
|
||||||
|
actual: computed_crc,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify BLAKE3 (content-addressing integrity)
|
||||||
let calculated: [u8; 32] = blake3::hash(&payload).into();
|
let calculated: [u8; 32] = blake3::hash(&payload).into();
|
||||||
if checksum != calculated {
|
if checksum != calculated {
|
||||||
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
||||||
std::io::ErrorKind::InvalidData,
|
std::io::ErrorKind::InvalidData,
|
||||||
"Checksum mismatch",
|
"BLAKE3 checksum mismatch",
|
||||||
)));
|
)));
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(Self { checksum, payload })
|
Ok(Self { checksum, crc: stored_crc, payload })
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -151,46 +191,166 @@ mod tests {
|
|||||||
let header = FileHeader::new();
|
let header = FileHeader::new();
|
||||||
let mut buffer = Vec::new();
|
let mut buffer = Vec::new();
|
||||||
|
|
||||||
header.write_to(&mut buffer).unwrap();
|
header.write_to(&mut buffer).expect("write header");
|
||||||
assert_eq!(buffer.len(), HEADER_SIZE);
|
assert_eq!(buffer.len(), HEADER_SIZE);
|
||||||
|
|
||||||
let mut reader = Cursor::new(buffer);
|
let mut reader = Cursor::new(buffer);
|
||||||
let read_header = FileHeader::read_from(&mut reader).unwrap();
|
let read_header = FileHeader::read_from(&mut reader).expect("read header");
|
||||||
|
|
||||||
assert_eq!(header, read_header);
|
assert_eq!(header, read_header);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_record_roundtrip() {
|
fn test_record_v2_roundtrip() {
|
||||||
let payload = b"test payload data".to_vec();
|
let payload = b"test payload data".to_vec();
|
||||||
let record = Record::new(payload.clone());
|
let record = Record::new(payload.clone());
|
||||||
let mut buffer = Vec::new();
|
let mut buffer = Vec::new();
|
||||||
|
|
||||||
record.write_to(&mut buffer).unwrap();
|
record.write_to(&mut buffer).expect("write record");
|
||||||
assert_eq!(buffer.len() as u64, record.disk_size());
|
assert_eq!(buffer.len() as u64, record.disk_size());
|
||||||
|
assert_eq!(buffer.len(), RECORD_OVERHEAD + payload.len());
|
||||||
|
|
||||||
let mut reader = Cursor::new(buffer);
|
let mut reader = Cursor::new(buffer);
|
||||||
let read_record = Record::read_from(&mut reader).unwrap();
|
let read_record = Record::read_from(&mut reader).expect("read record");
|
||||||
|
|
||||||
assert_eq!(record, read_record);
|
assert_eq!(record, read_record);
|
||||||
assert_eq!(read_record.payload, payload);
|
assert_eq!(read_record.payload, payload);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_record_checksum_validation() {
|
fn test_crc32c_detects_payload_corruption() {
|
||||||
let payload = b"test data".to_vec();
|
let payload = b"test data".to_vec();
|
||||||
let record = Record::new(payload);
|
let record = Record::new(payload);
|
||||||
let mut buffer = Vec::new();
|
let mut buffer = Vec::new();
|
||||||
record.write_to(&mut buffer).unwrap();
|
record.write_to(&mut buffer).expect("write record");
|
||||||
|
|
||||||
// Corrupt the payload in the buffer
|
// Corrupt a byte in the payload region (after the 40-byte header)
|
||||||
let len = buffer.len();
|
let last = buffer.len() - 1;
|
||||||
buffer[len - 1] ^= 0xFF; // Flip bits in the last byte
|
buffer[last] ^= 0xFF;
|
||||||
|
|
||||||
let mut reader = Cursor::new(buffer);
|
let mut reader = Cursor::new(buffer);
|
||||||
let result = Record::read_from(&mut reader);
|
let result = Record::read_from(&mut reader);
|
||||||
|
|
||||||
assert!(result.is_err());
|
assert!(result.is_err());
|
||||||
assert_eq!(result.unwrap_err().to_string(), "Checksum mismatch");
|
let err = result.unwrap_err();
|
||||||
|
assert!(
|
||||||
|
matches!(err, QuarantineError::Crc32cMismatch { .. }),
|
||||||
|
"Expected Crc32cMismatch, got: {}",
|
||||||
|
err
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_crc32c_detects_length_corruption() {
|
||||||
|
let payload = b"test data".to_vec();
|
||||||
|
let record = Record::new(payload);
|
||||||
|
let mut buffer = Vec::new();
|
||||||
|
record.write_to(&mut buffer).expect("write record");
|
||||||
|
|
||||||
|
// Corrupt the length field (first 4 bytes) - change to a valid but wrong length
|
||||||
|
// Set length to payload.len() + 1 (still within bounds)
|
||||||
|
let corrupted_len = (record.payload.len() as u32 + 1).to_le_bytes();
|
||||||
|
buffer[0] = corrupted_len[0];
|
||||||
|
buffer[1] = corrupted_len[1];
|
||||||
|
buffer[2] = corrupted_len[2];
|
||||||
|
buffer[3] = corrupted_len[3];
|
||||||
|
|
||||||
|
let mut reader = Cursor::new(buffer);
|
||||||
|
let result = Record::read_from(&mut reader);
|
||||||
|
|
||||||
|
// Should fail - either EOF because length is too long, or CRC mismatch
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_blake3_still_verified() {
|
||||||
|
// Create a record, then manually construct a buffer with correct CRC
|
||||||
|
// but wrong BLAKE3 to verify the second check works
|
||||||
|
let payload = b"original data".to_vec();
|
||||||
|
let record = Record::new(payload);
|
||||||
|
let mut buffer = Vec::new();
|
||||||
|
record.write_to(&mut buffer).expect("write record");
|
||||||
|
|
||||||
|
// Tamper with blake3 hash bytes (bytes 8..40) AND the CRC to match
|
||||||
|
// This is contrived but tests that BLAKE3 is independently verified
|
||||||
|
let bad_payload = b"tampered data".to_vec();
|
||||||
|
let bad_checksum: [u8; 32] = blake3::hash(b"wrong data").into();
|
||||||
|
let len_bytes = (bad_payload.len() as u32).to_le_bytes();
|
||||||
|
let new_crc = compute_crc32c(&len_bytes, &bad_checksum, &bad_payload);
|
||||||
|
|
||||||
|
let mut tampered = Vec::new();
|
||||||
|
tampered.extend_from_slice(&len_bytes);
|
||||||
|
tampered.extend_from_slice(&new_crc.to_le_bytes());
|
||||||
|
tampered.extend_from_slice(&bad_checksum);
|
||||||
|
tampered.extend_from_slice(&bad_payload);
|
||||||
|
|
||||||
|
let mut reader = Cursor::new(tampered);
|
||||||
|
let result = Record::read_from(&mut reader);
|
||||||
|
|
||||||
|
assert!(result.is_err());
|
||||||
|
let err_msg = result.unwrap_err().to_string();
|
||||||
|
assert!(err_msg.contains("BLAKE3"), "Expected BLAKE3 error, got: {}", err_msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_header_rejects_v1() {
|
||||||
|
let mut buffer = Vec::new();
|
||||||
|
// Write magic
|
||||||
|
buffer.extend_from_slice(MAGIC);
|
||||||
|
// Write version 1
|
||||||
|
buffer.push(1);
|
||||||
|
// Write reserved
|
||||||
|
buffer.extend_from_slice(&[0u8; 3]);
|
||||||
|
|
||||||
|
let mut reader = Cursor::new(buffer);
|
||||||
|
let result = FileHeader::read_from(&mut reader);
|
||||||
|
|
||||||
|
assert!(result.is_err());
|
||||||
|
let err_msg = result.unwrap_err().to_string();
|
||||||
|
assert!(
|
||||||
|
err_msg.contains("Unsupported WAL version 1"),
|
||||||
|
"Expected version error, got: {}",
|
||||||
|
err_msg
|
||||||
|
);
|
||||||
|
assert!(
|
||||||
|
err_msg.contains("Delete the WAL"),
|
||||||
|
"Expected remediation advice, got: {}",
|
||||||
|
err_msg
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_record_disk_size() {
|
||||||
|
let payload = vec![0u8; 100];
|
||||||
|
let record = Record::new(payload);
|
||||||
|
assert_eq!(record.disk_size(), (RECORD_OVERHEAD + 100) as u64);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_record_empty_payload_rejected() {
|
||||||
|
// Empty payload (len=0) should be rejected on read
|
||||||
|
let mut buffer = Vec::new();
|
||||||
|
buffer.extend_from_slice(&0u32.to_le_bytes()); // len = 0
|
||||||
|
buffer.extend_from_slice(&0u32.to_le_bytes()); // crc
|
||||||
|
buffer.extend_from_slice(&[0u8; 32]); // blake3
|
||||||
|
|
||||||
|
let mut reader = Cursor::new(buffer);
|
||||||
|
let result = Record::read_from(&mut reader);
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_crc32c_deterministic() {
|
||||||
|
let len_bytes = [10, 0, 0, 0u8];
|
||||||
|
let blake3 = [0xABu8; 32];
|
||||||
|
let payload = b"hello world!";
|
||||||
|
|
||||||
|
let crc1 = compute_crc32c(&len_bytes, &blake3, payload);
|
||||||
|
let crc2 = compute_crc32c(&len_bytes, &blake3, payload);
|
||||||
|
assert_eq!(crc1, crc2);
|
||||||
|
|
||||||
|
// Different payload -> different CRC
|
||||||
|
let crc3 = compute_crc32c(&len_bytes, &blake3, b"different");
|
||||||
|
assert_ne!(crc1, crc3);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
342
crates/stemedb-wal/src/group_commit.rs
Normal file
342
crates/stemedb-wal/src/group_commit.rs
Normal file
@ -0,0 +1,342 @@
|
|||||||
|
//! Group commit buffer for batching fsync operations.
|
||||||
|
//!
|
||||||
|
//! The `GroupCommitBuffer` wraps a `Journal` and batches writes so that
|
||||||
|
//! multiple concurrent appenders share a single fsync. This dramatically
|
||||||
|
//! reduces fsync overhead under concurrent load.
|
||||||
|
//!
|
||||||
|
//! # Architecture
|
||||||
|
//!
|
||||||
|
//! Writers send payloads through an MPSC channel. A background flusher task
|
||||||
|
//! collects up to `max_writes` payloads (or waits up to `max_duration`),
|
||||||
|
//! writes them all to the Journal with `DurabilityLevel::Eventual`, calls
|
||||||
|
//! `force_sync()` once, then responds to all waiting writers.
|
||||||
|
//!
|
||||||
|
//! # Feature Gate
|
||||||
|
//!
|
||||||
|
//! This module is only available with the `group-commit` feature enabled,
|
||||||
|
//! which brings in the `tokio` dependency.
|
||||||
|
|
||||||
|
use crate::error::QuarantineError;
|
||||||
|
use crate::journal::Journal;
|
||||||
|
use std::time::{Duration, Instant};
|
||||||
|
use tokio::sync::{mpsc, oneshot};
|
||||||
|
use tracing::{debug, error, info, instrument, warn};
|
||||||
|
|
||||||
|
/// Type alias for a flush batch entry: response sender + write result.
|
||||||
|
type FlushEntry = (oneshot::Sender<Result<u64, QuarantineError>>, Result<u64, QuarantineError>);
|
||||||
|
|
||||||
|
/// Configuration for the group commit buffer.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct GroupCommitConfig {
|
||||||
|
/// Maximum number of writes to batch before flushing.
|
||||||
|
pub max_writes: usize,
|
||||||
|
/// Maximum time to wait before flushing a partial batch.
|
||||||
|
pub max_duration: Duration,
|
||||||
|
/// Channel capacity for pending write requests.
|
||||||
|
pub channel_capacity: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for GroupCommitConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self { max_writes: 100, max_duration: Duration::from_millis(10), channel_capacity: 10_000 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A write request sent through the channel.
|
||||||
|
struct WriteRequest {
|
||||||
|
payload: Vec<u8>,
|
||||||
|
response: oneshot::Sender<Result<u64, QuarantineError>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Group commit buffer that batches fsync operations.
|
||||||
|
///
|
||||||
|
/// Owns the Journal internally and provides an async `append()` API.
|
||||||
|
/// Concurrent writers are coalesced into batches that share a single fsync.
|
||||||
|
///
|
||||||
|
/// This struct is cheaply cloneable (just clones the channel sender).
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct GroupCommitBuffer {
|
||||||
|
sender: mpsc::Sender<WriteRequest>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl GroupCommitBuffer {
|
||||||
|
/// Create a new group commit buffer wrapping the given journal.
|
||||||
|
///
|
||||||
|
/// Spawns a background flusher task on the current tokio runtime.
|
||||||
|
/// The journal is moved into the flusher and is not accessible externally.
|
||||||
|
#[instrument(skip(journal), fields(max_writes = config.max_writes, max_duration_ms = config.max_duration.as_millis() as u64))]
|
||||||
|
pub fn new(journal: Journal, config: GroupCommitConfig) -> Self {
|
||||||
|
let (sender, receiver) = mpsc::channel(config.channel_capacity);
|
||||||
|
|
||||||
|
tokio::spawn(Self::flusher_loop(journal, receiver, config));
|
||||||
|
|
||||||
|
info!("GroupCommitBuffer started");
|
||||||
|
Self { sender }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Append a payload to the journal via the group commit buffer.
|
||||||
|
///
|
||||||
|
/// Returns the WAL offset of the written record once the batch
|
||||||
|
/// containing this write has been fsynced.
|
||||||
|
pub async fn append(&self, payload: Vec<u8>) -> Result<u64, QuarantineError> {
|
||||||
|
let (response_tx, response_rx) = oneshot::channel();
|
||||||
|
|
||||||
|
let request = WriteRequest { payload, response: response_tx };
|
||||||
|
|
||||||
|
self.sender.send(request).await.map_err(|_| {
|
||||||
|
QuarantineError::IoGeneric(std::io::Error::other(
|
||||||
|
"GroupCommitBuffer flusher has shut down",
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
response_rx.await.map_err(|_| {
|
||||||
|
QuarantineError::IoGeneric(std::io::Error::other(
|
||||||
|
"GroupCommitBuffer flusher dropped response channel",
|
||||||
|
))
|
||||||
|
})?
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Background flusher loop.
|
||||||
|
///
|
||||||
|
/// Collects writes into batches, writes them all, then fsyncs once.
|
||||||
|
async fn flusher_loop(
|
||||||
|
mut journal: Journal,
|
||||||
|
mut receiver: mpsc::Receiver<WriteRequest>,
|
||||||
|
config: GroupCommitConfig,
|
||||||
|
) {
|
||||||
|
let mut batch: Vec<WriteRequest> = Vec::with_capacity(config.max_writes);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
// Wait for the first write request
|
||||||
|
let first = match receiver.recv().await {
|
||||||
|
Some(req) => req,
|
||||||
|
None => {
|
||||||
|
info!("GroupCommitBuffer channel closed, flusher exiting");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
batch.push(first);
|
||||||
|
|
||||||
|
// Collect more requests up to max_writes or max_duration
|
||||||
|
let deadline = tokio::time::Instant::now() + config.max_duration;
|
||||||
|
while batch.len() < config.max_writes {
|
||||||
|
match tokio::time::timeout_at(deadline, receiver.recv()).await {
|
||||||
|
Ok(Some(req)) => batch.push(req),
|
||||||
|
Ok(None) => {
|
||||||
|
// Channel closed, flush what we have and exit
|
||||||
|
Self::flush_batch(&mut journal, &mut batch);
|
||||||
|
info!("GroupCommitBuffer channel closed during batch, flusher exiting");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
Err(_) => break, // Timeout reached, flush
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
debug!(batch_size = batch.len(), "Flushing batch");
|
||||||
|
Self::flush_batch(&mut journal, &mut batch);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Write all requests in the batch, fsync once, respond to all waiters.
|
||||||
|
fn flush_batch(journal: &mut Journal, batch: &mut Vec<WriteRequest>) {
|
||||||
|
let mut results: Vec<FlushEntry> = Vec::with_capacity(batch.len());
|
||||||
|
|
||||||
|
let mut any_error = false;
|
||||||
|
|
||||||
|
for request in batch.drain(..) {
|
||||||
|
if any_error {
|
||||||
|
// If a previous write in this batch failed, fail all subsequent
|
||||||
|
let err = QuarantineError::IoGeneric(std::io::Error::other(
|
||||||
|
"Previous write in batch failed",
|
||||||
|
));
|
||||||
|
results.push((request.response, Err(err)));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
match journal.append(request.payload) {
|
||||||
|
Ok(offset) => {
|
||||||
|
results.push((request.response, Ok(offset)));
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
error!(error = %e, "Write failed in group commit batch");
|
||||||
|
any_error = true;
|
||||||
|
results.push((request.response, Err(e)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Single fsync for the entire batch
|
||||||
|
if !any_error {
|
||||||
|
let fsync_start = Instant::now();
|
||||||
|
if let Err(e) = journal.force_sync() {
|
||||||
|
error!(error = %e, "Fsync failed in group commit batch");
|
||||||
|
// Convert all Ok results to errors since fsync failed
|
||||||
|
for (_, result) in &mut results {
|
||||||
|
if result.is_ok() {
|
||||||
|
*result = Err(QuarantineError::IoGeneric(std::io::Error::other(
|
||||||
|
"Batch fsync failed",
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
let fsync_ms = fsync_start.elapsed().as_millis();
|
||||||
|
if fsync_ms > 500 {
|
||||||
|
warn!(fsync_ms, batch_size = results.len(), "Slow fsync detected");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send all responses
|
||||||
|
for (sender, result) in results {
|
||||||
|
// Ignore send errors - the receiver may have been dropped (timeout)
|
||||||
|
let _ = sender.send(result);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::durability::DurabilityLevel;
|
||||||
|
use tempfile::tempdir;
|
||||||
|
|
||||||
|
fn create_test_journal() -> (tempfile::TempDir, Journal) {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
let journal = Journal::open(&wal_path)
|
||||||
|
.expect("open journal")
|
||||||
|
.with_durability(DurabilityLevel::Eventual);
|
||||||
|
(dir, journal)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_single_write_through_buffer() {
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig::default();
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
let offset = buffer.append(b"hello world".to_vec()).await.expect("append");
|
||||||
|
assert_eq!(offset, 8); // HEADER_SIZE
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_batch_coalesces_fsync() {
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig {
|
||||||
|
max_writes: 50,
|
||||||
|
max_duration: Duration::from_millis(100),
|
||||||
|
channel_capacity: 1000,
|
||||||
|
};
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
// Launch 50 concurrent writes
|
||||||
|
let mut handles = Vec::new();
|
||||||
|
for i in 0..50 {
|
||||||
|
let buf = buffer.clone();
|
||||||
|
handles.push(tokio::spawn(async move {
|
||||||
|
buf.append(format!("record {}", i).into_bytes()).await
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut offsets = Vec::new();
|
||||||
|
for handle in handles {
|
||||||
|
let offset = handle.await.expect("join").expect("append");
|
||||||
|
offsets.push(offset);
|
||||||
|
}
|
||||||
|
|
||||||
|
// All offsets should be unique
|
||||||
|
offsets.sort();
|
||||||
|
offsets.dedup();
|
||||||
|
assert_eq!(offsets.len(), 50);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_flush_on_timeout() {
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig {
|
||||||
|
max_writes: 1000, // High threshold - won't trigger
|
||||||
|
max_duration: Duration::from_millis(50),
|
||||||
|
channel_capacity: 100,
|
||||||
|
};
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
// Single write should flush after timeout
|
||||||
|
let offset = buffer.append(b"timeout test".to_vec()).await.expect("append");
|
||||||
|
assert_eq!(offset, 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_flush_on_max_writes() {
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig {
|
||||||
|
max_writes: 5,
|
||||||
|
max_duration: Duration::from_secs(60), // Long timeout - won't trigger
|
||||||
|
channel_capacity: 100,
|
||||||
|
};
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
// Write exactly max_writes records
|
||||||
|
let mut handles = Vec::new();
|
||||||
|
for i in 0..5 {
|
||||||
|
let buf = buffer.clone();
|
||||||
|
handles.push(tokio::spawn(async move {
|
||||||
|
buf.append(format!("rec {}", i).into_bytes()).await
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
for handle in handles {
|
||||||
|
handle.await.expect("join").expect("append");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_concurrent_writers_unique_offsets() {
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig {
|
||||||
|
max_writes: 50,
|
||||||
|
max_duration: Duration::from_millis(20),
|
||||||
|
channel_capacity: 10_000,
|
||||||
|
};
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
// 10 tasks x 100 writes
|
||||||
|
let mut handles = Vec::new();
|
||||||
|
for task_id in 0..10 {
|
||||||
|
for write_id in 0..100 {
|
||||||
|
let buf = buffer.clone();
|
||||||
|
let payload = format!("task {} write {}", task_id, write_id).into_bytes();
|
||||||
|
handles.push(tokio::spawn(async move { buf.append(payload).await }));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut offsets = Vec::new();
|
||||||
|
for handle in handles {
|
||||||
|
let offset = handle.await.expect("join").expect("append");
|
||||||
|
offsets.push(offset);
|
||||||
|
}
|
||||||
|
|
||||||
|
// All 1000 offsets must be unique
|
||||||
|
offsets.sort();
|
||||||
|
let unique_count = offsets.len();
|
||||||
|
offsets.dedup();
|
||||||
|
assert_eq!(offsets.len(), unique_count, "All offsets should be unique");
|
||||||
|
assert_eq!(offsets.len(), 1000);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_error_propagation_to_waiters() {
|
||||||
|
// Dropping the buffer (and thus the sender) should cause pending
|
||||||
|
// appends to fail
|
||||||
|
let (_dir, journal) = create_test_journal();
|
||||||
|
let config = GroupCommitConfig::default();
|
||||||
|
let buffer = GroupCommitBuffer::new(journal, config);
|
||||||
|
|
||||||
|
// First write should succeed
|
||||||
|
buffer.append(b"ok".to_vec()).await.expect("first append");
|
||||||
|
|
||||||
|
// Drop the buffer to close the channel
|
||||||
|
drop(buffer);
|
||||||
|
|
||||||
|
// Can't send more since we dropped it - this is correct behavior
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -1,33 +1,47 @@
|
|||||||
use crate::durability::{DurabilityLevel, FsyncGuard};
|
use crate::durability::{DurabilityLevel, FsyncGuard};
|
||||||
use crate::error::{QuarantineError, Result};
|
use crate::error::{QuarantineError, Result};
|
||||||
use crate::format::{FileHeader, Record, HEADER_SIZE};
|
use crate::format::{FileHeader, Record, HEADER_SIZE};
|
||||||
use std::fs::{self, File, OpenOptions};
|
use crate::recovery::{self, RecoveryReport};
|
||||||
|
use crate::segment::{SegmentManager, DEFAULT_MAX_SEGMENT_SIZE};
|
||||||
|
use std::fs::{File, OpenOptions};
|
||||||
use std::io::{BufReader, Seek, SeekFrom};
|
use std::io::{BufReader, Seek, SeekFrom};
|
||||||
use std::path::{Path, PathBuf};
|
use std::path::Path;
|
||||||
use tracing::{debug, info, instrument, warn};
|
use tracing::{debug, info, instrument, warn};
|
||||||
|
|
||||||
/// The main quarantine journal.
|
/// The main quarantine journal.
|
||||||
///
|
///
|
||||||
/// Provides append-only storage with crash recovery and fsync guarantees.
|
/// Provides append-only storage with crash recovery, fsync guarantees,
|
||||||
|
/// and log rotation via segments.
|
||||||
pub struct Journal {
|
pub struct Journal {
|
||||||
data_dir: PathBuf,
|
segment_mgr: SegmentManager,
|
||||||
current_file: Option<FsyncGuard>,
|
current_file: Option<FsyncGuard>,
|
||||||
current_offset: u64,
|
current_offset: u64,
|
||||||
durability: DurabilityLevel,
|
durability: DurabilityLevel,
|
||||||
|
last_recovery_report: Option<RecoveryReport>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Journal {
|
impl Journal {
|
||||||
/// Open or create a journal in the specified directory.
|
/// Open or create a journal in the specified directory.
|
||||||
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))]
|
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))]
|
||||||
pub fn open(data_dir: impl AsRef<Path>) -> Result<Self> {
|
pub fn open(data_dir: impl AsRef<Path>) -> Result<Self> {
|
||||||
|
Self::open_with_max_segment_size(data_dir, DEFAULT_MAX_SEGMENT_SIZE)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Open with a custom max segment size (useful for tests).
|
||||||
|
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display(), max_segment_size))]
|
||||||
|
pub fn open_with_max_segment_size(
|
||||||
|
data_dir: impl AsRef<Path>,
|
||||||
|
max_segment_size: u64,
|
||||||
|
) -> Result<Self> {
|
||||||
let data_dir = data_dir.as_ref().to_path_buf();
|
let data_dir = data_dir.as_ref().to_path_buf();
|
||||||
fs::create_dir_all(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?;
|
let segment_mgr = SegmentManager::open(&data_dir, max_segment_size)?;
|
||||||
|
|
||||||
let mut journal = Self {
|
let mut journal = Self {
|
||||||
data_dir,
|
segment_mgr,
|
||||||
current_file: None,
|
current_file: None,
|
||||||
current_offset: 0,
|
current_offset: 0,
|
||||||
durability: DurabilityLevel::Immediate,
|
durability: DurabilityLevel::Immediate,
|
||||||
|
last_recovery_report: None,
|
||||||
};
|
};
|
||||||
|
|
||||||
journal.recover()?;
|
journal.recover()?;
|
||||||
@ -41,11 +55,32 @@ impl Journal {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Get the current write offset.
|
||||||
|
pub fn current_offset(&self) -> u64 {
|
||||||
|
self.current_offset
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the recovery report from the last open/recover call.
|
||||||
|
pub fn recovery_report(&self) -> Option<&RecoveryReport> {
|
||||||
|
self.last_recovery_report.as_ref()
|
||||||
|
}
|
||||||
|
|
||||||
/// Append a record to the journal.
|
/// Append a record to the journal.
|
||||||
|
///
|
||||||
|
/// Checks if rotation is needed before writing. Returns the global offset.
|
||||||
#[instrument(skip(self, payload), fields(payload_len = payload.len()))]
|
#[instrument(skip(self, payload), fields(payload_len = payload.len()))]
|
||||||
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
|
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
|
||||||
if self.current_file.is_none() {
|
if self.current_file.is_none() {
|
||||||
self.open_current_file()?;
|
self.ensure_current_segment()?;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if rotation is needed
|
||||||
|
if let Some(guard) = &self.current_file {
|
||||||
|
let current_size =
|
||||||
|
guard.file().metadata().map_err(|e| QuarantineError::io(guard.path(), e))?.len();
|
||||||
|
if self.segment_mgr.needs_rotation(current_size) {
|
||||||
|
self.rotate()?;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let record = Record::new(payload);
|
let record = Record::new(payload);
|
||||||
@ -64,57 +99,172 @@ impl Journal {
|
|||||||
Ok(offset)
|
Ok(offset)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Read a record at the given offset.
|
/// Read a record at the given global offset.
|
||||||
|
///
|
||||||
|
/// Resolves the correct segment via binary search, then seeks within it.
|
||||||
|
/// If no segment is found, rescans the directory for new segments created
|
||||||
|
/// by a separate writer instance and retries once.
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
pub fn read(&self, offset: u64) -> Result<Record> {
|
pub fn read(&mut self, offset: u64) -> Result<Record> {
|
||||||
let path = self.current_file_path();
|
// Try to resolve the segment, refreshing once if not found
|
||||||
let mut file = File::open(&path).map_err(|e| QuarantineError::io(&path, e))?;
|
let segment_info = match self.segment_mgr.resolve_segment(offset) {
|
||||||
file.seek(SeekFrom::Start(offset)).map_err(|e| QuarantineError::io(&path, e))?;
|
Some(seg) => (seg.base_offset, seg.path.clone()),
|
||||||
|
None => {
|
||||||
|
// Segment not found - rescan directory for new segments
|
||||||
|
self.segment_mgr.refresh()?;
|
||||||
|
match self.segment_mgr.resolve_segment(offset) {
|
||||||
|
Some(seg) => (seg.base_offset, seg.path.clone()),
|
||||||
|
None => {
|
||||||
|
return Err(QuarantineError::IoGeneric(std::io::Error::new(
|
||||||
|
std::io::ErrorKind::UnexpectedEof,
|
||||||
|
format!("No segment contains offset {}", offset),
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let local_offset = offset - segment_info.0;
|
||||||
|
|
||||||
|
let mut file =
|
||||||
|
File::open(&segment_info.1).map_err(|e| QuarantineError::io(&segment_info.1, e))?;
|
||||||
|
file.seek(SeekFrom::Start(local_offset))
|
||||||
|
.map_err(|e| QuarantineError::io(&segment_info.1, e))?;
|
||||||
|
|
||||||
let mut reader = BufReader::new(file);
|
let mut reader = BufReader::new(file);
|
||||||
Record::read_from(&mut reader)
|
Record::read_from(&mut reader)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Recover state from disk.
|
/// Force sync any pending writes.
|
||||||
|
#[instrument(skip(self))]
|
||||||
|
pub fn force_sync(&mut self) -> Result<()> {
|
||||||
|
if let Some(ref mut guard) = self.current_file {
|
||||||
|
guard.force_sync()?;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clean up old segments below the given minimum cursor.
|
||||||
|
///
|
||||||
|
/// Returns the number of bytes freed.
|
||||||
|
#[instrument(skip(self))]
|
||||||
|
pub fn cleanup(&mut self, min_cursor: u64) -> Result<u64> {
|
||||||
|
self.segment_mgr.cleanup(min_cursor)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Recover state from disk using full record scanning across all segments.
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
fn recover(&mut self) -> Result<()> {
|
fn recover(&mut self) -> Result<()> {
|
||||||
let path = self.current_file_path();
|
let segments = self.segment_mgr.segments().to_vec();
|
||||||
if !path.exists() {
|
|
||||||
debug!("No existing WAL file, starting fresh");
|
if segments.is_empty() {
|
||||||
|
debug!("No existing WAL segments, starting fresh");
|
||||||
return Ok(());
|
return Ok(());
|
||||||
}
|
}
|
||||||
|
|
||||||
let file = File::open(&path).map_err(|e| QuarantineError::io(&path, e))?;
|
// Recover each segment in order; stop at first with issues
|
||||||
let len = file.metadata().map_err(|e| QuarantineError::io(&path, e))?.len();
|
let mut total_valid = 0u64;
|
||||||
|
let mut final_offset = 0u64;
|
||||||
|
let mut last_report = None;
|
||||||
|
|
||||||
// Basic recovery: validate header and set offset to end
|
for (i, segment) in segments.iter().enumerate() {
|
||||||
// TODO: Implement full scan and truncate of partial records
|
let file_len = std::fs::metadata(&segment.path)
|
||||||
if len >= HEADER_SIZE as u64 {
|
.map_err(|e| QuarantineError::io(&segment.path, e))?
|
||||||
let mut reader = BufReader::new(file);
|
.len();
|
||||||
let _header = FileHeader::read_from(&mut reader)?;
|
|
||||||
self.current_offset = len;
|
if file_len == 0 {
|
||||||
info!(file_size = len, "Recovered existing WAL");
|
debug!(base_offset = segment.base_offset, "Empty segment file, skipping");
|
||||||
} else {
|
continue;
|
||||||
// Corrupt or empty, start over
|
}
|
||||||
warn!(file_size = len, "WAL file too small, resetting");
|
|
||||||
self.current_offset = 0;
|
let report = recovery::recover_file(&segment.path)?;
|
||||||
|
|
||||||
|
total_valid += report.valid_records;
|
||||||
|
// The final_offset from recover_file is a local offset within the segment.
|
||||||
|
// Convert to global: segment.base_offset + local_offset
|
||||||
|
final_offset = segment.base_offset + report.final_offset;
|
||||||
|
|
||||||
|
if report.bytes_truncated > 0 {
|
||||||
|
warn!(
|
||||||
|
segment_index = i,
|
||||||
|
base_offset = segment.base_offset,
|
||||||
|
truncated = report.bytes_truncated,
|
||||||
|
"Segment had corrupt data, truncated"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
last_report = Some(report);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
self.current_offset = final_offset;
|
||||||
|
|
||||||
|
if let Some(report) = &last_report {
|
||||||
|
if report.bytes_truncated > 0 {
|
||||||
|
warn!(total_valid, final_offset, "Recovery truncated corrupt data");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
info!(total_valid, final_offset, "Multi-segment recovery complete");
|
||||||
|
self.last_recovery_report = last_report;
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
fn current_file_path(&self) -> PathBuf {
|
/// Ensure there's a current segment open for writing.
|
||||||
self.data_dir.join("00000000.wal")
|
fn ensure_current_segment(&mut self) -> Result<()> {
|
||||||
|
if self.segment_mgr.segments().is_empty() {
|
||||||
|
// First ever segment
|
||||||
|
self.segment_mgr.create_segment(0)?;
|
||||||
|
self.current_offset = HEADER_SIZE as u64;
|
||||||
|
}
|
||||||
|
|
||||||
|
self.open_current_file()
|
||||||
}
|
}
|
||||||
|
|
||||||
#[instrument(skip(self), fields(path = %self.current_file_path().display()))]
|
/// Rotate to a new segment.
|
||||||
|
#[instrument(skip(self))]
|
||||||
|
fn rotate(&mut self) -> Result<()> {
|
||||||
|
// Close current file
|
||||||
|
if let Some(mut guard) = self.current_file.take() {
|
||||||
|
guard.force_sync()?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let new_base = self.current_offset;
|
||||||
|
self.segment_mgr.create_segment(new_base)?;
|
||||||
|
|
||||||
|
// The new segment starts with a header, so the actual write position
|
||||||
|
// within the segment is at HEADER_SIZE. But the global offset stays
|
||||||
|
// at current_offset (which already accounts for everything written so far).
|
||||||
|
// We do NOT advance current_offset by HEADER_SIZE here because the
|
||||||
|
// segment's base_offset = current_offset, and reads will use
|
||||||
|
// local_offset = global_offset - base_offset = 0 + HEADER_SIZE for the first record.
|
||||||
|
//
|
||||||
|
// Actually, we need the first record in the new segment to have a global
|
||||||
|
// offset that, when converted to local, lands after the header.
|
||||||
|
// local_offset = global_offset - base_offset
|
||||||
|
// For the first record: local_offset should be HEADER_SIZE.
|
||||||
|
// So global_offset = base_offset + HEADER_SIZE = current_offset + HEADER_SIZE.
|
||||||
|
self.current_offset = new_base + HEADER_SIZE as u64;
|
||||||
|
|
||||||
|
self.open_current_file()?;
|
||||||
|
info!(new_base, "Rotated to new segment");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip(self))]
|
||||||
fn open_current_file(&mut self) -> Result<()> {
|
fn open_current_file(&mut self) -> Result<()> {
|
||||||
let path = self.current_file_path();
|
let segment = self.segment_mgr.current_segment().ok_or_else(|| {
|
||||||
|
QuarantineError::IoGeneric(std::io::Error::other("No segments available"))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let path = segment.path.clone();
|
||||||
|
|
||||||
let file = OpenOptions::new()
|
let file = OpenOptions::new()
|
||||||
.create(true)
|
.create(true)
|
||||||
.read(true)
|
.read(true)
|
||||||
.write(true)
|
.write(true)
|
||||||
.truncate(false) // Never truncate existing WAL files on open
|
.truncate(false)
|
||||||
.open(&path)
|
.open(&path)
|
||||||
.map_err(|e| QuarantineError::io(&path, e))?;
|
.map_err(|e| QuarantineError::io(&path, e))?;
|
||||||
|
|
||||||
@ -128,15 +278,12 @@ impl Journal {
|
|||||||
let mut buf = Vec::with_capacity(HEADER_SIZE);
|
let mut buf = Vec::with_capacity(HEADER_SIZE);
|
||||||
header.write_to(&mut buf)?;
|
header.write_to(&mut buf)?;
|
||||||
guard.write(&buf)?;
|
guard.write(&buf)?;
|
||||||
self.current_offset = HEADER_SIZE as u64;
|
debug!("Wrote v2 header to new segment");
|
||||||
debug!("Created new WAL file with header");
|
|
||||||
} else {
|
|
||||||
// Seek to end of file for append operations
|
|
||||||
guard.file_mut().seek(SeekFrom::End(0)).map_err(|e| QuarantineError::io(&path, e))?;
|
|
||||||
self.current_offset = len;
|
|
||||||
debug!(file_size = len, "Opened existing WAL file");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Seek to end for appends
|
||||||
|
guard.file_mut().seek(SeekFrom::End(0)).map_err(|e| QuarantineError::io(&path, e))?;
|
||||||
|
|
||||||
self.current_file = Some(guard);
|
self.current_file = Some(guard);
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|||||||
@ -3,13 +3,27 @@
|
|||||||
//! This crate provides the foundational durability layer, ensuring that
|
//! This crate provides the foundational durability layer, ensuring that
|
||||||
//! assertions are safely persisted to disk before being acknowledged.
|
//! assertions are safely persisted to disk before being acknowledged.
|
||||||
//!
|
//!
|
||||||
|
//! # Record Format (v2)
|
||||||
|
//!
|
||||||
|
//! Each record is stored as: `[payload_len:u32_LE][crc32c:u32][blake3:32][payload:N]`
|
||||||
|
//!
|
||||||
|
//! - CRC32C provides fast integrity checking to detect torn writes
|
||||||
|
//! - BLAKE3 provides content-addressed verification
|
||||||
|
//!
|
||||||
//! # Crash Recovery
|
//! # Crash Recovery
|
||||||
//!
|
//!
|
||||||
//! The WAL provides crash recovery guarantees via immediate fsync. When a
|
//! The WAL provides crash recovery guarantees via immediate fsync. When a
|
||||||
//! record is appended with `DurabilityLevel::Immediate` (the default), it
|
//! record is appended with `DurabilityLevel::Immediate` (the default), it
|
||||||
//! is guaranteed to survive process crashes or power failures.
|
//! is guaranteed to survive process crashes or power failures.
|
||||||
//!
|
//!
|
||||||
//! See the `recovery` module for integration tests proving these guarantees.
|
//! On open, the journal scans all records across all segments, verifying
|
||||||
|
//! CRC32C and BLAKE3. Any corrupt or partial records at the tail are truncated.
|
||||||
|
//!
|
||||||
|
//! # Log Rotation
|
||||||
|
//!
|
||||||
|
//! Segment files are named `{base_offset:016x}.wal`. When the current segment
|
||||||
|
//! exceeds the configured max size, a new segment is created. Old segments
|
||||||
|
//! can be cleaned up once all consumers have advanced past them.
|
||||||
|
|
||||||
pub mod durability;
|
pub mod durability;
|
||||||
/// Error types and Result wrapper for WAL operations.
|
/// Error types and Result wrapper for WAL operations.
|
||||||
@ -18,10 +32,18 @@ pub mod error;
|
|||||||
pub mod format;
|
pub mod format;
|
||||||
/// The main Journal API.
|
/// The main Journal API.
|
||||||
pub mod journal;
|
pub mod journal;
|
||||||
/// Crash recovery integration tests.
|
/// Crash recovery: file scanning, validation, and truncation.
|
||||||
mod recovery;
|
pub mod recovery;
|
||||||
|
/// Log rotation via segment files.
|
||||||
|
pub mod segment;
|
||||||
|
|
||||||
|
/// Group commit buffer for batching fsync operations.
|
||||||
|
#[cfg(feature = "group-commit")]
|
||||||
|
pub mod group_commit;
|
||||||
|
|
||||||
pub use durability::{DurabilityLevel, FsyncGuard};
|
pub use durability::{DurabilityLevel, FsyncGuard};
|
||||||
pub use error::{QuarantineError, Result};
|
pub use error::{QuarantineError, Result};
|
||||||
pub use format::{FileHeader, Record, HEADER_SIZE};
|
pub use format::{FileHeader, Record, HEADER_SIZE, RECORD_OVERHEAD};
|
||||||
pub use journal::Journal;
|
pub use journal::Journal;
|
||||||
|
pub use recovery::RecoveryReport;
|
||||||
|
pub use segment::{Segment, SegmentManager};
|
||||||
|
|||||||
@ -1,198 +0,0 @@
|
|||||||
//! Crash recovery integration tests for the WAL.
|
|
||||||
//!
|
|
||||||
//! These tests verify that the Write-Ahead Log survives crashes (simulated by
|
|
||||||
//! dropping the Journal and reopening it) without data loss.
|
|
||||||
//!
|
|
||||||
//! # Test Strategy
|
|
||||||
//!
|
|
||||||
//! We cannot truly simulate a power failure in a unit test, but we can:
|
|
||||||
//! 1. Write data with immediate fsync (ensuring it hits disk)
|
|
||||||
//! 2. Drop the Journal (simulating process termination)
|
|
||||||
//! 3. Reopen the Journal (simulating restart)
|
|
||||||
//! 4. Verify all data is present and readable
|
|
||||||
//!
|
|
||||||
//! This proves the durability guarantees of the WAL.
|
|
||||||
|
|
||||||
#[cfg(test)]
|
|
||||||
mod tests {
|
|
||||||
use crate::format::HEADER_SIZE;
|
|
||||||
use crate::journal::Journal;
|
|
||||||
use tempfile::tempdir;
|
|
||||||
|
|
||||||
/// Test: Single record survives Journal close and reopen.
|
|
||||||
///
|
|
||||||
/// This is the fundamental crash recovery guarantee:
|
|
||||||
/// After fsync completes, data is durable.
|
|
||||||
#[test]
|
|
||||||
fn test_single_record_crash_recovery() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("wal");
|
|
||||||
|
|
||||||
let payload = b"critical assertion data".to_vec();
|
|
||||||
let offset: u64;
|
|
||||||
|
|
||||||
// Phase 1: Write and "crash" (drop journal)
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
|
|
||||||
offset = journal.append(payload.clone()).expect("Failed to append");
|
|
||||||
// Journal dropped here - simulates crash/restart
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 2: Recovery - reopen and verify
|
|
||||||
{
|
|
||||||
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
|
|
||||||
let record = journal.read(offset).expect("Failed to read after recovery");
|
|
||||||
assert_eq!(record.payload, payload, "Data should survive restart");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Test: Multiple records survive crash and are readable in order.
|
|
||||||
#[test]
|
|
||||||
fn test_multiple_records_crash_recovery() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("wal");
|
|
||||||
|
|
||||||
let records = vec![
|
|
||||||
b"assertion 1: Tesla revenue is $96.7B".to_vec(),
|
|
||||||
b"assertion 2: Apple revenue is $394B".to_vec(),
|
|
||||||
b"assertion 3: Microsoft revenue is $211B".to_vec(),
|
|
||||||
];
|
|
||||||
let mut offsets = Vec::new();
|
|
||||||
|
|
||||||
// Phase 1: Write multiple records and "crash"
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
|
|
||||||
for payload in &records {
|
|
||||||
let offset = journal.append(payload.clone()).expect("Failed to append");
|
|
||||||
offsets.push(offset);
|
|
||||||
}
|
|
||||||
// Journal dropped here
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 2: Recovery - verify all records
|
|
||||||
{
|
|
||||||
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
|
|
||||||
for (i, offset) in offsets.iter().enumerate() {
|
|
||||||
let record = journal.read(*offset).expect("Failed to read");
|
|
||||||
assert_eq!(record.payload, records[i], "Record {} should match", i);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Test: Journal can continue appending after recovery.
|
|
||||||
///
|
|
||||||
/// This verifies that recovery properly sets the write offset.
|
|
||||||
#[test]
|
|
||||||
fn test_append_after_recovery() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("wal");
|
|
||||||
|
|
||||||
let first_payload = b"first record".to_vec();
|
|
||||||
let first_offset: u64;
|
|
||||||
|
|
||||||
// Phase 1: Write first record and "crash"
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
|
|
||||||
first_offset = journal.append(first_payload.clone()).expect("Failed to append");
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 2: Recover and append more
|
|
||||||
let second_payload = b"second record after recovery".to_vec();
|
|
||||||
let second_offset: u64;
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to reopen journal");
|
|
||||||
second_offset = journal.append(second_payload.clone()).expect("Failed to append");
|
|
||||||
// Verify second offset is after first
|
|
||||||
assert!(
|
|
||||||
second_offset > first_offset,
|
|
||||||
"New records should be appended after existing data"
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 3: Verify both records after another "crash"
|
|
||||||
{
|
|
||||||
let journal = Journal::open(&wal_path).expect("Failed to reopen journal again");
|
|
||||||
let first = journal.read(first_offset).expect("Failed to read first");
|
|
||||||
let second = journal.read(second_offset).expect("Failed to read second");
|
|
||||||
assert_eq!(first.payload, first_payload);
|
|
||||||
assert_eq!(second.payload, second_payload);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Test: Large payloads survive crash recovery.
|
|
||||||
///
|
|
||||||
/// Ensures the WAL handles larger data correctly, not just small test payloads.
|
|
||||||
#[test]
|
|
||||||
fn test_large_payload_crash_recovery() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("wal");
|
|
||||||
|
|
||||||
// Create a 1MB payload (simulating a large assertion with embeddings)
|
|
||||||
let large_payload: Vec<u8> = (0..1024 * 1024).map(|i| (i % 256) as u8).collect();
|
|
||||||
let offset: u64;
|
|
||||||
|
|
||||||
// Write and "crash"
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
|
|
||||||
offset = journal.append(large_payload.clone()).expect("Failed to append large payload");
|
|
||||||
}
|
|
||||||
|
|
||||||
// Recover and verify
|
|
||||||
{
|
|
||||||
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
|
|
||||||
let record = journal.read(offset).expect("Failed to read large payload");
|
|
||||||
assert_eq!(record.payload.len(), large_payload.len());
|
|
||||||
assert_eq!(record.payload, large_payload, "Large payload should survive");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Test: Empty WAL directory is handled gracefully.
|
|
||||||
#[test]
|
|
||||||
fn test_fresh_start_no_existing_wal() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("fresh_wal");
|
|
||||||
|
|
||||||
// Opening a fresh directory should work
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open fresh journal");
|
|
||||||
|
|
||||||
// Should be able to write immediately
|
|
||||||
let offset = journal.append(b"first record".to_vec()).expect("Failed to append");
|
|
||||||
assert_eq!(offset, HEADER_SIZE as u64, "First record should start after header");
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Test: Repeated crash-recovery cycles work correctly.
|
|
||||||
///
|
|
||||||
/// Simulates a flaky system that crashes and recovers multiple times.
|
|
||||||
#[test]
|
|
||||||
fn test_repeated_crash_recovery_cycles() {
|
|
||||||
let dir = tempdir().expect("Failed to create temp dir");
|
|
||||||
let wal_path = dir.path().join("wal");
|
|
||||||
|
|
||||||
let mut all_offsets = Vec::new();
|
|
||||||
let num_cycles = 5;
|
|
||||||
let records_per_cycle = 3;
|
|
||||||
|
|
||||||
for cycle in 0..num_cycles {
|
|
||||||
// Write some records
|
|
||||||
{
|
|
||||||
let mut journal = Journal::open(&wal_path).expect("Failed to open journal");
|
|
||||||
for i in 0..records_per_cycle {
|
|
||||||
let payload = format!("cycle {} record {}", cycle, i).into_bytes();
|
|
||||||
let offset = journal.append(payload).expect("Failed to append");
|
|
||||||
all_offsets.push((offset, cycle, i));
|
|
||||||
}
|
|
||||||
// "Crash" - drop journal
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Final verification - all records from all cycles should be present
|
|
||||||
{
|
|
||||||
let journal = Journal::open(&wal_path).expect("Failed to reopen journal");
|
|
||||||
for (offset, cycle, i) in &all_offsets {
|
|
||||||
let record = journal.read(*offset).expect("Failed to read");
|
|
||||||
let expected = format!("cycle {} record {}", cycle, i).into_bytes();
|
|
||||||
assert_eq!(record.payload, expected, "Record from cycle {} should survive", cycle);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
236
crates/stemedb-wal/src/recovery/mod.rs
Normal file
236
crates/stemedb-wal/src/recovery/mod.rs
Normal file
@ -0,0 +1,236 @@
|
|||||||
|
//! Crash recovery for WAL files.
|
||||||
|
//!
|
||||||
|
//! Provides `recover_file()` which scans a WAL file record-by-record,
|
||||||
|
//! verifying CRC32C and BLAKE3 checksums. On encountering corruption or
|
||||||
|
//! a partial record, it truncates the file to the last valid offset.
|
||||||
|
//!
|
||||||
|
//! Recovery never returns `Err` for data corruption — it logs and truncates.
|
||||||
|
//! Only I/O failures (disk error, permission denied) produce errors.
|
||||||
|
|
||||||
|
use crate::error::{QuarantineError, Result};
|
||||||
|
use crate::format::{
|
||||||
|
compute_crc32c, FileHeader, HEADER_SIZE, MAX_RECORD_SIZE, RECORD_OVERHEAD, VERSION,
|
||||||
|
};
|
||||||
|
use byteorder::{LittleEndian, ReadBytesExt};
|
||||||
|
use std::fs::{File, OpenOptions};
|
||||||
|
use std::io::{BufReader, Read, Seek, SeekFrom};
|
||||||
|
use std::path::Path;
|
||||||
|
use std::time::{Duration, Instant};
|
||||||
|
use tracing::{info, instrument, warn};
|
||||||
|
|
||||||
|
/// Report from a recovery scan.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct RecoveryReport {
|
||||||
|
/// Number of valid records found.
|
||||||
|
pub valid_records: u64,
|
||||||
|
/// Number of invalid/corrupt records encountered (always 0 or 1,
|
||||||
|
/// since we stop at first corruption).
|
||||||
|
pub invalid_records: u64,
|
||||||
|
/// Bytes truncated from the end of the file.
|
||||||
|
pub bytes_truncated: u64,
|
||||||
|
/// Time spent during recovery.
|
||||||
|
pub recovery_duration: Duration,
|
||||||
|
/// Final valid offset (write position after recovery).
|
||||||
|
pub final_offset: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Recover a single WAL file, returning a report.
|
||||||
|
///
|
||||||
|
/// Algorithm:
|
||||||
|
/// 1. Read and validate FileHeader (must be v2)
|
||||||
|
/// 2. Sequential scan from HEADER_SIZE:
|
||||||
|
/// - Read payload_len. EOF -> clean end.
|
||||||
|
/// - Validate length (> 0, <= MAX_RECORD_SIZE). Invalid -> truncate.
|
||||||
|
/// - Read crc32c, blake3, payload. EOF -> truncate at scan position.
|
||||||
|
/// - Verify CRC32C. Mismatch -> truncate.
|
||||||
|
/// - Verify BLAKE3. Mismatch -> truncate.
|
||||||
|
/// - Advance scan position.
|
||||||
|
/// 3. If truncation needed: set_len + fsync.
|
||||||
|
/// 4. Return RecoveryReport.
|
||||||
|
#[instrument(skip_all, fields(path = %path.as_ref().display()))]
|
||||||
|
pub fn recover_file(path: impl AsRef<Path>) -> Result<RecoveryReport> {
|
||||||
|
let path = path.as_ref();
|
||||||
|
let start = Instant::now();
|
||||||
|
|
||||||
|
let file = File::open(path).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
let file_len = file.metadata().map_err(|e| QuarantineError::io(path, e))?.len();
|
||||||
|
|
||||||
|
// File too small for header
|
||||||
|
if file_len < HEADER_SIZE as u64 {
|
||||||
|
warn!(file_len, "WAL file smaller than header, truncating to 0");
|
||||||
|
let wfile =
|
||||||
|
OpenOptions::new().write(true).open(path).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
wfile.set_len(0).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
wfile.sync_all().map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
return Ok(RecoveryReport {
|
||||||
|
valid_records: 0,
|
||||||
|
invalid_records: 0,
|
||||||
|
bytes_truncated: file_len,
|
||||||
|
recovery_duration: start.elapsed(),
|
||||||
|
final_offset: 0,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut reader = BufReader::new(file);
|
||||||
|
|
||||||
|
// Validate header
|
||||||
|
let header_result = FileHeader::read_from(&mut reader);
|
||||||
|
if let Err(e) = header_result {
|
||||||
|
warn!(error = %e, "WAL header invalid, cannot recover");
|
||||||
|
return Err(e);
|
||||||
|
}
|
||||||
|
let header = header_result?;
|
||||||
|
if header.version != VERSION {
|
||||||
|
return Err(QuarantineError::CorruptRecord {
|
||||||
|
offset: 0,
|
||||||
|
reason: format!(
|
||||||
|
"Unsupported WAL version {} (expected {}). Delete the WAL and re-ingest.",
|
||||||
|
header.version, VERSION
|
||||||
|
),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sequential scan
|
||||||
|
let mut scan_offset = HEADER_SIZE as u64;
|
||||||
|
let mut valid_records: u64 = 0;
|
||||||
|
let mut needs_truncation = false;
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if scan_offset >= file_len {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Not enough room for even the fixed header portion of a record
|
||||||
|
let remaining = file_len - scan_offset;
|
||||||
|
if remaining < RECORD_OVERHEAD as u64 {
|
||||||
|
warn!(
|
||||||
|
offset = scan_offset,
|
||||||
|
remaining_bytes = remaining,
|
||||||
|
"Partial record header at end of file"
|
||||||
|
);
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
reader.seek(SeekFrom::Start(scan_offset)).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
|
||||||
|
// Read payload_len
|
||||||
|
let payload_len = match reader.read_u32::<LittleEndian>() {
|
||||||
|
Ok(len) => len,
|
||||||
|
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => break,
|
||||||
|
Err(e) => return Err(QuarantineError::io(path, e)),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Validate length
|
||||||
|
if payload_len == 0 || payload_len as usize > MAX_RECORD_SIZE {
|
||||||
|
warn!(offset = scan_offset, payload_len, "Invalid record length, truncating");
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if enough bytes remain for the full record
|
||||||
|
let record_size = RECORD_OVERHEAD as u64 + payload_len as u64;
|
||||||
|
if scan_offset + record_size > file_len {
|
||||||
|
warn!(
|
||||||
|
offset = scan_offset,
|
||||||
|
expected_size = record_size,
|
||||||
|
file_len,
|
||||||
|
"Truncated record at end of file"
|
||||||
|
);
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read CRC32C
|
||||||
|
let stored_crc = match reader.read_u32::<LittleEndian>() {
|
||||||
|
Ok(crc) => crc,
|
||||||
|
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => {
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
Err(e) => return Err(QuarantineError::io(path, e)),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Read BLAKE3
|
||||||
|
let mut blake3_hash = [0u8; 32];
|
||||||
|
if let Err(e) = reader.read_exact(&mut blake3_hash) {
|
||||||
|
if e.kind() == std::io::ErrorKind::UnexpectedEof {
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
return Err(QuarantineError::io(path, e));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read payload
|
||||||
|
let mut payload = vec![0u8; payload_len as usize];
|
||||||
|
if let Err(e) = reader.read_exact(&mut payload) {
|
||||||
|
if e.kind() == std::io::ErrorKind::UnexpectedEof {
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
return Err(QuarantineError::io(path, e));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify CRC32C
|
||||||
|
let len_bytes = payload_len.to_le_bytes();
|
||||||
|
let computed_crc = compute_crc32c(&len_bytes, &blake3_hash, &payload);
|
||||||
|
if stored_crc != computed_crc {
|
||||||
|
warn!(
|
||||||
|
offset = scan_offset,
|
||||||
|
expected = stored_crc,
|
||||||
|
actual = computed_crc,
|
||||||
|
"CRC32C mismatch, truncating"
|
||||||
|
);
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify BLAKE3
|
||||||
|
let computed_blake3: [u8; 32] = blake3::hash(&payload).into();
|
||||||
|
if blake3_hash != computed_blake3 {
|
||||||
|
warn!(offset = scan_offset, "BLAKE3 mismatch, truncating");
|
||||||
|
needs_truncation = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Record is valid
|
||||||
|
scan_offset += record_size;
|
||||||
|
valid_records += 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Truncate if needed
|
||||||
|
let bytes_truncated = if needs_truncation && scan_offset < file_len {
|
||||||
|
let truncated = file_len - scan_offset;
|
||||||
|
let wfile =
|
||||||
|
OpenOptions::new().write(true).open(path).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
wfile.set_len(scan_offset).map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
wfile.sync_all().map_err(|e| QuarantineError::io(path, e))?;
|
||||||
|
if let Some(parent) = path.parent() {
|
||||||
|
let _ = crate::durability::sync_directory(parent);
|
||||||
|
}
|
||||||
|
info!(truncated_bytes = truncated, final_offset = scan_offset, "Truncated corrupt tail");
|
||||||
|
truncated
|
||||||
|
} else {
|
||||||
|
0
|
||||||
|
};
|
||||||
|
|
||||||
|
let report = RecoveryReport {
|
||||||
|
valid_records,
|
||||||
|
invalid_records: u64::from(needs_truncation),
|
||||||
|
bytes_truncated,
|
||||||
|
recovery_duration: start.elapsed(),
|
||||||
|
final_offset: scan_offset,
|
||||||
|
};
|
||||||
|
|
||||||
|
info!(
|
||||||
|
valid_records = report.valid_records,
|
||||||
|
bytes_truncated = report.bytes_truncated,
|
||||||
|
final_offset = report.final_offset,
|
||||||
|
"Recovery complete"
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(report)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests;
|
||||||
413
crates/stemedb-wal/src/recovery/tests.rs
Normal file
413
crates/stemedb-wal/src/recovery/tests.rs
Normal file
@ -0,0 +1,413 @@
|
|||||||
|
//! Tests for crash recovery and log rotation integration.
|
||||||
|
|
||||||
|
use super::*;
|
||||||
|
use crate::format::{FileHeader, Record, HEADER_SIZE, MAX_RECORD_SIZE, RECORD_OVERHEAD};
|
||||||
|
use crate::journal::Journal;
|
||||||
|
use std::io::Write;
|
||||||
|
use tempfile::tempdir;
|
||||||
|
|
||||||
|
/// Helper: write a raw WAL file with header + records for testing
|
||||||
|
fn write_test_wal(path: &Path, records: &[&[u8]]) -> Vec<u64> {
|
||||||
|
let mut file = File::create(path).expect("create file");
|
||||||
|
let header = FileHeader::new();
|
||||||
|
let mut buf = Vec::new();
|
||||||
|
header.write_to(&mut buf).expect("write header");
|
||||||
|
file.write_all(&buf).expect("write header bytes");
|
||||||
|
|
||||||
|
let mut offsets = Vec::new();
|
||||||
|
let mut offset = HEADER_SIZE as u64;
|
||||||
|
|
||||||
|
for payload in records {
|
||||||
|
offsets.push(offset);
|
||||||
|
let record = Record::new(payload.to_vec());
|
||||||
|
let mut rec_buf = Vec::new();
|
||||||
|
record.write_to(&mut rec_buf).expect("write record");
|
||||||
|
file.write_all(&rec_buf).expect("write record bytes");
|
||||||
|
offset += record.disk_size();
|
||||||
|
}
|
||||||
|
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
offsets
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_truncates_partial_record() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
write_test_wal(&wal_file, &[b"record 0", b"record 1", b"record 2"]);
|
||||||
|
|
||||||
|
// Append 5 trailing junk bytes
|
||||||
|
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
|
||||||
|
file.write_all(&[0xDE, 0xAD, 0xBE, 0xEF, 0x42]).expect("write junk");
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 3);
|
||||||
|
assert_eq!(report.bytes_truncated, 5);
|
||||||
|
assert_eq!(report.invalid_records, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_truncates_corrupt_checksum() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
write_test_wal(&wal_file, &[b"record 0", b"record 1", b"record 2"]);
|
||||||
|
|
||||||
|
// Corrupt a byte in record 2's payload area
|
||||||
|
let mut data = std::fs::read(&wal_file).expect("read file");
|
||||||
|
let corrupt_pos = data.len() - 2;
|
||||||
|
data[corrupt_pos] ^= 0xFF;
|
||||||
|
std::fs::write(&wal_file, &data).expect("write corrupted");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 2);
|
||||||
|
assert_eq!(report.invalid_records, 1);
|
||||||
|
assert!(report.bytes_truncated > 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_handles_empty_after_header() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
// Write just a header
|
||||||
|
let mut file = File::create(&wal_file).expect("create");
|
||||||
|
let header = FileHeader::new();
|
||||||
|
let mut buf = Vec::new();
|
||||||
|
header.write_to(&mut buf).expect("write header");
|
||||||
|
file.write_all(&buf).expect("write");
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 0);
|
||||||
|
assert_eq!(report.bytes_truncated, 0);
|
||||||
|
assert_eq!(report.final_offset, HEADER_SIZE as u64);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_handles_truncated_header() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
// Write only 4 bytes (less than HEADER_SIZE = 8)
|
||||||
|
std::fs::write(&wal_file, [0x53, 0x54, 0x45, 0x4D]).expect("write");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 0);
|
||||||
|
assert_eq!(report.bytes_truncated, 4);
|
||||||
|
assert_eq!(report.final_offset, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_report_metrics() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
write_test_wal(&wal_file, &[b"alpha", b"beta", b"gamma"]);
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 3);
|
||||||
|
assert_eq!(report.invalid_records, 0);
|
||||||
|
assert_eq!(report.bytes_truncated, 0);
|
||||||
|
assert!(report.recovery_duration < Duration::from_secs(5));
|
||||||
|
|
||||||
|
let expected_offset = HEADER_SIZE as u64 + 3 * RECORD_OVERHEAD as u64 + 5 + 4 + 5; // alpha=5, beta=4, gamma=5
|
||||||
|
assert_eq!(report.final_offset, expected_offset);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_preserves_valid_before_corruption() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
let offsets = write_test_wal(&wal_file, &[b"keep me", b"keep me too", b"corrupt me"]);
|
||||||
|
|
||||||
|
// Corrupt record 2 by flipping a CRC byte (bytes 4..8 of that record)
|
||||||
|
let mut data = std::fs::read(&wal_file).expect("read");
|
||||||
|
let record2_crc_offset = offsets[2] as usize + 4; // skip payload_len, hit CRC
|
||||||
|
data[record2_crc_offset] ^= 0xFF;
|
||||||
|
std::fs::write(&wal_file, &data).expect("write");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 2);
|
||||||
|
assert_eq!(report.final_offset, offsets[2]);
|
||||||
|
|
||||||
|
// Verify the file was actually truncated
|
||||||
|
let new_len = std::fs::metadata(&wal_file).expect("metadata").len();
|
||||||
|
assert_eq!(new_len, offsets[2]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_handles_zero_length_record() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
write_test_wal(&wal_file, &[b"good record"]);
|
||||||
|
|
||||||
|
// Append a record with payload_len = 0 (invalid)
|
||||||
|
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
|
||||||
|
file.write_all(&0u32.to_le_bytes()).expect("write zero len");
|
||||||
|
file.write_all(&[0u8; 36]).expect("write padding"); // crc + blake3
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 1);
|
||||||
|
assert_eq!(report.invalid_records, 1);
|
||||||
|
assert!(report.bytes_truncated > 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_handles_impossibly_large_length() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
write_test_wal(&wal_file, &[b"good record"]);
|
||||||
|
|
||||||
|
// Append a record with payload_len = MAX + 1 (invalid)
|
||||||
|
let huge_len = (MAX_RECORD_SIZE as u32) + 1;
|
||||||
|
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
|
||||||
|
file.write_all(&huge_len.to_le_bytes()).expect("write huge len");
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 1);
|
||||||
|
assert_eq!(report.invalid_records, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Integration test: Journal uses recover_file under the hood
|
||||||
|
#[test]
|
||||||
|
fn test_journal_recovery_integration() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
let offsets: Vec<u64>;
|
||||||
|
|
||||||
|
// Write records via Journal
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open(&wal_path).expect("open journal");
|
||||||
|
offsets = (0..5)
|
||||||
|
.map(|i| journal.append(format!("record {}", i).into_bytes()).expect("append"))
|
||||||
|
.collect();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Append junk to simulate torn write
|
||||||
|
let wal_file = wal_path.join("0000000000000000.wal");
|
||||||
|
let mut file = OpenOptions::new().append(true).open(&wal_file).expect("open");
|
||||||
|
file.write_all(&[0xFF; 20]).expect("write junk");
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
// Journal should recover cleanly
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open(&wal_path).expect("reopen journal");
|
||||||
|
for (i, offset) in offsets.iter().enumerate() {
|
||||||
|
let record = journal.read(*offset).expect("read record");
|
||||||
|
assert_eq!(record.payload, format!("record {}", i).into_bytes());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Performance: recovery of 10K records should be fast
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_performance_10k_records() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_file = dir.path().join("test.wal");
|
||||||
|
|
||||||
|
let payloads: Vec<&[u8]> =
|
||||||
|
(0..10_000).map(|_| b"benchmark payload data here" as &[u8]).collect();
|
||||||
|
write_test_wal(&wal_file, &payloads);
|
||||||
|
|
||||||
|
// Corrupt the last record
|
||||||
|
let mut data = std::fs::read(&wal_file).expect("read");
|
||||||
|
let last = data.len() - 1;
|
||||||
|
data[last] ^= 0xFF;
|
||||||
|
std::fs::write(&wal_file, &data).expect("write");
|
||||||
|
|
||||||
|
let report = recover_file(&wal_file).expect("recover");
|
||||||
|
assert_eq!(report.valid_records, 9_999);
|
||||||
|
assert!(
|
||||||
|
report.recovery_duration < Duration::from_secs(10),
|
||||||
|
"Recovery took {:?}",
|
||||||
|
report.recovery_duration
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// =========================================================================
|
||||||
|
// Wave 4: Log Rotation Integration Tests
|
||||||
|
// =========================================================================
|
||||||
|
|
||||||
|
/// Test: Rotation creates new segments at the configured threshold.
|
||||||
|
#[test]
|
||||||
|
fn test_rotation_creates_new_segment() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
// Use a tiny max_segment_size (1KB) to trigger rotation quickly
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 1024).expect("open journal");
|
||||||
|
|
||||||
|
let mut offsets = Vec::new();
|
||||||
|
// Write enough records to trigger multiple rotations
|
||||||
|
for i in 0..50 {
|
||||||
|
let payload = format!("rotation test record {} with some padding data", i).into_bytes();
|
||||||
|
let offset = journal.append(payload).expect("append");
|
||||||
|
offsets.push(offset);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify we created multiple segments
|
||||||
|
let segment_files: Vec<_> = std::fs::read_dir(&wal_path)
|
||||||
|
.expect("readdir")
|
||||||
|
.filter_map(|e| e.ok())
|
||||||
|
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
|
||||||
|
.collect();
|
||||||
|
assert!(segment_files.len() > 1, "Expected multiple segments, got {}", segment_files.len());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test: Records can be read across segment boundaries.
|
||||||
|
#[test]
|
||||||
|
fn test_read_across_segments() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
// 512 byte threshold to force rotation
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open journal");
|
||||||
|
|
||||||
|
let mut records = Vec::new();
|
||||||
|
for i in 0..30 {
|
||||||
|
let payload = format!("cross-segment record {}", i).into_bytes();
|
||||||
|
let offset = journal.append(payload.clone()).expect("append");
|
||||||
|
records.push((offset, payload));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read all records back - some will span segment boundaries
|
||||||
|
for (offset, expected_payload) in &records {
|
||||||
|
let record = journal.read(*offset).expect("read across segments");
|
||||||
|
assert_eq!(&record.payload, expected_payload);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test: Recovery works across multiple segments.
|
||||||
|
#[test]
|
||||||
|
fn test_recovery_across_segments() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
let mut records = Vec::new();
|
||||||
|
|
||||||
|
// Write records with small segments
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
|
||||||
|
for i in 0..20 {
|
||||||
|
let payload = format!("recovery segment test {}", i).into_bytes();
|
||||||
|
let offset = journal.append(payload.clone()).expect("append");
|
||||||
|
records.push((offset, payload));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Append junk to the last segment to simulate torn write
|
||||||
|
let last_segment = std::fs::read_dir(&wal_path)
|
||||||
|
.expect("readdir")
|
||||||
|
.filter_map(|e| e.ok())
|
||||||
|
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
|
||||||
|
.max_by_key(|e| e.file_name())
|
||||||
|
.expect("at least one segment");
|
||||||
|
let mut file =
|
||||||
|
OpenOptions::new().append(true).open(last_segment.path()).expect("open last segment");
|
||||||
|
file.write_all(&[0xDE, 0xAD, 0xBE, 0xEF]).expect("write junk");
|
||||||
|
file.sync_all().expect("sync");
|
||||||
|
|
||||||
|
// Recovery should preserve all valid records
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("reopen");
|
||||||
|
for (offset, expected) in &records {
|
||||||
|
let record = journal.read(*offset).expect("read after recovery");
|
||||||
|
assert_eq!(&record.payload, expected);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test: Appending after recovery with rotation works.
|
||||||
|
#[test]
|
||||||
|
fn test_append_after_recovery_with_rotation() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
let mut pre_recovery_records = Vec::new();
|
||||||
|
|
||||||
|
// Phase 1: Write some records
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
|
||||||
|
for i in 0..10 {
|
||||||
|
let payload = format!("before recovery {}", i).into_bytes();
|
||||||
|
let offset = journal.append(payload.clone()).expect("append");
|
||||||
|
pre_recovery_records.push((offset, payload));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2: Recover and continue writing
|
||||||
|
let mut post_recovery_records = Vec::new();
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("reopen");
|
||||||
|
|
||||||
|
// Verify pre-recovery records
|
||||||
|
for (offset, expected) in &pre_recovery_records {
|
||||||
|
let record = journal.read(*offset).expect("read pre-recovery");
|
||||||
|
assert_eq!(&record.payload, expected);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write more records
|
||||||
|
for i in 0..10 {
|
||||||
|
let payload = format!("after recovery {}", i).into_bytes();
|
||||||
|
let offset = journal.append(payload.clone()).expect("append");
|
||||||
|
post_recovery_records.push((offset, payload));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 3: Final verification
|
||||||
|
{
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("final open");
|
||||||
|
for (offset, expected) in pre_recovery_records.iter().chain(&post_recovery_records) {
|
||||||
|
let record = journal.read(*offset).expect("read final");
|
||||||
|
assert_eq!(&record.payload, expected);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test: Cleanup removes old segments after cursor advances.
|
||||||
|
#[test]
|
||||||
|
fn test_cleanup_removes_old_segments() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let wal_path = dir.path().join("wal");
|
||||||
|
|
||||||
|
let mut journal = Journal::open_with_max_segment_size(&wal_path, 512).expect("open");
|
||||||
|
|
||||||
|
// Write enough to create multiple segments
|
||||||
|
let mut last_offset = 0;
|
||||||
|
for i in 0..30 {
|
||||||
|
let payload = format!("cleanup test record {}", i).into_bytes();
|
||||||
|
last_offset = journal.append(payload).expect("append");
|
||||||
|
}
|
||||||
|
|
||||||
|
let count_segments = || -> usize {
|
||||||
|
std::fs::read_dir(&wal_path)
|
||||||
|
.expect("readdir")
|
||||||
|
.filter_map(|e| e.ok())
|
||||||
|
.filter(|e| e.path().extension().map(|ext| ext == "wal").unwrap_or(false))
|
||||||
|
.count()
|
||||||
|
};
|
||||||
|
|
||||||
|
let initial_segments = count_segments();
|
||||||
|
assert!(initial_segments > 1, "Should have multiple segments");
|
||||||
|
|
||||||
|
// Cleanup with cursor at the last offset should remove old segments
|
||||||
|
let freed = journal.cleanup(last_offset).expect("cleanup");
|
||||||
|
assert!(freed > 0, "Should have freed some bytes");
|
||||||
|
|
||||||
|
let final_segments = count_segments();
|
||||||
|
assert!(
|
||||||
|
final_segments < initial_segments,
|
||||||
|
"Should have fewer segments after cleanup: {} -> {}",
|
||||||
|
initial_segments,
|
||||||
|
final_segments
|
||||||
|
);
|
||||||
|
}
|
||||||
368
crates/stemedb-wal/src/segment.rs
Normal file
368
crates/stemedb-wal/src/segment.rs
Normal file
@ -0,0 +1,368 @@
|
|||||||
|
//! Log rotation via segment files with global offset addressing.
|
||||||
|
//!
|
||||||
|
//! Each segment file is named `{base_offset:016x}.wal` where `base_offset` is
|
||||||
|
//! the global WAL offset where that segment begins. Reads resolve the correct
|
||||||
|
//! segment via binary search, and writes rotate to a new segment when the
|
||||||
|
//! current one exceeds `MAX_SEGMENT_SIZE`.
|
||||||
|
//!
|
||||||
|
//! # Cleanup
|
||||||
|
//!
|
||||||
|
//! `SegmentManager::cleanup(min_cursor)` deletes segments whose entire range
|
||||||
|
//! is below `min_cursor`, freeing disk space after consumers have advanced.
|
||||||
|
|
||||||
|
use crate::error::{QuarantineError, Result};
|
||||||
|
use crate::format::{FileHeader, HEADER_SIZE};
|
||||||
|
use std::fs;
|
||||||
|
use std::path::{Path, PathBuf};
|
||||||
|
use tracing::{debug, info, instrument, warn};
|
||||||
|
|
||||||
|
/// Default maximum segment size (1 GB).
|
||||||
|
pub const DEFAULT_MAX_SEGMENT_SIZE: u64 = 1024 * 1024 * 1024;
|
||||||
|
|
||||||
|
/// A single WAL segment file.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct Segment {
|
||||||
|
/// Global WAL offset where this segment starts.
|
||||||
|
pub base_offset: u64,
|
||||||
|
/// Path to the segment file.
|
||||||
|
pub path: PathBuf,
|
||||||
|
/// Current file size in bytes.
|
||||||
|
pub size: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Segment {
|
||||||
|
/// Format a segment filename from its base offset.
|
||||||
|
pub fn filename(base_offset: u64) -> String {
|
||||||
|
format!("{:016x}.wal", base_offset)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse a base offset from a segment filename.
|
||||||
|
pub fn parse_filename(name: &str) -> Option<u64> {
|
||||||
|
let stem = name.strip_suffix(".wal")?;
|
||||||
|
if stem.len() != 16 {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
u64::from_str_radix(stem, 16).ok()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Manages multiple WAL segment files.
|
||||||
|
pub struct SegmentManager {
|
||||||
|
/// Directory containing segment files.
|
||||||
|
data_dir: PathBuf,
|
||||||
|
/// Segments sorted by base_offset.
|
||||||
|
segments: Vec<Segment>,
|
||||||
|
/// Maximum size per segment before rotation.
|
||||||
|
max_segment_size: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SegmentManager {
|
||||||
|
/// Open an existing segment directory, scanning for segment files.
|
||||||
|
#[instrument(skip_all, fields(data_dir = %data_dir.as_ref().display()))]
|
||||||
|
pub fn open(data_dir: impl AsRef<Path>, max_segment_size: u64) -> Result<Self> {
|
||||||
|
let data_dir = data_dir.as_ref().to_path_buf();
|
||||||
|
fs::create_dir_all(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?;
|
||||||
|
|
||||||
|
let mut segments = Vec::new();
|
||||||
|
|
||||||
|
let entries = fs::read_dir(&data_dir).map_err(|e| QuarantineError::io(&data_dir, e))?;
|
||||||
|
for entry in entries {
|
||||||
|
let entry = entry.map_err(|e| QuarantineError::io(&data_dir, e))?;
|
||||||
|
let name = entry.file_name();
|
||||||
|
let name_str = name.to_string_lossy();
|
||||||
|
|
||||||
|
if let Some(base_offset) = Segment::parse_filename(&name_str) {
|
||||||
|
let meta = entry.metadata().map_err(|e| QuarantineError::io(entry.path(), e))?;
|
||||||
|
segments.push(Segment { base_offset, path: entry.path(), size: meta.len() });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
segments.sort_by_key(|s| s.base_offset);
|
||||||
|
|
||||||
|
debug!(segment_count = segments.len(), "SegmentManager opened");
|
||||||
|
Ok(Self { data_dir, segments, max_segment_size })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Rescan the data directory for new segment files.
|
||||||
|
///
|
||||||
|
/// This is used by read-only journal instances that need to discover
|
||||||
|
/// segments created by a separate writer instance.
|
||||||
|
#[instrument(skip(self), fields(data_dir = %self.data_dir.display()))]
|
||||||
|
pub fn refresh(&mut self) -> Result<()> {
|
||||||
|
let mut segments = Vec::new();
|
||||||
|
|
||||||
|
let entries =
|
||||||
|
fs::read_dir(&self.data_dir).map_err(|e| QuarantineError::io(&self.data_dir, e))?;
|
||||||
|
for entry in entries {
|
||||||
|
let entry = entry.map_err(|e| QuarantineError::io(&self.data_dir, e))?;
|
||||||
|
let name = entry.file_name();
|
||||||
|
let name_str = name.to_string_lossy();
|
||||||
|
|
||||||
|
if let Some(base_offset) = Segment::parse_filename(&name_str) {
|
||||||
|
let meta = entry.metadata().map_err(|e| QuarantineError::io(entry.path(), e))?;
|
||||||
|
segments.push(Segment { base_offset, path: entry.path(), size: meta.len() });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
segments.sort_by_key(|s| s.base_offset);
|
||||||
|
debug!(segment_count = segments.len(), "SegmentManager refreshed");
|
||||||
|
self.segments = segments;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get all segments, sorted by base_offset.
|
||||||
|
pub fn segments(&self) -> &[Segment] {
|
||||||
|
&self.segments
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Find the segment containing the given global offset.
|
||||||
|
///
|
||||||
|
/// Uses binary search: finds the last segment whose `base_offset <= offset`.
|
||||||
|
pub fn resolve_segment(&self, offset: u64) -> Option<&Segment> {
|
||||||
|
if self.segments.is_empty() {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Binary search for the largest base_offset <= offset
|
||||||
|
let idx = match self.segments.binary_search_by_key(&offset, |s| s.base_offset) {
|
||||||
|
Ok(exact) => exact,
|
||||||
|
Err(insert) => {
|
||||||
|
if insert == 0 {
|
||||||
|
return None; // offset is before all segments
|
||||||
|
}
|
||||||
|
insert - 1
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
Some(&self.segments[idx])
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the current (latest) segment, if any.
|
||||||
|
pub fn current_segment(&self) -> Option<&Segment> {
|
||||||
|
self.segments.last()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if the current segment needs rotation.
|
||||||
|
pub fn needs_rotation(&self, current_segment_size: u64) -> bool {
|
||||||
|
current_segment_size >= self.max_segment_size
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new segment with the given base offset.
|
||||||
|
///
|
||||||
|
/// Writes a v2 FileHeader to the new file and adds it to the segment list.
|
||||||
|
#[instrument(skip(self), fields(base_offset))]
|
||||||
|
pub fn create_segment(&mut self, base_offset: u64) -> Result<&Segment> {
|
||||||
|
let filename = Segment::filename(base_offset);
|
||||||
|
let path = self.data_dir.join(&filename);
|
||||||
|
|
||||||
|
// Write header
|
||||||
|
let header = FileHeader::new();
|
||||||
|
let mut buf = Vec::with_capacity(HEADER_SIZE);
|
||||||
|
header.write_to(&mut buf)?;
|
||||||
|
fs::write(&path, &buf).map_err(|e| QuarantineError::io(&path, e))?;
|
||||||
|
|
||||||
|
let segment = Segment { base_offset, path, size: HEADER_SIZE as u64 };
|
||||||
|
|
||||||
|
self.segments.push(segment);
|
||||||
|
info!(base_offset, filename, "Created new segment");
|
||||||
|
|
||||||
|
self.segments.last().ok_or_else(|| {
|
||||||
|
QuarantineError::IoGeneric(std::io::Error::other("segment list unexpectedly empty"))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Delete segments whose entire range is below `min_cursor`.
|
||||||
|
///
|
||||||
|
/// A segment can be deleted if the *next* segment's base_offset <= min_cursor,
|
||||||
|
/// meaning no reads will ever need the deleted segment.
|
||||||
|
///
|
||||||
|
/// Returns the number of bytes freed.
|
||||||
|
#[instrument(skip(self))]
|
||||||
|
pub fn cleanup(&mut self, min_cursor: u64) -> Result<u64> {
|
||||||
|
let mut freed = 0u64;
|
||||||
|
let mut to_remove = Vec::new();
|
||||||
|
|
||||||
|
for (i, _segment) in self.segments.iter().enumerate() {
|
||||||
|
// Can only delete if there's a next segment and it starts at or below min_cursor
|
||||||
|
if i + 1 < self.segments.len() && self.segments[i + 1].base_offset <= min_cursor {
|
||||||
|
to_remove.push(i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Remove in reverse order to preserve indices
|
||||||
|
for &idx in to_remove.iter().rev() {
|
||||||
|
let segment = &self.segments[idx];
|
||||||
|
info!(
|
||||||
|
base_offset = segment.base_offset,
|
||||||
|
size = segment.size,
|
||||||
|
path = %segment.path.display(),
|
||||||
|
"Deleting old segment"
|
||||||
|
);
|
||||||
|
match fs::remove_file(&segment.path) {
|
||||||
|
Ok(()) => {
|
||||||
|
freed += segment.size;
|
||||||
|
self.segments.remove(idx);
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
warn!(
|
||||||
|
error = %e,
|
||||||
|
path = %segment.path.display(),
|
||||||
|
"Failed to delete segment file, keeping in list"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if freed > 0 {
|
||||||
|
info!(
|
||||||
|
freed_bytes = freed,
|
||||||
|
remaining_segments = self.segments.len(),
|
||||||
|
"Cleanup complete"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(freed)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the data directory path.
|
||||||
|
pub fn data_dir(&self) -> &Path {
|
||||||
|
&self.data_dir
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use tempfile::tempdir;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_segment_name_roundtrip() {
|
||||||
|
let offsets = [0u64, 1, 255, 65536, 0xDEAD_BEEF, u64::MAX];
|
||||||
|
for offset in offsets {
|
||||||
|
let name = Segment::filename(offset);
|
||||||
|
let parsed = Segment::parse_filename(&name);
|
||||||
|
assert_eq!(parsed, Some(offset), "Roundtrip failed for offset {}", offset);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_filename_rejects_invalid() {
|
||||||
|
assert_eq!(Segment::parse_filename("not_a_wal.txt"), None);
|
||||||
|
assert_eq!(Segment::parse_filename("short.wal"), None);
|
||||||
|
assert_eq!(Segment::parse_filename("0000000000000000.log"), None);
|
||||||
|
assert_eq!(Segment::parse_filename(""), None);
|
||||||
|
// Too many hex digits
|
||||||
|
assert_eq!(Segment::parse_filename("00000000000000000.wal"), None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_resolve_segment_binary_search() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
|
||||||
|
|
||||||
|
// Create segments at offsets 0, 1000, 2000
|
||||||
|
mgr.create_segment(0).expect("seg 0");
|
||||||
|
mgr.create_segment(1000).expect("seg 1000");
|
||||||
|
mgr.create_segment(2000).expect("seg 2000");
|
||||||
|
|
||||||
|
// Offset 0 -> segment 0
|
||||||
|
assert_eq!(mgr.resolve_segment(0).map(|s| s.base_offset), Some(0));
|
||||||
|
// Offset 500 -> segment 0
|
||||||
|
assert_eq!(mgr.resolve_segment(500).map(|s| s.base_offset), Some(0));
|
||||||
|
// Offset 999 -> segment 0
|
||||||
|
assert_eq!(mgr.resolve_segment(999).map(|s| s.base_offset), Some(0));
|
||||||
|
// Offset 1000 -> segment 1000
|
||||||
|
assert_eq!(mgr.resolve_segment(1000).map(|s| s.base_offset), Some(1000));
|
||||||
|
// Offset 1500 -> segment 1000
|
||||||
|
assert_eq!(mgr.resolve_segment(1500).map(|s| s.base_offset), Some(1000));
|
||||||
|
// Offset 2000 -> segment 2000
|
||||||
|
assert_eq!(mgr.resolve_segment(2000).map(|s| s.base_offset), Some(2000));
|
||||||
|
// Offset 99999 -> segment 2000
|
||||||
|
assert_eq!(mgr.resolve_segment(99999).map(|s| s.base_offset), Some(2000));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_resolve_segment_empty() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
|
||||||
|
assert!(mgr.resolve_segment(0).is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_rotation_creates_new_segment() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
// Small threshold for testing: 1KB
|
||||||
|
let mut mgr = SegmentManager::open(dir.path(), 1024).expect("open");
|
||||||
|
|
||||||
|
mgr.create_segment(0).expect("create seg 0");
|
||||||
|
assert_eq!(mgr.segments().len(), 1);
|
||||||
|
|
||||||
|
// Simulate that segment 0 grew beyond threshold
|
||||||
|
assert!(mgr.needs_rotation(2048));
|
||||||
|
assert!(!mgr.needs_rotation(512));
|
||||||
|
|
||||||
|
mgr.create_segment(2048).expect("create seg 2048");
|
||||||
|
assert_eq!(mgr.segments().len(), 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_cleanup_deletes_old_segments() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
|
||||||
|
|
||||||
|
mgr.create_segment(0).expect("seg 0");
|
||||||
|
mgr.create_segment(1000).expect("seg 1000");
|
||||||
|
mgr.create_segment(2000).expect("seg 2000");
|
||||||
|
assert_eq!(mgr.segments().len(), 3);
|
||||||
|
|
||||||
|
// Cleanup with min_cursor=1500: can delete seg 0 (next seg starts at 1000 <= 1500)
|
||||||
|
let freed = mgr.cleanup(1500).expect("cleanup");
|
||||||
|
assert!(freed > 0);
|
||||||
|
assert_eq!(mgr.segments().len(), 2);
|
||||||
|
assert_eq!(mgr.segments()[0].base_offset, 1000);
|
||||||
|
|
||||||
|
// Cleanup with min_cursor=2500: can delete seg 1000 (next starts at 2000 <= 2500)
|
||||||
|
let freed = mgr.cleanup(2500).expect("cleanup");
|
||||||
|
assert!(freed > 0);
|
||||||
|
assert_eq!(mgr.segments().len(), 1);
|
||||||
|
assert_eq!(mgr.segments()[0].base_offset, 2000);
|
||||||
|
|
||||||
|
// Last segment is never deleted
|
||||||
|
let freed = mgr.cleanup(u64::MAX).expect("cleanup");
|
||||||
|
assert_eq!(freed, 0);
|
||||||
|
assert_eq!(mgr.segments().len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_segment_manager_scans_existing_files() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
|
||||||
|
// Create segments manually, then reopen
|
||||||
|
{
|
||||||
|
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
|
||||||
|
mgr.create_segment(0).expect("seg 0");
|
||||||
|
mgr.create_segment(5000).expect("seg 5000");
|
||||||
|
mgr.create_segment(10000).expect("seg 10000");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reopen and verify scan
|
||||||
|
let mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("reopen");
|
||||||
|
assert_eq!(mgr.segments().len(), 3);
|
||||||
|
assert_eq!(mgr.segments()[0].base_offset, 0);
|
||||||
|
assert_eq!(mgr.segments()[1].base_offset, 5000);
|
||||||
|
assert_eq!(mgr.segments()[2].base_offset, 10000);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_segment_file_has_valid_header() {
|
||||||
|
let dir = tempdir().expect("tempdir");
|
||||||
|
let mut mgr = SegmentManager::open(dir.path(), DEFAULT_MAX_SEGMENT_SIZE).expect("open");
|
||||||
|
mgr.create_segment(0).expect("seg 0");
|
||||||
|
|
||||||
|
// Read the file and verify header
|
||||||
|
let data = std::fs::read(&mgr.segments()[0].path).expect("read");
|
||||||
|
assert_eq!(data.len(), HEADER_SIZE);
|
||||||
|
assert_eq!(&data[0..4], b"STEM");
|
||||||
|
assert_eq!(data[4], 2); // version
|
||||||
|
}
|
||||||
|
}
|
||||||
25
roadmap.md
25
roadmap.md
@ -603,17 +603,18 @@
|
|||||||
|
|
||||||
#### 5A. Storage Engine Replacement
|
#### 5A. Storage Engine Replacement
|
||||||
|
|
||||||
- [ ] **5A.1 Replace sled with redb + fjall**: sled is abandoned (author recommends alternatives).
|
- [x] **5A.1 Replace sled with redb + fjall**: sled is abandoned (author recommends alternatives).
|
||||||
- **Problem:** sled is alpha-stage with known performance regressions and no active development. Our entire storage layer depends on it.
|
- **Problem:** sled is alpha-stage with known performance regressions and no active development. Our entire storage layer depends on it.
|
||||||
- **Solution:** Use **redb** (pure Rust B-tree, 1.0 stable since 2023) for read-heavy paths and **fjall** (Rust LSM engine v3.0, lowest write amplification) for write-heavy paths.
|
- **Solution:** HybridStore routes keys by prefix — **fjall** (LSM) for write-heavy paths (`H:`, `V:`, `VC:`, `VW:`, `E:`, `SUPERSEDED:`, `__CURSOR__:`) and **redb** (B-tree) for read-heavy paths (`S:`, `SP:`, `MV:`, `TR:`, `QA:`, `QT:`, `TP:`, `GS:`, `ESC:`).
|
||||||
- **Tasks:**
|
- **Tasks:**
|
||||||
- [ ] Abstract `KVStore` trait to be backend-agnostic (already trait-based, verify no sled-specific leakage).
|
- [x] Generalize `StorageError::Sled` to `StorageError::Backend(String)`.
|
||||||
- [ ] Implement `RedbStore` backend with ACID transactions.
|
- [x] Implement `FjallStore` backend with DashMap per-key locks for atomics.
|
||||||
- [ ] Implement `FjallStore` backend for high-throughput assertion writes.
|
- [x] Implement `RedbStore` backend with ACID transactions.
|
||||||
- [ ] Benchmark: redb vs fjall vs sled for our access patterns (bulk load, random read, prefix scan).
|
- [x] Implement `HybridStore` routing layer with prefix-based dispatch.
|
||||||
- [ ] Migration tool: read all sled data, write to new backend.
|
- [x] Migrate all ~500 tests from `SledStore` to `HybridStore`.
|
||||||
- [ ] Update all integration tests.
|
- [x] Remove sled dependency entirely.
|
||||||
- **Crates:** `redb = "2.0"`, `fjall = "3.0"`
|
- [x] Add criterion benchmarks (sequential put, random get, prefix scan, atomic increment, mixed workload).
|
||||||
|
- **Crates:** `redb = "2"`, `fjall = "2"`, `dashmap = "6"`
|
||||||
|
|
||||||
- [ ] **5A.2 Key Layout Redesign**: Prepare keys for subject-prefix range sharding.
|
- [ ] **5A.2 Key Layout Redesign**: Prepare keys for subject-prefix range sharding.
|
||||||
- **Problem:** Current keys (`H:{hash}`, `S:{subject}`, `MV:{subject}:{predicate}`) scatter related data across the keyspace. Distributed sharding needs co-location.
|
- **Problem:** Current keys (`H:{hash}`, `S:{subject}`, `MV:{subject}:{predicate}`) scatter related data across the keyspace. Distributed sharding needs co-location.
|
||||||
@ -930,9 +931,9 @@
|
|||||||
* [x] **Phase 3 The Pilot**: Consumer Health vertical integration. ✅ COMPLETE
|
* [x] **Phase 3 The Pilot**: Consumer Health vertical integration. ✅ COMPLETE
|
||||||
* [x] **Phase 4 The Hive**: Trust & Scale + Extension Primitives. ✅ COMPLETE
|
* [x] **Phase 4 The Hive**: Trust & Scale + Extension Primitives. ✅ COMPLETE
|
||||||
* [ ] **Phase 5 The Forge**: Foundation hardening — replace sled, fix WAL, persist indices.
|
* [ ] **Phase 5 The Forge**: Foundation hardening — replace sled, fix WAL, persist indices.
|
||||||
|
* [x] **5A.1**: Replace sled with redb/fjall (HybridStore). ✅ COMPLETE
|
||||||
|
|
||||||
### Next Up
|
### Next Up
|
||||||
* **Phase 5A.1**: Replace sled with redb/fjall (critical — sled is abandoned).
|
|
||||||
* **Phase 5B.2**: Implement real crash recovery (current recovery is a stub).
|
* **Phase 5B.2**: Implement real crash recovery (current recovery is a stub).
|
||||||
* **Phase 5B.3**: Group commit for WAL throughput.
|
* **Phase 5B.3**: Group commit for WAL throughput.
|
||||||
* **Phase 5A.2**: Key layout redesign for subject-prefix sharding.
|
* **Phase 5A.2**: Key layout redesign for subject-prefix sharding.
|
||||||
@ -1053,7 +1054,7 @@
|
|||||||
* [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) — Spanner/CockroachDB-style distributed writes adapted for append-only model.
|
* [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) — Spanner/CockroachDB-style distributed writes adapted for append-only model.
|
||||||
|
|
||||||
### Key Architectural Decisions
|
### Key Architectural Decisions
|
||||||
* **sled → redb/fjall**: sled is abandoned. redb for reads, fjall for writes.
|
* **sled → redb/fjall**: sled is abandoned. HybridStore routes by key prefix: redb for reads, fjall for writes. ✅ COMPLETE
|
||||||
* **Raft log = WAL**: TiKV eliminated duplicate WAL in v5.4. We should too.
|
* **Raft log = WAL**: TiKV eliminated duplicate WAL in v5.4. We should too.
|
||||||
* **CRDT for data, Raft for coordination**: Assertions are a G-Set CRDT (merge = set union). Only cluster metadata needs Raft.
|
* **CRDT for data, Raft for coordination**: Assertions are a G-Set CRDT (merge = set union). Only cluster metadata needs Raft.
|
||||||
* **Subject-prefix ranges**: Co-locate all data for a subject on one shard. Split hot subjects via range split.
|
* **Subject-prefix ranges**: Co-locate all data for a subject on one shard. Split hot subjects via range split.
|
||||||
@ -1173,7 +1174,7 @@ Phase 3 (Data Foundation) Phase 4 (Extension Primitives) Extensio
|
|||||||
Phase 5 (The Forge) Phase 6 (The Mesh) Phase 7+8
|
Phase 5 (The Forge) Phase 6 (The Mesh) Phase 7+8
|
||||||
======================= ======================= ==================
|
======================= ======================= ==================
|
||||||
|
|
||||||
[5A.1 Replace sled] ──────────────> [6A.1 CRDT Foundation] ──┐
|
[5A.1 Replace sled ✅] ───────────> [6A.1 CRDT Foundation] ──┐
|
||||||
| |
|
| |
|
||||||
[5A.2 Key Layout] ───────────────> [6C.2 Range Sharding] ──> |
|
[5A.2 Key Layout] ───────────────> [6C.2 Range Sharding] ──> |
|
||||||
|
|
|
|
||||||
|
|||||||
@ -10,30 +10,15 @@ Think of it as **Git for Truth**: just as Git lets developers work on different
|
|||||||
|
|
||||||
## The Problem We Solve
|
## The Problem We Solve
|
||||||
|
|
||||||
### The Semaglutide Story
|
|
||||||
|
|
||||||
A woman researching a weight-loss medication finds:
|
|
||||||
|
|
||||||
| Source | Says |
|
|
||||||
|--------|------|
|
|
||||||
| Her doctor | "Generally well-tolerated" |
|
|
||||||
| FDA label | "Thyroid warning, gastroparesis rare" |
|
|
||||||
| Reddit (500+ posts) | "Stomach paralysis, can't eat, hospitalized" |
|
|
||||||
| Clinical trials | "No gastroparesis signal in Phase III" |
|
|
||||||
|
|
||||||
**What should she believe?**
|
|
||||||
|
|
||||||
A traditional database would force someone to pick one answer. The Reddit signal gets ignored or the clinical trial gets overwritten. In January 2024, the FDA added a gastroparesis warning. The Reddit users were right. The system failed because it couldn't hold "clinical trials say X, patients report Y, and these disagree" as a structured fact.
|
|
||||||
|
|
||||||
### The M&A Story
|
### The M&A Story
|
||||||
|
|
||||||
Three analyst teams assess an acquisition target. They find:
|
Three analyst teams assess an acquisition target. They find:
|
||||||
|
|
||||||
| Team | Revenue Estimate |
|
| Team | Revenue Estimate |
|
||||||
|------|------------------|
|
| -------------------- | ---------------- |
|
||||||
| SEC Filing Analysis | $47M |
|
| SEC Filing Analysis | $47M |
|
||||||
| Investor Deck | $62M |
|
| Investor Deck | $62M |
|
||||||
| Bank Statement Audit | $52M |
|
| Bank Statement Audit | $52M |
|
||||||
|
|
||||||
The database forces "canonical truth." The acquirer picks the investor deck number. They overpay by $180M. Post-acquisition, the SEC filing was right.
|
The database forces "canonical truth." The acquirer picks the investor deck number. They overpay by $180M. Post-acquisition, the SEC filing was right.
|
||||||
|
|
||||||
@ -41,21 +26,34 @@ The database forces "canonical truth." The acquirer picks the investor deck numb
|
|||||||
|
|
||||||
An AI agent is tasked with deploying a microservice update. It finds:
|
An AI agent is tasked with deploying a microservice update. It finds:
|
||||||
|
|
||||||
| Source | Says |
|
| Source | Says |
|
||||||
|--------|------|
|
| ---------------------- | --------------------------------------------- |
|
||||||
| RFC 7519 (JWT spec) | "Tokens MUST be validated with `aud` claim" |
|
| RFC 7519 (JWT spec) | "Tokens MUST be validated with `aud` claim" |
|
||||||
| Internal Wiki (2024) | "Skip `aud` validation for internal services" |
|
| Internal Wiki (2024) | "Skip `aud` validation for internal services" |
|
||||||
| Approved Runbook v3.2 | "Validate all claims including `aud`" |
|
| Approved Runbook v3.2 | "Validate all claims including `aud`" |
|
||||||
| Stack Overflow snippet | "Just set `verify=false`, it's internal" |
|
| Stack Overflow snippet | "Just set `verify=false`, it's internal" |
|
||||||
|
|
||||||
The agent picks the Stack Overflow snippet—it's the most recent thing it found. It deploys. At 2 AM, an attacker uses a token minted for the staging environment to access production. Customer data leaks. The postmortem reveals: the agent never saw the conflict between the RFC and the wiki. The database held "the latest answer," not "the disagreement."
|
The agent picks the Stack Overflow snippet—it's the most recent thing it found. It deploys. At 2 AM, an attacker uses a token minted for the staging environment to access production. Customer data leaks. The postmortem reveals: the agent never saw the conflict between the RFC and the wiki. The database held "the latest answer," not "the disagreement."
|
||||||
|
|
||||||
**The problem wasn't bad data. The problem was that the database erased the disagreement.**
|
**The problem wasn't bad data. The problem was that the database erased the disagreement.**
|
||||||
|
|
||||||
Episteme would have surfaced the conflict: "RFC 7519 (Tier 0, regulatory) contradicts Internal Wiki (Tier 3, expert). Conflict score: 0.9. The Approved Runbook agrees with the RFC." The agent—or a human reviewer—sees the disagreement *before* deployment, not after the breach.
|
Episteme would have surfaced the conflict: "RFC 7519 (Tier 0, regulatory) contradicts Internal Wiki (Tier 3, expert). Conflict score: 0.9. The Approved Runbook agrees with the RFC." The agent—or a human reviewer—sees the disagreement _before_ deployment, not after the breach.
|
||||||
|
|
||||||
**Episteme prevents AI agents from hallucinating production configs.**
|
**Episteme prevents AI agents from hallucinating production configs.**
|
||||||
|
|
||||||
|
### The Pharmaceutical Safety Story
|
||||||
|
|
||||||
|
A doctor reviews the safety profile of a newly prescribed medication and finds conflicting information across sources:
|
||||||
|
|
||||||
|
| Source | Says |
|
||||||
|
| ------------------- | -------------------------------------------- |
|
||||||
|
| Prescribing info | "Generally well-tolerated" |
|
||||||
|
| FDA label | "Thyroid warning, gastroparesis rare" |
|
||||||
|
| Reddit (500+ posts) | "Stomach paralysis, can't eat, hospitalized" |
|
||||||
|
| Clinical trials | "No gastroparesis signal in Phase III" |
|
||||||
|
|
||||||
|
A traditional database would force someone to pick one answer. The patient reports get ignored or the clinical trial gets overwritten. When the FDA later adds a gastroparesis warning, it turns out the patient community was right. The system failed because it couldn't hold "clinical trials say X, patients report Y, and these disagree" as a structured fact.
|
||||||
|
|
||||||
### What These Stories Have in Common
|
### What These Stories Have in Common
|
||||||
|
|
||||||
The problem wasn't bad data. In each case, the correct information existed. The problem was that the database erased the disagreement—and nobody automated the reconciliation.
|
The problem wasn't bad data. In each case, the correct information existed. The problem was that the database erased the disagreement—and nobody automated the reconciliation.
|
||||||
@ -94,14 +92,14 @@ You can query for the **conflict score** and see exactly where sources agree and
|
|||||||
|
|
||||||
Every claim has a **source class** that affects how much weight it carries:
|
Every claim has a **source class** that affects how much weight it carries:
|
||||||
|
|
||||||
| Tier | Source Type | Examples | Decay Rate |
|
| Tier | Source Type | Examples | Decay Rate |
|
||||||
|------|-------------|----------|------------|
|
| ---- | ------------- | -------------------- | ----------------- |
|
||||||
| 0 | Regulatory | FDA, SEC, EMA | Never fades |
|
| 0 | Regulatory | FDA, SEC, EMA | Never fades |
|
||||||
| 1 | Clinical | Peer-reviewed trials | 2 year half-life |
|
| 1 | Clinical | Peer-reviewed trials | 2 year half-life |
|
||||||
| 2 | Observational | Real-world studies | 1 year half-life |
|
| 2 | Observational | Real-world studies | 1 year half-life |
|
||||||
| 3 | Expert | Doctor opinions | 6 month half-life |
|
| 3 | Expert | Doctor opinions | 6 month half-life |
|
||||||
| 4 | Community | Patient registries | 3 month half-life |
|
| 4 | Community | Patient registries | 3 month half-life |
|
||||||
| 5 | Anecdotal | Reddit, social media | 30 day half-life |
|
| 5 | Anecdotal | Reddit, social media | 30 day half-life |
|
||||||
|
|
||||||
A million Reddit posts can't outvote an FDA label. But they can signal "something is happening here" that deserves attention.
|
A million Reddit posts can't outvote an FDA label. But they can signal "something is happening here" that deserves attention.
|
||||||
|
|
||||||
@ -145,13 +143,13 @@ Episteme preserves every historical state. You can query what was believed at an
|
|||||||
|
|
||||||
The same data can be queried with different **Lenses**:
|
The same data can be queried with different **Lenses**:
|
||||||
|
|
||||||
| Lens | Question | Answer Style |
|
| Lens | Question | Answer Style |
|
||||||
|------|----------|--------------|
|
| ------------- | -------------------------------- | ------------------------------------- |
|
||||||
| **Consensus** | "What do most sources agree on?" | The most common answer |
|
| **Consensus** | "What do most sources agree on?" | The most common answer |
|
||||||
| **Authority** | "What do trusted sources say?" | Weighted by source tier |
|
| **Authority** | "What do trusted sources say?" | Weighted by source tier |
|
||||||
| **Recency** | "What's the latest?" | Most recent claim wins |
|
| **Recency** | "What's the latest?" | Most recent claim wins |
|
||||||
| **Skeptic** | "Where is there disagreement?" | Shows all claims with conflict scores |
|
| **Skeptic** | "Where is there disagreement?" | Shows all claims with conflict scores |
|
||||||
| **Layered** | "What does each tier believe?" | Tier-by-tier breakdown |
|
| **Layered** | "What does each tier believe?" | Tier-by-tier breakdown |
|
||||||
|
|
||||||
The **Skeptic** lens is particularly powerful: instead of hiding disagreement, it surfaces it. "Here's where clinical trials and patient reports diverge."
|
The **Skeptic** lens is particularly powerful: instead of hiding disagreement, it surfaces it. "Here's where clinical trials and patient reports diverge."
|
||||||
|
|
||||||
@ -162,6 +160,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
|
|||||||
### Consumer Health Intelligence
|
### Consumer Health Intelligence
|
||||||
|
|
||||||
**The Living Review:** A continuously updated assessment of a drug or treatment that:
|
**The Living Review:** A continuously updated assessment of a drug or treatment that:
|
||||||
|
|
||||||
- Shows regulatory, clinical, and patient evidence separately
|
- Shows regulatory, clinical, and patient evidence separately
|
||||||
- Surfaces emerging signals from patient communities before clinical confirmation
|
- Surfaces emerging signals from patient communities before clinical confirmation
|
||||||
- Time-travels to "what was known when you started treatment"
|
- Time-travels to "what was known when you started treatment"
|
||||||
@ -172,6 +171,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
|
|||||||
### Financial Due Diligence
|
### Financial Due Diligence
|
||||||
|
|
||||||
**The Contradiction Detector:** Multiple analyst teams assess a target. The system:
|
**The Contradiction Detector:** Multiple analyst teams assess a target. The system:
|
||||||
|
|
||||||
- Holds all revenue/liability estimates without forcing resolution
|
- Holds all revenue/liability estimates without forcing resolution
|
||||||
- Shows where teams agree (high confidence) vs. disagree (investigate further)
|
- Shows where teams agree (high confidence) vs. disagree (investigate further)
|
||||||
- Tracks which sources informed which conclusions
|
- Tracks which sources informed which conclusions
|
||||||
@ -182,6 +182,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
|
|||||||
### DevOps & Production Safety
|
### DevOps & Production Safety
|
||||||
|
|
||||||
**The Config Guardian:** AI agents deploy infrastructure changes. The system:
|
**The Config Guardian:** AI agents deploy infrastructure changes. The system:
|
||||||
|
|
||||||
- Holds specs from RFCs, internal wikis, runbooks, and Stack Overflow with source tiers
|
- Holds specs from RFCs, internal wikis, runbooks, and Stack Overflow with source tiers
|
||||||
- Blocks deployments when high-tier sources (RFCs, approved runbooks) conflict with the agent's chosen config
|
- Blocks deployments when high-tier sources (RFCs, approved runbooks) conflict with the agent's chosen config
|
||||||
- Auto-escalates to human review when conflict score exceeds threshold
|
- Auto-escalates to human review when conflict score exceeds threshold
|
||||||
@ -192,6 +193,7 @@ The **Skeptic** lens is particularly powerful: instead of hiding disagreement, i
|
|||||||
### AI Agent Collaboration
|
### AI Agent Collaboration
|
||||||
|
|
||||||
**The Shared Memory:** Multiple AI research agents explore a topic. The system:
|
**The Shared Memory:** Multiple AI research agents explore a topic. The system:
|
||||||
|
|
||||||
- Lets each agent contribute observations with confidence scores
|
- Lets each agent contribute observations with confidence scores
|
||||||
- Resolves conflicts based on agent reputation (trust scores)
|
- Resolves conflicts based on agent reputation (trust scores)
|
||||||
- Maintains audit trail: "Agent A believed X because it read Y"
|
- Maintains audit trail: "Agent A believed X because it read Y"
|
||||||
@ -227,10 +229,10 @@ In return for that vote, the extension **overlays everything**:
|
|||||||
│ Conflict Score: ██████████░░ 0.82 │
|
│ Conflict Score: ██████████░░ 0.82 │
|
||||||
│ │
|
│ │
|
||||||
│ ▼ Competing claims (4 sources) │
|
│ ▼ Competing claims (4 sources) │
|
||||||
│ FDA Label (Tier 0): gastroparesis warning added │
|
│ FDA Label (Tier 0): gastroparesis warning added │
|
||||||
│ NEJM Trial (Tier 1): no signal in Phase III │
|
│ NEJM Trial (Tier 1): no signal in Phase III │
|
||||||
│ Patient Registry (Tier 4): 340 reports │
|
│ Patient Registry (Tier 4): 340 reports │
|
||||||
│ This page (Tier 5): "no serious side effects" │
|
│ This page (Tier 5): "no serious side effects" │
|
||||||
│ │
|
│ │
|
||||||
│ ▼ Decay: this claim is 8mo old, confidence 0.11 │
|
│ ▼ Decay: this claim is 8mo old, confidence 0.11 │
|
||||||
│ ▼ Timeline: 3 major shifts since publication │
|
│ ▼ Timeline: 3 major shifts since publication │
|
||||||
@ -254,6 +256,7 @@ This isn't a fact-checker. Fact-checkers pick a side. This shows you **all the s
|
|||||||
**Traditional databases optimize for consensus.** They want one answer.
|
**Traditional databases optimize for consensus.** They want one answer.
|
||||||
|
|
||||||
**Episteme optimizes for epistemic honesty.** It wants you to see:
|
**Episteme optimizes for epistemic honesty.** It wants you to see:
|
||||||
|
|
||||||
- What different sources believe
|
- What different sources believe
|
||||||
- How confident they are
|
- How confident they are
|
||||||
- Where they disagree
|
- Where they disagree
|
||||||
@ -283,16 +286,16 @@ The result: a database that acts more like a **version control system for knowle
|
|||||||
|
|
||||||
## When Is Episteme the Right Choice?
|
## When Is Episteme the Right Choice?
|
||||||
|
|
||||||
| Scenario | Episteme? | Why |
|
| Scenario | Episteme? | Why |
|
||||||
|----------|-----------|-----|
|
| ---------------------------------------- | --------- | ------------------------ |
|
||||||
| Multiple sources report different things | Yes | Core use case |
|
| Multiple sources report different things | Yes | Core use case |
|
||||||
| You need to weight sources by authority | Yes | Source class hierarchy |
|
| You need to weight sources by authority | Yes | Source class hierarchy |
|
||||||
| You need to surface disagreement | Yes | Skeptic lens |
|
| You need to surface disagreement | Yes | Skeptic lens |
|
||||||
| You need historical snapshots | Yes | Time-travel queries |
|
| You need historical snapshots | Yes | Time-travel queries |
|
||||||
| You need audit trails | Yes | Query audit + signatures |
|
| You need audit trails | Yes | Query audit + signatures |
|
||||||
| You have one source of truth | No | Use Postgres |
|
| You have one source of truth | No | Use Postgres |
|
||||||
| Data never conflicts | No | Use Postgres |
|
| Data never conflicts | No | Use Postgres |
|
||||||
| Consensus is pre-determined | No | Use Postgres |
|
| Consensus is pre-determined | No | Use Postgres |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user