fix: merge upstream 10 commits, fix DashMap deadlock, deterministic sim ingestion
Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint, security hardening, signed assertions, source registry, dashboard enhancements) and fixed all test failures across the full workspace (2656/2656 passing). Key fixes: - fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges() - 6 previously-hanging SWIM tests now pass in <2s - fix(sim): replace background-task+polling ingestion with synchronous process_pending() - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests - Removed ingestor.start() + wait_until_ingested() pattern throughout sim - All arena functions now call ingestor.process_pending() directly (deterministic) - fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2) - fix(test): quota test signed "test" but v1 requires "subject:predicate" format - fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex - fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change) - config: add nextest.toml with slow-timeout for background-task-tests group Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
ad07a75d0a
commit
02ecac9a07
22
.config/nextest.toml
Normal file
22
.config/nextest.toml
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
# Nextest configuration for StemeDB workspace.
|
||||||
|
#
|
||||||
|
# References:
|
||||||
|
# https://nextest.rs/configuration/overview.html
|
||||||
|
# https://nextest.rs/configuration/test-groups.html
|
||||||
|
|
||||||
|
# Tests that spawn background tokio tasks and wait for them to make progress
|
||||||
|
# (e.g. IngestWorker cursor polling) need exclusive CPU access under parallel
|
||||||
|
# test load. Without this, the background tasks get starved and timeout.
|
||||||
|
[test-groups.background-task-tests]
|
||||||
|
max-threads = 1
|
||||||
|
|
||||||
|
[profile.default]
|
||||||
|
# Give long-running simulation tests enough time before marking them as slow.
|
||||||
|
slow-timeout = { period = "60s" }
|
||||||
|
|
||||||
|
[[profile.default.overrides]]
|
||||||
|
# smoke_high_volume_simulation spawns a background IngestWorker and polls its
|
||||||
|
# cursor. Under full parallel load (2000+ concurrent tests) the tokio scheduler
|
||||||
|
# starves the background task, causing spurious timeout failures.
|
||||||
|
filter = 'test(smoke_high_volume_simulation)'
|
||||||
|
test-group = "background-task-tests"
|
||||||
13
GEMINI.md
13
GEMINI.md
@ -12,7 +12,7 @@ It serves as the "Git for Truth," allowing agents to:
|
|||||||
## Tech Stack
|
## Tech Stack
|
||||||
* **Language:** Rust (2024 edition)
|
* **Language:** Rust (2024 edition)
|
||||||
* **Durability:** `stemedb-wal` (Quarantine Pattern with `fs2`, `blake3` checksums)
|
* **Durability:** `stemedb-wal` (Quarantine Pattern with `fs2`, `blake3` checksums)
|
||||||
* **Storage:** `stemedb-storage` (`sled` embedded KV, abstracted via `KVStore` trait)
|
* **Storage:** `stemedb-storage` (Hybrid Store: `fjall` LSM-tree for writes, `redb` B-tree for reads)
|
||||||
* **Serialization:** `rkyv` (Zero-copy deserialization for high performance)
|
* **Serialization:** `rkyv` (Zero-copy deserialization for high performance)
|
||||||
* **Ingestion:** `stemedb-ingest` (Async background worker bridging WAL and Store)
|
* **Ingestion:** `stemedb-ingest` (Async background worker bridging WAL and Store)
|
||||||
* **Simulation:** `stemedb-sim` (Agent-based modeling to verify system behavior)
|
* **Simulation:** `stemedb-sim` (Agent-based modeling to verify system behavior)
|
||||||
@ -25,12 +25,12 @@ The system follows a "Spine -> Lattice -> Cortex" architecture:
|
|||||||
* **Ingestor:** Background task that tails the WAL and indexes data.
|
* **Ingestor:** Background task that tails the WAL and indexes data.
|
||||||
* **KV Store:** Persistent storage for assertions and indexes.
|
* **KV Store:** Persistent storage for assertions and indexes.
|
||||||
|
|
||||||
2. **The Lattice (Connectivity) - *In Progress*:**
|
2. **The Lattice (Connectivity) - *Implemented*:**
|
||||||
* **Ballot Box:** High-velocity vote stream.
|
* **Ballot Box:** High-velocity vote stream.
|
||||||
* **Materialized Views:** Pre-computed truth states.
|
* **Materialized Views:** Pre-computed truth states.
|
||||||
|
|
||||||
3. **The Cortex (Reasoning) - *Planned*:**
|
3. **The Cortex (Reasoning) - *Implemented*:**
|
||||||
* **Lenses:** WASM-based filters for truth resolution.
|
* **Lenses:** WASM-based filters for truth resolution (Consensus, Authority, Recency, etc.).
|
||||||
* **SMT:** Sparse Merkle Trees for efficient branching.
|
* **SMT:** Sparse Merkle Trees for efficient branching.
|
||||||
|
|
||||||
## Key Files & Directories
|
## Key Files & Directories
|
||||||
@ -38,8 +38,9 @@ The system follows a "Spine -> Lattice -> Cortex" architecture:
|
|||||||
* `crates/`
|
* `crates/`
|
||||||
* `stemedb-core/`: Core data structures (`Assertion`, `Vote`, `Epoch`) and types.
|
* `stemedb-core/`: Core data structures (`Assertion`, `Vote`, `Epoch`) and types.
|
||||||
* `stemedb-wal/`: Durability primitives (`Journal`, `FsyncGuard`, `Record`).
|
* `stemedb-wal/`: Durability primitives (`Journal`, `FsyncGuard`, `Record`).
|
||||||
* `stemedb-storage/`: Storage engine abstraction and `sled` implementation.
|
* `stemedb-storage/`: Storage engine abstraction and Hybrid Store implementation.
|
||||||
* `stemedb-ingest/`: Async ingestion pipeline logic.
|
* `stemedb-ingest/`: Async ingestion pipeline logic.
|
||||||
|
* `stemedb-lens/`: Truth Lenses (`Recency`, `Consensus`, `Authority`, `Skeptic`).
|
||||||
* `stemedb-sim/`: "The Arena" simulation for end-to-end verification.
|
* `stemedb-sim/`: "The Arena" simulation for end-to-end verification.
|
||||||
* `architecture.md`: Detailed system design and data flow.
|
* `architecture.md`: Detailed system design and data flow.
|
||||||
* `roadmap.md`: Phased implementation plan and status.
|
* `roadmap.md`: Phased implementation plan and status.
|
||||||
@ -62,4 +63,4 @@ The project uses a `Makefile` for common tasks:
|
|||||||
* Zero warnings allowed.
|
* Zero warnings allowed.
|
||||||
* Missing documentation is a hard error.
|
* Missing documentation is a hard error.
|
||||||
* **Testing:** Every crate must have unit tests. The `stemedb-sim` crate serves as the integration test suite.
|
* **Testing:** Every crate must have unit tests. The `stemedb-sim` crate serves as the integration test suite.
|
||||||
* **Architecture:** Follow the "Defensive by Default" philosophy. Durability > Speed > Features.
|
* **Architecture:** Follow the "Defensive by Default" philosophy. Durability > Speed > Features.
|
||||||
@ -23,8 +23,8 @@ fn test_micro_team_sees_patterns() {
|
|||||||
// - Scale tier: Micro (1-5 projects)
|
// - Scale tier: Micro (1-5 projects)
|
||||||
// - Emerging min_projects: max(2, 0.50*3) = max(2, 1.5) = 2
|
// - Emerging min_projects: max(2, 0.50*3) = max(2, 1.5) = 2
|
||||||
// - Adoption rate: 2/3 = 67% >= 50%
|
// - Adoption rate: 2/3 = 67% >= 50%
|
||||||
// Should require review (emerging tier)
|
// Micro emerging auto-promotes for immediate visibility
|
||||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Community));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
@ -40,8 +40,8 @@ fn test_micro_team_regulatory_disabled() {
|
|||||||
);
|
);
|
||||||
|
|
||||||
// Regulatory tier is disabled for micro teams
|
// Regulatory tier is disabled for micro teams
|
||||||
// Should fall through to emerging tier
|
// Falls through to emerging tier, which auto-promotes for immediate visibility
|
||||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Community));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
|
|||||||
@ -3,6 +3,7 @@ use tracing::instrument;
|
|||||||
|
|
||||||
use crate::llm::{chunk_text, create_client, deduplicate_claims, ChunkConfig, LlmConfig};
|
use crate::llm::{chunk_text, create_client, deduplicate_claims, ChunkConfig, LlmConfig};
|
||||||
use crate::types::{Claim, ClaimCheck, ClaimStatus};
|
use crate::types::{Claim, ClaimCheck, ClaimStatus};
|
||||||
|
use crate::stemedb::Client as StemeClient;
|
||||||
|
|
||||||
use super::settings::SettingsState;
|
use super::settings::SettingsState;
|
||||||
|
|
||||||
@ -90,28 +91,60 @@ fn map_llm_error_to_user_message(e: &crate::llm::LlmError) -> String {
|
|||||||
|
|
||||||
/// Check claims against the knowledge graph.
|
/// Check claims against the knowledge graph.
|
||||||
#[tauri::command]
|
#[tauri::command]
|
||||||
pub async fn check_claims(claims: Vec<Claim>) -> Result<Vec<ClaimCheck>, String> {
|
pub async fn check_claims(
|
||||||
|
state: State<'_, SettingsState>,
|
||||||
|
claims: Vec<Claim>,
|
||||||
|
) -> Result<Vec<ClaimCheck>, String> {
|
||||||
tracing::info!(count = claims.len(), "Checking claims");
|
tracing::info!(count = claims.len(), "Checking claims");
|
||||||
|
|
||||||
// TODO: Week 3 - Check against Episteme
|
let settings = state.0.lock().map_err(|e| format!("Failed to read settings: {}", e))?.clone();
|
||||||
Ok(claims
|
let client = StemeClient::new(settings.stemedb_url);
|
||||||
.into_iter()
|
|
||||||
.map(|claim| ClaimCheck { claim, status: ClaimStatus::New, related: vec![] })
|
let mut checks = Vec::new();
|
||||||
.collect())
|
for claim in claims {
|
||||||
|
match client.check_claim(&claim).await {
|
||||||
|
Ok(check) => checks.push(check),
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Failed to check claim: {}", e);
|
||||||
|
// Return as new if check fails
|
||||||
|
checks.push(ClaimCheck {
|
||||||
|
claim,
|
||||||
|
status: ClaimStatus::New,
|
||||||
|
related: vec![],
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(checks)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Save claims to the knowledge graph.
|
/// Save claims to the knowledge graph.
|
||||||
#[tauri::command]
|
#[tauri::command]
|
||||||
pub async fn save_claims(claims: Vec<Claim>) -> Result<usize, String> {
|
pub async fn save_claims(
|
||||||
|
state: State<'_, SettingsState>,
|
||||||
|
claims: Vec<Claim>,
|
||||||
|
) -> Result<usize, String> {
|
||||||
let count = claims.len();
|
let count = claims.len();
|
||||||
tracing::info!(count, "Saving claims");
|
tracing::info!(count, "Saving claims");
|
||||||
// TODO: Week 3 - Save to Episteme
|
|
||||||
Ok(count)
|
let settings = state.0.lock().map_err(|e| format!("Failed to read settings: {}", e))?.clone();
|
||||||
|
let client = StemeClient::new(settings.stemedb_url);
|
||||||
|
|
||||||
|
let mut saved = 0;
|
||||||
|
for claim in claims {
|
||||||
|
match client.save_claim(&claim).await {
|
||||||
|
Ok(_) => saved += 1,
|
||||||
|
Err(e) => tracing::warn!("Failed to save claim: {}", e),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(saved)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Get the current claim count.
|
/// Get the current claim count.
|
||||||
#[tauri::command]
|
#[tauri::command]
|
||||||
pub async fn get_claim_count() -> Result<usize, String> {
|
pub async fn get_claim_count() -> Result<usize, String> {
|
||||||
// TODO: Week 3 - Query Episteme
|
// TODO: Implement stats endpoint in StemeDB
|
||||||
Ok(0)
|
Ok(0)
|
||||||
}
|
}
|
||||||
@ -3,6 +3,7 @@
|
|||||||
mod commands;
|
mod commands;
|
||||||
mod llm;
|
mod llm;
|
||||||
mod types;
|
mod types;
|
||||||
|
mod stemedb;
|
||||||
|
|
||||||
use commands::{
|
use commands::{
|
||||||
check_claims, extract_claims, get_claim_count, get_settings, save_claims, test_llm_connection,
|
check_claims, extract_claims, get_claim_count, get_settings, save_claims, test_llm_connection,
|
||||||
@ -41,4 +42,4 @@ pub fn run() {
|
|||||||
eprintln!("Failed to start Tauri application: {e}");
|
eprintln!("Failed to start Tauri application: {e}");
|
||||||
std::process::exit(1);
|
std::process::exit(1);
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
135
applications/disputed/app/src-tauri/src/stemedb.rs
Normal file
135
applications/disputed/app/src-tauri/src/stemedb.rs
Normal file
@ -0,0 +1,135 @@
|
|||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use crate::types::{Claim, ClaimCheck, ClaimStatus, RelatedClaim};
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
pub struct QueryResponse {
|
||||||
|
pub assertions: Vec<AssertionResponse>,
|
||||||
|
pub conflict_score: Option<f32>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
pub struct AssertionResponse {
|
||||||
|
pub subject: String,
|
||||||
|
pub predicate: String,
|
||||||
|
pub object: ObjectValue,
|
||||||
|
pub confidence: f32,
|
||||||
|
pub source_class: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
#[serde(untagged)]
|
||||||
|
pub enum ObjectValue {
|
||||||
|
Text(String),
|
||||||
|
Number(f64),
|
||||||
|
Boolean(bool),
|
||||||
|
Link(String),
|
||||||
|
Image(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ToString for ObjectValue {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
match self {
|
||||||
|
ObjectValue::Text(s) => write!(f, "{}", s),
|
||||||
|
ObjectValue::Number(n) => write!(f, "{}", n),
|
||||||
|
ObjectValue::Boolean(b) => write!(f, "{}", b),
|
||||||
|
ObjectValue::Link(s) => write!(f, "{}", s),
|
||||||
|
ObjectValue::Image(s) => write!(f, "[Image: {}]", s),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct Client {
|
||||||
|
url: String,
|
||||||
|
http: reqwest::Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Client {
|
||||||
|
pub fn new(url: String) -> Self {
|
||||||
|
Self {
|
||||||
|
url: url.trim_end_matches('/').to_string(),
|
||||||
|
http: reqwest::Client::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn check_claim(&self, claim: &Claim) -> Result<ClaimCheck, String> {
|
||||||
|
let url = format!("{}/v1/query", self.url);
|
||||||
|
|
||||||
|
// Query using Skeptic lens to see conflicts
|
||||||
|
let response = self.http.get(&url)
|
||||||
|
.query(&[
|
||||||
|
("subject", &claim.subject),
|
||||||
|
("predicate", &claim.predicate),
|
||||||
|
("lens", &"Skeptic".to_string()),
|
||||||
|
])
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.map_err(|e| format!("Request failed: {}", e))?;
|
||||||
|
|
||||||
|
if !response.status().is_success() {
|
||||||
|
return Err(format!("API error: {}", response.status()));
|
||||||
|
}
|
||||||
|
|
||||||
|
let data: QueryResponse = response.json().await
|
||||||
|
.map_err(|e| format!("Parse error: {}", e))?;
|
||||||
|
|
||||||
|
let status = if data.assertions.is_empty() {
|
||||||
|
ClaimStatus::New
|
||||||
|
} else if let Some(score) = data.conflict_score {
|
||||||
|
if score > 0.5 {
|
||||||
|
ClaimStatus::Contradicts
|
||||||
|
} else {
|
||||||
|
ClaimStatus::Matches
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
ClaimStatus::Matches
|
||||||
|
};
|
||||||
|
|
||||||
|
let related = data.assertions.into_iter().map(|a| {
|
||||||
|
RelatedClaim {
|
||||||
|
claim: Claim {
|
||||||
|
subject: a.subject,
|
||||||
|
predicate: a.predicate,
|
||||||
|
object: a.object.to_string(),
|
||||||
|
confidence: a.confidence,
|
||||||
|
quote: "".to_string(),
|
||||||
|
source: Some(a.source_class),
|
||||||
|
},
|
||||||
|
relationship: "existing".to_string(),
|
||||||
|
source: "stemedb".to_string(),
|
||||||
|
}
|
||||||
|
}).collect();
|
||||||
|
|
||||||
|
Ok(ClaimCheck {
|
||||||
|
claim: claim.clone(),
|
||||||
|
status,
|
||||||
|
related,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn save_claim(&self, claim: &Claim) -> Result<(), String> {
|
||||||
|
let url = format!("{}/v1/assert", self.url);
|
||||||
|
|
||||||
|
let body = serde_json::json!({
|
||||||
|
"subject": claim.subject,
|
||||||
|
"predicate": claim.predicate,
|
||||||
|
"object": {
|
||||||
|
"type": "Text",
|
||||||
|
"value": claim.object
|
||||||
|
},
|
||||||
|
"confidence": claim.confidence,
|
||||||
|
"source_class": "Anecdotal", // Default for Disputed
|
||||||
|
});
|
||||||
|
|
||||||
|
let response = self.http.post(&url)
|
||||||
|
.json(&body)
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.map_err(|e| format!("Request failed: {}", e))?;
|
||||||
|
|
||||||
|
if !response.status().is_success() {
|
||||||
|
return Err(format!("API error: {}", response.status()));
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -48,6 +48,7 @@ pub struct Settings {
|
|||||||
pub api_key: Option<String>,
|
pub api_key: Option<String>,
|
||||||
pub auto_save: bool,
|
pub auto_save: bool,
|
||||||
pub notifications_enabled: bool,
|
pub notifications_enabled: bool,
|
||||||
|
pub stemedb_url: String,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for Settings {
|
impl Default for Settings {
|
||||||
@ -57,6 +58,7 @@ impl Default for Settings {
|
|||||||
api_key: None,
|
api_key: None,
|
||||||
auto_save: false,
|
auto_save: false,
|
||||||
notifications_enabled: true,
|
notifications_enabled: true,
|
||||||
|
stemedb_url: "http://localhost:18180".to_string(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -1,7 +1,7 @@
|
|||||||
# Episteme (StemeDB) Architecture
|
# Episteme (StemeDB) Architecture
|
||||||
|
|
||||||
> **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed.
|
> **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed.
|
||||||
> **Status:** Draft Spec v1.1
|
> **Status:** Implementation v1.0
|
||||||
|
|
||||||
## 1. System Overview
|
## 1. System Overview
|
||||||
|
|
||||||
@ -82,22 +82,26 @@ struct TrustPack {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2.4. The Storage Layout (LSM Tree)
|
### 2.4. The Storage Layout (Hybrid Store)
|
||||||
|
|
||||||
| Key | Value | Purpose |
|
Episteme uses a **Hybrid Storage** architecture to balance write throughput and read latency:
|
||||||
| :--- | :--- | :--- |
|
* **Fjall (LSM-Tree):** Used for write-heavy, append-only data (Assertions, Votes, WAL).
|
||||||
| `H:{Hash}` | `Assertion` | Immutable Content Store |
|
* **Redb (B-Tree):** Used for read-heavy, random-access data (Indexes, Materialized Views).
|
||||||
| `V:{Hash}` | `List<Vote>` | The Ballot Box (Append-only) |
|
|
||||||
| `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") |
|
| Key | Value | Purpose | Backend |
|
||||||
| `TP:{PackID}` | `TrustPack` | Curation Lists |
|
| :--- | :--- | :--- | :--- |
|
||||||
| `S:{Subject}` | `List<Hash>` | Adjacency Index |
|
| `H:{Hash}` | `Assertion` | Immutable Content Store | Fjall |
|
||||||
|
| `V:{Hash}` | `List<Vote>` | The Ballot Box (Append-only) | Fjall |
|
||||||
|
| `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") | Redb |
|
||||||
|
| `TP:{PackID}` | `TrustPack` | Curation Lists | Redb |
|
||||||
|
| `S:{Subject}` | `List<Hash>` | Adjacency Index | Redb |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. The Write Path (The Ballot Box)
|
## 3. The Write Path (The Ballot Box)
|
||||||
|
|
||||||
1. **Ingest:** Agents submit `Assertions` or `Votes`.
|
1. **Ingest:** Agents submit `Assertions` or `Votes`.
|
||||||
2. **Journal:** Written to `episteme-wal`.
|
2. **Journal:** Written to `episteme-wal` (Quarantine Pattern).
|
||||||
3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream.
|
3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream.
|
||||||
4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key.
|
4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key.
|
||||||
|
|
||||||
@ -119,12 +123,12 @@ struct TrustPack {
|
|||||||
4. Sum weights of remaining votes.
|
4. Sum weights of remaining votes.
|
||||||
* Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation).
|
* Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation).
|
||||||
|
|
||||||
### Standard Lenses
|
### Standard Lenses (Implemented)
|
||||||
* **Consensus:** Highest cluster density.
|
* **Consensus:** Highest cluster density (Vote-aware).
|
||||||
* **Authority:** Filter by **Trust Pack**.
|
* **Authority:** Filter by **Trust Pack** and **TrustRank**.
|
||||||
* **Recency:** Last Writer Wins.
|
* **Recency:** Last Writer Wins (Hybrid Logical Clock).
|
||||||
* **EpochAware:** Validates against current paradigm.
|
* **EpochAware:** Validates against current paradigm.
|
||||||
* **Constraints:** (New) Returns all `must_use`/`forbidden` assertions for a context. Acts as a "Pre-Flight Check."
|
* **Skeptic:** Surfaces conflicts and divergence.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -148,18 +152,19 @@ The system continuously exports data to train the next generation of Agents.
|
|||||||
## 7. Implementation Roadmap
|
## 7. Implementation Roadmap
|
||||||
|
|
||||||
### Phase 1: The Spine (Foundation)
|
### Phase 1: The Spine (Foundation)
|
||||||
* [ ] Reuse `quarantine-journal` pattern for WAL.
|
* [x] Reuse `quarantine-journal` pattern for WAL (`stemedb-wal`).
|
||||||
* [ ] Implement `Assertion`, `Epoch`, and **`Vote`** structs.
|
* [x] Implement `Assertion`, `Epoch`, and **`Vote`** structs (`stemedb-core`).
|
||||||
* [ ] Basic `sled` storage backend.
|
* [x] Hybrid Storage backend (`stemedb-storage`).
|
||||||
|
|
||||||
### Phase 2: The Lattice (Connectivity)
|
### Phase 2: The Lattice (Connectivity)
|
||||||
* [ ] **The Ballot Box**: Implement separate Vote storage stream.
|
* [x] **The Ballot Box**: Separate Vote storage stream.
|
||||||
* [ ] **Materializer**: Implement background worker to maintain `MV` keys.
|
* [x] **Materializer**: Background worker to maintain `MV` keys.
|
||||||
* [ ] **Trust Packs**: Implement BitSet/BloomFilter logic for agent sets.
|
* [x] **Trust Packs**: Agent sets for filtering.
|
||||||
* [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager.
|
* [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager.
|
||||||
* [ ] **Agent Wallet**: Sidecar for key management/signing.
|
* [ ] **Agent Wallet**: Sidecar for key management/signing.
|
||||||
|
|
||||||
### Phase 3: The Cortex (Reasoning)
|
### Phase 3: The Cortex (Reasoning)
|
||||||
|
* [x] **Lenses**: `Recency`, `Consensus`, `Authority`, `Skeptic` implemented (`stemedb-lens`).
|
||||||
* [ ] SMT Backend & Branching.
|
* [ ] SMT Backend & Branching.
|
||||||
* [ ] Vector Search.
|
* [ ] Vector Search.
|
||||||
* [ ] **Lens: Constraints**: Implement the pre-flight check logic.
|
* [ ] **Lens: Constraints**: Implement the pre-flight check logic.
|
||||||
@ -167,4 +172,4 @@ The system continuously exports data to train the next generation of Agents.
|
|||||||
### Phase 4: The Hive (Learning)
|
### Phase 4: The Hive (Learning)
|
||||||
* [ ] **The Simulator**: Log exporter pipeline.
|
* [ ] **The Simulator**: Log exporter pipeline.
|
||||||
* [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs.
|
* [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs.
|
||||||
* [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring.
|
* [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring.
|
||||||
|
|||||||
@ -100,7 +100,7 @@ pub enum SupersessionTypeDto {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Lens strategy for conflict resolution.
|
/// Lens strategy for conflict resolution.
|
||||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize, ToSchema)]
|
#[derive(Debug, Clone, Copy, Serialize, Deserialize, ToSchema, PartialEq)]
|
||||||
#[serde(rename_all = "PascalCase")]
|
#[serde(rename_all = "PascalCase")]
|
||||||
pub enum LensDto {
|
pub enum LensDto {
|
||||||
/// Latest timestamp wins
|
/// Latest timestamp wins
|
||||||
@ -140,6 +140,10 @@ pub enum LensDto {
|
|||||||
/// Use for agent pre-flight checks: "What MUST I use? What's FORBIDDEN?"
|
/// Use for agent pre-flight checks: "What MUST I use? What's FORBIDDEN?"
|
||||||
/// Predicate patterns: `must_use:*`, `forbidden:*`, `prefer:*`
|
/// Predicate patterns: `must_use:*`, `forbidden:*`, `prefer:*`
|
||||||
Constraints,
|
Constraints,
|
||||||
|
|
||||||
|
/// Surfaces all claims with conflict score calculation.
|
||||||
|
/// Use for "Trust but Verify" dashboards and Cognitive Firewall overlays.
|
||||||
|
Skeptic,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Agent signature entry.
|
/// Agent signature entry.
|
||||||
|
|||||||
@ -63,8 +63,8 @@ pub async fn scan(
|
|||||||
file_source: aphoria::FileSource::All,
|
file_source: aphoria::FileSource::All,
|
||||||
benchmark: false,
|
benchmark: false,
|
||||||
show_claims: false,
|
show_claims: false,
|
||||||
strict: false,
|
|
||||||
show_observations: false,
|
show_observations: false,
|
||||||
|
strict: false,
|
||||||
};
|
};
|
||||||
|
|
||||||
// Execute scan
|
// Execute scan
|
||||||
|
|||||||
@ -9,6 +9,7 @@ use crate::{
|
|||||||
hex,
|
hex,
|
||||||
state::AppState,
|
state::AppState,
|
||||||
};
|
};
|
||||||
|
use stemedb_storage::MemTableEntry;
|
||||||
|
|
||||||
use stemedb_core::limits::MAX_NARRATIVE_LEN;
|
use stemedb_core::limits::MAX_NARRATIVE_LEN;
|
||||||
use stemedb_core::types::{
|
use stemedb_core::types::{
|
||||||
@ -68,7 +69,12 @@ pub async fn create_assertion(
|
|||||||
let hash = blake3::hash(&serialized_assertion);
|
let hash = blake3::hash(&serialized_assertion);
|
||||||
|
|
||||||
// Append to WAL via group commit buffer
|
// Append to WAL via group commit buffer
|
||||||
state.commit_buffer.append(payload).await?;
|
let wal_offset = state.commit_buffer.append(payload).await?;
|
||||||
|
|
||||||
|
// Insert into MemTable for immediate visibility (read-your-writes)
|
||||||
|
// This must happen AFTER WAL commit to maintain durability guarantees
|
||||||
|
let entry = MemTableEntry::new(assertion, *hash.as_bytes(), wal_offset);
|
||||||
|
state.memtable.insert(entry);
|
||||||
|
|
||||||
metrics::counter!("stemedb_assertions_ingested_total").increment(1);
|
metrics::counter!("stemedb_assertions_ingested_total").increment(1);
|
||||||
|
|
||||||
|
|||||||
@ -31,8 +31,8 @@ struct CandidateMetadata {
|
|||||||
lifecycle: LifecycleStage,
|
lifecycle: LifecycleStage,
|
||||||
}
|
}
|
||||||
use stemedb_lens::{
|
use stemedb_lens::{
|
||||||
AsyncLens, ConfidenceLens, ConsensusLens, EpochAwareLens, Lens, RecencyLens,
|
AnalysisLens, AsyncLens, ConfidenceLens, ConsensusLens, EpochAwareLens, Lens, RecencyLens,
|
||||||
TrustAwareAuthorityLens, VoteAwareConsensusLens,
|
SkepticLens, TrustAwareAuthorityLens, VoteAwareConsensusLens,
|
||||||
};
|
};
|
||||||
use stemedb_query::Query;
|
use stemedb_query::Query;
|
||||||
use stemedb_storage::{AuditStore, GenericAuditStore, GenericTrustRankStore, GenericVoteStore};
|
use stemedb_storage::{AuditStore, GenericAuditStore, GenericTrustRankStore, GenericVoteStore};
|
||||||
@ -428,6 +428,15 @@ async fn apply_lens_with_confidence(
|
|||||||
let lens = EpochAwareLens::with_recency(store);
|
let lens = EpochAwareLens::with_recency(store);
|
||||||
lens.resolve_async(&assertions).await
|
lens.resolve_async(&assertions).await
|
||||||
}
|
}
|
||||||
|
LensDto::Skeptic => {
|
||||||
|
// SkepticLens returns all assertions with a conflict score, not a single winner.
|
||||||
|
// Used for "Trust but Verify" dashboards and Cognitive Firewall overlays.
|
||||||
|
let vote_store = std::sync::Arc::new(GenericVoteStore::new(store.clone()));
|
||||||
|
let trust_store = std::sync::Arc::new(GenericTrustRankStore::new(store));
|
||||||
|
let lens = SkepticLens::new(vote_store, trust_store);
|
||||||
|
let analysis = lens.analyze(&assertions).await;
|
||||||
|
return Ok((assertions, 1.0, analysis.conflict_score));
|
||||||
|
}
|
||||||
LensDto::LayeredConsensus => {
|
LensDto::LayeredConsensus => {
|
||||||
// LayeredConsensus returns a different response type with per-tier results.
|
// LayeredConsensus returns a different response type with per-tier results.
|
||||||
// Use the dedicated /v1/layered endpoint for this lens.
|
// Use the dedicated /v1/layered endpoint for this lens.
|
||||||
|
|||||||
@ -1,21 +1,4 @@
|
|||||||
//! Episteme (StemeDB) API server binary.
|
//! Episteme (StemeDB) API server binary.
|
||||||
//!
|
|
||||||
//! This starts the HTTP API server with the following components:
|
|
||||||
//! 1. Opens Journal (WAL) for writes (via GroupCommitBuffer) and reads
|
|
||||||
//! 2. Opens HybridStore (KV storage)
|
|
||||||
//! 3. Spawns IngestWorker background task to tail WAL
|
|
||||||
//! 4. Starts axum HTTP server with OpenAPI documentation
|
|
||||||
//! 5. Optionally enables The Meter (economic throttling)
|
|
||||||
//!
|
|
||||||
//! # Environment Variables
|
|
||||||
//!
|
|
||||||
//! | Variable | Default | Description |
|
|
||||||
//! |----------|---------|-------------|
|
|
||||||
//! | `STEMEDB_WAL_DIR` | `data/wal` | Directory for WAL files |
|
|
||||||
//! | `STEMEDB_DB_DIR` | `data/db` | Directory for KV store |
|
|
||||||
//! | `STEMEDB_BIND_ADDR` | `127.0.0.1:18180` | HTTP server bind address |
|
|
||||||
//! | `STEMEDB_METER_ENABLED` | `true` | Enable economic throttling |
|
|
||||||
//! | `STEMEDB_CORPUS_DB_DIR` | (none) | Optional: Directory for Aphoria corpus DB |
|
|
||||||
|
|
||||||
use std::net::SocketAddr;
|
use std::net::SocketAddr;
|
||||||
use std::path::PathBuf;
|
use std::path::PathBuf;
|
||||||
@ -38,19 +21,10 @@ use std::path::Path;
|
|||||||
/// Server configuration.
|
/// Server configuration.
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
struct Config {
|
struct Config {
|
||||||
/// Directory for WAL files
|
|
||||||
wal_dir: PathBuf,
|
wal_dir: PathBuf,
|
||||||
|
|
||||||
/// Directory for KV store
|
|
||||||
db_dir: PathBuf,
|
db_dir: PathBuf,
|
||||||
|
|
||||||
/// HTTP server bind address
|
|
||||||
bind_addr: String,
|
bind_addr: String,
|
||||||
|
|
||||||
/// Enable economic throttling (The Meter)
|
|
||||||
meter_enabled: bool,
|
meter_enabled: bool,
|
||||||
|
|
||||||
/// Optional corpus database directory (for Aphoria corpus)
|
|
||||||
corpus_db_dir: Option<PathBuf>,
|
corpus_db_dir: Option<PathBuf>,
|
||||||
|
|
||||||
/// TLS certificate path (optional - enables HTTPS)
|
/// TLS certificate path (optional - enables HTTPS)
|
||||||
@ -66,6 +40,9 @@ struct Config {
|
|||||||
read_body_limit: usize,
|
read_body_limit: usize,
|
||||||
/// HTTP request timeout in seconds (default: 30)
|
/// HTTP request timeout in seconds (default: 30)
|
||||||
http_timeout_secs: u64,
|
http_timeout_secs: u64,
|
||||||
|
|
||||||
|
/// Skip Ed25519 signature verification (unsafe, for dev/testing only)
|
||||||
|
unsafe_skip_signatures: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
@ -82,6 +59,7 @@ impl Default for Config {
|
|||||||
write_body_limit: 1024 * 1024, // 1MB
|
write_body_limit: 1024 * 1024, // 1MB
|
||||||
read_body_limit: 64 * 1024, // 64KB
|
read_body_limit: 64 * 1024, // 64KB
|
||||||
http_timeout_secs: 30,
|
http_timeout_secs: 30,
|
||||||
|
unsafe_skip_signatures: false,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -98,26 +76,20 @@ impl Config {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl Config {
|
impl Config {
|
||||||
/// Load configuration from environment variables.
|
|
||||||
fn from_env() -> Self {
|
fn from_env() -> Self {
|
||||||
let mut config = Self::default();
|
let mut config = Self::default();
|
||||||
|
|
||||||
if let Ok(wal_dir) = std::env::var("STEMEDB_WAL_DIR") {
|
if let Ok(wal_dir) = std::env::var("STEMEDB_WAL_DIR") {
|
||||||
config.wal_dir = PathBuf::from(wal_dir);
|
config.wal_dir = PathBuf::from(wal_dir);
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Ok(db_dir) = std::env::var("STEMEDB_DB_DIR") {
|
if let Ok(db_dir) = std::env::var("STEMEDB_DB_DIR") {
|
||||||
config.db_dir = PathBuf::from(db_dir);
|
config.db_dir = PathBuf::from(db_dir);
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Ok(bind_addr) = std::env::var("STEMEDB_BIND_ADDR") {
|
if let Ok(bind_addr) = std::env::var("STEMEDB_BIND_ADDR") {
|
||||||
config.bind_addr = bind_addr;
|
config.bind_addr = bind_addr;
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Ok(meter_enabled) = std::env::var("STEMEDB_METER_ENABLED") {
|
if let Ok(meter_enabled) = std::env::var("STEMEDB_METER_ENABLED") {
|
||||||
config.meter_enabled = meter_enabled.to_lowercase() != "false" && meter_enabled != "0";
|
config.meter_enabled = meter_enabled.to_lowercase() != "false" && meter_enabled != "0";
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Ok(corpus_db_dir) = std::env::var("STEMEDB_CORPUS_DB_DIR") {
|
if let Ok(corpus_db_dir) = std::env::var("STEMEDB_CORPUS_DB_DIR") {
|
||||||
config.corpus_db_dir = Some(PathBuf::from(corpus_db_dir));
|
config.corpus_db_dir = Some(PathBuf::from(corpus_db_dir));
|
||||||
}
|
}
|
||||||
@ -149,6 +121,10 @@ impl Config {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if let Ok(val) = std::env::var("STEMEDB_UNSAFE_SKIP_SIGNATURES") {
|
||||||
|
config.unsafe_skip_signatures = val.to_lowercase() == "true" || val == "1";
|
||||||
|
}
|
||||||
|
|
||||||
config
|
config
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -169,72 +145,42 @@ async fn load_tls_config(
|
|||||||
|
|
||||||
#[tokio::main]
|
#[tokio::main]
|
||||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||||
// Initialize tracing
|
let env_filter = tracing_subscriber::EnvFilter::try_from_default_env()
|
||||||
let env_filter = match tracing_subscriber::EnvFilter::try_from_default_env() {
|
.unwrap_or_else(|_| "stemedb_api=debug,tower_http=debug".into());
|
||||||
Ok(filter) => filter,
|
|
||||||
Err(_) => "stemedb_api=debug,tower_http=debug".into(),
|
|
||||||
};
|
|
||||||
|
|
||||||
tracing_subscriber::registry().with(env_filter).with(tracing_subscriber::fmt::layer()).init();
|
tracing_subscriber::registry().with(env_filter).with(tracing_subscriber::fmt::layer()).init();
|
||||||
|
|
||||||
// Initialize Prometheus metrics recorder (must be done before any metrics are recorded)
|
let prometheus_handle = Arc::new(PrometheusBuilder::new().install_recorder()?);
|
||||||
let prometheus_handle = PrometheusBuilder::new()
|
|
||||||
.install_recorder()
|
|
||||||
.map_err(|e| format!("Failed to install Prometheus recorder: {e}"))?;
|
|
||||||
let prometheus_handle = Arc::new(prometheus_handle);
|
|
||||||
info!("Prometheus metrics recorder initialized");
|
|
||||||
|
|
||||||
let config = Config::from_env();
|
let config = Config::from_env();
|
||||||
|
|
||||||
info!("Starting Episteme (StemeDB) API server");
|
|
||||||
info!(?config, "Configuration loaded");
|
|
||||||
|
|
||||||
// Ensure directories exist
|
|
||||||
std::fs::create_dir_all(&config.wal_dir)?;
|
std::fs::create_dir_all(&config.wal_dir)?;
|
||||||
std::fs::create_dir_all(&config.db_dir)?;
|
std::fs::create_dir_all(&config.db_dir)?;
|
||||||
|
|
||||||
// Open write Journal (owned by GroupCommitBuffer)
|
|
||||||
info!("Opening write Journal at {:?}", config.wal_dir);
|
|
||||||
let write_journal = Journal::open(&config.wal_dir)?;
|
let write_journal = Journal::open(&config.wal_dir)?;
|
||||||
|
|
||||||
// Open read Journal (for IngestWorker to tail)
|
|
||||||
info!("Opening read Journal at {:?}", config.wal_dir);
|
|
||||||
let read_journal = Journal::open(&config.wal_dir)?;
|
let read_journal = Journal::open(&config.wal_dir)?;
|
||||||
|
|
||||||
info!("Opening HybridStore at {:?}", config.db_dir);
|
|
||||||
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
||||||
|
let corpus_store = config.corpus_db_dir.as_ref().map(|d| {
|
||||||
|
let _ = std::fs::create_dir_all(d);
|
||||||
|
Arc::new(HybridStore::open(d).unwrap())
|
||||||
|
});
|
||||||
|
|
||||||
// Open optional corpus store (for Aphoria corpus)
|
|
||||||
let corpus_store = if let Some(ref corpus_dir) = config.corpus_db_dir {
|
|
||||||
// Ensure corpus directory exists
|
|
||||||
std::fs::create_dir_all(corpus_dir)?;
|
|
||||||
info!("Opening corpus HybridStore at {:?}", corpus_dir);
|
|
||||||
Some(Arc::new(HybridStore::open(corpus_dir)?))
|
|
||||||
} else {
|
|
||||||
info!("No separate corpus DB configured, using main store for corpus queries");
|
|
||||||
None
|
|
||||||
};
|
|
||||||
|
|
||||||
// Create application state (initializes GroupCommitBuffer)
|
|
||||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store), corpus_store);
|
let state = AppState::new(write_journal, read_journal, Arc::clone(&store), corpus_store);
|
||||||
|
|
||||||
// Spawn IngestWorker background task (uses read journal)
|
|
||||||
info!("Spawning IngestWorker background task");
|
|
||||||
let worker_journal = state.journal.clone();
|
let worker_journal = state.journal.clone();
|
||||||
let worker_store = store;
|
let worker_store = store;
|
||||||
let worker_flush_notify = Arc::clone(&state.flush_notify);
|
let worker_flush_notify = Arc::clone(&state.flush_notify);
|
||||||
|
let skip_sigs = config.unsafe_skip_signatures;
|
||||||
|
|
||||||
|
let worker_memtable = Arc::clone(&state.memtable);
|
||||||
tokio::spawn(async move {
|
tokio::spawn(async move {
|
||||||
let worker_result = IngestWorker::new(worker_journal, worker_store).await;
|
match IngestWorker::new(worker_journal, worker_store).await {
|
||||||
match worker_result {
|
|
||||||
Ok(worker) => {
|
Ok(worker) => {
|
||||||
// Wire up flush notification so IngestWorker reacts immediately to new data
|
let mut worker = worker
|
||||||
let mut worker = worker.with_flush_notify(worker_flush_notify);
|
.with_flush_notify(worker_flush_notify)
|
||||||
info!("IngestWorker started with flush notification, entering run loop");
|
.with_memtable(worker_memtable)
|
||||||
|
.with_skip_signature_verification(skip_sigs);
|
||||||
|
info!(skip_signatures = skip_sigs, "IngestWorker started");
|
||||||
worker.run().await;
|
worker.run().await;
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => error!("Failed to create IngestWorker: {:?}", e),
|
||||||
error!("Failed to create IngestWorker: {:?}", e);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@ -7,7 +7,7 @@ use stemedb_query::QueryEngine;
|
|||||||
use stemedb_storage::{
|
use stemedb_storage::{
|
||||||
CircuitBreakerConfig, GenericAdmissionStore, GenericAliasStore, GenericApiKeyStore,
|
CircuitBreakerConfig, GenericAdmissionStore, GenericAliasStore, GenericApiKeyStore,
|
||||||
GenericCircuitBreakerStore, GenericEscalationStore, GenericQuarantineStore, GenericQuotaStore,
|
GenericCircuitBreakerStore, GenericEscalationStore, GenericQuarantineStore, GenericQuotaStore,
|
||||||
GenericTrustRankStore, HybridStore,
|
GenericTrustRankStore, HybridStore, MemTable,
|
||||||
};
|
};
|
||||||
use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig};
|
use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig};
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
@ -81,6 +81,10 @@ pub struct AppState {
|
|||||||
/// API key store for authentication (P4.2)
|
/// API key store for authentication (P4.2)
|
||||||
pub api_key_store: Arc<ApiKeyStoreImpl>,
|
pub api_key_store: Arc<ApiKeyStoreImpl>,
|
||||||
|
|
||||||
|
/// MemTable for read-your-writes consistency.
|
||||||
|
/// Assertions are inserted here after WAL commit, before KVStore indexing.
|
||||||
|
pub memtable: Arc<MemTable>,
|
||||||
|
|
||||||
/// Notification channel for signaling IngestWorker when new data is flushed.
|
/// Notification channel for signaling IngestWorker when new data is flushed.
|
||||||
///
|
///
|
||||||
/// When GroupCommitBuffer successfully flushes a batch, it signals this
|
/// When GroupCommitBuffer successfully flushes a batch, it signals this
|
||||||
@ -149,6 +153,9 @@ impl AppState {
|
|||||||
// Create API key store for authentication (P4.2)
|
// Create API key store for authentication (P4.2)
|
||||||
let api_key_store = Arc::new(GenericApiKeyStore::new(Arc::clone(&store)));
|
let api_key_store = Arc::new(GenericApiKeyStore::new(Arc::clone(&store)));
|
||||||
|
|
||||||
|
// Create MemTable for read-your-writes consistency
|
||||||
|
let memtable = Arc::new(MemTable::new(10_000));
|
||||||
|
|
||||||
Self {
|
Self {
|
||||||
commit_buffer,
|
commit_buffer,
|
||||||
journal,
|
journal,
|
||||||
@ -162,6 +169,7 @@ impl AppState {
|
|||||||
quarantine_store,
|
quarantine_store,
|
||||||
circuit_breaker_store,
|
circuit_breaker_store,
|
||||||
api_key_store,
|
api_key_store,
|
||||||
|
memtable,
|
||||||
flush_notify,
|
flush_notify,
|
||||||
#[cfg(feature = "aphoria")]
|
#[cfg(feature = "aphoria")]
|
||||||
scan_cache: ScanCache::new(),
|
scan_cache: ScanCache::new(),
|
||||||
@ -171,7 +179,8 @@ impl AppState {
|
|||||||
/// Get a QueryEngine for this state.
|
/// Get a QueryEngine for this state.
|
||||||
///
|
///
|
||||||
/// Creates a new QueryEngine each time since it cannot be cloned.
|
/// Creates a new QueryEngine each time since it cannot be cloned.
|
||||||
|
/// Attaches the MemTable for read-your-writes consistency.
|
||||||
pub fn query_engine(&self) -> QueryEngine<HybridStore> {
|
pub fn query_engine(&self) -> QueryEngine<HybridStore> {
|
||||||
QueryEngine::new(self.store.clone())
|
QueryEngine::new(self.store.clone()).with_memtable(Arc::clone(&self.memtable))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -207,7 +207,8 @@ async fn test_quota_consumption_with_meter() {
|
|||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
|
|
||||||
let (agent_id, signature) = common::sign_message("test");
|
// v1 signature: sign "subject:predicate"
|
||||||
|
let (agent_id, signature) = common::sign_message("QuotaTest:test");
|
||||||
let agent_id_hex = hex::encode(agent_id);
|
let agent_id_hex = hex::encode(agent_id);
|
||||||
|
|
||||||
// Set a low quota limit for testing
|
// Set a low quota limit for testing
|
||||||
|
|||||||
236
crates/stemedb-api/tests/http_read_your_writes.rs
Normal file
236
crates/stemedb-api/tests/http_read_your_writes.rs
Normal file
@ -0,0 +1,236 @@
|
|||||||
|
//! HTTP integration tests for read-your-writes consistency.
|
||||||
|
//!
|
||||||
|
//! These tests verify that after POST /assert returns 201, an immediate
|
||||||
|
//! GET /query returns the assertion without waiting for background indexing.
|
||||||
|
//!
|
||||||
|
//! The MemTable provides this consistency by storing assertions after WAL
|
||||||
|
//! commit, and merging them with KVStore results during query.
|
||||||
|
|
||||||
|
#![allow(clippy::expect_used)]
|
||||||
|
|
||||||
|
mod common;
|
||||||
|
|
||||||
|
use axum::{
|
||||||
|
body::Body,
|
||||||
|
http::{Request, StatusCode},
|
||||||
|
};
|
||||||
|
use tower::ServiceExt;
|
||||||
|
|
||||||
|
use stemedb_api::create_router;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Read-Your-Writes Consistency Tests
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_read_your_writes_immediate() {
|
||||||
|
let env = common::create_test_env().await;
|
||||||
|
let app = create_router(env.state);
|
||||||
|
|
||||||
|
// Create a unique subject for this test
|
||||||
|
let subject = format!(
|
||||||
|
"TestSubject_{}",
|
||||||
|
std::time::SystemTime::now()
|
||||||
|
.duration_since(std::time::UNIX_EPOCH)
|
||||||
|
.expect("time")
|
||||||
|
.as_nanos()
|
||||||
|
);
|
||||||
|
let predicate = "test_predicate";
|
||||||
|
|
||||||
|
// 1. POST /assert with unique subject
|
||||||
|
let assertion_json = common::create_signed_assertion_json(&subject, predicate, 42.0);
|
||||||
|
|
||||||
|
let request = Request::builder()
|
||||||
|
.uri("/v1/assert")
|
||||||
|
.method("POST")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(Body::from(assertion_json.to_string()))
|
||||||
|
.expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::CREATED, "POST /assert should return 201");
|
||||||
|
|
||||||
|
// Parse the response to get the hash
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let create_response: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
let created_hash = create_response["hash"].as_str().expect("hash field");
|
||||||
|
|
||||||
|
// 2. IMMEDIATELY query for the same subject (no sleep!)
|
||||||
|
let query_uri = format!("/v1/query?subject={}", subject);
|
||||||
|
let request =
|
||||||
|
Request::builder().uri(&query_uri).method("GET").body(Body::empty()).expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::OK, "GET /query should return 200");
|
||||||
|
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let query_result: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
|
||||||
|
// 3. Assert the assertion is returned immediately
|
||||||
|
let assertions = query_result["assertions"].as_array().expect("assertions array");
|
||||||
|
assert!(!assertions.is_empty(), "Query should return the just-created assertion immediately");
|
||||||
|
|
||||||
|
// Verify we got the correct assertion
|
||||||
|
let found = assertions.iter().any(|a| a["subject"].as_str() == Some(subject.as_str()));
|
||||||
|
assert!(found, "The queried assertion should match the created subject");
|
||||||
|
|
||||||
|
// Note: resolved_hash is only set for MV hits; for MemTable lookups it may not be set
|
||||||
|
// So we just verify we found the assertion by subject
|
||||||
|
let _ = created_hash; // Suppress unused variable warning
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_read_your_writes_with_predicate() {
|
||||||
|
let env = common::create_test_env().await;
|
||||||
|
let app = create_router(env.state);
|
||||||
|
|
||||||
|
// Create unique identifiers
|
||||||
|
let subject = format!(
|
||||||
|
"Company_{}",
|
||||||
|
std::time::SystemTime::now()
|
||||||
|
.duration_since(std::time::UNIX_EPOCH)
|
||||||
|
.expect("time")
|
||||||
|
.as_nanos()
|
||||||
|
);
|
||||||
|
let predicate = "revenue";
|
||||||
|
|
||||||
|
// 1. POST /assert
|
||||||
|
let assertion_json = common::create_signed_assertion_json(&subject, predicate, 1_000_000.0);
|
||||||
|
|
||||||
|
let request = Request::builder()
|
||||||
|
.uri("/v1/assert")
|
||||||
|
.method("POST")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(Body::from(assertion_json.to_string()))
|
||||||
|
.expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::CREATED);
|
||||||
|
|
||||||
|
// 2. IMMEDIATELY query with both subject and predicate
|
||||||
|
let query_uri = format!("/v1/query?subject={}&predicate={}", subject, predicate);
|
||||||
|
let request =
|
||||||
|
Request::builder().uri(&query_uri).method("GET").body(Body::empty()).expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let query_result: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
|
||||||
|
// 3. Verify assertion is found
|
||||||
|
let assertions = query_result["assertions"].as_array().expect("assertions array");
|
||||||
|
assert!(!assertions.is_empty(), "Query should find the assertion immediately");
|
||||||
|
|
||||||
|
let found = assertions.iter().any(|a| {
|
||||||
|
a["subject"].as_str() == Some(subject.as_str())
|
||||||
|
&& a["predicate"].as_str() == Some(predicate)
|
||||||
|
});
|
||||||
|
assert!(found, "The assertion should match subject and predicate");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_read_your_writes_multiple_assertions() {
|
||||||
|
let env = common::create_test_env().await;
|
||||||
|
let app = create_router(env.state);
|
||||||
|
|
||||||
|
// Create unique subject
|
||||||
|
let subject = format!(
|
||||||
|
"MultiAssert_{}",
|
||||||
|
std::time::SystemTime::now()
|
||||||
|
.duration_since(std::time::UNIX_EPOCH)
|
||||||
|
.expect("time")
|
||||||
|
.as_nanos()
|
||||||
|
);
|
||||||
|
|
||||||
|
// Create multiple assertions with different predicates
|
||||||
|
let predicates = vec!["revenue", "profit", "employees"];
|
||||||
|
let values = vec![100.0, 20.0, 50.0];
|
||||||
|
|
||||||
|
for (predicate, value) in predicates.iter().zip(values.iter()) {
|
||||||
|
let assertion_json = common::create_signed_assertion_json(&subject, predicate, *value);
|
||||||
|
|
||||||
|
let request = Request::builder()
|
||||||
|
.uri("/v1/assert")
|
||||||
|
.method("POST")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(Body::from(assertion_json.to_string()))
|
||||||
|
.expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::CREATED);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Query for all assertions with this subject
|
||||||
|
let query_uri = format!("/v1/query?subject={}", subject);
|
||||||
|
let request =
|
||||||
|
Request::builder().uri(&query_uri).method("GET").body(Body::empty()).expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let query_result: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
|
||||||
|
let assertions = query_result["assertions"].as_array().expect("assertions array");
|
||||||
|
assert_eq!(assertions.len(), 3, "Should find all 3 assertions immediately");
|
||||||
|
|
||||||
|
// Verify each predicate was found
|
||||||
|
for predicate in &predicates {
|
||||||
|
let found = assertions.iter().any(|a| a["predicate"].as_str() == Some(*predicate));
|
||||||
|
assert!(found, "Should find assertion with predicate: {}", predicate);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_read_your_writes_does_not_affect_other_subjects() {
|
||||||
|
let env = common::create_test_env().await;
|
||||||
|
let app = create_router(env.state);
|
||||||
|
|
||||||
|
let timestamp = std::time::SystemTime::now()
|
||||||
|
.duration_since(std::time::UNIX_EPOCH)
|
||||||
|
.expect("time")
|
||||||
|
.as_nanos();
|
||||||
|
|
||||||
|
let subject1 = format!("Subject1_{}", timestamp);
|
||||||
|
let subject2 = format!("Subject2_{}", timestamp);
|
||||||
|
|
||||||
|
// Create assertion for subject1
|
||||||
|
let assertion_json = common::create_signed_assertion_json(&subject1, "test", 1.0);
|
||||||
|
|
||||||
|
let request = Request::builder()
|
||||||
|
.uri("/v1/assert")
|
||||||
|
.method("POST")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(Body::from(assertion_json.to_string()))
|
||||||
|
.expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::CREATED);
|
||||||
|
|
||||||
|
// Query for subject2 (should be empty)
|
||||||
|
let query_uri = format!("/v1/query?subject={}", subject2);
|
||||||
|
let request =
|
||||||
|
Request::builder().uri(&query_uri).method("GET").body(Body::empty()).expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
assert_eq!(response.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let query_result: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
|
||||||
|
let assertions = query_result["assertions"].as_array().expect("assertions array");
|
||||||
|
assert!(assertions.is_empty(), "Subject2 should have no assertions");
|
||||||
|
|
||||||
|
// Query for subject1 (should have one assertion)
|
||||||
|
let query_uri = format!("/v1/query?subject={}", subject1);
|
||||||
|
let request =
|
||||||
|
Request::builder().uri(&query_uri).method("GET").body(Body::empty()).expect("Request");
|
||||||
|
|
||||||
|
let response = app.clone().oneshot(request).await.expect("Request");
|
||||||
|
let body = axum::body::to_bytes(response.into_body(), usize::MAX).await.expect("Body");
|
||||||
|
let query_result: serde_json::Value = serde_json::from_slice(&body).expect("JSON");
|
||||||
|
|
||||||
|
let assertions = query_result["assertions"].as_array().expect("assertions array");
|
||||||
|
assert_eq!(assertions.len(), 1, "Subject1 should have one assertion");
|
||||||
|
}
|
||||||
@ -217,12 +217,15 @@ async fn test_hex_decode_valid() {
|
|||||||
|
|
||||||
let response = app.oneshot(request).await.expect("Request");
|
let response = app.oneshot(request).await.expect("Request");
|
||||||
|
|
||||||
// Accept either 201 (success) or 500 (ingest worker not running in test)
|
// Accept 201 (success), 400 (invalid signature crypto — hex format was valid, but
|
||||||
// We're primarily testing that the hex validation doesn't reject valid lengths
|
// verify_assertion_signatures rejects non-Ed25519-valid bytes), or 500 (ingest worker
|
||||||
|
// not running in test). We're primarily testing that hex-format validation accepts
|
||||||
|
// correct-length strings without rejecting them at the parsing layer.
|
||||||
assert!(
|
assert!(
|
||||||
response.status() == StatusCode::CREATED
|
response.status() == StatusCode::CREATED
|
||||||
|
|| response.status() == StatusCode::BAD_REQUEST
|
||||||
|| response.status() == StatusCode::INTERNAL_SERVER_ERROR,
|
|| response.status() == StatusCode::INTERNAL_SERVER_ERROR,
|
||||||
"Expected 201 or 500, got {}",
|
"Expected 201, 400, or 500, got {}",
|
||||||
response.status()
|
response.status()
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
@ -12,7 +12,6 @@
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tls_tests {
|
mod tls_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "TLS tests require self-signed certificate generation"]
|
#[ignore = "TLS tests require self-signed certificate generation"]
|
||||||
@ -43,7 +42,6 @@ mod tls_tests {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod body_limit_tests {
|
mod body_limit_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "Body limit tests require test server"]
|
#[ignore = "Body limit tests require test server"]
|
||||||
@ -80,7 +78,6 @@ mod body_limit_tests {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod timeout_tests {
|
mod timeout_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "Timeout tests require mock slow handlers"]
|
#[ignore = "Timeout tests require mock slow handlers"]
|
||||||
@ -109,7 +106,6 @@ mod timeout_tests {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod secret_sanitization_tests {
|
mod secret_sanitization_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "Secret sanitization tests require log capture"]
|
#[ignore = "Secret sanitization tests require log capture"]
|
||||||
@ -150,7 +146,6 @@ mod secret_sanitization_tests {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod rate_limit_tests {
|
mod rate_limit_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "Rate limit tests require test server"]
|
#[ignore = "Rate limit tests require test server"]
|
||||||
@ -195,7 +190,6 @@ mod rate_limit_tests {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod integration_tests {
|
mod integration_tests {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
#[ignore = "Integration tests require full server setup"]
|
#[ignore = "Integration tests require full server setup"]
|
||||||
@ -224,7 +218,6 @@ mod integration_tests {
|
|||||||
// Helper functions for test setup
|
// Helper functions for test setup
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod test_helpers {
|
mod test_helpers {
|
||||||
use super::*;
|
|
||||||
|
|
||||||
/// Generate self-signed certificate for testing.
|
/// Generate self-signed certificate for testing.
|
||||||
#[allow(dead_code)]
|
#[allow(dead_code)]
|
||||||
@ -243,7 +236,7 @@ mod test_helpers {
|
|||||||
|
|
||||||
/// Capture log output during test.
|
/// Capture log output during test.
|
||||||
#[allow(dead_code)]
|
#[allow(dead_code)]
|
||||||
fn capture_logs<F>(f: F) -> String
|
fn capture_logs<F>(_f: F) -> String
|
||||||
where
|
where
|
||||||
F: FnOnce(),
|
F: FnOnce(),
|
||||||
{
|
{
|
||||||
|
|||||||
@ -285,40 +285,61 @@ impl SwimMembership {
|
|||||||
/// Marks a node as suspected (failed to respond to probe).
|
/// Marks a node as suspected (failed to respond to probe).
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
pub fn suspect_node(&self, node_id: NodeId) {
|
pub fn suspect_node(&self, node_id: NodeId) {
|
||||||
if let Some(mut entry) = self.members.get_mut(&node_id) {
|
// IMPORTANT: Clone the entry and drop the RefMut BEFORE calling update_node_gauges.
|
||||||
if entry.state == NodeState::Alive {
|
// DashMap::get_mut holds a shard write lock; update_node_gauges calls iter() which
|
||||||
entry.state = NodeState::Suspect;
|
// acquires read locks on all shards. parking_lot write locks are non-reentrant —
|
||||||
entry.lamport_time = self.tick();
|
// calling iter() while get_mut's RefMut is alive deadlocks on the same shard.
|
||||||
|
let gossip_entry = {
|
||||||
|
if let Some(mut entry) = self.members.get_mut(&node_id) {
|
||||||
|
if entry.state == NodeState::Alive {
|
||||||
|
entry.state = NodeState::Suspect;
|
||||||
|
entry.lamport_time = self.tick();
|
||||||
|
|
||||||
info!(node = %node_id.short_hex(), "Marking node as suspect");
|
info!(node = %node_id.short_hex(), "Marking node as suspect");
|
||||||
let _ = self.event_tx.send(MembershipEvent::NodeSuspected(node_id));
|
let _ = self.event_tx.send(MembershipEvent::NodeSuspected(node_id));
|
||||||
self.suspects.insert(node_id, Instant::now());
|
self.suspects.insert(node_id, Instant::now());
|
||||||
counter!("stemedb_membership_events_total", "type" => "suspected").increment(1);
|
counter!("stemedb_membership_events_total", "type" => "suspected").increment(1);
|
||||||
self.update_node_gauges();
|
Some(entry.clone())
|
||||||
|
} else {
|
||||||
// Queue for gossip
|
None
|
||||||
self.queue_gossip(entry.clone());
|
}
|
||||||
|
} else {
|
||||||
|
None
|
||||||
}
|
}
|
||||||
|
}; // RefMut dropped here — safe to iterate the map now
|
||||||
|
|
||||||
|
if let Some(entry) = gossip_entry {
|
||||||
|
self.update_node_gauges();
|
||||||
|
self.queue_gossip(entry);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Marks a node as dead (suspicion timeout expired).
|
/// Marks a node as dead (suspicion timeout expired).
|
||||||
#[instrument(skip(self))]
|
#[instrument(skip(self))]
|
||||||
pub fn fail_node(&self, node_id: NodeId) {
|
pub fn fail_node(&self, node_id: NodeId) {
|
||||||
if let Some(mut entry) = self.members.get_mut(&node_id) {
|
// IMPORTANT: same deadlock hazard as suspect_node — drop RefMut before update_node_gauges.
|
||||||
if entry.state == NodeState::Suspect {
|
let gossip_entry = {
|
||||||
entry.state = NodeState::Dead;
|
if let Some(mut entry) = self.members.get_mut(&node_id) {
|
||||||
entry.lamport_time = self.tick();
|
if entry.state == NodeState::Suspect {
|
||||||
|
entry.state = NodeState::Dead;
|
||||||
|
entry.lamport_time = self.tick();
|
||||||
|
|
||||||
warn!(node = %node_id.short_hex(), "Marking node as dead");
|
warn!(node = %node_id.short_hex(), "Marking node as dead");
|
||||||
let _ = self.event_tx.send(MembershipEvent::NodeFailed(node_id));
|
let _ = self.event_tx.send(MembershipEvent::NodeFailed(node_id));
|
||||||
self.suspects.remove(&node_id);
|
self.suspects.remove(&node_id);
|
||||||
counter!("stemedb_membership_events_total", "type" => "failed").increment(1);
|
counter!("stemedb_membership_events_total", "type" => "failed").increment(1);
|
||||||
self.update_node_gauges();
|
Some(entry.clone())
|
||||||
|
} else {
|
||||||
// Queue for gossip
|
None
|
||||||
self.queue_gossip(entry.clone());
|
}
|
||||||
|
} else {
|
||||||
|
None
|
||||||
}
|
}
|
||||||
|
}; // RefMut dropped here
|
||||||
|
|
||||||
|
if let Some(entry) = gossip_entry {
|
||||||
|
self.update_node_gauges();
|
||||||
|
self.queue_gossip(entry);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -327,37 +348,46 @@ impl SwimMembership {
|
|||||||
pub fn alive_node(&self, node_id: NodeId, info: NodeInfo) {
|
pub fn alive_node(&self, node_id: NodeId, info: NodeInfo) {
|
||||||
let lamport = self.tick();
|
let lamport = self.tick();
|
||||||
|
|
||||||
match self.members.get_mut(&node_id) {
|
// IMPORTANT: same deadlock hazard — drop RefMut from get_mut before update_node_gauges.
|
||||||
Some(mut entry) => {
|
let result = {
|
||||||
// Only update if incarnation is higher or equal
|
match self.members.get_mut(&node_id) {
|
||||||
if info.incarnation >= entry.node.incarnation {
|
Some(mut entry) => {
|
||||||
let was_suspect = entry.state == NodeState::Suspect;
|
// Only update if incarnation is higher or equal
|
||||||
entry.node = info.clone();
|
if info.incarnation >= entry.node.incarnation {
|
||||||
entry.state = NodeState::Alive;
|
let was_suspect = entry.state == NodeState::Suspect;
|
||||||
entry.lamport_time = lamport;
|
entry.node = info.clone();
|
||||||
|
entry.state = NodeState::Alive;
|
||||||
|
entry.lamport_time = lamport;
|
||||||
|
|
||||||
self.suspects.remove(&node_id);
|
self.suspects.remove(&node_id);
|
||||||
self.queue_gossip(entry.clone());
|
|
||||||
|
|
||||||
if was_suspect {
|
if was_suspect {
|
||||||
counter!("stemedb_membership_events_total", "type" => "recovered")
|
counter!("stemedb_membership_events_total", "type" => "recovered")
|
||||||
.increment(1);
|
.increment(1);
|
||||||
|
}
|
||||||
|
Some((entry.clone(), MembershipEvent::NodeUpdated(info)))
|
||||||
|
} else {
|
||||||
|
None
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
None => {
|
||||||
|
// New node — insert() releases any lock immediately, so update_node_gauges
|
||||||
|
// is safe to call right after.
|
||||||
|
let entry = MembershipEntry::new(info.clone(), NodeState::Alive, lamport);
|
||||||
|
self.members.insert(node_id, entry.clone());
|
||||||
|
self.queue_gossip(entry);
|
||||||
|
counter!("stemedb_membership_events_total", "type" => "joined").increment(1);
|
||||||
self.update_node_gauges();
|
self.update_node_gauges();
|
||||||
|
let _ = self.event_tx.send(MembershipEvent::NodeJoined(info));
|
||||||
let _ = self.event_tx.send(MembershipEvent::NodeUpdated(info));
|
return;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
None => {
|
}; // RefMut dropped here
|
||||||
// New node
|
|
||||||
let entry = MembershipEntry::new(info.clone(), NodeState::Alive, lamport);
|
|
||||||
self.members.insert(node_id, entry.clone());
|
|
||||||
self.queue_gossip(entry);
|
|
||||||
counter!("stemedb_membership_events_total", "type" => "joined").increment(1);
|
|
||||||
self.update_node_gauges();
|
|
||||||
|
|
||||||
let _ = self.event_tx.send(MembershipEvent::NodeJoined(info));
|
if let Some((entry, event)) = result {
|
||||||
}
|
self.update_node_gauges();
|
||||||
|
self.queue_gossip(entry);
|
||||||
|
let _ = self.event_tx.send(event);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -1,28 +1,5 @@
|
|||||||
//! Background worker that tails the WAL and updates the KV store.
|
//! Background worker that tails the WAL and updates the KV store.
|
||||||
//!
|
|
||||||
//! The worker reads records from the Write-Ahead Log and persists them
|
|
||||||
//! to the storage engine using content-addressed keys.
|
|
||||||
//!
|
|
||||||
//! # Storage Layout
|
|
||||||
//!
|
|
||||||
//! Following the architecture spec, records are stored with these key prefixes:
|
|
||||||
//! - `H:{hash}` - Assertions (content-addressed by BLAKE3 hash)
|
|
||||||
//! - `V:{assertion_hash}:{vote_hash}` - Votes on assertions
|
|
||||||
//! - `E:{hash}` - Epochs (paradigm definitions)
|
|
||||||
//! - `S:{subject}` - Subject adjacency index (list of assertion hashes)
|
|
||||||
|
|
||||||
use std::sync::atomic::{AtomicBool, Ordering};
|
|
||||||
use std::sync::Arc;
|
|
||||||
use stemedb_core::types::HlcTimestamp;
|
|
||||||
use stemedb_storage::{GenericIndexStore, GenericVoteStore, KVStore, VectorIndex, VisualIndex};
|
|
||||||
use stemedb_wal::{Journal, HEADER_SIZE};
|
|
||||||
use tokio::sync::{Mutex, Notify};
|
|
||||||
use tracing::{debug, info, warn};
|
|
||||||
|
|
||||||
use crate::error::{IngestError, Result};
|
|
||||||
use crate::gossip::GossipBroadcast;
|
|
||||||
|
|
||||||
// Module declarations
|
|
||||||
mod processing;
|
mod processing;
|
||||||
mod record_types;
|
mod record_types;
|
||||||
mod run;
|
mod run;
|
||||||
@ -31,7 +8,19 @@ mod storage;
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests;
|
mod tests;
|
||||||
|
|
||||||
// Re-exports
|
use std::sync::atomic::{AtomicBool, Ordering};
|
||||||
|
use std::sync::Arc;
|
||||||
|
use stemedb_core::types::HlcTimestamp;
|
||||||
|
use stemedb_storage::{
|
||||||
|
GenericIndexStore, GenericVoteStore, KVStore, MemTable, VectorIndex, VisualIndex,
|
||||||
|
};
|
||||||
|
use stemedb_wal::{Journal, HEADER_SIZE};
|
||||||
|
use tokio::sync::{Mutex, Notify};
|
||||||
|
use tracing::{debug, info, warn};
|
||||||
|
|
||||||
|
use crate::error::{IngestError, Result};
|
||||||
|
use crate::gossip::GossipBroadcast;
|
||||||
|
|
||||||
pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType};
|
pub use record_types::{serialize_assertion, serialize_epoch, serialize_vote, RecordType};
|
||||||
|
|
||||||
/// Background worker that tails the WAL and updates the KV store.
|
/// Background worker that tails the WAL and updates the KV store.
|
||||||
@ -41,37 +30,21 @@ pub struct IngestWorker<S> {
|
|||||||
index_store: GenericIndexStore<Arc<S>>,
|
index_store: GenericIndexStore<Arc<S>>,
|
||||||
vote_store: GenericVoteStore<Arc<S>>,
|
vote_store: GenericVoteStore<Arc<S>>,
|
||||||
current_offset: u64,
|
current_offset: u64,
|
||||||
/// Optional notification channel for event-driven materialization.
|
|
||||||
/// When set, the worker signals this after each successful ingestion
|
|
||||||
/// so downstream consumers (e.g., the Materializer) can react immediately.
|
|
||||||
notify: Option<Arc<Notify>>,
|
notify: Option<Arc<Notify>>,
|
||||||
/// Optional vector index for semantic similarity search.
|
|
||||||
/// When set, assertions with embedding vectors are indexed on ingestion.
|
|
||||||
vector_index: Option<Arc<dyn VectorIndex>>,
|
vector_index: Option<Arc<dyn VectorIndex>>,
|
||||||
/// Optional visual index for perceptual hash similarity search.
|
|
||||||
/// When set, assertions with visual_hash are indexed on ingestion.
|
|
||||||
visual_index: Option<Arc<dyn VisualIndex>>,
|
visual_index: Option<Arc<dyn VisualIndex>>,
|
||||||
/// Shutdown signal shared with Ingestor.
|
|
||||||
/// When set to true, the run() loop exits gracefully.
|
|
||||||
shutdown: Arc<AtomicBool>,
|
shutdown: Arc<AtomicBool>,
|
||||||
/// Hybrid Logical Clock for distributed causal ordering.
|
|
||||||
///
|
|
||||||
/// Used to generate HLC timestamps for supersessions and epoch
|
|
||||||
/// ingestion. Provides causal ordering guarantees across distributed
|
|
||||||
/// nodes, even with clock skew.
|
|
||||||
hlc: uhlc::HLC,
|
hlc: uhlc::HLC,
|
||||||
/// Optional gossip broadcaster for distributed replication.
|
|
||||||
///
|
|
||||||
/// When set, the worker broadcasts newly ingested assertions to peer nodes.
|
|
||||||
gossip_broadcaster: Option<Arc<dyn GossipBroadcast>>,
|
gossip_broadcaster: Option<Arc<dyn GossipBroadcast>>,
|
||||||
|
/// MemTable for read-your-writes eviction signaling.
|
||||||
|
/// When assertions are indexed, we signal the MemTable to evict them.
|
||||||
|
memtable: Option<Arc<MemTable>>,
|
||||||
|
/// DEBUG ONLY: Skip signature verification for demos/testing
|
||||||
|
pub skip_signature_verification: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<S: KVStore + 'static> IngestWorker<S> {
|
impl<S: KVStore + 'static> IngestWorker<S> {
|
||||||
/// Create a new ingest worker, resuming from the last persisted cursor.
|
/// Create a new ingest worker.
|
||||||
///
|
|
||||||
/// If a cursor checkpoint exists in the KV store, the worker resumes
|
|
||||||
/// from that offset. Otherwise, it starts from the beginning of the WAL
|
|
||||||
/// (after the file header).
|
|
||||||
pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> {
|
pub async fn new(journal: Arc<Mutex<Journal>>, store: Arc<S>) -> Result<Self> {
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
let vote_store = GenericVoteStore::new(store.clone());
|
let vote_store = GenericVoteStore::new(store.clone());
|
||||||
@ -86,10 +59,7 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
offset
|
offset
|
||||||
}
|
}
|
||||||
Some(bytes) => {
|
Some(bytes) => {
|
||||||
warn!(
|
warn!(len = bytes.len(), "Corrupt cursor value, starting from beginning");
|
||||||
len = bytes.len(),
|
|
||||||
"Corrupt cursor value (expected 8 bytes), starting from beginning"
|
|
||||||
);
|
|
||||||
HEADER_SIZE as u64
|
HEADER_SIZE as u64
|
||||||
}
|
}
|
||||||
None => {
|
None => {
|
||||||
@ -97,7 +67,6 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
HEADER_SIZE as u64
|
HEADER_SIZE as u64
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
// Initialize HLC with random node ID
|
|
||||||
let hlc = uhlc::HLCBuilder::new().build();
|
let hlc = uhlc::HLCBuilder::new().build();
|
||||||
|
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
@ -112,13 +81,12 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
shutdown: Arc::new(AtomicBool::new(false)),
|
shutdown: Arc::new(AtomicBool::new(false)),
|
||||||
hlc,
|
hlc,
|
||||||
gossip_broadcaster: None,
|
gossip_broadcaster: None,
|
||||||
|
memtable: None,
|
||||||
|
skip_signature_verification: false,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Create a new ingest worker with a shared shutdown signal.
|
/// Create with shutdown signal.
|
||||||
///
|
|
||||||
/// This is used by the Ingestor to coordinate shutdown between the
|
|
||||||
/// manager and the background task.
|
|
||||||
pub async fn with_shutdown(
|
pub async fn with_shutdown(
|
||||||
journal: Arc<Mutex<Journal>>,
|
journal: Arc<Mutex<Journal>>,
|
||||||
store: Arc<S>,
|
store: Arc<S>,
|
||||||
@ -129,154 +97,85 @@ impl<S: KVStore + 'static> IngestWorker<S> {
|
|||||||
Ok(worker)
|
Ok(worker)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Check if shutdown has been requested.
|
/// Is shutdown?
|
||||||
pub fn is_shutdown(&self) -> bool {
|
pub fn is_shutdown(&self) -> bool {
|
||||||
self.shutdown.load(Ordering::Relaxed)
|
self.shutdown.load(Ordering::Relaxed)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a notification channel for event-driven downstream consumers.
|
/// With notify.
|
||||||
///
|
|
||||||
/// After each successful record ingestion, the worker will signal this
|
|
||||||
/// `Notify` so consumers like the Materializer can react immediately
|
|
||||||
/// instead of polling on a fixed interval.
|
|
||||||
pub fn with_notify(mut self, notify: Arc<Notify>) -> Self {
|
pub fn with_notify(mut self, notify: Arc<Notify>) -> Self {
|
||||||
self.notify = Some(notify);
|
self.notify = Some(notify);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a notification channel for event-driven WAL reading.
|
/// With flush notify.
|
||||||
///
|
|
||||||
/// When the GroupCommitBuffer flushes new data to the WAL, it signals
|
|
||||||
/// this `Notify` so the worker can immediately refresh its segment list
|
|
||||||
/// and process the new records. This is the counterpart to the downstream
|
|
||||||
/// `with_notify` - this one is for upstream signaling from the writer.
|
|
||||||
///
|
|
||||||
/// Note: This uses the same internal field as `with_notify` since we
|
|
||||||
/// want to wake up on both upstream writes and downstream requests.
|
|
||||||
/// The run loop handles both cases by refreshing segments and processing.
|
|
||||||
pub fn with_flush_notify(mut self, notify: Arc<Notify>) -> Self {
|
pub fn with_flush_notify(mut self, notify: Arc<Notify>) -> Self {
|
||||||
self.notify = Some(notify);
|
self.notify = Some(notify);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a vector index for semantic similarity search.
|
/// With vector index.
|
||||||
///
|
|
||||||
/// When set, assertions with embedding vectors (`vector` field) are
|
|
||||||
/// automatically indexed during ingestion, enabling k-NN queries.
|
|
||||||
///
|
|
||||||
/// # Example
|
|
||||||
/// ```ignore
|
|
||||||
/// let vector_index = Arc::new(HnswVectorIndex::new(128));
|
|
||||||
/// let worker = IngestWorker::new(journal, store)
|
|
||||||
/// .await?
|
|
||||||
/// .with_vector_index(vector_index);
|
|
||||||
/// ```
|
|
||||||
pub fn with_vector_index(mut self, index: Arc<dyn VectorIndex>) -> Self {
|
pub fn with_vector_index(mut self, index: Arc<dyn VectorIndex>) -> Self {
|
||||||
self.vector_index = Some(index);
|
self.vector_index = Some(index);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a visual index for perceptual hash similarity search.
|
/// With visual index.
|
||||||
///
|
|
||||||
/// When set, assertions with visual hashes (`visual_hash` field) are
|
|
||||||
/// automatically indexed during ingestion, enabling visual similarity queries.
|
|
||||||
///
|
|
||||||
/// # Example
|
|
||||||
/// ```ignore
|
|
||||||
/// let visual_index = Arc::new(BkTreeVisualIndex::new());
|
|
||||||
/// let worker = IngestWorker::new(journal, store)
|
|
||||||
/// .await?
|
|
||||||
/// .with_visual_index(visual_index);
|
|
||||||
/// ```
|
|
||||||
pub fn with_visual_index(mut self, index: Arc<dyn VisualIndex>) -> Self {
|
pub fn with_visual_index(mut self, index: Arc<dyn VisualIndex>) -> Self {
|
||||||
self.visual_index = Some(index);
|
self.visual_index = Some(index);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Configure the HLC with a specific node ID.
|
/// With node ID.
|
||||||
///
|
|
||||||
/// Use this when running multiple nodes in a distributed cluster to ensure
|
|
||||||
/// each node has a unique identifier for total ordering of concurrent events.
|
|
||||||
///
|
|
||||||
/// # Example
|
|
||||||
/// ```ignore
|
|
||||||
/// let node_id = uhlc::ID::try_from(&node_uuid.as_bytes()[..]).unwrap();
|
|
||||||
/// let worker = IngestWorker::new(journal, store)
|
|
||||||
/// .await?
|
|
||||||
/// .with_node_id(node_id);
|
|
||||||
/// ```
|
|
||||||
pub fn with_node_id(mut self, node_id: uhlc::ID) -> Self {
|
pub fn with_node_id(mut self, node_id: uhlc::ID) -> Self {
|
||||||
self.hlc = uhlc::HLCBuilder::new().with_id(node_id).build();
|
self.hlc = uhlc::HLCBuilder::new().with_id(node_id).build();
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a gossip broadcaster for distributed replication.
|
/// With gossip.
|
||||||
///
|
|
||||||
/// When set, newly ingested assertions are broadcast to peer nodes
|
|
||||||
/// for low-latency replication. The gossip layer is best-effort:
|
|
||||||
/// failures are logged but don't block the ingestion pipeline.
|
|
||||||
///
|
|
||||||
/// # Example
|
|
||||||
/// ```ignore
|
|
||||||
/// let broadcaster = GossipBroadcaster::new(peers).await?;
|
|
||||||
/// let worker = IngestWorker::new(journal, store)
|
|
||||||
/// .await?
|
|
||||||
/// .with_gossip_broadcaster(Arc::new(broadcaster));
|
|
||||||
/// ```
|
|
||||||
pub fn with_gossip_broadcaster(mut self, broadcaster: Arc<dyn GossipBroadcast>) -> Self {
|
pub fn with_gossip_broadcaster(mut self, broadcaster: Arc<dyn GossipBroadcast>) -> Self {
|
||||||
self.gossip_broadcaster = Some(broadcaster);
|
self.gossip_broadcaster = Some(broadcaster);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the gossip broadcaster if configured.
|
/// With MemTable for read-your-writes eviction signaling.
|
||||||
|
///
|
||||||
|
/// When assertions are indexed in KVStore, the IngestWorker signals
|
||||||
|
/// the MemTable to evict them, freeing memory.
|
||||||
|
pub fn with_memtable(mut self, memtable: Arc<MemTable>) -> Self {
|
||||||
|
self.memtable = Some(memtable);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get gossip.
|
||||||
pub fn gossip_broadcaster(&self) -> Option<&Arc<dyn GossipBroadcast>> {
|
pub fn gossip_broadcaster(&self) -> Option<&Arc<dyn GossipBroadcast>> {
|
||||||
self.gossip_broadcaster.as_ref()
|
self.gossip_broadcaster.as_ref()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Generates a new HLC timestamp.
|
/// Gen HLC.
|
||||||
///
|
|
||||||
/// The returned timestamp is guaranteed to be greater than all previously
|
|
||||||
/// generated timestamps from this worker, even if the system clock goes
|
|
||||||
/// backwards.
|
|
||||||
///
|
|
||||||
/// Use this when creating supersessions or other records that need
|
|
||||||
/// causal ordering across distributed nodes.
|
|
||||||
pub fn generate_hlc_timestamp(&self) -> HlcTimestamp {
|
pub fn generate_hlc_timestamp(&self) -> HlcTimestamp {
|
||||||
HlcTimestamp::now(&self.hlc)
|
HlcTimestamp::now(&self.hlc)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Updates the HLC with a timestamp from a remote node.
|
/// Update HLC.
|
||||||
///
|
|
||||||
/// Call this when receiving data from another node to ensure the local
|
|
||||||
/// clock stays synchronized. The HLC will advance to at least the
|
|
||||||
/// remote timestamp, maintaining causal ordering.
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
///
|
|
||||||
/// * `remote` - HLC timestamp received from a remote node
|
|
||||||
///
|
|
||||||
/// # Returns
|
|
||||||
///
|
|
||||||
/// Ok(()) if the clock was updated, Err if the timestamp is too far
|
|
||||||
/// in the future (clock skew protection).
|
|
||||||
pub fn update_hlc_from_remote(&self, remote: &HlcTimestamp) -> Result<()> {
|
pub fn update_hlc_from_remote(&self, remote: &HlcTimestamp) -> Result<()> {
|
||||||
if let Some(ts) = remote.to_uhlc() {
|
if let Some(ts) = remote.to_uhlc() {
|
||||||
self.hlc.update_with_timestamp(&ts).map_err(|e| {
|
self.hlc.update_with_timestamp(&ts).map_err(|e| {
|
||||||
warn!(
|
warn!(remote_time = remote.time_ntp64, error = %e, "HLC update failed");
|
||||||
remote_time = remote.time_ntp64,
|
|
||||||
error = %e,
|
|
||||||
"Failed to update HLC from remote timestamp (clock skew?)"
|
|
||||||
);
|
|
||||||
IngestError::InputValidation(format!("HLC update failed: {}", e))
|
IngestError::InputValidation(format!("HLC update failed: {}", e))
|
||||||
})?;
|
})?;
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the current HLC node ID as bytes.
|
/// Get HLC node ID.
|
||||||
///
|
|
||||||
/// Useful for including in CRDT state or other distributed data structures.
|
|
||||||
pub fn hlc_node_id(&self) -> [u8; 16] {
|
pub fn hlc_node_id(&self) -> [u8; 16] {
|
||||||
self.hlc.get_id().to_le_bytes()
|
self.hlc.get_id().to_le_bytes()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// DEBUG ONLY: Enable skipping signature verification.
|
||||||
|
pub fn with_skip_signature_verification(mut self, skip: bool) -> Self {
|
||||||
|
self.skip_signature_verification = skip;
|
||||||
|
self
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,112 +1,44 @@
|
|||||||
//! Storage helper methods for the IngestWorker.
|
//! Storage orchestration for IngestWorker.
|
||||||
//!
|
|
||||||
//! Contains methods for persisting cursors, writing supersession cascades,
|
|
||||||
//! and building storage keys.
|
|
||||||
|
|
||||||
use super::IngestWorker;
|
use super::IngestWorker;
|
||||||
use crate::error::Result;
|
use crate::error::Result;
|
||||||
use stemedb_core::serde::deserialize;
|
|
||||||
use stemedb_core::types::Epoch;
|
|
||||||
use stemedb_storage::key_codec;
|
|
||||||
use stemedb_storage::KVStore;
|
use stemedb_storage::KVStore;
|
||||||
use tracing::{debug, warn};
|
|
||||||
|
|
||||||
impl<S: KVStore + 'static> IngestWorker<S> {
|
impl<S: KVStore + 'static> IngestWorker<S> {
|
||||||
/// Maximum depth for walking supersession chains at write time.
|
/// Write cascade markers for superseded epochs.
|
||||||
pub(super) const MAX_CASCADE_DEPTH: usize = 100;
|
|
||||||
|
|
||||||
/// Write `\x00SUPERSEDED:` markers for the full transitive closure of superseded epochs.
|
|
||||||
///
|
///
|
||||||
/// All markers point to the LATEST superseding epoch (`new_epoch_id`).
|
/// Walks the epoch supersession chain and writes markers so that
|
||||||
/// For chain C→B→A: writes `SUPERSEDED:B→C` and `SUPERSEDED:A→C`.
|
/// queries can check if an epoch is superseded in O(1) time.
|
||||||
///
|
pub async fn write_supersession_cascade(
|
||||||
/// This enables O(1) "is this epoch superseded?" checks at query time:
|
|
||||||
/// just look for `\x00SUPERSEDED:{epoch_id}` key existence.
|
|
||||||
///
|
|
||||||
/// # Algorithm
|
|
||||||
///
|
|
||||||
/// 1. Start with the immediately superseded epoch
|
|
||||||
/// 2. Write marker pointing to the new (latest) epoch
|
|
||||||
/// 3. Read the superseded epoch to check if it also supersedes something
|
|
||||||
/// 4. Repeat transitively until end of chain or max depth
|
|
||||||
///
|
|
||||||
/// # Safety
|
|
||||||
///
|
|
||||||
/// - Cycle detection via visited set
|
|
||||||
/// - Max depth guard (100 levels)
|
|
||||||
/// - Missing/corrupt epochs gracefully terminate the walk
|
|
||||||
pub(super) async fn write_supersession_cascade(
|
|
||||||
&self,
|
&self,
|
||||||
new_epoch_id: &[u8; 32],
|
new_epoch_id: &[u8; 32],
|
||||||
superseded_id: &[u8; 32],
|
old_epoch_id: &[u8; 32],
|
||||||
) -> Result<()> {
|
) -> Result<()> {
|
||||||
let mut current_id = *superseded_id;
|
let mut current_id = *old_epoch_id;
|
||||||
let mut visited = std::collections::HashSet::new();
|
|
||||||
let mut depth = 0;
|
let mut depth = 0;
|
||||||
|
const MAX_CASCADE_DEPTH: usize = 100;
|
||||||
|
|
||||||
loop {
|
while depth < MAX_CASCADE_DEPTH {
|
||||||
// Cycle detection
|
// Write marker: \x00SUPERSEDED:{old_id} -> {new_id}
|
||||||
if !visited.insert(current_id) {
|
let key = stemedb_storage::key_codec::superseded_key(&hex::encode(current_id));
|
||||||
debug!(
|
self.store.put(&key, new_epoch_id).await?;
|
||||||
epoch_id = %hex::encode(current_id),
|
|
||||||
"Cycle detected in supersession cascade, stopping write"
|
|
||||||
);
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Max depth guard
|
// Follow the chain: look up what this epoch superseded
|
||||||
if depth >= Self::MAX_CASCADE_DEPTH {
|
let epoch_key = stemedb_storage::key_codec::epoch_key(&hex::encode(current_id));
|
||||||
warn!(
|
if let Some(bytes) = self.store.get(&epoch_key).await? {
|
||||||
depth,
|
let epoch: stemedb_core::types::Epoch = stemedb_core::serde::deserialize(&bytes)
|
||||||
new_epoch = %hex::encode(new_epoch_id),
|
.map_err(|e| crate::error::IngestError::Serialization(e.to_string()))?;
|
||||||
"Supersession cascade exceeded max depth"
|
|
||||||
);
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Write marker: \x00SUPERSEDED:{current_id} → new_epoch_id (always the LATEST)
|
if let Some(prev_id) = epoch.supersedes {
|
||||||
let marker_key = key_codec::superseded_key(&hex::encode(current_id));
|
current_id = prev_id;
|
||||||
self.store.put(&marker_key, new_epoch_id).await?;
|
depth += 1;
|
||||||
|
} else {
|
||||||
debug!(
|
|
||||||
superseded = %hex::encode(current_id),
|
|
||||||
by = %hex::encode(new_epoch_id),
|
|
||||||
depth,
|
|
||||||
"Wrote supersession marker"
|
|
||||||
);
|
|
||||||
|
|
||||||
// Check if current_id also superseded something (transitive closure)
|
|
||||||
let epoch_key = key_codec::epoch_key(&hex::encode(current_id));
|
|
||||||
let ancestor_epoch = match self.store.get(&epoch_key).await? {
|
|
||||||
Some(bytes) => match deserialize::<Epoch>(&bytes) {
|
|
||||||
Ok(e) => e,
|
|
||||||
Err(e) => {
|
|
||||||
debug!(
|
|
||||||
epoch_id = %hex::encode(current_id),
|
|
||||||
error = %e,
|
|
||||||
"Failed to deserialize ancestor epoch, stopping cascade"
|
|
||||||
);
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
},
|
|
||||||
None => {
|
|
||||||
debug!(
|
|
||||||
epoch_id = %hex::encode(current_id),
|
|
||||||
"Ancestor epoch not found, stopping cascade"
|
|
||||||
);
|
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
};
|
} else {
|
||||||
|
break;
|
||||||
match ancestor_epoch.supersedes {
|
|
||||||
Some(grandparent_id) => {
|
|
||||||
current_id = grandparent_id;
|
|
||||||
depth += 1;
|
|
||||||
}
|
|
||||||
None => break, // End of chain
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,12 +1,14 @@
|
|||||||
//! Candidate retrieval from indexes.
|
//! Candidate retrieval from indexes.
|
||||||
//!
|
//!
|
||||||
//! This module handles fetching assertions from different indexes:
|
//! This module handles fetching assertions from different indexes:
|
||||||
|
//! - MemTable (read-your-writes, checked first for freshest data)
|
||||||
//! - Subject index (S:{subject})
|
//! - Subject index (S:{subject})
|
||||||
//! - Compound index (SP:{subject}:{predicate})
|
//! - Compound index (SP:{subject}:{predicate})
|
||||||
//! - Vector index (HNSW k-NN)
|
//! - Vector index (HNSW k-NN)
|
||||||
//! - Visual index (BK-tree hamming distance)
|
//! - Visual index (BK-tree hamming distance)
|
||||||
//! - Full scan (H: prefix)
|
//! - Full scan (H: prefix)
|
||||||
|
|
||||||
|
use std::collections::HashSet;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
use stemedb_core::types::Assertion;
|
use stemedb_core::types::Assertion;
|
||||||
@ -19,12 +21,43 @@ use crate::query::parse_hex_phash;
|
|||||||
use super::QueryEngine;
|
use super::QueryEngine;
|
||||||
|
|
||||||
impl<S: KVStore + 'static> QueryEngine<S> {
|
impl<S: KVStore + 'static> QueryEngine<S> {
|
||||||
/// Fetch assertions for a specific subject using the subject index.
|
/// Compute the hash of an assertion for deduplication.
|
||||||
pub(super) async fn fetch_by_subject(&self, subject: &str) -> Result<Vec<Assertion>> {
|
///
|
||||||
let hash_list = self.index_store.get_by_subject(subject).await?;
|
/// Uses BLAKE3 hash of the serialized assertion, matching the hash
|
||||||
|
/// computed during ingestion.
|
||||||
|
fn compute_assertion_hash(assertion: &Assertion) -> Option<[u8; 32]> {
|
||||||
|
let serialized = stemedb_core::serde::serialize(assertion).ok()?;
|
||||||
|
Some(*blake3::hash(&serialized).as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
let mut results = Vec::with_capacity(hash_list.len());
|
/// Fetch assertions for a specific subject using the subject index.
|
||||||
|
///
|
||||||
|
/// Merges MemTable results (freshest) with KVStore results, deduplicating by hash.
|
||||||
|
pub(super) async fn fetch_by_subject(&self, subject: &str) -> Result<Vec<Assertion>> {
|
||||||
|
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
||||||
|
let mut results = Vec::new();
|
||||||
|
|
||||||
|
// Check MemTable first (has freshest data)
|
||||||
|
if let Some(ref memtable) = self.memtable {
|
||||||
|
for assertion in memtable.get_by_subject(subject) {
|
||||||
|
if let Some(hash) = Self::compute_assertion_hash(&assertion) {
|
||||||
|
if seen_hashes.insert(hash) {
|
||||||
|
results.push(assertion);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !results.is_empty() {
|
||||||
|
debug!(subject, memtable_count = results.len(), "Found assertions in MemTable");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then fetch from KVStore (existing indexed data)
|
||||||
|
let hash_list = self.index_store.get_by_subject(subject).await?;
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
|
if !seen_hashes.insert(hash) {
|
||||||
|
continue; // Already in results from MemTable
|
||||||
|
}
|
||||||
|
|
||||||
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
@ -40,15 +73,42 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Fetch assertions for a specific subject and predicate using the compound index.
|
/// Fetch assertions for a specific subject and predicate using the compound index.
|
||||||
|
///
|
||||||
|
/// Merges MemTable results (freshest) with KVStore results, deduplicating by hash.
|
||||||
pub(super) async fn fetch_by_subject_predicate(
|
pub(super) async fn fetch_by_subject_predicate(
|
||||||
&self,
|
&self,
|
||||||
subject: &str,
|
subject: &str,
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
) -> Result<Vec<Assertion>> {
|
) -> Result<Vec<Assertion>> {
|
||||||
let hash_list = self.index_store.get_by_subject_predicate(subject, predicate).await?;
|
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
||||||
|
let mut results = Vec::new();
|
||||||
|
|
||||||
let mut results = Vec::with_capacity(hash_list.len());
|
// Check MemTable first (has freshest data)
|
||||||
|
if let Some(ref memtable) = self.memtable {
|
||||||
|
for assertion in memtable.get_by_subject_predicate(subject, predicate) {
|
||||||
|
if let Some(hash) = Self::compute_assertion_hash(&assertion) {
|
||||||
|
if seen_hashes.insert(hash) {
|
||||||
|
results.push(assertion);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !results.is_empty() {
|
||||||
|
debug!(
|
||||||
|
subject,
|
||||||
|
predicate,
|
||||||
|
memtable_count = results.len(),
|
||||||
|
"Found assertions in MemTable"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then fetch from KVStore (existing indexed data)
|
||||||
|
let hash_list = self.index_store.get_by_subject_predicate(subject, predicate).await?;
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
|
if !seen_hashes.insert(hash) {
|
||||||
|
continue; // Already in results from MemTable
|
||||||
|
}
|
||||||
|
|
||||||
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
|
||||||
if let Some(data) = self.store.get(&assertion_key).await? {
|
if let Some(data) = self.store.get(&assertion_key).await? {
|
||||||
match self.deserialize_assertion(&data) {
|
match self.deserialize_assertion(&data) {
|
||||||
@ -67,11 +127,25 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
///
|
///
|
||||||
/// Used for alias resolution where a single query subject expands to
|
/// Used for alias resolution where a single query subject expands to
|
||||||
/// multiple aliased subjects (e.g., code:// and rfc:// paths).
|
/// multiple aliased subjects (e.g., code:// and rfc:// paths).
|
||||||
|
/// Merges MemTable results with KVStore results.
|
||||||
pub(super) async fn fetch_by_subjects(&self, subjects: &[String]) -> Result<Vec<Assertion>> {
|
pub(super) async fn fetch_by_subjects(&self, subjects: &[String]) -> Result<Vec<Assertion>> {
|
||||||
use std::collections::HashSet;
|
|
||||||
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
||||||
let mut results = Vec::new();
|
let mut results = Vec::new();
|
||||||
|
|
||||||
|
// Check MemTable first (has freshest data)
|
||||||
|
if let Some(ref memtable) = self.memtable {
|
||||||
|
for subject in subjects {
|
||||||
|
for assertion in memtable.get_by_subject(subject) {
|
||||||
|
if let Some(hash) = Self::compute_assertion_hash(&assertion) {
|
||||||
|
if seen_hashes.insert(hash) {
|
||||||
|
results.push(assertion);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then fetch from KVStore
|
||||||
for subject in subjects {
|
for subject in subjects {
|
||||||
let hash_list = self.index_store.get_by_subject(subject).await?;
|
let hash_list = self.index_store.get_by_subject(subject).await?;
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
@ -102,15 +176,29 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
/// Fetch assertions for multiple subjects with predicate filter, deduplicating by hash.
|
/// Fetch assertions for multiple subjects with predicate filter, deduplicating by hash.
|
||||||
///
|
///
|
||||||
/// Used for alias resolution when both subject and predicate are specified.
|
/// Used for alias resolution when both subject and predicate are specified.
|
||||||
|
/// Merges MemTable results with KVStore results.
|
||||||
pub(super) async fn fetch_by_subjects_predicate(
|
pub(super) async fn fetch_by_subjects_predicate(
|
||||||
&self,
|
&self,
|
||||||
subjects: &[String],
|
subjects: &[String],
|
||||||
predicate: &str,
|
predicate: &str,
|
||||||
) -> Result<Vec<Assertion>> {
|
) -> Result<Vec<Assertion>> {
|
||||||
use std::collections::HashSet;
|
|
||||||
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
|
||||||
let mut results = Vec::new();
|
let mut results = Vec::new();
|
||||||
|
|
||||||
|
// Check MemTable first (has freshest data)
|
||||||
|
if let Some(ref memtable) = self.memtable {
|
||||||
|
for subject in subjects {
|
||||||
|
for assertion in memtable.get_by_subject_predicate(subject, predicate) {
|
||||||
|
if let Some(hash) = Self::compute_assertion_hash(&assertion) {
|
||||||
|
if seen_hashes.insert(hash) {
|
||||||
|
results.push(assertion);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then fetch from KVStore
|
||||||
for subject in subjects {
|
for subject in subjects {
|
||||||
let hash_list = self.index_store.get_by_subject_predicate(subject, predicate).await?;
|
let hash_list = self.index_store.get_by_subject_predicate(subject, predicate).await?;
|
||||||
for hash in hash_list {
|
for hash in hash_list {
|
||||||
|
|||||||
@ -6,7 +6,7 @@
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
use stemedb_core::types::Assertion;
|
use stemedb_core::types::Assertion;
|
||||||
use stemedb_storage::{AliasStore, GenericIndexStore, KVStore, VectorIndex, VisualIndex};
|
use stemedb_storage::{AliasStore, GenericIndexStore, KVStore, MemTable, VectorIndex, VisualIndex};
|
||||||
// Trait import required for IndexStore methods on GenericIndexStore
|
// Trait import required for IndexStore methods on GenericIndexStore
|
||||||
#[allow(unused_imports)]
|
#[allow(unused_imports)]
|
||||||
use stemedb_storage::IndexStore;
|
use stemedb_storage::IndexStore;
|
||||||
@ -45,13 +45,24 @@ pub struct QueryEngine<S> {
|
|||||||
pub(super) visual_index: Option<Arc<dyn VisualIndex>>,
|
pub(super) visual_index: Option<Arc<dyn VisualIndex>>,
|
||||||
/// Optional alias store for cross-scheme subject resolution.
|
/// Optional alias store for cross-scheme subject resolution.
|
||||||
pub(super) alias_store: Option<Arc<dyn AliasStore>>,
|
pub(super) alias_store: Option<Arc<dyn AliasStore>>,
|
||||||
|
/// Optional MemTable for read-your-writes consistency.
|
||||||
|
/// When set, queries merge MemTable with KVStore to ensure recently
|
||||||
|
/// written assertions are immediately visible.
|
||||||
|
pub(super) memtable: Option<Arc<MemTable>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<S: KVStore + 'static> QueryEngine<S> {
|
impl<S: KVStore + 'static> QueryEngine<S> {
|
||||||
/// Create a new query engine backed by the given store.
|
/// Create a new query engine backed by the given store.
|
||||||
pub fn new(store: Arc<S>) -> Self {
|
pub fn new(store: Arc<S>) -> Self {
|
||||||
let index_store = GenericIndexStore::new(store.clone());
|
let index_store = GenericIndexStore::new(store.clone());
|
||||||
Self { store, index_store, vector_index: None, visual_index: None, alias_store: None }
|
Self {
|
||||||
|
store,
|
||||||
|
index_store,
|
||||||
|
vector_index: None,
|
||||||
|
visual_index: None,
|
||||||
|
alias_store: None,
|
||||||
|
memtable: None,
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Attach a vector index for k-NN similarity search.
|
/// Attach a vector index for k-NN similarity search.
|
||||||
@ -83,6 +94,16 @@ impl<S: KVStore + 'static> QueryEngine<S> {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Attach a MemTable for read-your-writes consistency.
|
||||||
|
///
|
||||||
|
/// When set, queries merge MemTable entries with KVStore results to
|
||||||
|
/// ensure recently written assertions are immediately visible before
|
||||||
|
/// the IngestWorker has processed them into KVStore indexes.
|
||||||
|
pub fn with_memtable(mut self, memtable: Arc<MemTable>) -> Self {
|
||||||
|
self.memtable = Some(memtable);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
/// Execute a query and return matching assertions.
|
/// Execute a query and return matching assertions.
|
||||||
///
|
///
|
||||||
/// # Query Execution Strategy
|
/// # Query Execution Strategy
|
||||||
|
|||||||
@ -7,6 +7,7 @@ pub use ed25519_dalek::{Signer, SigningKey};
|
|||||||
pub use rand::rngs::OsRng;
|
pub use rand::rngs::OsRng;
|
||||||
pub use std::sync::Arc;
|
pub use std::sync::Arc;
|
||||||
pub use stemedb_core::serde::serialize;
|
pub use stemedb_core::serde::serialize;
|
||||||
|
pub use stemedb_core::signing::compute_content_hash_v2;
|
||||||
pub use stemedb_core::testing::AssertionBuilder;
|
pub use stemedb_core::testing::AssertionBuilder;
|
||||||
pub use stemedb_core::types::{
|
pub use stemedb_core::types::{
|
||||||
Assertion, EscalationLevel, EscalationPolicy, LifecycleStage, ObjectValue, ResolutionStatus,
|
Assertion, EscalationLevel, EscalationPolicy, LifecycleStage, ObjectValue, ResolutionStatus,
|
||||||
@ -93,12 +94,13 @@ pub fn create_signed_assertion_v2(
|
|||||||
.signatures(vec![])
|
.signatures(vec![])
|
||||||
.build();
|
.build();
|
||||||
|
|
||||||
// Serialize to get content hash
|
// Compute the canonical v2 content hash (subject:predicate:object:source_hash:...).
|
||||||
let bytes = serialize(&assertion).expect("serialize assertion for v2 signing");
|
// Must use compute_content_hash_v2, NOT blake3::hash(serialize(&assertion)) —
|
||||||
let content_hash = blake3::hash(&bytes);
|
// the verifier checks the canonical fields hash, not the rkyv serialization hash.
|
||||||
|
let content_hash = compute_content_hash_v2(&assertion);
|
||||||
|
|
||||||
// Sign the content hash (v2 enterprise format)
|
// Sign the content hash (v2 enterprise format)
|
||||||
let signature = signing_key.sign(content_hash.as_bytes());
|
let signature = signing_key.sign(&content_hash);
|
||||||
|
|
||||||
// Add signature with version 2
|
// Add signature with version 2
|
||||||
assertion.signatures = vec![SignatureEntry {
|
assertion.signatures = vec![SignatureEntry {
|
||||||
|
|||||||
@ -12,7 +12,7 @@ use tokio::sync::Mutex;
|
|||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
|
|
||||||
use crate::agent::Agent;
|
use crate::agent::Agent;
|
||||||
use crate::helpers::{wait_until_ingested, write_assertion_to_wal};
|
use crate::helpers::write_assertion_to_wal;
|
||||||
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
||||||
|
|
||||||
/// Test that RecencyLens correctly selects the most recent assertion.
|
/// Test that RecencyLens correctly selects the most recent assertion.
|
||||||
@ -22,9 +22,9 @@ use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
|||||||
pub(crate) async fn run_recency_lens_test<S: KVStore + 'static>(
|
pub(crate) async fn run_recency_lens_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let agent = &agents[0];
|
let agent = &agents[0];
|
||||||
let subject = "RecencyTest_Entity";
|
let subject = "RecencyTest_Entity";
|
||||||
@ -48,35 +48,32 @@ pub(crate) async fn run_recency_lens_test<S: KVStore + 'static>(
|
|||||||
Some(2000),
|
Some(2000),
|
||||||
);
|
);
|
||||||
|
|
||||||
// Write both to WAL and track last offset
|
// Write both to WAL
|
||||||
let _old_result = match write_assertion_to_wal(journal, &old_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &old_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Recency test: failed to write old assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Recency test: failed to write old assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
let new_result = match write_assertion_to_wal(journal, &new_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &new_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Recency test: failed to write new assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Recency test: failed to write new assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
let last_offset = new_result.end_offset;
|
|
||||||
|
|
||||||
// Wait for ingestion to reach the last offset
|
// Synchronously drain all pending WAL entries (deterministic, no background task scheduling)
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Recency test: ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -152,9 +149,9 @@ pub(crate) async fn run_recency_lens_test<S: KVStore + 'static>(
|
|||||||
pub(crate) async fn run_lifecycle_test<S: KVStore + 'static>(
|
pub(crate) async fn run_lifecycle_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let agent = &agents[0];
|
let agent = &agents[0];
|
||||||
let subject = "LifecycleTest_Entity";
|
let subject = "LifecycleTest_Entity";
|
||||||
@ -178,35 +175,32 @@ pub(crate) async fn run_lifecycle_test<S: KVStore + 'static>(
|
|||||||
Some(2000),
|
Some(2000),
|
||||||
);
|
);
|
||||||
|
|
||||||
// Write both to WAL and track last offset
|
// Write both to WAL
|
||||||
let _proposed_result = match write_assertion_to_wal(journal, &proposed).await {
|
if let Err(e) = write_assertion_to_wal(journal, &proposed).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Lifecycle test: failed to write proposed: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Lifecycle test: failed to write proposed: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
let approved_result = match write_assertion_to_wal(journal, &approved).await {
|
if let Err(e) = write_assertion_to_wal(journal, &approved).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Lifecycle test: failed to write approved: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Lifecycle test: failed to write approved: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
let last_offset = approved_result.end_offset;
|
|
||||||
|
|
||||||
// Wait for ingestion to reach the last offset
|
// Synchronously drain all pending WAL entries
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Lifecycle test: ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -10,9 +10,7 @@ use tokio::sync::Mutex;
|
|||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
|
|
||||||
use crate::agent::Agent;
|
use crate::agent::Agent;
|
||||||
use crate::helpers::{
|
use crate::helpers::{compute_assertion_hash, write_assertion_to_wal, write_vote_to_wal};
|
||||||
compute_assertion_hash, wait_until_ingested, write_assertion_to_wal, write_vote_to_wal,
|
|
||||||
};
|
|
||||||
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
@ -32,9 +30,9 @@ use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
|||||||
pub(crate) async fn run_vote_consensus_test<S: KVStore + 'static>(
|
pub(crate) async fn run_vote_consensus_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
// Need at least 3 agents: Alpha, Beta, Believer
|
// Need at least 3 agents: Alpha, Beta, Believer
|
||||||
if agents.len() < 3 {
|
if agents.len() < 3 {
|
||||||
@ -71,37 +69,34 @@ pub(crate) async fn run_vote_consensus_test<S: KVStore + 'static>(
|
|||||||
Some(1001),
|
Some(1001),
|
||||||
);
|
);
|
||||||
|
|
||||||
// Write both assertions to WAL and track last offset
|
// Write both assertions to WAL
|
||||||
let _alpha_result = match write_assertion_to_wal(journal, &alpha_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &alpha_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Vote consensus test: failed to write Alpha assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Vote consensus test: failed to write Alpha assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
|
|
||||||
let beta_result = match write_assertion_to_wal(journal, &beta_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &beta_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Vote consensus test: failed to write Beta assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Vote consensus test: failed to write Beta assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
let mut last_offset = beta_result.end_offset;
|
|
||||||
|
|
||||||
// Wait for assertions to be ingested
|
// Synchronously drain assertions from WAL
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Vote consensus test: assertion ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -135,22 +130,23 @@ pub(crate) async fn run_vote_consensus_test<S: KVStore + 'static>(
|
|||||||
|
|
||||||
// Believer votes for Alpha's assertion (weight 1.0) - this tips the balance
|
// Believer votes for Alpha's assertion (weight 1.0) - this tips the balance
|
||||||
let believer_vote = believer.vote(alpha_hash, 1.0);
|
let believer_vote = believer.vote(alpha_hash, 1.0);
|
||||||
last_offset = match write_vote_to_wal(journal, &believer_vote).await {
|
if let Err(e) = write_vote_to_wal(journal, &believer_vote).await {
|
||||||
Ok(offset) => offset,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::VoteWriteFailure,
|
||||||
tick: 0,
|
message: format!("Vote consensus test: failed to write Believer vote: {}", e),
|
||||||
kind: ErrorKind::VoteWriteFailure,
|
});
|
||||||
message: format!("Vote consensus test: failed to write Believer vote: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.votes_written += 1;
|
result.votes_written += 1;
|
||||||
|
|
||||||
// Wait for votes to be ingested
|
// Synchronously drain votes from WAL
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Vote consensus test: vote ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -236,9 +232,9 @@ pub(crate) async fn run_vote_consensus_test<S: KVStore + 'static>(
|
|||||||
pub(crate) async fn run_troll_resistance_test<S: KVStore + 'static>(
|
pub(crate) async fn run_troll_resistance_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
// Need at least 3 agents: Scientist, Troll, Ally
|
// Need at least 3 agents: Scientist, Troll, Ally
|
||||||
if agents.len() < 3 {
|
if agents.len() < 3 {
|
||||||
@ -276,39 +272,33 @@ pub(crate) async fn run_troll_resistance_test<S: KVStore + 'static>(
|
|||||||
);
|
);
|
||||||
|
|
||||||
// Write both assertions to WAL and track last offset
|
// Write both assertions to WAL and track last offset
|
||||||
let _scientist_result = match write_assertion_to_wal(journal, &scientist_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &scientist_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Troll resistance test: failed to write scientist assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!(
|
return false;
|
||||||
"Troll resistance test: failed to write scientist assertion: {}",
|
}
|
||||||
e
|
|
||||||
),
|
|
||||||
});
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
|
|
||||||
let troll_result = match write_assertion_to_wal(journal, &troll_assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &troll_assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Troll resistance test: failed to write troll assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Troll resistance test: failed to write troll assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
let mut last_offset = troll_result.end_offset;
|
|
||||||
|
|
||||||
// Wait for assertions to be ingested
|
// Synchronously drain assertions from WAL
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Troll resistance test: assertion ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -342,22 +332,23 @@ pub(crate) async fn run_troll_resistance_test<S: KVStore + 'static>(
|
|||||||
|
|
||||||
// Ally votes for scientist's assertion (weight 1.0) - tips balance in scientist's favor
|
// Ally votes for scientist's assertion (weight 1.0) - tips balance in scientist's favor
|
||||||
let ally_vote = ally.vote(scientist_hash, 1.0);
|
let ally_vote = ally.vote(scientist_hash, 1.0);
|
||||||
last_offset = match write_vote_to_wal(journal, &ally_vote).await {
|
if let Err(e) = write_vote_to_wal(journal, &ally_vote).await {
|
||||||
Ok(offset) => offset,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::VoteWriteFailure,
|
||||||
tick: 0,
|
message: format!("Troll resistance test: failed to write ally vote: {}", e),
|
||||||
kind: ErrorKind::VoteWriteFailure,
|
});
|
||||||
message: format!("Troll resistance test: failed to write ally vote: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.votes_written += 1;
|
result.votes_written += 1;
|
||||||
|
|
||||||
// Wait for votes to be ingested
|
// Synchronously drain votes from WAL
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Troll resistance test: vote ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -10,9 +10,7 @@ use tokio::sync::Mutex;
|
|||||||
use tracing::debug;
|
use tracing::debug;
|
||||||
|
|
||||||
use crate::agent::Agent;
|
use crate::agent::Agent;
|
||||||
use crate::helpers::{
|
use crate::helpers::{verify_assertion_text, write_assertion_to_wal};
|
||||||
cursor_key, verify_assertion_text, wait_until_ingested, write_assertion_to_wal,
|
|
||||||
};
|
|
||||||
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
@ -30,9 +28,9 @@ use crate::types::{ErrorKind, SimulationError, SimulationResult};
|
|||||||
pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
|
pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let agent = &agents[0];
|
let agent = &agents[0];
|
||||||
let subject = "MV_Test_Entity";
|
let subject = "MV_Test_Entity";
|
||||||
@ -47,37 +45,23 @@ pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
|
|||||||
Some(3000),
|
Some(3000),
|
||||||
);
|
);
|
||||||
|
|
||||||
// Check cursor state before writing
|
if let Err(e) = write_assertion_to_wal(journal, &assertion).await {
|
||||||
let cursor_before = match store.get(&cursor_key()).await {
|
result.errors.push(SimulationError {
|
||||||
Ok(Some(bytes)) => {
|
tick: 0,
|
||||||
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
kind: ErrorKind::WriteFailure,
|
||||||
u64::from_le_bytes(arr)
|
message: format!("MV integration test: failed to write assertion: {}", e),
|
||||||
} else {
|
});
|
||||||
0
|
return false;
|
||||||
}
|
}
|
||||||
}
|
|
||||||
_ => 0,
|
|
||||||
};
|
|
||||||
debug!(" MV integration test: cursor before write = {}", cursor_before);
|
|
||||||
|
|
||||||
let write_result = match write_assertion_to_wal(journal, &assertion).await {
|
|
||||||
Ok(r) => r,
|
|
||||||
Err(e) => {
|
|
||||||
result.errors.push(SimulationError {
|
|
||||||
tick: 0,
|
|
||||||
kind: ErrorKind::WriteFailure,
|
|
||||||
message: format!("MV integration test: failed to write assertion: {}", e),
|
|
||||||
});
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
let last_offset = write_result.end_offset;
|
|
||||||
debug!(last_offset, cursor_before, "MV integration test: wrote assertion");
|
|
||||||
|
|
||||||
// Wait for ingestion to complete
|
// Synchronously drain WAL entries
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("MV integration test: ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -205,9 +189,9 @@ pub(crate) async fn run_mv_integration_test<S: KVStore + 'static>(
|
|||||||
pub(crate) async fn run_fast_path_test<S: KVStore + 'static>(
|
pub(crate) async fn run_fast_path_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let agent = &agents[0];
|
let agent = &agents[0];
|
||||||
let subject = "FastPath_Entity";
|
let subject = "FastPath_Entity";
|
||||||
@ -222,23 +206,23 @@ pub(crate) async fn run_fast_path_test<S: KVStore + 'static>(
|
|||||||
Some(3100),
|
Some(3100),
|
||||||
);
|
);
|
||||||
|
|
||||||
let write_result = match write_assertion_to_wal(journal, &assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &assertion).await {
|
||||||
Ok(r) => r,
|
result.errors.push(SimulationError {
|
||||||
Err(e) => {
|
tick: 0,
|
||||||
result.errors.push(SimulationError {
|
kind: ErrorKind::WriteFailure,
|
||||||
tick: 0,
|
message: format!("Fast-path test: failed to write assertion: {}", e),
|
||||||
kind: ErrorKind::WriteFailure,
|
});
|
||||||
message: format!("Fast-path test: failed to write assertion: {}", e),
|
return false;
|
||||||
});
|
}
|
||||||
return false;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
result.assertions_written += 1;
|
result.assertions_written += 1;
|
||||||
let last_offset = write_result.end_offset;
|
|
||||||
|
|
||||||
// Wait for ingestion
|
// Synchronously drain WAL entries
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Fast-path test: ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -323,9 +307,9 @@ pub(crate) async fn run_fast_path_test<S: KVStore + 'static>(
|
|||||||
pub(crate) async fn run_mv_freshness_test<S: KVStore + 'static>(
|
pub(crate) async fn run_mv_freshness_test<S: KVStore + 'static>(
|
||||||
journal: &Arc<Mutex<Journal>>,
|
journal: &Arc<Mutex<Journal>>,
|
||||||
store: &Arc<S>,
|
store: &Arc<S>,
|
||||||
|
ingestor: &stemedb_ingest::Ingestor<S>,
|
||||||
agents: &[Agent],
|
agents: &[Agent],
|
||||||
result: &mut SimulationResult,
|
result: &mut SimulationResult,
|
||||||
ingestion_wait_ms: u64,
|
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let agent = &agents[0];
|
let agent = &agents[0];
|
||||||
let subject = "Freshness_Entity";
|
let subject = "Freshness_Entity";
|
||||||
@ -334,7 +318,6 @@ pub(crate) async fn run_mv_freshness_test<S: KVStore + 'static>(
|
|||||||
let base_timestamp = 4000u64;
|
let base_timestamp = 4000u64;
|
||||||
|
|
||||||
// Write 10 assertions with incrementing timestamps
|
// Write 10 assertions with incrementing timestamps
|
||||||
let mut last_offset = 0u64;
|
|
||||||
for i in 0..num_assertions {
|
for i in 0..num_assertions {
|
||||||
let assertion = agent.sign_assertion_with_options(
|
let assertion = agent.sign_assertion_with_options(
|
||||||
subject,
|
subject,
|
||||||
@ -344,25 +327,24 @@ pub(crate) async fn run_mv_freshness_test<S: KVStore + 'static>(
|
|||||||
Some(base_timestamp + i as u64),
|
Some(base_timestamp + i as u64),
|
||||||
);
|
);
|
||||||
|
|
||||||
match write_assertion_to_wal(journal, &assertion).await {
|
if let Err(e) = write_assertion_to_wal(journal, &assertion).await {
|
||||||
Ok(r) => {
|
result.errors.push(SimulationError {
|
||||||
result.assertions_written += 1;
|
tick: 0,
|
||||||
last_offset = r.end_offset;
|
kind: ErrorKind::WriteFailure,
|
||||||
}
|
message: format!("MV freshness test: failed to write assertion {}: {}", i, e),
|
||||||
Err(e) => {
|
});
|
||||||
result.errors.push(SimulationError {
|
return false;
|
||||||
tick: 0,
|
|
||||||
kind: ErrorKind::WriteFailure,
|
|
||||||
message: format!("MV freshness test: failed to write assertion {}: {}", i, e),
|
|
||||||
});
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
result.assertions_written += 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for all assertions to be ingested
|
// Synchronously drain all pending WAL entries
|
||||||
if let Err(e) = wait_until_ingested(&**store, last_offset, ingestion_wait_ms).await {
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
result.errors.push(e);
|
result.errors.push(SimulationError {
|
||||||
|
tick: 0,
|
||||||
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("MV freshness test: ingestion failed: {}", e),
|
||||||
|
});
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -1,21 +1,18 @@
|
|||||||
//! Helper functions for WAL operations and ingestion synchronization.
|
//! Helper functions for WAL operations and assertion verification.
|
||||||
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::{Duration, Instant};
|
|
||||||
use stemedb_core::serde::serialize;
|
use stemedb_core::serde::serialize;
|
||||||
use stemedb_core::types::{Assertion, Hash, Vote};
|
use stemedb_core::types::{Assertion, Hash, Vote};
|
||||||
use stemedb_ingest::{serialize_assertion, serialize_vote};
|
use stemedb_ingest::{serialize_assertion, serialize_vote};
|
||||||
use stemedb_storage::{key_codec, KVStore};
|
|
||||||
use stemedb_wal::Journal;
|
use stemedb_wal::Journal;
|
||||||
use tokio::sync::Mutex;
|
use tokio::sync::Mutex;
|
||||||
use tracing::debug;
|
|
||||||
|
|
||||||
use crate::types::{ErrorKind, SimulationError};
|
use crate::types::{ErrorKind, SimulationError};
|
||||||
|
|
||||||
/// Result from writing to WAL, includes the raw bytes and the journal offset after the write.
|
/// Result from writing to WAL, includes the raw bytes and the journal offset after the write.
|
||||||
pub(crate) struct WalWriteResult {
|
pub(crate) struct WalWriteResult {
|
||||||
pub raw_bytes: Vec<u8>,
|
pub raw_bytes: Vec<u8>,
|
||||||
/// The journal offset AFTER this write (use this as target for wait_until_ingested)
|
/// The journal offset AFTER this write.
|
||||||
pub end_offset: u64,
|
pub end_offset: u64,
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -67,67 +64,6 @@ pub(crate) fn compute_assertion_hash(assertion: &Assertion) -> Hash {
|
|||||||
*blake3::hash(&bytes).as_bytes()
|
*blake3::hash(&bytes).as_bytes()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The cursor key used by the ingestor to track its progress.
|
|
||||||
/// Uses key_codec format: `\x00META:cursor:ingest`
|
|
||||||
pub(crate) fn cursor_key() -> Vec<u8> {
|
|
||||||
key_codec::cursor_key()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Wait until the ingestor cursor reaches or exceeds the target offset.
|
|
||||||
///
|
|
||||||
/// This replaces hardcoded sleep timers with cursor-based polling, making
|
|
||||||
/// tests deterministic rather than timing-dependent.
|
|
||||||
///
|
|
||||||
/// Polls every 10ms and times out after max_wait_ms milliseconds.
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
/// * `store` - The KVStore to read the cursor from
|
|
||||||
/// * `target_offset` - The minimum cursor offset to wait for
|
|
||||||
/// * `max_wait_ms` - Maximum time to wait in milliseconds
|
|
||||||
///
|
|
||||||
/// # Returns
|
|
||||||
/// * `Ok(())` if cursor reached target
|
|
||||||
/// * `Err(SimulationError)` if timeout exceeded
|
|
||||||
pub(crate) async fn wait_until_ingested<S: KVStore>(
|
|
||||||
store: &S,
|
|
||||||
target_offset: u64,
|
|
||||||
max_wait_ms: u64,
|
|
||||||
) -> Result<(), SimulationError> {
|
|
||||||
let start = Instant::now();
|
|
||||||
let timeout = Duration::from_millis(max_wait_ms);
|
|
||||||
let poll_interval = Duration::from_millis(10);
|
|
||||||
|
|
||||||
loop {
|
|
||||||
// Read current cursor position
|
|
||||||
if let Ok(Some(bytes)) = store.get(&cursor_key()).await {
|
|
||||||
if let Ok(arr) = <[u8; 8]>::try_from(bytes.as_slice()) {
|
|
||||||
let cursor = u64::from_le_bytes(arr);
|
|
||||||
// Use > (strictly greater) because journal.append() returns the START offset
|
|
||||||
// of the record. The cursor must move PAST this offset to confirm the record
|
|
||||||
// was fully processed.
|
|
||||||
if cursor > target_offset {
|
|
||||||
debug!(cursor, target_offset, "Ingestion sync: cursor passed target");
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check timeout
|
|
||||||
if start.elapsed() > timeout {
|
|
||||||
return Err(SimulationError {
|
|
||||||
tick: 0,
|
|
||||||
kind: ErrorKind::WriteFailure,
|
|
||||||
message: format!(
|
|
||||||
"Ingestion sync timeout: cursor did not reach {} within {}ms",
|
|
||||||
target_offset, max_wait_ms
|
|
||||||
),
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
tokio::time::sleep(poll_interval).await;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Verify that an assertion matches expected subject, predicate, and text value.
|
/// Verify that an assertion matches expected subject, predicate, and text value.
|
||||||
///
|
///
|
||||||
/// Used by arena3 tests to validate MV winner properties.
|
/// Used by arena3 tests to validate MV winner properties.
|
||||||
|
|||||||
@ -16,9 +16,7 @@ use crate::arenas::{
|
|||||||
run_mv_integration_test, run_recency_lens_test, run_troll_resistance_test,
|
run_mv_integration_test, run_recency_lens_test, run_troll_resistance_test,
|
||||||
run_vote_consensus_test,
|
run_vote_consensus_test,
|
||||||
};
|
};
|
||||||
use crate::helpers::{
|
use crate::helpers::{compute_assertion_hash, write_assertion_to_wal, write_vote_to_wal};
|
||||||
compute_assertion_hash, wait_until_ingested, write_assertion_to_wal, write_vote_to_wal,
|
|
||||||
};
|
|
||||||
use crate::strategy::{self, AgentAction, AgentStrategy, StrategyMetrics, WorldState};
|
use crate::strategy::{self, AgentAction, AgentStrategy, StrategyMetrics, WorldState};
|
||||||
use crate::types::{
|
use crate::types::{
|
||||||
ErrorKind, SimulationConfig, SimulationError, SimulationResult, SimulationSetupError,
|
ErrorKind, SimulationConfig, SimulationError, SimulationResult, SimulationSetupError,
|
||||||
@ -68,12 +66,11 @@ pub async fn run_simulation(
|
|||||||
debug!(" WAL initialized at {:?}", temp_wal_dir.path());
|
debug!(" WAL initialized at {:?}", temp_wal_dir.path());
|
||||||
debug!(" KV Store initialized at {:?}", temp_db_dir.path());
|
debug!(" KV Store initialized at {:?}", temp_db_dir.path());
|
||||||
|
|
||||||
// 2. Start Ingestor
|
// 2. Create Ingestor (no background task - we drain synchronously via process_pending)
|
||||||
let mut ingestor = Ingestor::new(journal.clone(), store.clone())
|
let ingestor = Ingestor::new(journal.clone(), store.clone())
|
||||||
.await
|
.await
|
||||||
.map_err(|e| SimulationSetupError::IngestorCreate(e.to_string()))?;
|
.map_err(|e| SimulationSetupError::IngestorCreate(e.to_string()))?;
|
||||||
ingestor.start();
|
debug!(" Ingestor created (synchronous drain mode).");
|
||||||
debug!(" Ingestor started (background worker).");
|
|
||||||
|
|
||||||
// 3. Setup Agents with strategies
|
// 3. Setup Agents with strategies
|
||||||
let mut agents: Vec<Agent> = Vec::with_capacity(agent_count);
|
let mut agents: Vec<Agent> = Vec::with_capacity(agent_count);
|
||||||
@ -230,12 +227,14 @@ pub async fn run_simulation(
|
|||||||
|
|
||||||
info!(" {} assertions written to WAL.", result.assertions_written);
|
info!(" {} assertions written to WAL.", result.assertions_written);
|
||||||
|
|
||||||
// 6. Wait for Ingestion (cursor-based sync)
|
// 6. Synchronously drain all pending WAL entries (deterministic, no background scheduling)
|
||||||
info!("⏳ Waiting for ingestion to reach offset {}...", last_journal_offset);
|
info!("⚙️ Processing {} WAL bytes synchronously...", last_journal_offset);
|
||||||
if let Err(e) =
|
if let Err(e) = ingestor.process_pending().await {
|
||||||
wait_until_ingested(&*store, last_journal_offset, config.ingestion_wait_ms).await
|
result.errors.push(SimulationError {
|
||||||
{
|
tick: 0,
|
||||||
result.errors.push(e);
|
kind: ErrorKind::WriteFailure,
|
||||||
|
message: format!("Main ingestion failed: {}", e),
|
||||||
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
@ -339,15 +338,14 @@ pub async fn run_simulation(
|
|||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("🔬 Arena 1.2: Testing Recency Lens...");
|
info!("🔬 Arena 1.2: Testing Recency Lens...");
|
||||||
result.recency_test_passed =
|
result.recency_test_passed =
|
||||||
run_recency_lens_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms)
|
run_recency_lens_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
.await;
|
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// 9. Arena 1.3: Lifecycle Filtering Test
|
// 9. Arena 1.3: Lifecycle Filtering Test
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("🔬 Arena 1.3: Testing Lifecycle Filtering...");
|
info!("🔬 Arena 1.3: Testing Lifecycle Filtering...");
|
||||||
result.lifecycle_test_passed =
|
result.lifecycle_test_passed =
|
||||||
run_lifecycle_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms).await;
|
run_lifecycle_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// 10. Arena 1.4: Query Audit Verification
|
// 10. Arena 1.4: Query Audit Verification
|
||||||
@ -366,16 +364,14 @@ pub async fn run_simulation(
|
|||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("🗳️ Arena 2.2: Testing Vote-Aware Consensus...");
|
info!("🗳️ Arena 2.2: Testing Vote-Aware Consensus...");
|
||||||
result.vote_consensus_test_passed =
|
result.vote_consensus_test_passed =
|
||||||
run_vote_consensus_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms)
|
run_vote_consensus_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
.await;
|
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// 12. Arena 2.3: Troll Vote Resistance
|
// 12. Arena 2.3: Troll Vote Resistance
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("🗳️ Arena 2.3: Testing Troll Vote Resistance...");
|
info!("🗳️ Arena 2.3: Testing Troll Vote Resistance...");
|
||||||
result.troll_resistance_test_passed =
|
result.troll_resistance_test_passed =
|
||||||
run_troll_resistance_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms)
|
run_troll_resistance_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
.await;
|
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// ARENA 3: Materialized Views
|
// ARENA 3: Materialized Views
|
||||||
@ -388,23 +384,21 @@ pub async fn run_simulation(
|
|||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("✨ Arena 3.1: Testing MV Integration...");
|
info!("✨ Arena 3.1: Testing MV Integration...");
|
||||||
result.mv_integration_test_passed =
|
result.mv_integration_test_passed =
|
||||||
run_mv_integration_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms)
|
run_mv_integration_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
.await;
|
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// 14. Arena 3.2: Fast-Path Verification
|
// 14. Arena 3.2: Fast-Path Verification
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("✨ Arena 3.2: Testing Fast-Path Verification...");
|
info!("✨ Arena 3.2: Testing Fast-Path Verification...");
|
||||||
result.fast_path_test_passed =
|
result.fast_path_test_passed =
|
||||||
run_fast_path_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms).await;
|
run_fast_path_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// 15. Arena 3.3: MV Freshness Under Load
|
// 15. Arena 3.3: MV Freshness Under Load
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
info!("✨ Arena 3.3: Testing MV Freshness Under Load...");
|
info!("✨ Arena 3.3: Testing MV Freshness Under Load...");
|
||||||
result.mv_freshness_test_passed =
|
result.mv_freshness_test_passed =
|
||||||
run_mv_freshness_test(&journal, &store, &agents, &mut result, config.ingestion_wait_ms)
|
run_mv_freshness_test(&journal, &store, &ingestor, &agents, &mut result).await;
|
||||||
.await;
|
|
||||||
|
|
||||||
// ========================================================================
|
// ========================================================================
|
||||||
// ARENA 4: Agent Persona Verification
|
// ARENA 4: Agent Persona Verification
|
||||||
@ -469,13 +463,9 @@ pub async fn run_simulation(
|
|||||||
result.strategy_metrics =
|
result.strategy_metrics =
|
||||||
strategy_map.into_iter().map(|(name, metrics)| (name.to_string(), metrics)).collect();
|
strategy_map.into_iter().map(|(name, metrics)| (name.to_string(), metrics)).collect();
|
||||||
|
|
||||||
// 16. Shut down the ingestor gracefully
|
// 16. Drop the ingestor (no background task running - we used synchronous process_pending).
|
||||||
//
|
// The TempDir cleanup happens after this point.
|
||||||
// This is critical: we must stop the background ingestion task BEFORE
|
drop(ingestor);
|
||||||
// the TempDir is dropped, otherwise the task will try to read from
|
|
||||||
// deleted WAL files.
|
|
||||||
info!("Shutting down ingestor...");
|
|
||||||
ingestor.shutdown(std::time::Duration::from_secs(2)).await;
|
|
||||||
|
|
||||||
// 17. Log summary
|
// 17. Log summary
|
||||||
if result.is_success() {
|
if result.is_success() {
|
||||||
|
|||||||
@ -43,7 +43,7 @@ async fn smoke_high_volume_simulation() {
|
|||||||
AgentSpec { count: 3, strategy: StrategyType::Believer },
|
AgentSpec { count: 3, strategy: StrategyType::Believer },
|
||||||
],
|
],
|
||||||
tick_count: 50,
|
tick_count: 50,
|
||||||
ingestion_wait_ms: 1000, // More time for larger workload
|
ingestion_wait_ms: 3000, // More time for larger workload
|
||||||
};
|
};
|
||||||
|
|
||||||
let result = run_simulation(config).await.expect("Simulation setup should not fail");
|
let result = run_simulation(config).await.expect("Simulation setup should not fail");
|
||||||
|
|||||||
@ -212,6 +212,9 @@ pub mod visual_index;
|
|||||||
/// High-velocity vote storage (The Ballot Box).
|
/// High-velocity vote storage (The Ballot Box).
|
||||||
pub mod vote_store;
|
pub mod vote_store;
|
||||||
|
|
||||||
|
/// MemTable for read-your-writes consistency.
|
||||||
|
pub mod memtable;
|
||||||
|
|
||||||
pub use admission_store::{
|
pub use admission_store::{
|
||||||
AdmissionCheck, AdmissionStatus, AdmissionStatusResult, AdmissionStore, GenericAdmissionStore,
|
AdmissionCheck, AdmissionStatus, AdmissionStatusResult, AdmissionStore, GenericAdmissionStore,
|
||||||
};
|
};
|
||||||
@ -255,6 +258,9 @@ pub use visual_index::{
|
|||||||
};
|
};
|
||||||
pub use vote_store::{GenericVoteStore, VoteStore};
|
pub use vote_store::{GenericVoteStore, VoteStore};
|
||||||
|
|
||||||
|
// MemTable exports
|
||||||
|
pub use memtable::{MemTable, MemTableEntry};
|
||||||
|
|
||||||
// Pattern aggregate store exports (Community Corpus)
|
// Pattern aggregate store exports (Community Corpus)
|
||||||
pub use pattern_aggregate_store::{
|
pub use pattern_aggregate_store::{
|
||||||
GenericPatternAggregateStore, PatternAggregate, PatternAggregateStore,
|
GenericPatternAggregateStore, PatternAggregate, PatternAggregateStore,
|
||||||
|
|||||||
27
crates/stemedb-storage/src/memtable/entry.rs
Normal file
27
crates/stemedb-storage/src/memtable/entry.rs
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
//! MemTable entry type.
|
||||||
|
|
||||||
|
use stemedb_core::types::Assertion;
|
||||||
|
|
||||||
|
/// An entry in the MemTable representing an assertion waiting for KVStore indexing.
|
||||||
|
///
|
||||||
|
/// Entries are inserted after WAL commit and evicted once the IngestWorker
|
||||||
|
/// has processed them and updated the KVStore indexes.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct MemTableEntry {
|
||||||
|
/// The assertion data.
|
||||||
|
pub assertion: Assertion,
|
||||||
|
/// The BLAKE3 hash of the serialized assertion (content-addressed ID).
|
||||||
|
pub hash: [u8; 32],
|
||||||
|
/// The WAL offset where this assertion was written.
|
||||||
|
/// Used for eviction: entries with wal_offset <= indexed_offset can be evicted.
|
||||||
|
pub wal_offset: u64,
|
||||||
|
/// When this entry was inserted (for time-based safety eviction).
|
||||||
|
pub inserted_at: std::time::Instant,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MemTableEntry {
|
||||||
|
/// Create a new MemTableEntry.
|
||||||
|
pub fn new(assertion: Assertion, hash: [u8; 32], wal_offset: u64) -> Self {
|
||||||
|
Self { assertion, hash, wal_offset, inserted_at: std::time::Instant::now() }
|
||||||
|
}
|
||||||
|
}
|
||||||
43
crates/stemedb-storage/src/memtable/mod.rs
Normal file
43
crates/stemedb-storage/src/memtable/mod.rs
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
//! MemTable for read-your-writes consistency.
|
||||||
|
//!
|
||||||
|
//! The MemTable sits between the WAL commit and KVStore indexing,
|
||||||
|
//! providing immediate visibility of assertions. This ensures that
|
||||||
|
//! after `POST /assert` returns 201, an immediate `GET /query` will
|
||||||
|
//! return the assertion without waiting for background indexing.
|
||||||
|
//!
|
||||||
|
//! # Architecture
|
||||||
|
//!
|
||||||
|
//! ```text
|
||||||
|
//! Write: POST /assert → WAL (fsync) → MemTable → return 201
|
||||||
|
//! ↓
|
||||||
|
//! Query: GET /query → MemTable ∪ KVStore → Lens → response
|
||||||
|
//!
|
||||||
|
//! Background: IngestWorker → WAL → KVStore → evict from MemTable
|
||||||
|
//! ```
|
||||||
|
//!
|
||||||
|
//! # Usage
|
||||||
|
//!
|
||||||
|
//! ```ignore
|
||||||
|
//! use stemedb_storage::memtable::{MemTable, MemTableEntry};
|
||||||
|
//!
|
||||||
|
//! let memtable = MemTable::new(10_000);
|
||||||
|
//!
|
||||||
|
//! // After WAL commit, insert into MemTable
|
||||||
|
//! let entry = MemTableEntry::new(assertion, hash, wal_offset);
|
||||||
|
//! memtable.insert(entry);
|
||||||
|
//!
|
||||||
|
//! // Query merges MemTable with KVStore
|
||||||
|
//! let assertions = memtable.get_by_subject("Tesla");
|
||||||
|
//!
|
||||||
|
//! // After IngestWorker indexes, evict from MemTable
|
||||||
|
//! memtable.advance_indexed_offset(new_offset);
|
||||||
|
//! ```
|
||||||
|
|
||||||
|
mod entry;
|
||||||
|
mod table;
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests;
|
||||||
|
|
||||||
|
pub use entry::MemTableEntry;
|
||||||
|
pub use table::MemTable;
|
||||||
262
crates/stemedb-storage/src/memtable/table.rs
Normal file
262
crates/stemedb-storage/src/memtable/table.rs
Normal file
@ -0,0 +1,262 @@
|
|||||||
|
//! MemTable implementation for read-your-writes consistency.
|
||||||
|
//!
|
||||||
|
//! The MemTable provides immediate visibility of assertions after WAL commit,
|
||||||
|
//! before the IngestWorker has processed them into KVStore indexes.
|
||||||
|
//!
|
||||||
|
//! # Design
|
||||||
|
//!
|
||||||
|
//! - Thread-safe via DashMap for concurrent reads/writes
|
||||||
|
//! - Three indexes for efficient lookup:
|
||||||
|
//! - by_hash: O(1) hash lookup
|
||||||
|
//! - by_subject: subject → list of hashes
|
||||||
|
//! - by_subject_predicate: (subject, predicate) → list of hashes
|
||||||
|
//! - Eviction based on WAL offset watermark
|
||||||
|
//! - Safety valve: age-based eviction for stale entries
|
||||||
|
|
||||||
|
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
|
||||||
|
use std::time::{Duration, Instant};
|
||||||
|
|
||||||
|
use dashmap::DashMap;
|
||||||
|
use stemedb_core::types::Assertion;
|
||||||
|
use tracing::{debug, warn};
|
||||||
|
|
||||||
|
use super::MemTableEntry;
|
||||||
|
|
||||||
|
/// In-memory buffer for assertions pending KVStore indexing.
|
||||||
|
///
|
||||||
|
/// Provides read-your-writes consistency: queries merge MemTable with KVStore
|
||||||
|
/// to ensure recently written assertions are immediately visible.
|
||||||
|
pub struct MemTable {
|
||||||
|
/// Primary index: hash → entry
|
||||||
|
by_hash: DashMap<[u8; 32], MemTableEntry>,
|
||||||
|
|
||||||
|
/// Subject index: subject → list of hashes
|
||||||
|
by_subject: DashMap<String, Vec<[u8; 32]>>,
|
||||||
|
|
||||||
|
/// Compound index: (subject, predicate) → list of hashes
|
||||||
|
by_subject_predicate: DashMap<(String, String), Vec<[u8; 32]>>,
|
||||||
|
|
||||||
|
/// WAL offset up to which assertions have been indexed in KVStore.
|
||||||
|
/// Entries with wal_offset <= indexed_offset are safe to evict.
|
||||||
|
indexed_offset: AtomicU64,
|
||||||
|
|
||||||
|
/// Maximum entries before triggering aggressive eviction (soft limit).
|
||||||
|
max_entries: usize,
|
||||||
|
|
||||||
|
/// Eviction age threshold for safety valve (default: 30 seconds).
|
||||||
|
max_age: Duration,
|
||||||
|
|
||||||
|
/// Count of entries (for metrics without locking).
|
||||||
|
entry_count: AtomicUsize,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MemTable {
|
||||||
|
/// Create a new MemTable with the specified capacity limit.
|
||||||
|
pub fn new(max_entries: usize) -> Self {
|
||||||
|
Self {
|
||||||
|
by_hash: DashMap::new(),
|
||||||
|
by_subject: DashMap::new(),
|
||||||
|
by_subject_predicate: DashMap::new(),
|
||||||
|
indexed_offset: AtomicU64::new(0),
|
||||||
|
max_entries,
|
||||||
|
max_age: Duration::from_secs(30),
|
||||||
|
entry_count: AtomicUsize::new(0),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a MemTable with custom max age for testing.
|
||||||
|
#[cfg(test)]
|
||||||
|
pub fn with_max_age(max_entries: usize, max_age: Duration) -> Self {
|
||||||
|
Self {
|
||||||
|
by_hash: DashMap::new(),
|
||||||
|
by_subject: DashMap::new(),
|
||||||
|
by_subject_predicate: DashMap::new(),
|
||||||
|
indexed_offset: AtomicU64::new(0),
|
||||||
|
max_entries,
|
||||||
|
max_age,
|
||||||
|
entry_count: AtomicUsize::new(0),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Insert an entry into the MemTable.
|
||||||
|
///
|
||||||
|
/// Updates all indexes atomically. If the hash already exists, the entry
|
||||||
|
/// is replaced (idempotent on retry).
|
||||||
|
pub fn insert(&self, entry: MemTableEntry) {
|
||||||
|
let hash = entry.hash;
|
||||||
|
let subject = entry.assertion.subject.clone();
|
||||||
|
let predicate = entry.assertion.predicate.clone();
|
||||||
|
|
||||||
|
// Insert into primary index
|
||||||
|
let was_new = self.by_hash.insert(hash, entry).is_none();
|
||||||
|
|
||||||
|
if was_new {
|
||||||
|
// Update subject index
|
||||||
|
self.by_subject.entry(subject.clone()).or_default().push(hash);
|
||||||
|
|
||||||
|
// Update compound index
|
||||||
|
self.by_subject_predicate.entry((subject, predicate)).or_default().push(hash);
|
||||||
|
|
||||||
|
self.entry_count.fetch_add(1, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if we need to evict
|
||||||
|
if self.len() > self.max_entries {
|
||||||
|
self.evict_stale_entries();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get an entry by its hash.
|
||||||
|
pub fn get_by_hash(&self, hash: &[u8; 32]) -> Option<Assertion> {
|
||||||
|
self.by_hash.get(hash).map(|entry| entry.assertion.clone())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get all assertions for a subject.
|
||||||
|
pub fn get_by_subject(&self, subject: &str) -> Vec<Assertion> {
|
||||||
|
let hashes = match self.by_subject.get(subject) {
|
||||||
|
Some(ref_multi) => ref_multi.clone(),
|
||||||
|
None => return Vec::new(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut results = Vec::with_capacity(hashes.len());
|
||||||
|
for hash in hashes {
|
||||||
|
if let Some(entry) = self.by_hash.get(&hash) {
|
||||||
|
results.push(entry.assertion.clone());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
results
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get all assertions for a subject and predicate.
|
||||||
|
pub fn get_by_subject_predicate(&self, subject: &str, predicate: &str) -> Vec<Assertion> {
|
||||||
|
let key = (subject.to_string(), predicate.to_string());
|
||||||
|
let hashes = match self.by_subject_predicate.get(&key) {
|
||||||
|
Some(ref_multi) => ref_multi.clone(),
|
||||||
|
None => return Vec::new(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut results = Vec::with_capacity(hashes.len());
|
||||||
|
for hash in hashes {
|
||||||
|
if let Some(entry) = self.by_hash.get(&hash) {
|
||||||
|
results.push(entry.assertion.clone());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
results
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Advance the indexed offset watermark.
|
||||||
|
///
|
||||||
|
/// Called by IngestWorker after processing records. Entries with
|
||||||
|
/// wal_offset <= this value are now in KVStore and can be evicted.
|
||||||
|
pub fn advance_indexed_offset(&self, offset: u64) {
|
||||||
|
self.indexed_offset.fetch_max(offset, Ordering::Release);
|
||||||
|
debug!(offset, "Advanced indexed offset");
|
||||||
|
|
||||||
|
// Trigger eviction after advancing
|
||||||
|
self.evict_indexed_entries();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get current indexed offset.
|
||||||
|
pub fn indexed_offset(&self) -> u64 {
|
||||||
|
self.indexed_offset.load(Ordering::Acquire)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Evict entries that have been indexed in KVStore.
|
||||||
|
///
|
||||||
|
/// Entries with wal_offset <= indexed_offset are safe to remove because
|
||||||
|
/// queries will find them via KVStore indexes.
|
||||||
|
pub fn evict_indexed_entries(&self) {
|
||||||
|
let indexed_up_to = self.indexed_offset.load(Ordering::Acquire);
|
||||||
|
if indexed_up_to == 0 {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Collect hashes to evict
|
||||||
|
let to_evict: Vec<[u8; 32]> = self
|
||||||
|
.by_hash
|
||||||
|
.iter()
|
||||||
|
.filter(|entry| entry.wal_offset <= indexed_up_to)
|
||||||
|
.map(|entry| entry.hash)
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
if !to_evict.is_empty() {
|
||||||
|
debug!(count = to_evict.len(), indexed_up_to, "Evicting indexed entries");
|
||||||
|
}
|
||||||
|
|
||||||
|
for hash in to_evict {
|
||||||
|
self.remove_by_hash(&hash);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Evict entries older than max_age (safety valve).
|
||||||
|
///
|
||||||
|
/// This prevents unbounded memory growth if IngestWorker is slow or stuck.
|
||||||
|
fn evict_stale_entries(&self) {
|
||||||
|
let threshold = Instant::now() - self.max_age;
|
||||||
|
|
||||||
|
let to_evict: Vec<[u8; 32]> = self
|
||||||
|
.by_hash
|
||||||
|
.iter()
|
||||||
|
.filter(|entry| entry.inserted_at < threshold)
|
||||||
|
.map(|entry| entry.hash)
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
if !to_evict.is_empty() {
|
||||||
|
warn!(
|
||||||
|
count = to_evict.len(),
|
||||||
|
max_age_secs = self.max_age.as_secs(),
|
||||||
|
"Safety evicting stale entries"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
for hash in to_evict {
|
||||||
|
self.remove_by_hash(&hash);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Remove an entry by hash, updating all indexes.
|
||||||
|
fn remove_by_hash(&self, hash: &[u8; 32]) {
|
||||||
|
if let Some((_, entry)) = self.by_hash.remove(hash) {
|
||||||
|
let subject = &entry.assertion.subject;
|
||||||
|
let predicate = &entry.assertion.predicate;
|
||||||
|
|
||||||
|
// Update subject index
|
||||||
|
if let Some(mut hashes) = self.by_subject.get_mut(subject) {
|
||||||
|
hashes.retain(|h| h != hash);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update compound index
|
||||||
|
let key = (subject.clone(), predicate.clone());
|
||||||
|
if let Some(mut hashes) = self.by_subject_predicate.get_mut(&key) {
|
||||||
|
hashes.retain(|h| h != hash);
|
||||||
|
}
|
||||||
|
|
||||||
|
self.entry_count.fetch_sub(1, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get current entry count.
|
||||||
|
pub fn len(&self) -> usize {
|
||||||
|
self.entry_count.load(Ordering::Relaxed)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if empty.
|
||||||
|
pub fn is_empty(&self) -> bool {
|
||||||
|
self.len() == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear all entries (for testing).
|
||||||
|
#[cfg(test)]
|
||||||
|
pub fn clear(&self) {
|
||||||
|
self.by_hash.clear();
|
||||||
|
self.by_subject.clear();
|
||||||
|
self.by_subject_predicate.clear();
|
||||||
|
self.entry_count.store(0, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for MemTable {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self::new(10_000)
|
||||||
|
}
|
||||||
|
}
|
||||||
248
crates/stemedb-storage/src/memtable/tests.rs
Normal file
248
crates/stemedb-storage/src/memtable/tests.rs
Normal file
@ -0,0 +1,248 @@
|
|||||||
|
//! Unit tests for MemTable.
|
||||||
|
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use stemedb_core::types::{Assertion, HlcTimestamp, LifecycleStage, ObjectValue, SourceClass};
|
||||||
|
|
||||||
|
use super::{MemTable, MemTableEntry};
|
||||||
|
|
||||||
|
fn make_test_assertion(subject: &str, predicate: &str) -> Assertion {
|
||||||
|
Assertion {
|
||||||
|
subject: subject.to_string(),
|
||||||
|
predicate: predicate.to_string(),
|
||||||
|
object: ObjectValue::Text("test".to_string()),
|
||||||
|
parent_hash: None,
|
||||||
|
source_hash: [0u8; 32],
|
||||||
|
source_class: SourceClass::Expert,
|
||||||
|
visual_hash: None,
|
||||||
|
epoch: None,
|
||||||
|
source_metadata: None,
|
||||||
|
narrative: None,
|
||||||
|
lifecycle: LifecycleStage::Proposed,
|
||||||
|
signatures: vec![],
|
||||||
|
confidence: 0.9,
|
||||||
|
timestamp: 1234567890,
|
||||||
|
hlc_timestamp: HlcTimestamp::default(),
|
||||||
|
vector: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_insert_and_get_by_hash() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
let assertion = make_test_assertion("Tesla", "revenue");
|
||||||
|
let hash = [1u8; 32];
|
||||||
|
let entry = MemTableEntry::new(assertion.clone(), hash, 100);
|
||||||
|
|
||||||
|
memtable.insert(entry);
|
||||||
|
|
||||||
|
let result = memtable.get_by_hash(&hash);
|
||||||
|
assert!(result.is_some());
|
||||||
|
assert_eq!(result.as_ref().map(|a| &a.subject), Some(&"Tesla".to_string()));
|
||||||
|
assert_eq!(memtable.len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_insert_and_get_by_subject() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Tesla", "profit");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
|
||||||
|
let results = memtable.get_by_subject("Tesla");
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
|
||||||
|
let predicates: Vec<_> = results.iter().map(|a| &a.predicate).collect();
|
||||||
|
assert!(predicates.contains(&&"revenue".to_string()));
|
||||||
|
assert!(predicates.contains(&&"profit".to_string()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_insert_and_get_by_subject_predicate() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion3 = make_test_assertion("Tesla", "profit");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion3, [3u8; 32], 300));
|
||||||
|
|
||||||
|
let results = memtable.get_by_subject_predicate("Tesla", "revenue");
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
|
||||||
|
let results = memtable.get_by_subject_predicate("Tesla", "profit");
|
||||||
|
assert_eq!(results.len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_eviction_by_offset() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Apple", "revenue");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
|
||||||
|
assert_eq!(memtable.len(), 2);
|
||||||
|
|
||||||
|
// Advance indexed offset to 150 - should evict first entry
|
||||||
|
memtable.advance_indexed_offset(150);
|
||||||
|
|
||||||
|
assert_eq!(memtable.len(), 1);
|
||||||
|
assert!(memtable.get_by_hash(&[1u8; 32]).is_none());
|
||||||
|
assert!(memtable.get_by_hash(&[2u8; 32]).is_some());
|
||||||
|
|
||||||
|
// Advance to 250 - should evict second entry
|
||||||
|
memtable.advance_indexed_offset(250);
|
||||||
|
|
||||||
|
assert_eq!(memtable.len(), 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_eviction_updates_subject_index() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Tesla", "profit");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
|
||||||
|
assert_eq!(memtable.get_by_subject("Tesla").len(), 2);
|
||||||
|
|
||||||
|
// Evict first entry
|
||||||
|
memtable.advance_indexed_offset(150);
|
||||||
|
|
||||||
|
let results = memtable.get_by_subject("Tesla");
|
||||||
|
assert_eq!(results.len(), 1);
|
||||||
|
assert_eq!(results[0].predicate, "profit");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_idempotent_insert() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion = make_test_assertion("Tesla", "revenue");
|
||||||
|
let hash = [1u8; 32];
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion.clone(), hash, 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion.clone(), hash, 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion, hash, 100));
|
||||||
|
|
||||||
|
// Should only have 1 entry
|
||||||
|
assert_eq!(memtable.len(), 1);
|
||||||
|
assert_eq!(memtable.get_by_subject("Tesla").len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_empty_lookups() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
assert!(memtable.get_by_hash(&[0u8; 32]).is_none());
|
||||||
|
assert!(memtable.get_by_subject("Nonexistent").is_empty());
|
||||||
|
assert!(memtable.get_by_subject_predicate("Nonexistent", "pred").is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_multiple_subjects_isolated() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Apple", "revenue");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
|
||||||
|
assert_eq!(memtable.get_by_subject("Tesla").len(), 1);
|
||||||
|
assert_eq!(memtable.get_by_subject("Apple").len(), 1);
|
||||||
|
assert_eq!(memtable.get_by_subject("Tesla")[0].subject, "Tesla");
|
||||||
|
assert_eq!(memtable.get_by_subject("Apple")[0].subject, "Apple");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_concurrent_insert_and_read() {
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
let memtable = Arc::new(MemTable::new(10_000));
|
||||||
|
|
||||||
|
// Spawn writers
|
||||||
|
let mut handles = Vec::new();
|
||||||
|
for i in 0..10 {
|
||||||
|
let mt = Arc::clone(&memtable);
|
||||||
|
handles.push(tokio::spawn(async move {
|
||||||
|
for j in 0..100 {
|
||||||
|
let subject = format!("Subject_{}", i);
|
||||||
|
let assertion = make_test_assertion(&subject, "predicate");
|
||||||
|
let mut hash = [0u8; 32];
|
||||||
|
hash[0] = i as u8;
|
||||||
|
hash[1] = j as u8;
|
||||||
|
mt.insert(MemTableEntry::new(assertion, hash, (i * 100 + j) as u64));
|
||||||
|
}
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Spawn readers concurrently
|
||||||
|
for i in 0..10 {
|
||||||
|
let mt = Arc::clone(&memtable);
|
||||||
|
handles.push(tokio::spawn(async move {
|
||||||
|
for _ in 0..50 {
|
||||||
|
let _ = mt.get_by_subject(&format!("Subject_{}", i));
|
||||||
|
tokio::task::yield_now().await;
|
||||||
|
}
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
for handle in handles {
|
||||||
|
handle.await.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should have 1000 entries total (10 writers * 100 entries)
|
||||||
|
assert_eq!(memtable.len(), 1000);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_indexed_offset_tracking() {
|
||||||
|
let memtable = MemTable::new(100);
|
||||||
|
|
||||||
|
assert_eq!(memtable.indexed_offset(), 0);
|
||||||
|
|
||||||
|
memtable.advance_indexed_offset(100);
|
||||||
|
assert_eq!(memtable.indexed_offset(), 100);
|
||||||
|
|
||||||
|
memtable.advance_indexed_offset(50); // Should not go backwards
|
||||||
|
assert_eq!(memtable.indexed_offset(), 100);
|
||||||
|
|
||||||
|
memtable.advance_indexed_offset(200);
|
||||||
|
assert_eq!(memtable.indexed_offset(), 200);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_stale_eviction_with_short_max_age() {
|
||||||
|
// Use a very short max age for testing
|
||||||
|
let memtable = MemTable::with_max_age(2, Duration::from_millis(10));
|
||||||
|
|
||||||
|
let assertion1 = make_test_assertion("Tesla", "revenue");
|
||||||
|
let assertion2 = make_test_assertion("Apple", "revenue");
|
||||||
|
|
||||||
|
memtable.insert(MemTableEntry::new(assertion1, [1u8; 32], 100));
|
||||||
|
|
||||||
|
// Wait for the entry to become stale
|
||||||
|
std::thread::sleep(Duration::from_millis(20));
|
||||||
|
|
||||||
|
// Insert another entry to trigger eviction
|
||||||
|
memtable.insert(MemTableEntry::new(assertion2, [2u8; 32], 200));
|
||||||
|
|
||||||
|
// The first entry should have been evicted due to age, but the second is new
|
||||||
|
// After the third insert (triggering eviction due to max_entries=2), only the newest remains
|
||||||
|
let assertion3 = make_test_assertion("Google", "revenue");
|
||||||
|
memtable.insert(MemTableEntry::new(assertion3, [3u8; 32], 300));
|
||||||
|
|
||||||
|
// First entry should be gone (stale)
|
||||||
|
assert!(memtable.get_by_hash(&[1u8; 32]).is_none());
|
||||||
|
}
|
||||||
116
future-vision.md
Normal file
116
future-vision.md
Normal file
@ -0,0 +1,116 @@
|
|||||||
|
# Vision: Epistemic Logits (The Neuro-Symbolic Cortex)
|
||||||
|
|
||||||
|
> **Status:** Vision / L9 Roadmap
|
||||||
|
> **Target:** Solves "Intrinsic Hallucination"
|
||||||
|
> **Core Concept:** StemeDB is no longer just a database we query; it is a constraint layer applied to the model's probability distribution during inference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The Problem: The "RAG Ceiling"
|
||||||
|
|
||||||
|
Current architectures (including our own Aphoria/ADK stack) rely on **Retrieval Augmented Generation (RAG)**. This is a "Glass Box" system, but it is composed of two disconnected brains:
|
||||||
|
|
||||||
|
1. **The Retriever (StemeDB):** Knows what is true, what is conflicted, and who said what.
|
||||||
|
2. **The Generator (LLM):** Knows how to predict the next token based on statistical patterns.
|
||||||
|
|
||||||
|
In our current architecture, we paste the Truth (1) into the Context Window of the Generator (2) and *hope* the Generator attends to it.
|
||||||
|
|
||||||
|
**The Failure Mode:** The Generator can still ignore the context. It can hallucinate. It can state a high-conflict fact with absolute certainty ("X is true") instead of qualified uncertainty ("Some sources claim X").
|
||||||
|
|
||||||
|
**We cannot fix this by prompting. We must fix it by math.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. The Solution: Epistemic Logits
|
||||||
|
|
||||||
|
**Epistemic Logits** is a decoding strategy that modifies the probability distribution of the LLM's output layer in real-time, based on the `ConflictScore` and `TrustRank` of the concepts being generated.
|
||||||
|
|
||||||
|
We move StemeDB from the **Input Layer** (Prompt) to the **Activation Layer** (Logits).
|
||||||
|
|
||||||
|
### The Core Equation
|
||||||
|
|
||||||
|
$$ P_{final}(token) = P_{model}(token) \times E(Subject, Predicate) $$
|
||||||
|
|
||||||
|
Where $E$ is the **Epistemic Function**:
|
||||||
|
* If `ConflictScore > 0.8` (High Disagreement) AND `Token` implies certainty ("is", "proven", "fact"), then $E \to 0$ (Penalty).
|
||||||
|
* If `ConflictScore > 0.8` AND `Token` implies uncertainty ("reported", "alleged", "contested"), then $E \to 1$ (Boost).
|
||||||
|
* If `SourceTier` is Low (Anecdotal) AND `Time` is old (Decayed), then $E \to 0$.
|
||||||
|
|
||||||
|
**Result:** The model *physically cannot* state a contested claim as a fact. It effectively has a "physics engine" for Truth.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Architecture: The Neuro-Symbolic Stack
|
||||||
|
|
||||||
|
```ascii
|
||||||
|
[ User Query ]
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
[ 1. Semantic Router ] ───► [ StemeDB (The Graph) ]
|
||||||
|
│ │
|
||||||
|
│ (Context) │ (Constraints & Scores)
|
||||||
|
▼ ▼
|
||||||
|
[ 2. LLM Core ] [ 3. Epistemic Decoder ]
|
||||||
|
(Transformer) (Logit Processor)
|
||||||
|
│ │
|
||||||
|
└──► [ Raw Logits ] ──────►│
|
||||||
|
│ ◄── "Don't say 'proven' if Conflict > 0.5"
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
[ Final Token ]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Component 1: The Lookahead Mapper
|
||||||
|
To constrain logits, we must know what the model is *about* to say. We implement a lightweight "Concept Probe" (a small BERT model or sparse autoencoder) that runs parallel to the main LLM.
|
||||||
|
* **Input:** Current generation stream.
|
||||||
|
* **Output:** The `StemeDB::SubjectID` the stream is discussing.
|
||||||
|
|
||||||
|
### Component 2: The Constraint Projector
|
||||||
|
Once the Subject is identified, StemeDB projects the **Epistemic State** of that subject into a set of forbidden/boosted tokens.
|
||||||
|
* *State:* `Semaglutide::has_side_effect` -> Conflict: High.
|
||||||
|
* *Constraint:* Ban absolute assertions. Boost attribution markers ("According to FDA...", "Patients report...").
|
||||||
|
|
||||||
|
### Component 3: The Reward Loop (RLHF on Reality)
|
||||||
|
We use the `VoteStore` not just for consensus, but to train a **Reward Model**.
|
||||||
|
* **Data:** Millions of historical votes where Agents disagreed.
|
||||||
|
* **Training:** Fine-tune the LLM to prefer outputs that align with the *weighted consensus* of the Graph.
|
||||||
|
* **Outcome:** The model "intuitively" knows which sources are trustworthy (Tier 0/1) without needing RAG retrieval for every fact.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Implementation Roadmap (The Path to L9)
|
||||||
|
|
||||||
|
### Phase 1: Structured Decoding (The "Guardrails")
|
||||||
|
*Integrate StemeDB with grammar-constrained generation libraries (like `guidance` or `outlines`).*
|
||||||
|
|
||||||
|
* **Mechanism:** Force the LLM to output a citation struct `{ claim: "...", source_id: "...", confidence: 0.0-1.0 }` for every assertion.
|
||||||
|
* **Validation:** If the generated `source_id` does not exist in StemeDB, or if the `confidence` doesn't match the `VoteStore`, reject the token stream and regenerate.
|
||||||
|
* **Deliverable:** `crates/stemedb-guidance`: A Rust binding for grammar-constrained sampling backed by the KV store.
|
||||||
|
|
||||||
|
### Phase 2: DPO Pipeline (The "Training")
|
||||||
|
*Direct Preference Optimization using StemeDB history.*
|
||||||
|
|
||||||
|
* **Mechanism:** Export the `VoteStore` history as `(Prompt, Chosen, Rejected)` tuples.
|
||||||
|
* *Chosen:* An assertion supported by Tier 0 (Regulatory) sources.
|
||||||
|
* *Rejected:* A conflicting assertion supported only by Tier 5 (Anecdotal) sources.
|
||||||
|
* **Action:** Fine-tune a Llama-3 8B model on this dataset.
|
||||||
|
* **Deliverable:** `crates/stemedb-rlhf`: A pipeline that turns WAL segments into HuggingFace datasets.
|
||||||
|
|
||||||
|
### Phase 3: The Logit Processor (The "Cortex")
|
||||||
|
*Real-time intervention.*
|
||||||
|
|
||||||
|
* **Mechanism:** A custom sampler (integrated into `llama.cpp` or `vLLM`) that queries StemeDB's `MaterializedView` in real-time (sub-millisecond) during inference.
|
||||||
|
* **Optimization:** This requires the `HybridStore` to be memory-mapped into the inference engine's address space for zero-latency lookups.
|
||||||
|
* **Deliverable:** `episteme-inference`: A standalone inference server that speaks OpenAI API but enforces StemeDB truth constraints.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. The Impact
|
||||||
|
|
||||||
|
When we achieve Epistemic Logits, we solve the **Liability Gap**.
|
||||||
|
|
||||||
|
Currently, no enterprise can deploy an autonomous agent for critical tasks (Medical, Legal, Finance) because they cannot guarantee the output.
|
||||||
|
|
||||||
|
With Epistemic Logits, we provide a mathematical guarantee: **"This system is incapable of stating a claim with higher confidence than the underlying evidence supports."**
|
||||||
|
|
||||||
|
This transforms AI from a creative writing tool into a **fiduciary instrument**.
|
||||||
70
scripts/demo-cognitive-firewall.sh
Executable file
70
scripts/demo-cognitive-firewall.sh
Executable file
@ -0,0 +1,70 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# API URL
|
||||||
|
API="http://localhost:18180/v1"
|
||||||
|
|
||||||
|
echo "🔥 Cognitive Firewall Demo: Real-time Truth Resolution"
|
||||||
|
echo "====================================================="
|
||||||
|
|
||||||
|
SUBJECT="Cognitive_Firewall_Test_$(date +%s)"
|
||||||
|
echo "Testing Subject: $SUBJECT"
|
||||||
|
|
||||||
|
# Generate dummy source hashes
|
||||||
|
SOURCE_HASH_FDA="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
|
||||||
|
SOURCE_HASH_REDDIT="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
|
||||||
|
|
||||||
|
echo "💉 Injecting Claim 1 (FDA): 'Safe'..."
|
||||||
|
curl -s -X POST "$API/assert" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"subject": "'$SUBJECT'",
|
||||||
|
"predicate": "status",
|
||||||
|
"object": {"type": "Text", "value": "Safe"},
|
||||||
|
"source_hash": "'$SOURCE_HASH_FDA'",
|
||||||
|
"source_class": "Regulatory",
|
||||||
|
"confidence": 1.0,
|
||||||
|
"signatures": [{"agent_id": "0000000000000000000000000000000000000000000000000000000000000000", "signature": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "timestamp": 0, "version": 1}]
|
||||||
|
}' > /dev/null
|
||||||
|
|
||||||
|
echo "💉 Injecting Claim 2 (Reddit): 'Dangerous'..."
|
||||||
|
curl -s -X POST "$API/assert" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"subject": "'$SUBJECT'",
|
||||||
|
"predicate": "status",
|
||||||
|
"object": {"type": "Text", "value": "Dangerous"},
|
||||||
|
"source_hash": "'$SOURCE_HASH_REDDIT'",
|
||||||
|
"source_class": "Anecdotal",
|
||||||
|
"confidence": 0.8,
|
||||||
|
"signatures": [{"agent_id": "0000000000000000000000000000000000000000000000000000000000000000", "signature": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "timestamp": 0, "version": 1}]
|
||||||
|
}' > /dev/null
|
||||||
|
|
||||||
|
echo "⏳ Waiting for Ingestion (Log -> KV)..."
|
||||||
|
for i in {1..10}; do
|
||||||
|
echo -n "."
|
||||||
|
sleep 2
|
||||||
|
RESPONSE=$(curl -s -G "$API/query" \
|
||||||
|
--data-urlencode "subject=$SUBJECT" \
|
||||||
|
--data-urlencode "predicate=status" \
|
||||||
|
--data-urlencode "lens=Skeptic")
|
||||||
|
COUNT=$(echo "$RESPONSE" | jq -r '.total_count // 0')
|
||||||
|
if [ "$COUNT" -gt 0 ]; then
|
||||||
|
echo " Success!"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
CONFLICT=$(echo "$RESPONSE" | jq -r '.conflict_score // 0')
|
||||||
|
|
||||||
|
echo "-----------------------------------------------------"
|
||||||
|
echo "📊 Results:"
|
||||||
|
echo " Assertions Found: $COUNT"
|
||||||
|
echo " Conflict Score: $CONFLICT"
|
||||||
|
|
||||||
|
if [ "$COUNT" -eq 0 ]; then
|
||||||
|
echo "❌ ERROR: No assertions found after 20s."
|
||||||
|
elif (( $(echo "$CONFLICT > 0.5" | bc -l) )); then
|
||||||
|
echo "🔴 RED ALERT: High Conflict Detected! Firewall Active."
|
||||||
|
else
|
||||||
|
echo "🟢 GREEN: Consensus Reached."
|
||||||
|
fi
|
||||||
|
echo "-----------------------------------------------------"
|
||||||
Loading…
Reference in New Issue
Block a user