jordan 3320c24afa feat: WAL hardening (Phase 5B) - CRC32C, crash recovery, group commit, log rotation

Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.

Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 12:36:35 -07:00

10 KiB

Raw Blame History

Lens

Last Updated: 2026-02-01 Confidence: High Status: Implemented in stemedb-lens v0.1.0

Summary

A Lens resolves conflicting assertions into a deterministic answer at read time. Multiple truths coexist; the Lens chooses which to return.

Key Facts:

Stateless compute (no side effects)
Deterministic (same input = same output)
Fast (runs on every read, avoid allocations)
Pluggable (implement Lens trait)

File Pointer: crates/stemedb-lens/src/lib.rs

The Traits

Synchronous Lens

pub trait Lens: Send + Sync {
    fn resolve(&self, candidates: &[Assertion]) -> Resolution;
    fn name(&self) -> &'static str;
}

Async Lens

For lenses requiring I/O (e.g., VoteStore lookups):

#[async_trait]
pub trait AsyncLens: Send + Sync {
    async fn resolve_async(&self, candidates: &[Assertion]) -> Resolution;
    fn name(&self) -> &'static str;
}

Analysis Lens (Trust but Verify)

For lenses that surface conflict instead of resolving it:

#[async_trait]
pub trait AnalysisLens: Send + Sync {
    async fn analyze(&self, candidates: &[Assertion]) -> ConflictAnalysis;
    fn name(&self) -> &'static str;
}

Returns ConflictAnalysis with:

status: Unanimous, Agreed, or Contested
conflict_score: 0.0 (unanimous) to 1.0 (chaos) using normalized Shannon entropy
claims: All distinct claims ranked by weight share

VoteAwareConsensus Implementation

The VoteAwareConsensusLens integrates with the Ballot Box pattern (VoteStore) to resolve based on actual vote counts.

Resolution Strategy:

For each candidate assertion, lookup vote count and aggregate weight (O(1) cached)
Rank by aggregate weight (sum of all vote weights)
Return assertion with highest aggregate weight
Tiebreaker: If weights equal, prefer most recent timestamp

Confidence Calculation:

confidence = winner_weight / total_weight_across_all_candidates

Example:

use stemedb_lens::VoteAwareConsensusLens;
use stemedb_storage::{HybridStore, GenericVoteStore};
use std::sync::Arc;

let store = HybridStore::open("./data").await?;
let vote_store = Arc::new(GenericVoteStore::new(store));
let lens = VoteAwareConsensusLens::new(vote_store);

let resolution = lens.resolve_async(&candidates).await;

TrustAwareAuthority Implementation

The TrustAwareAuthorityLens integrates with TrustRank to weight assertions by agent reputation. This is the foundation of "The Hive" learning loop.

Resolution Strategy:

For each candidate assertion, lookup the primary signer's TrustRank (O(1) lookup)
Calculate weighted score: assertion.confidence * agent.trust_rank
Return assertion with highest weighted score
Tiebreaker: If scores equal, prefer most recent timestamp
New agents default to 0.5 trust score
Unsigned assertions treated as 0.0 trust

Confidence Calculation:

weighted_score = assertion.confidence * agent.trust_rank
confidence = weighted_score  // Direct weighted score

TrustRank Learning Loop:

Agents start at 0.5 (neutral)
Accurate predictions: +0.05 per correct assertion
Inaccurate predictions: -0.1 per incorrect assertion (higher penalty discourages spam)
Confidence half-life: Scores decay over 30 days by default
Scores bounded to [0.0, 1.0]

Example:

use stemedb_lens::TrustAwareAuthorityLens;
use stemedb_storage::{HybridStore, GenericTrustRankStore};
use std::sync::Arc;

let store = HybridStore::open("./data").await?;
let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = TrustAwareAuthorityLens::new(trust_store);

let resolution = lens.resolve_async(&candidates).await;

// Record outcome for learning
trust_store.record_outcome(&agent_id, was_accurate, timestamp).await?;

// Apply decay periodically
trust_store.decay_trust_ranks(current_timestamp, None).await?;

Standard Lenses

Lens	Strategy	Use Case	Status
Recency	Latest timestamp wins	News, real-time	✅ Implemented
Consensus	Most common object value	Democratic truth (basic)	✅ Implemented
VoteAwareConsensus	Highest vote weight from VoteStore	Democratic truth (advanced)	✅ Implemented
Confidence	Highest assertion `confidence` field	Source-declared certainty	✅ Implemented
Authority	Alias for TrustAwareAuthority	Reputation-weighted (user-friendly name)	✅ Implemented
TrustAwareAuthority	Weighted by TrustRank reputation	Expert truth (The Hive)	✅ Implemented
Skeptic	Returns all claims with conflict score	"Trust but Verify" dashboards	✅ Implemented
EpochAware	Filters superseded epochs first	Paradigm-safe queries	✅ Implemented
Constraints	Returns `must_use`/`forbidden` predicates	Pre-flight checks	🔜 Planned

Note: The Authority lens is now an alias for TrustAwareAuthority (both use agent reputation via TrustRank). Use Confidence if you want to select by the assertion's self-declared confidence field without considering agent reputation.

EpochAwareLens Implementation

The EpochAwareLens filters assertions from superseded epochs before delegating to an inner lens. This enables "paradigm-safe" queries where obsolete worldviews are automatically excluded.

Resolution Strategy:

Collect all unique epoch IDs from candidate assertions
For each epoch, read E:{epoch_id} from store
Walk the supersedes chain to build a set of superseded epoch IDs
Filter candidates: exclude any assertion whose epoch is in the superseded set
Delegate filtered candidates to inner lens (default: RecencyLens)

Key Design Decisions:

Behavior	Choice	Rationale
Missing epoch record	Include assertion (fail-open)	Data availability > metadata consistency
Cycle in supersession chain	Stop walking, include assertions	Pathological data shouldn't hide valid assertions
Max depth exceeded (100)	Stop walking, log warning	Prevent infinite loops
No epochs in candidates	Delegate directly to inner lens	Optimization for common case

Use Case: Accounting Standard Migration (GAAP → IFRS)

# Create epochs representing paradigm shift
POST /v1/epoch {"name": "GAAP-Era", "start_timestamp": 0}
# Returns epoch_id: "abc123..."

POST /v1/epoch {
  "name": "IFRS-Transition",
  "supersedes": "abc123...",
  "supersession_type": "Temporal",
  "start_timestamp": 1704067200
}
# Returns epoch_id: "def456..."

# Query with epoch awareness
GET /v1/query?subject=Acme&predicate=lease_liability&lens=EpochAware

# Returns IFRS treatment (new epoch)
# GAAP treatment (old epoch) automatically excluded

Example:

use stemedb_lens::EpochAwareLens;
use stemedb_storage::HybridStore;
use std::sync::Arc;

let store = Arc::new(HybridStore::open("./data").expect("store"));

// Default: filter superseded epochs, then pick most recent
let lens = EpochAwareLens::with_recency(store.clone());

// Custom: filter superseded epochs, then use consensus
use stemedb_lens::ConsensusLens;
let lens = EpochAwareLens::with_sync_lens(store, ConsensusLens);

let resolution = lens.resolve_async(&candidates).await;

Limitation: The lens only filters assertions when assertions from the superseding epoch are present in the candidates. If you only have old-epoch assertions (no new-epoch assertions exist for the query), they will pass through. This is intentional fail-open behavior.

SkepticLens Implementation

The SkepticLens surfaces conflict instead of hiding it. It implements AnalysisLens rather than Lens, returning a ConflictAnalysis with all competing claims.

Resolution Strategy:

Group assertions by object value
For each group, calculate aggregate vote weight (or fallback to confidence)
Calculate normalized Shannon entropy as conflict score
Determine status: Unanimous (<0.1), Agreed (<0.4), or Contested (>=0.4)
Build ClaimSummary for each group with supporting agents and source provenance

Conflict Score Formula:

entropy = -sum(p * log2(p)) for each claim weight proportion
conflict_score = entropy / log2(num_claims)  // Normalized to 0.0-1.0

API Endpoint:

GET /v1/skeptic?subject=Semaglutide&predicate=muscle_effect

{
  "status": "Contested",
  "conflict_score": 0.72,
  "claims": [
    {
      "value": {"type": "Text", "value": "Significant loss"},
      "weight_share": 0.45,
      "assertion_count": 12,
      "supporting_agents": [...]
    },
    {
      "value": {"type": "Text", "value": "Minimal loss"},
      "weight_share": 0.35,
      "assertion_count": 3
    }
  ],
  "candidates_count": 17
}

Example:

use stemedb_lens::SkepticLens;
use stemedb_storage::{HybridStore, GenericVoteStore, GenericTrustRankStore};
use std::sync::Arc;

let store = HybridStore::open("./data").await?;
let vote_store = Arc::new(GenericVoteStore::new(store.clone()));
let trust_store = Arc::new(GenericTrustRankStore::new(store));
let lens = SkepticLens::new(vote_store, trust_store);

let analysis = lens.analyze(&candidates).await;
if analysis.status == ResolutionStatus::Contested {
    println!("⚠️ This fact is disputed! Conflict score: {:.2}", analysis.conflict_score);
}

Lens::Constraints (Pre-Flight Check)

Special lens for agent safety. Returns rules, not facts.

GET /query?context=python_http&lens=constraints

-> Returns:
{
  "constraints": [
    { "must_use": "axios", "forbidden": "requests", "reason": "User correction" }
  ]
}

Origin: Solves the "Optimization Conflict" where agents forget corrections. Acts as a compiler error for agent intent.

See agile-agent-team.md for full explanation.

Query Flow

Client: GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")
Index lookup: SP:Tesla:Revenue -> [Hash1, Hash2, Hash3]
Hydrate: Load assertions from hashes
Resolve: ConsensusLens.resolve(assertions, context)
Return: Single deterministic answer with confidence

10 KiB Raw Blame History

Lens

Summary

The Traits

Synchronous Lens

Async Lens

Analysis Lens (Trust but Verify)

VoteAwareConsensus Implementation

TrustAwareAuthority Implementation

Standard Lenses

EpochAwareLens Implementation

SkepticLens Implementation

Lens::Constraints (Pre-Flight Check)

Query Flow

Related Topics

10 KiB

Raw Blame History