Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
15 KiB
Pre-Aphoria Validation Battery
Purpose: Verify stemedb behaves as documented before building ConceptPath and Aphoria on top of it. Every test maps to a claim the product makes or a code path Aphoria depends on.
Test file: crates/stemedb-query/tests/battery_pre_aphoria.rs
Battery 1: The Semaglutide Scenario
Reproduces the exact example from what-is-episteme.md. Four sources, four tiers, one subject, conflicting claims. If this doesn't work, the product demo fails.
1.1 test_semaglutide_four_sources_ingest_and_query
Setup:
- Agent A signs: subject=
Semaglutide, predicate=has_side_effect, object=Text("gastroparesis_warning"), source_class=Regulatory, confidence=1.0, timestamp=T - Agent B signs: subject=
Semaglutide, predicate=has_side_effect, object=Text("no_gastroparesis_signal"), source_class=Clinical, confidence=0.9, timestamp=T+1 - Agent C signs: subject=
Semaglutide, predicate=has_side_effect, object=Text("gastroparesis"), source_class=Anecdotal, confidence=0.2, timestamp=T+2 - Agent D signs: subject=
Semaglutide, predicate=has_side_effect, object=Text("no_gastroparesis_signal"), source_class=Clinical, confidence=0.9, timestamp=T+3
Ingest all four through WAL + IngestWorker.
Assert:
- All four assertions are stored (query with no lens returns 4 results)
- Authority lens (TrustAwareAuthority) winner is the Regulatory assertion (FDA)
- Recency lens winner is Agent D (most recent)
- Consensus lens groups by object value: "no_gastroparesis_signal" has 2 assertions, "gastroparesis" variants have 2
1.2 test_semaglutide_skeptic_analysis
Using the same four assertions from 1.1:
Assert:
- Skeptic lens
analyze()returnsConflictAnalysiswith:candidates_count= 4claims.len()>= 2 (at least two distinct object values)status=Contested(conflict_score >= 0.4)conflict_score> 0.3 (there is real disagreement between object values)- The claim with object
"no_gastroparesis_signal"hasassertion_count= 2 - Claims are sorted descending by
weight_share
1.3 test_semaglutide_source_class_decay
Using the same four assertions, all with timestamp 6 months ago:
Query with source_class_decay: true:
- Regulatory assertion (Tier 0): confidence unchanged (no half-life)
- Clinical assertions (Tier 1, 730-day half-life): confidence decayed slightly (~0.9 * 2^(-180/730) ~ 0.75)
- Anecdotal assertion (Tier 5, 30-day half-life): confidence decayed to near zero (~0.2 * 2^(-180/30) ~ 0.003)
Assert:
- After decay, the Anecdotal assertion's effective confidence is < 0.01
- After decay, the Regulatory assertion's confidence is exactly 1.0
- After decay, Clinical assertions' confidence is between 0.7 and 0.85
- Authority lens after decay still picks Regulatory as winner
1.4 test_semaglutide_time_travel
Using the same four assertions with staggered timestamps (T, T+100, T+200, T+300):
Query with as_of: T+150:
- Only assertions at T and T+100 are included
- Assert exactly 2 candidates
- Conflict landscape is different from the full query (only FDA + NEJM)
Battery 2: The JWT Conflict Scenario
Reproduces the JWT outage story. Validates escalation — the claim that Episteme is an "active safety system."
2.1 test_jwt_conflict_escalation_fires
Setup:
- RFC 7519 (Tier 0, confidence 1.0): predicate=
aud_validation, object=Boolean(true) - Internal wiki (Tier 3, confidence 0.8): predicate=
aud_validation, object=Boolean(false) - Stack Overflow (Tier 5, confidence 0.6): predicate=
aud_validation, object=Boolean(false) - Approved runbook (Tier 2, confidence 0.95): predicate=
aud_validation, object=Boolean(true)
Configure escalation policy:
name: "security-config"
min_conflict_score: 0.5
level: High
predicate_pattern: None
Ingest all four. Run materializer with escalation policies.
Assert:
- Escalation event is created (query
ESC:prefix, find at least one) - Event has
level=High - Event has
conflict_score>= 0.5 - Event has correct subject and predicate
- Event
resolved= false
2.2 test_jwt_escalation_predicate_filter
Same four assertions as 2.1. Two policies:
- Policy A:
predicate_pattern: Some("aud"),min_conflict_score: 0.3,level: Critical - Policy B:
predicate_pattern: Some("revenue"),min_conflict_score: 0.3,level: Medium
Assert:
- Policy A fires (predicate
aud_validationcontains "aud") - Policy B does NOT fire (predicate doesn't contain "revenue")
- Only one escalation event exists, with level
Critical
2.3 test_jwt_layered_lens_tier_agreement
Same four assertions. Query with Layered Consensus lens.
Assert:
- Tier 0 result: winner object =
Boolean(true)(RFC says validate) - Tier 2 result: winner object =
Boolean(true)(Runbook agrees) - Tier 3 result: winner object =
Boolean(false)(Wiki says skip) - Tier 5 result: winner object =
Boolean(false)(SO says skip) overall_conflict_score> 0.5 (cross-tier disagreement between 0/2 and 3/5)overall_winnercomes from Tier 0 (highest authority)
Battery 3: Decay Math Precision
Aphoria computes conflict scores after decay. If decay is wrong, every conflict score is wrong.
3.1 test_decay_tier0_never_decays
Regulatory assertion, confidence 0.95, timestamp 10 years ago.
Query with source_class_decay: true.
Assert: effective confidence is exactly 0.95 (unchanged).
3.2 test_decay_tier1_exact_halflife
Clinical assertion, confidence 1.0, timestamp exactly 730 days ago.
Query with source_class_decay: true.
Assert: effective confidence is 0.5 (within tolerance of 0.02).
3.3 test_decay_tier1_two_halflives
Clinical assertion, confidence 1.0, timestamp exactly 1460 days ago.
Query with source_class_decay: true.
Assert: effective confidence is 0.25 (within tolerance of 0.02).
3.4 test_decay_tier5_exact_halflife
Anecdotal assertion, confidence 1.0, timestamp exactly 30 days ago.
Query with source_class_decay: true.
Assert: effective confidence is 0.5 (within tolerance of 0.02).
3.5 test_decay_tier5_three_halflives
Anecdotal assertion, confidence 1.0, timestamp exactly 90 days ago.
Query with source_class_decay: true.
Assert: effective confidence is 0.125 (within tolerance of 0.02).
3.6 test_decay_zero_confidence_stays_zero
Assertion with confidence 0.0, any tier, any age.
Assert: effective confidence is 0.0 after decay (0 * anything = 0).
3.7 test_decay_never_goes_negative
Anecdotal assertion, confidence 0.01, timestamp 365 days ago (12+ half-lives).
Assert: effective confidence >= 0.0.
3.8 test_decay_uses_as_of_for_age_calculation
Two assertions, both at timestamp T=1000:
- Assertion A: Clinical, confidence 0.9
- Assertion B: Anecdotal, confidence 0.9
Query with as_of: T + 730*86400 (exactly 730 days after assertions) and source_class_decay: true.
Assert:
- A's effective confidence ~ 0.45 (Clinical, one half-life)
- B's effective confidence ~ near zero (Anecdotal, 24+ half-lives at 30-day rate)
Battery 4: Conflict Score Calibration
Two conflict score implementations exist. compute_conflict_score in traits.rs uses confidence variance. calculate_conflict_score in skeptic/analysis.rs uses Shannon entropy over object value groups. Both need validation.
4.1 test_variance_conflict_score_unanimous
5 assertions, all confidence 0.8.
compute_conflict_score() returns 0.0 (no variance).
4.2 test_variance_conflict_score_maximum
2 assertions, confidence 0.0 and 1.0.
compute_conflict_score() returns 1.0 (maximum variance).
4.3 test_variance_conflict_score_moderate
3 assertions, confidence 0.2, 0.5, 0.8.
compute_conflict_score() returns a value between 0.2 and 0.8.
4.4 test_variance_conflict_score_single
1 assertion. Returns 0.0.
4.5 test_variance_conflict_score_empty
0 assertions. Returns 0.0.
4.6 test_skeptic_entropy_same_confidence_different_objects [POTENTIAL BUG DETECTOR]
Three assertions, ALL with confidence 0.9:
- Object A:
Text("yes"), confidence 0.9 - Object B:
Text("no"), confidence 0.9 - Object C:
Text("no"), confidence 0.9
Skeptic lens analyze():
- Groups into 2 claims: "yes" (weight 0.9) and "no" (weight 1.8)
- Entropy is non-zero because there are two groups with different weights
conflict_score> 0.0statusis NOTUnanimous
Note: The variance-based compute_conflict_score would return 0.0 for these candidates (all same confidence). The Skeptic entropy-based score correctly detects the disagreement. This test validates the Skeptic lens is the correct tool for Aphoria's conflict detection, NOT the variance-based score.
4.7 test_skeptic_entropy_unanimous_different_confidence
Three assertions, all same object Text("yes"), but different confidences (0.3, 0.6, 0.9):
Skeptic lens analyze():
- Groups into 1 claim (all same object)
conflict_score= 0.0 (unanimous — no disagreement on the value)status=Unanimous
Note: Even though confidences differ, there's no actual conflict — all sources agree. The Skeptic lens correctly identifies this as unanimous.
4.8 test_variance_score_nan_defensive
2 assertions with confidence f32::NAN.
compute_conflict_score() returns 0.0 (defensive, not NaN propagation).
Battery 5: scan_prefix with ConceptPath-shaped Keys
Storage foundation for hierarchical queries.
5.1 test_prefix_scan_concept_path_keys
Store via IndexStore:
S:code://rust/citadeldb/auth/jwt/aud_validation → [hash_a]
S:code://rust/citadeldb/auth/jwt/expiry → [hash_b]
S:code://rust/citadeldb/net/tls/verify → [hash_c]
S:code://rust/citadeldb/auth/oauth/scopes → [hash_d]
Assert:
scan_prefix("S:code://rust/citadeldb/auth/jwt/")→ 2 keys (aud_validation, expiry)scan_prefix("S:code://rust/citadeldb/auth/")→ 3 keys (jwt/aud, jwt/expiry, oauth/scopes)scan_prefix("S:code://rust/citadeldb/")→ 4 keys (all)scan_prefix("S:code://")→ 4 keys (all)scan_prefix("S:rfc://")→ 0 keys (different scheme)
5.2 test_prefix_scan_no_false_positives
Store:
S:code://rust/citadeldb/auth → [hash_a]
S:code://rust/citadeldb/authentication → [hash_b]
Assert:
scan_prefix("S:code://rust/citadeldb/auth/")→ 0 keys (trailing slash prevents matching "auth" without children)scan_prefix("S:code://rust/citadeldb/auth")→ 2 keys (both match the prefix "auth")
This validates that the trailing / in hierarchical queries is necessary to prevent auth from matching authentication.
5.3 test_prefix_scan_sp_keys_with_concept_paths
Store via IndexStore (using SP: compound keys):
SP:code://rust/citadeldb/auth/jwt/aud_validation:config_value → [hash_a]
SP:code://rust/citadeldb/auth/jwt/expiry:config_value → [hash_b]
Assert:
scan_prefix("SP:code://rust/citadeldb/auth/jwt/")→ 2 keys- The parsed SP key for hash_a correctly splits into subject=
code://rust/citadeldb/auth/jwt/aud_validationand predicate=config_value(validates the rfind fix)
Battery 6: Signature Tamper Detection
Aphoria ingests signed assertions. If signature verification has gaps, tampered claims enter the graph.
6.1 test_valid_signature_accepted
Agent A signs an assertion. Ingest through IngestWorker.
Assert: assertion is stored, index entries exist.
6.2 test_tampered_confidence_rejected
Agent A signs assertion with confidence=0.8. Modify the serialized assertion bytes to change confidence to 1.0. Attempt to ingest.
Assert: IngestError::InvalidSignature. Assertion is NOT stored.
6.3 test_tampered_subject_rejected
Agent A signs assertion with subject="X". Clone the assertion, change subject to "Y", keep original signature.
Assert: ingestion fails with invalid signature.
6.4 test_wrong_agent_id_rejected
Agent A signs assertion. Replace agent_id in the SignatureEntry with Agent B's public key (but keep Agent A's signature bytes).
Assert: ingestion fails — the signature was made by A's private key but claims to be from B's public key.
6.5 test_multi_sig_all_valid_accepted
Agent A and Agent B both sign the same assertion (two valid SignatureEntries).
Assert: ingestion succeeds.
6.6 test_multi_sig_one_invalid_rejected
Agent A signs validly, Agent B's signature is invalid (tampered).
Assert: ingestion fails. ALL signatures must be valid.
Battery 7: Materialized View Consistency
Aphoria queries MVs for fast conflict checks. Stale or inconsistent MVs produce wrong verdicts.
7.1 test_mv_initial_materialization
Ingest assertion A (confidence 0.9) for subject=S, predicate=P.
Run materializer step().
Assert:
- MV exists at
MV:{S}:{P} - MV winner_hash matches A's content hash
- MV confidence = 0.9
- Changelog entry exists (first materialization)
7.2 test_mv_winner_changes_on_update
Ingest A (confidence 0.9), materialize. Then ingest B (same S/P, confidence 0.95), materialize again.
Assert:
- MV winner changes to B
- Changelog has 2 entries: initial (winner=A), update (previous=A, new=B)
7.3 test_mv_no_changelog_when_winner_unchanged
Ingest A (confidence 0.9), materialize. Ingest B (same S/P, confidence 0.5), materialize again.
Assert:
- MV winner stays A (B has lower confidence)
- No new changelog entry after second materialization
7.4 test_mv_since_query_returns_changelog
Ingest A at T=1000, materialize at T=1001. Ingest B at T=2000, materialize at T=2001.
Query with since: 1500:
- Returns changelog entries only from after T=1500
- Should include the B materialization but not the A materialization
7.5 test_mv_max_stale_fast_path
Ingest A, materialize. Query immediately with max_stale: 60.
Assert: fast path is used (MV is fresh).
7.6 test_mv_max_stale_slow_path
Ingest A, materialize. Wait (or mock time) so MV is 120 seconds old. Query with max_stale: 60.
Assert: slow path is used (MV is stale, falls through to index lookup).
Findings to Watch For
Known Risk: Two Conflict Score Implementations
compute_conflict_score in traits.rs (line 89) uses confidence variance. It measures how much confidence values disagree, not how much object values disagree. Three sources saying "yes" at 0.9 and two sources saying "no" at 0.9 produces a conflict score of 0.0 because all confidences are identical.
calculate_conflict_score in skeptic/analysis.rs (line 36) uses Shannon entropy over object value groups. It correctly detects that "yes" vs "no" is a real conflict regardless of confidence values.
Aphoria must use the Skeptic lens for conflict detection, not the standard lens conflict score. Battery 4.6 validates this distinction explicitly. If Aphoria were to use compute_conflict_score from standard lenses, it would miss conflicts where sources disagree on values but agree on confidence levels.
Known Risk: Decay + Time-Travel Interaction
When both source_class_decay and as_of are set, the age calculation must use as_of as the reference time, not now. Battery 3.8 validates this. If the implementation uses now for age but filters by as_of for inclusion, the decay amounts will be wrong for historical queries.
ConceptPath Readiness
Battery 5 validates the storage layer works with ConceptPath-shaped keys before any type changes. If these tests pass, the scan_prefix foundation is solid and ConceptPath implementation can proceed with confidence.