jordan 3320c24afa feat: WAL hardening (Phase 5B) - CRC32C, crash recovery, group commit, log rotation

Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.

Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 12:36:35 -07:00

15 KiB

Raw Blame History

Pre-Aphoria Validation Battery

Purpose: Verify stemedb behaves as documented before building ConceptPath and Aphoria on top of it. Every test maps to a claim the product makes or a code path Aphoria depends on.

Test file: crates/stemedb-query/tests/battery_pre_aphoria.rs

Battery 1: The Semaglutide Scenario

Reproduces the exact example from what-is-episteme.md. Four sources, four tiers, one subject, conflicting claims. If this doesn't work, the product demo fails.

1.1 `test_semaglutide_four_sources_ingest_and_query`

Setup:

Agent A signs: subject=Semaglutide, predicate=has_side_effect, object=Text("gastroparesis_warning"), source_class=Regulatory, confidence=1.0, timestamp=T
Agent B signs: subject=Semaglutide, predicate=has_side_effect, object=Text("no_gastroparesis_signal"), source_class=Clinical, confidence=0.9, timestamp=T+1
Agent C signs: subject=Semaglutide, predicate=has_side_effect, object=Text("gastroparesis"), source_class=Anecdotal, confidence=0.2, timestamp=T+2
Agent D signs: subject=Semaglutide, predicate=has_side_effect, object=Text("no_gastroparesis_signal"), source_class=Clinical, confidence=0.9, timestamp=T+3

Ingest all four through WAL + IngestWorker.

Assert:

All four assertions are stored (query with no lens returns 4 results)
Authority lens (TrustAwareAuthority) winner is the Regulatory assertion (FDA)
Recency lens winner is Agent D (most recent)
Consensus lens groups by object value: "no_gastroparesis_signal" has 2 assertions, "gastroparesis" variants have 2

1.2 `test_semaglutide_skeptic_analysis`

Using the same four assertions from 1.1:

Assert:

Skeptic lens analyze() returns ConflictAnalysis with:
- candidates_count = 4
- claims.len() >= 2 (at least two distinct object values)
- status = Contested (conflict_score >= 0.4)
- conflict_score > 0.3 (there is real disagreement between object values)
- The claim with object "no_gastroparesis_signal" has assertion_count = 2
- Claims are sorted descending by weight_share

1.3 `test_semaglutide_source_class_decay`

Using the same four assertions, all with timestamp 6 months ago:

Query with source_class_decay: true:

Regulatory assertion (Tier 0): confidence unchanged (no half-life)
Clinical assertions (Tier 1, 730-day half-life): confidence decayed slightly (~0.9 * 2^(-180/730) ~ 0.75)
Anecdotal assertion (Tier 5, 30-day half-life): confidence decayed to near zero (~0.2 * 2^(-180/30) ~ 0.003)

Assert:

After decay, the Anecdotal assertion's effective confidence is < 0.01
After decay, the Regulatory assertion's confidence is exactly 1.0
After decay, Clinical assertions' confidence is between 0.7 and 0.85
Authority lens after decay still picks Regulatory as winner

1.4 `test_semaglutide_time_travel`

Using the same four assertions with staggered timestamps (T, T+100, T+200, T+300):

Query with as_of: T+150:

Only assertions at T and T+100 are included
Assert exactly 2 candidates
Conflict landscape is different from the full query (only FDA + NEJM)

Battery 2: The JWT Conflict Scenario

Reproduces the JWT outage story. Validates escalation — the claim that Episteme is an "active safety system."

2.1 `test_jwt_conflict_escalation_fires`

Setup:

RFC 7519 (Tier 0, confidence 1.0): predicate=aud_validation, object=Boolean(true)
Internal wiki (Tier 3, confidence 0.8): predicate=aud_validation, object=Boolean(false)
Stack Overflow (Tier 5, confidence 0.6): predicate=aud_validation, object=Boolean(false)
Approved runbook (Tier 2, confidence 0.95): predicate=aud_validation, object=Boolean(true)

Configure escalation policy:

name: "security-config"
min_conflict_score: 0.5
level: High
predicate_pattern: None

Ingest all four. Run materializer with escalation policies.

Assert:

Escalation event is created (query ESC: prefix, find at least one)
Event has level = High
Event has conflict_score >= 0.5
Event has correct subject and predicate
Event resolved = false

2.2 `test_jwt_escalation_predicate_filter`

Same four assertions as 2.1. Two policies:

Policy A: predicate_pattern: Some("aud"), min_conflict_score: 0.3, level: Critical
Policy B: predicate_pattern: Some("revenue"), min_conflict_score: 0.3, level: Medium

Assert:

Policy A fires (predicate aud_validation contains "aud")
Policy B does NOT fire (predicate doesn't contain "revenue")
Only one escalation event exists, with level Critical

2.3 `test_jwt_layered_lens_tier_agreement`

Same four assertions. Query with Layered Consensus lens.

Assert:

Tier 0 result: winner object = Boolean(true) (RFC says validate)
Tier 2 result: winner object = Boolean(true) (Runbook agrees)
Tier 3 result: winner object = Boolean(false) (Wiki says skip)
Tier 5 result: winner object = Boolean(false) (SO says skip)
overall_conflict_score > 0.5 (cross-tier disagreement between 0/2 and 3/5)
overall_winner comes from Tier 0 (highest authority)

Battery 3: Decay Math Precision

Aphoria computes conflict scores after decay. If decay is wrong, every conflict score is wrong.

3.1 `test_decay_tier0_never_decays`

Regulatory assertion, confidence 0.95, timestamp 10 years ago. Query with source_class_decay: true.

Assert: effective confidence is exactly 0.95 (unchanged).

3.2 `test_decay_tier1_exact_halflife`

Clinical assertion, confidence 1.0, timestamp exactly 730 days ago. Query with source_class_decay: true.

Assert: effective confidence is 0.5 (within tolerance of 0.02).

3.3 `test_decay_tier1_two_halflives`

Clinical assertion, confidence 1.0, timestamp exactly 1460 days ago. Query with source_class_decay: true.

Assert: effective confidence is 0.25 (within tolerance of 0.02).

3.4 `test_decay_tier5_exact_halflife`

Anecdotal assertion, confidence 1.0, timestamp exactly 30 days ago. Query with source_class_decay: true.

Assert: effective confidence is 0.5 (within tolerance of 0.02).

3.5 `test_decay_tier5_three_halflives`

Anecdotal assertion, confidence 1.0, timestamp exactly 90 days ago. Query with source_class_decay: true.

Assert: effective confidence is 0.125 (within tolerance of 0.02).

3.6 `test_decay_zero_confidence_stays_zero`

Assertion with confidence 0.0, any tier, any age.

Assert: effective confidence is 0.0 after decay (0 * anything = 0).

3.7 `test_decay_never_goes_negative`

Anecdotal assertion, confidence 0.01, timestamp 365 days ago (12+ half-lives).

Assert: effective confidence >= 0.0.

3.8 `test_decay_uses_as_of_for_age_calculation`

Two assertions, both at timestamp T=1000:

Assertion A: Clinical, confidence 0.9
Assertion B: Anecdotal, confidence 0.9

Query with as_of: T + 730*86400 (exactly 730 days after assertions) and source_class_decay: true.

Assert:

A's effective confidence ~ 0.45 (Clinical, one half-life)
B's effective confidence ~ near zero (Anecdotal, 24+ half-lives at 30-day rate)

Battery 4: Conflict Score Calibration

Two conflict score implementations exist. compute_conflict_score in traits.rs uses confidence variance. calculate_conflict_score in skeptic/analysis.rs uses Shannon entropy over object value groups. Both need validation.

4.1 `test_variance_conflict_score_unanimous`

5 assertions, all confidence 0.8. compute_conflict_score() returns 0.0 (no variance).

4.2 `test_variance_conflict_score_maximum`

2 assertions, confidence 0.0 and 1.0. compute_conflict_score() returns 1.0 (maximum variance).

4.3 `test_variance_conflict_score_moderate`

3 assertions, confidence 0.2, 0.5, 0.8. compute_conflict_score() returns a value between 0.2 and 0.8.

4.4 `test_variance_conflict_score_single`

1 assertion. Returns 0.0.

4.5 `test_variance_conflict_score_empty`

0 assertions. Returns 0.0.

4.6 `test_skeptic_entropy_same_confidence_different_objects` [POTENTIAL BUG DETECTOR]

Three assertions, ALL with confidence 0.9:

Object A: Text("yes"), confidence 0.9
Object B: Text("no"), confidence 0.9
Object C: Text("no"), confidence 0.9

Skeptic lens analyze():

Groups into 2 claims: "yes" (weight 0.9) and "no" (weight 1.8)
Entropy is non-zero because there are two groups with different weights
conflict_score > 0.0
status is NOT Unanimous

Note: The variance-based compute_conflict_score would return 0.0 for these candidates (all same confidence). The Skeptic entropy-based score correctly detects the disagreement. This test validates the Skeptic lens is the correct tool for Aphoria's conflict detection, NOT the variance-based score.

4.7 `test_skeptic_entropy_unanimous_different_confidence`

Three assertions, all same object Text("yes"), but different confidences (0.3, 0.6, 0.9):

Skeptic lens analyze():

Groups into 1 claim (all same object)
conflict_score = 0.0 (unanimous — no disagreement on the value)
status = Unanimous

Note: Even though confidences differ, there's no actual conflict — all sources agree. The Skeptic lens correctly identifies this as unanimous.

4.8 `test_variance_score_nan_defensive`

2 assertions with confidence f32::NAN. compute_conflict_score() returns 0.0 (defensive, not NaN propagation).

Battery 5: scan_prefix with ConceptPath-shaped Keys

Storage foundation for hierarchical queries.

5.1 `test_prefix_scan_concept_path_keys`

Store via IndexStore:

S:code://rust/citadeldb/auth/jwt/aud_validation  → [hash_a]
S:code://rust/citadeldb/auth/jwt/expiry          → [hash_b]
S:code://rust/citadeldb/net/tls/verify           → [hash_c]
S:code://rust/citadeldb/auth/oauth/scopes        → [hash_d]

Assert:

scan_prefix("S:code://rust/citadeldb/auth/jwt/") → 2 keys (aud_validation, expiry)
scan_prefix("S:code://rust/citadeldb/auth/") → 3 keys (jwt/aud, jwt/expiry, oauth/scopes)
scan_prefix("S:code://rust/citadeldb/") → 4 keys (all)
scan_prefix("S:code://") → 4 keys (all)
scan_prefix("S:rfc://") → 0 keys (different scheme)

5.2 `test_prefix_scan_no_false_positives`

Store:

S:code://rust/citadeldb/auth          → [hash_a]
S:code://rust/citadeldb/authentication → [hash_b]

Assert:

scan_prefix("S:code://rust/citadeldb/auth/") → 0 keys (trailing slash prevents matching "auth" without children)
scan_prefix("S:code://rust/citadeldb/auth") → 2 keys (both match the prefix "auth")

This validates that the trailing / in hierarchical queries is necessary to prevent auth from matching authentication.

5.3 `test_prefix_scan_sp_keys_with_concept_paths`

Store via IndexStore (using SP: compound keys):

SP:code://rust/citadeldb/auth/jwt/aud_validation:config_value  → [hash_a]
SP:code://rust/citadeldb/auth/jwt/expiry:config_value          → [hash_b]

Assert:

scan_prefix("SP:code://rust/citadeldb/auth/jwt/") → 2 keys
The parsed SP key for hash_a correctly splits into subject=code://rust/citadeldb/auth/jwt/aud_validation and predicate=config_value (validates the rfind fix)

Battery 6: Signature Tamper Detection

Aphoria ingests signed assertions. If signature verification has gaps, tampered claims enter the graph.

6.1 `test_valid_signature_accepted`

Agent A signs an assertion. Ingest through IngestWorker.

Assert: assertion is stored, index entries exist.

6.2 `test_tampered_confidence_rejected`

Agent A signs assertion with confidence=0.8. Modify the serialized assertion bytes to change confidence to 1.0. Attempt to ingest.

Assert: IngestError::InvalidSignature. Assertion is NOT stored.

6.3 `test_tampered_subject_rejected`

Agent A signs assertion with subject="X". Clone the assertion, change subject to "Y", keep original signature.

Assert: ingestion fails with invalid signature.

6.4 `test_wrong_agent_id_rejected`

Agent A signs assertion. Replace agent_id in the SignatureEntry with Agent B's public key (but keep Agent A's signature bytes).

Assert: ingestion fails — the signature was made by A's private key but claims to be from B's public key.

6.5 `test_multi_sig_all_valid_accepted`

Agent A and Agent B both sign the same assertion (two valid SignatureEntries).

Assert: ingestion succeeds.

6.6 `test_multi_sig_one_invalid_rejected`

Agent A signs validly, Agent B's signature is invalid (tampered).

Assert: ingestion fails. ALL signatures must be valid.

Battery 7: Materialized View Consistency

Aphoria queries MVs for fast conflict checks. Stale or inconsistent MVs produce wrong verdicts.

7.1 `test_mv_initial_materialization`

Ingest assertion A (confidence 0.9) for subject=S, predicate=P. Run materializer step().

Assert:

MV exists at MV:{S}:{P}
MV winner_hash matches A's content hash
MV confidence = 0.9
Changelog entry exists (first materialization)

7.2 `test_mv_winner_changes_on_update`

Ingest A (confidence 0.9), materialize. Then ingest B (same S/P, confidence 0.95), materialize again.

Assert:

MV winner changes to B
Changelog has 2 entries: initial (winner=A), update (previous=A, new=B)

7.3 `test_mv_no_changelog_when_winner_unchanged`

Ingest A (confidence 0.9), materialize. Ingest B (same S/P, confidence 0.5), materialize again.

Assert:

MV winner stays A (B has lower confidence)
No new changelog entry after second materialization

7.4 `test_mv_since_query_returns_changelog`

Ingest A at T=1000, materialize at T=1001. Ingest B at T=2000, materialize at T=2001.

Query with since: 1500:

Returns changelog entries only from after T=1500
Should include the B materialization but not the A materialization

7.5 `test_mv_max_stale_fast_path`

Ingest A, materialize. Query immediately with max_stale: 60.

Assert: fast path is used (MV is fresh).

7.6 `test_mv_max_stale_slow_path`

Ingest A, materialize. Wait (or mock time) so MV is 120 seconds old. Query with max_stale: 60.

Assert: slow path is used (MV is stale, falls through to index lookup).

Findings to Watch For

Known Risk: Two Conflict Score Implementations

compute_conflict_score in traits.rs (line 89) uses confidence variance. It measures how much confidence values disagree, not how much object values disagree. Three sources saying "yes" at 0.9 and two sources saying "no" at 0.9 produces a conflict score of 0.0 because all confidences are identical.

calculate_conflict_score in skeptic/analysis.rs (line 36) uses Shannon entropy over object value groups. It correctly detects that "yes" vs "no" is a real conflict regardless of confidence values.

Aphoria must use the Skeptic lens for conflict detection, not the standard lens conflict score. Battery 4.6 validates this distinction explicitly. If Aphoria were to use compute_conflict_score from standard lenses, it would miss conflicts where sources disagree on values but agree on confidence levels.

Known Risk: Decay + Time-Travel Interaction

When both source_class_decay and as_of are set, the age calculation must use as_of as the reference time, not now. Battery 3.8 validates this. If the implementation uses now for age but filters by as_of for inclusion, the decay amounts will be wrong for historical queries.

ConceptPath Readiness

Battery 5 validates the storage layer works with ConceptPath-shaped keys before any type changes. If these tests pass, the scan_prefix foundation is solid and ConceptPath implementation can proceed with confidence.

15 KiB Raw Blame History

Pre-Aphoria Validation Battery

Battery 1: The Semaglutide Scenario

1.1 test_semaglutide_four_sources_ingest_and_query

1.2 test_semaglutide_skeptic_analysis

1.3 test_semaglutide_source_class_decay

1.4 test_semaglutide_time_travel

Battery 2: The JWT Conflict Scenario

2.1 test_jwt_conflict_escalation_fires

2.2 test_jwt_escalation_predicate_filter

2.3 test_jwt_layered_lens_tier_agreement

Battery 3: Decay Math Precision

3.1 test_decay_tier0_never_decays

3.2 test_decay_tier1_exact_halflife

3.3 test_decay_tier1_two_halflives

3.4 test_decay_tier5_exact_halflife

3.5 test_decay_tier5_three_halflives

3.6 test_decay_zero_confidence_stays_zero

3.7 test_decay_never_goes_negative

3.8 test_decay_uses_as_of_for_age_calculation

Battery 4: Conflict Score Calibration

4.1 test_variance_conflict_score_unanimous

4.2 test_variance_conflict_score_maximum

4.3 test_variance_conflict_score_moderate

4.4 test_variance_conflict_score_single

4.5 test_variance_conflict_score_empty

4.6 test_skeptic_entropy_same_confidence_different_objects [POTENTIAL BUG DETECTOR]

4.7 test_skeptic_entropy_unanimous_different_confidence

4.8 test_variance_score_nan_defensive

Battery 5: scan_prefix with ConceptPath-shaped Keys

5.1 test_prefix_scan_concept_path_keys

5.2 test_prefix_scan_no_false_positives

5.3 test_prefix_scan_sp_keys_with_concept_paths

Battery 6: Signature Tamper Detection

6.1 test_valid_signature_accepted

6.2 test_tampered_confidence_rejected

6.3 test_tampered_subject_rejected

6.4 test_wrong_agent_id_rejected

6.5 test_multi_sig_all_valid_accepted

6.6 test_multi_sig_one_invalid_rejected

Battery 7: Materialized View Consistency

7.1 test_mv_initial_materialization

7.2 test_mv_winner_changes_on_update

7.3 test_mv_no_changelog_when_winner_unchanged

7.4 test_mv_since_query_returns_changelog

7.5 test_mv_max_stale_fast_path

7.6 test_mv_max_stale_slow_path

Findings to Watch For

Known Risk: Two Conflict Score Implementations

Known Risk: Decay + Time-Travel Interaction

ConceptPath Readiness

15 KiB

Raw Blame History

1.1 `test_semaglutide_four_sources_ingest_and_query`

1.2 `test_semaglutide_skeptic_analysis`

1.3 `test_semaglutide_source_class_decay`

1.4 `test_semaglutide_time_travel`

2.1 `test_jwt_conflict_escalation_fires`

2.2 `test_jwt_escalation_predicate_filter`

2.3 `test_jwt_layered_lens_tier_agreement`

3.1 `test_decay_tier0_never_decays`

3.2 `test_decay_tier1_exact_halflife`

3.3 `test_decay_tier1_two_halflives`

3.4 `test_decay_tier5_exact_halflife`

3.5 `test_decay_tier5_three_halflives`

3.6 `test_decay_zero_confidence_stays_zero`

3.7 `test_decay_never_goes_negative`

3.8 `test_decay_uses_as_of_for_age_calculation`

4.1 `test_variance_conflict_score_unanimous`

4.2 `test_variance_conflict_score_maximum`

4.3 `test_variance_conflict_score_moderate`

4.4 `test_variance_conflict_score_single`

4.5 `test_variance_conflict_score_empty`

4.6 `test_skeptic_entropy_same_confidence_different_objects` [POTENTIAL BUG DETECTOR]

4.7 `test_skeptic_entropy_unanimous_different_confidence`

4.8 `test_variance_score_nan_defensive`

5.1 `test_prefix_scan_concept_path_keys`

5.2 `test_prefix_scan_no_false_positives`

5.3 `test_prefix_scan_sp_keys_with_concept_paths`

6.1 `test_valid_signature_accepted`

6.2 `test_tampered_confidence_rejected`

6.3 `test_tampered_subject_rejected`

6.4 `test_wrong_agent_id_rejected`

6.5 `test_multi_sig_all_valid_accepted`

6.6 `test_multi_sig_one_invalid_rejected`

7.1 `test_mv_initial_materialization`

7.2 `test_mv_winner_changes_on_update`

7.3 `test_mv_no_changelog_when_winner_unchanged`

7.4 `test_mv_since_query_returns_changelog`

7.5 `test_mv_max_stale_fast_path`

7.6 `test_mv_max_stale_slow_path`