# Pre-Aphoria Validation Battery **Purpose:** Verify stemedb behaves as documented before building ConceptPath and Aphoria on top of it. Every test maps to a claim the product makes or a code path Aphoria depends on. **Test file:** `crates/stemedb-query/tests/battery_pre_aphoria.rs` --- ## Battery 1: The Semaglutide Scenario Reproduces the exact example from `what-is-episteme.md`. Four sources, four tiers, one subject, conflicting claims. If this doesn't work, the product demo fails. ### 1.1 `test_semaglutide_four_sources_ingest_and_query` Setup: - Agent A signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis_warning")`, source_class=Regulatory, confidence=1.0, timestamp=T - Agent B signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+1 - Agent C signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("gastroparesis")`, source_class=Anecdotal, confidence=0.2, timestamp=T+2 - Agent D signs: subject=`Semaglutide`, predicate=`has_side_effect`, object=`Text("no_gastroparesis_signal")`, source_class=Clinical, confidence=0.9, timestamp=T+3 Ingest all four through WAL + IngestWorker. Assert: - All four assertions are stored (query with no lens returns 4 results) - Authority lens (TrustAwareAuthority) winner is the Regulatory assertion (FDA) - Recency lens winner is Agent D (most recent) - Consensus lens groups by object value: "no_gastroparesis_signal" has 2 assertions, "gastroparesis" variants have 2 ### 1.2 `test_semaglutide_skeptic_analysis` Using the same four assertions from 1.1: Assert: - Skeptic lens `analyze()` returns `ConflictAnalysis` with: - `candidates_count` = 4 - `claims.len()` >= 2 (at least two distinct object values) - `status` = `Contested` (conflict_score >= 0.4) - `conflict_score` > 0.3 (there is real disagreement between object values) - The claim with object `"no_gastroparesis_signal"` has `assertion_count` = 2 - Claims are sorted descending by `weight_share` ### 1.3 `test_semaglutide_source_class_decay` Using the same four assertions, all with timestamp 6 months ago: Query with `source_class_decay: true`: - Regulatory assertion (Tier 0): confidence unchanged (no half-life) - Clinical assertions (Tier 1, 730-day half-life): confidence decayed slightly (~0.9 * 2^(-180/730) ~ 0.75) - Anecdotal assertion (Tier 5, 30-day half-life): confidence decayed to near zero (~0.2 * 2^(-180/30) ~ 0.003) Assert: - After decay, the Anecdotal assertion's effective confidence is < 0.01 - After decay, the Regulatory assertion's confidence is exactly 1.0 - After decay, Clinical assertions' confidence is between 0.7 and 0.85 - Authority lens after decay still picks Regulatory as winner ### 1.4 `test_semaglutide_time_travel` Using the same four assertions with staggered timestamps (T, T+100, T+200, T+300): Query with `as_of: T+150`: - Only assertions at T and T+100 are included - Assert exactly 2 candidates - Conflict landscape is different from the full query (only FDA + NEJM) --- ## Battery 2: The JWT Conflict Scenario Reproduces the JWT outage story. Validates escalation — the claim that Episteme is an "active safety system." ### 2.1 `test_jwt_conflict_escalation_fires` Setup: - RFC 7519 (Tier 0, confidence 1.0): predicate=`aud_validation`, object=`Boolean(true)` - Internal wiki (Tier 3, confidence 0.8): predicate=`aud_validation`, object=`Boolean(false)` - Stack Overflow (Tier 5, confidence 0.6): predicate=`aud_validation`, object=`Boolean(false)` - Approved runbook (Tier 2, confidence 0.95): predicate=`aud_validation`, object=`Boolean(true)` Configure escalation policy: ``` name: "security-config" min_conflict_score: 0.5 level: High predicate_pattern: None ``` Ingest all four. Run materializer with escalation policies. Assert: - Escalation event is created (query `ESC:` prefix, find at least one) - Event has `level` = `High` - Event has `conflict_score` >= 0.5 - Event has correct subject and predicate - Event `resolved` = false ### 2.2 `test_jwt_escalation_predicate_filter` Same four assertions as 2.1. Two policies: - Policy A: `predicate_pattern: Some("aud")`, `min_conflict_score: 0.3`, `level: Critical` - Policy B: `predicate_pattern: Some("revenue")`, `min_conflict_score: 0.3`, `level: Medium` Assert: - Policy A fires (predicate `aud_validation` contains "aud") - Policy B does NOT fire (predicate doesn't contain "revenue") - Only one escalation event exists, with level `Critical` ### 2.3 `test_jwt_layered_lens_tier_agreement` Same four assertions. Query with Layered Consensus lens. Assert: - Tier 0 result: winner object = `Boolean(true)` (RFC says validate) - Tier 2 result: winner object = `Boolean(true)` (Runbook agrees) - Tier 3 result: winner object = `Boolean(false)` (Wiki says skip) - Tier 5 result: winner object = `Boolean(false)` (SO says skip) - `overall_conflict_score` > 0.5 (cross-tier disagreement between 0/2 and 3/5) - `overall_winner` comes from Tier 0 (highest authority) --- ## Battery 3: Decay Math Precision Aphoria computes conflict scores after decay. If decay is wrong, every conflict score is wrong. ### 3.1 `test_decay_tier0_never_decays` Regulatory assertion, confidence 0.95, timestamp 10 years ago. Query with `source_class_decay: true`. Assert: effective confidence is exactly 0.95 (unchanged). ### 3.2 `test_decay_tier1_exact_halflife` Clinical assertion, confidence 1.0, timestamp exactly 730 days ago. Query with `source_class_decay: true`. Assert: effective confidence is 0.5 (within tolerance of 0.02). ### 3.3 `test_decay_tier1_two_halflives` Clinical assertion, confidence 1.0, timestamp exactly 1460 days ago. Query with `source_class_decay: true`. Assert: effective confidence is 0.25 (within tolerance of 0.02). ### 3.4 `test_decay_tier5_exact_halflife` Anecdotal assertion, confidence 1.0, timestamp exactly 30 days ago. Query with `source_class_decay: true`. Assert: effective confidence is 0.5 (within tolerance of 0.02). ### 3.5 `test_decay_tier5_three_halflives` Anecdotal assertion, confidence 1.0, timestamp exactly 90 days ago. Query with `source_class_decay: true`. Assert: effective confidence is 0.125 (within tolerance of 0.02). ### 3.6 `test_decay_zero_confidence_stays_zero` Assertion with confidence 0.0, any tier, any age. Assert: effective confidence is 0.0 after decay (0 * anything = 0). ### 3.7 `test_decay_never_goes_negative` Anecdotal assertion, confidence 0.01, timestamp 365 days ago (12+ half-lives). Assert: effective confidence >= 0.0. ### 3.8 `test_decay_uses_as_of_for_age_calculation` Two assertions, both at timestamp T=1000: - Assertion A: Clinical, confidence 0.9 - Assertion B: Anecdotal, confidence 0.9 Query with `as_of: T + 730*86400` (exactly 730 days after assertions) and `source_class_decay: true`. Assert: - A's effective confidence ~ 0.45 (Clinical, one half-life) - B's effective confidence ~ near zero (Anecdotal, 24+ half-lives at 30-day rate) --- ## Battery 4: Conflict Score Calibration Two conflict score implementations exist. `compute_conflict_score` in `traits.rs` uses confidence variance. `calculate_conflict_score` in `skeptic/analysis.rs` uses Shannon entropy over object value groups. Both need validation. ### 4.1 `test_variance_conflict_score_unanimous` 5 assertions, all confidence 0.8. `compute_conflict_score()` returns 0.0 (no variance). ### 4.2 `test_variance_conflict_score_maximum` 2 assertions, confidence 0.0 and 1.0. `compute_conflict_score()` returns 1.0 (maximum variance). ### 4.3 `test_variance_conflict_score_moderate` 3 assertions, confidence 0.2, 0.5, 0.8. `compute_conflict_score()` returns a value between 0.2 and 0.8. ### 4.4 `test_variance_conflict_score_single` 1 assertion. Returns 0.0. ### 4.5 `test_variance_conflict_score_empty` 0 assertions. Returns 0.0. ### 4.6 `test_skeptic_entropy_same_confidence_different_objects` [POTENTIAL BUG DETECTOR] Three assertions, ALL with confidence 0.9: - Object A: `Text("yes")`, confidence 0.9 - Object B: `Text("no")`, confidence 0.9 - Object C: `Text("no")`, confidence 0.9 Skeptic lens `analyze()`: - Groups into 2 claims: "yes" (weight 0.9) and "no" (weight 1.8) - Entropy is non-zero because there are two groups with different weights - `conflict_score` > 0.0 - `status` is NOT `Unanimous` **Note:** The variance-based `compute_conflict_score` would return 0.0 for these candidates (all same confidence). The Skeptic entropy-based score correctly detects the disagreement. This test validates the Skeptic lens is the correct tool for Aphoria's conflict detection, NOT the variance-based score. ### 4.7 `test_skeptic_entropy_unanimous_different_confidence` Three assertions, all same object `Text("yes")`, but different confidences (0.3, 0.6, 0.9): Skeptic lens `analyze()`: - Groups into 1 claim (all same object) - `conflict_score` = 0.0 (unanimous — no disagreement on the value) - `status` = `Unanimous` **Note:** Even though confidences differ, there's no actual conflict — all sources agree. The Skeptic lens correctly identifies this as unanimous. ### 4.8 `test_variance_score_nan_defensive` 2 assertions with confidence `f32::NAN`. `compute_conflict_score()` returns 0.0 (defensive, not NaN propagation). --- ## Battery 5: scan_prefix with ConceptPath-shaped Keys Storage foundation for hierarchical queries. ### 5.1 `test_prefix_scan_concept_path_keys` Store via IndexStore: ``` S:code://rust/citadeldb/auth/jwt/aud_validation → [hash_a] S:code://rust/citadeldb/auth/jwt/expiry → [hash_b] S:code://rust/citadeldb/net/tls/verify → [hash_c] S:code://rust/citadeldb/auth/oauth/scopes → [hash_d] ``` Assert: - `scan_prefix("S:code://rust/citadeldb/auth/jwt/")` → 2 keys (aud_validation, expiry) - `scan_prefix("S:code://rust/citadeldb/auth/")` → 3 keys (jwt/aud, jwt/expiry, oauth/scopes) - `scan_prefix("S:code://rust/citadeldb/")` → 4 keys (all) - `scan_prefix("S:code://")` → 4 keys (all) - `scan_prefix("S:rfc://")` → 0 keys (different scheme) ### 5.2 `test_prefix_scan_no_false_positives` Store: ``` S:code://rust/citadeldb/auth → [hash_a] S:code://rust/citadeldb/authentication → [hash_b] ``` Assert: - `scan_prefix("S:code://rust/citadeldb/auth/")` → 0 keys (trailing slash prevents matching "auth" without children) - `scan_prefix("S:code://rust/citadeldb/auth")` → 2 keys (both match the prefix "auth") This validates that the trailing `/` in hierarchical queries is necessary to prevent `auth` from matching `authentication`. ### 5.3 `test_prefix_scan_sp_keys_with_concept_paths` Store via IndexStore (using SP: compound keys): ``` SP:code://rust/citadeldb/auth/jwt/aud_validation:config_value → [hash_a] SP:code://rust/citadeldb/auth/jwt/expiry:config_value → [hash_b] ``` Assert: - `scan_prefix("SP:code://rust/citadeldb/auth/jwt/")` → 2 keys - The parsed SP key for hash_a correctly splits into subject=`code://rust/citadeldb/auth/jwt/aud_validation` and predicate=`config_value` (validates the rfind fix) --- ## Battery 6: Signature Tamper Detection Aphoria ingests signed assertions. If signature verification has gaps, tampered claims enter the graph. ### 6.1 `test_valid_signature_accepted` Agent A signs an assertion. Ingest through IngestWorker. Assert: assertion is stored, index entries exist. ### 6.2 `test_tampered_confidence_rejected` Agent A signs assertion with confidence=0.8. Modify the serialized assertion bytes to change confidence to 1.0. Attempt to ingest. Assert: `IngestError::InvalidSignature`. Assertion is NOT stored. ### 6.3 `test_tampered_subject_rejected` Agent A signs assertion with subject="X". Clone the assertion, change subject to "Y", keep original signature. Assert: ingestion fails with invalid signature. ### 6.4 `test_wrong_agent_id_rejected` Agent A signs assertion. Replace `agent_id` in the `SignatureEntry` with Agent B's public key (but keep Agent A's signature bytes). Assert: ingestion fails — the signature was made by A's private key but claims to be from B's public key. ### 6.5 `test_multi_sig_all_valid_accepted` Agent A and Agent B both sign the same assertion (two valid SignatureEntries). Assert: ingestion succeeds. ### 6.6 `test_multi_sig_one_invalid_rejected` Agent A signs validly, Agent B's signature is invalid (tampered). Assert: ingestion fails. ALL signatures must be valid. --- ## Battery 7: Materialized View Consistency Aphoria queries MVs for fast conflict checks. Stale or inconsistent MVs produce wrong verdicts. ### 7.1 `test_mv_initial_materialization` Ingest assertion A (confidence 0.9) for subject=S, predicate=P. Run materializer `step()`. Assert: - MV exists at `MV:{S}:{P}` - MV winner_hash matches A's content hash - MV confidence = 0.9 - Changelog entry exists (first materialization) ### 7.2 `test_mv_winner_changes_on_update` Ingest A (confidence 0.9), materialize. Then ingest B (same S/P, confidence 0.95), materialize again. Assert: - MV winner changes to B - Changelog has 2 entries: initial (winner=A), update (previous=A, new=B) ### 7.3 `test_mv_no_changelog_when_winner_unchanged` Ingest A (confidence 0.9), materialize. Ingest B (same S/P, confidence 0.5), materialize again. Assert: - MV winner stays A (B has lower confidence) - No new changelog entry after second materialization ### 7.4 `test_mv_since_query_returns_changelog` Ingest A at T=1000, materialize at T=1001. Ingest B at T=2000, materialize at T=2001. Query with `since: 1500`: - Returns changelog entries only from after T=1500 - Should include the B materialization but not the A materialization ### 7.5 `test_mv_max_stale_fast_path` Ingest A, materialize. Query immediately with `max_stale: 60`. Assert: fast path is used (MV is fresh). ### 7.6 `test_mv_max_stale_slow_path` Ingest A, materialize. Wait (or mock time) so MV is 120 seconds old. Query with `max_stale: 60`. Assert: slow path is used (MV is stale, falls through to index lookup). --- ## Findings to Watch For ### Known Risk: Two Conflict Score Implementations `compute_conflict_score` in `traits.rs` (line 89) uses **confidence variance**. It measures how much confidence values disagree, not how much object values disagree. Three sources saying "yes" at 0.9 and two sources saying "no" at 0.9 produces a conflict score of **0.0** because all confidences are identical. `calculate_conflict_score` in `skeptic/analysis.rs` (line 36) uses **Shannon entropy over object value groups**. It correctly detects that "yes" vs "no" is a real conflict regardless of confidence values. **Aphoria must use the Skeptic lens for conflict detection, not the standard lens conflict score.** Battery 4.6 validates this distinction explicitly. If Aphoria were to use `compute_conflict_score` from standard lenses, it would miss conflicts where sources disagree on values but agree on confidence levels. ### Known Risk: Decay + Time-Travel Interaction When both `source_class_decay` and `as_of` are set, the age calculation must use `as_of` as the reference time, not `now`. Battery 3.8 validates this. If the implementation uses `now` for age but filters by `as_of` for inclusion, the decay amounts will be wrong for historical queries. ### ConceptPath Readiness Battery 5 validates the storage layer works with ConceptPath-shaped keys before any type changes. If these tests pass, the `scan_prefix` foundation is solid and ConceptPath implementation can proceed with confidence.