jml 9bfa626203 docs: reorganize documentation structure for clarity

Major documentation restructure to improve discoverability and reduce duplication.

## Changes

**Deleted (Archived/Consolidated)**:
- Removed duplicate getting started guides
- Archived outdated planning documents
- Consolidated corpus and configuration docs
- Removed obsolete vision/spec files (superseded by vision.md)
- Cleaned up scrapyard and old PDFs

**New Structure**:
- docs/about/ - Project overview and introduction
- docs/guides/ - User guides (moved from root)
- docs/specs/ - Technical specifications
- docs/sdk/ - SDK documentation (Go)
- docs/references/ - API references
- docs/archive/ - Archived historical docs
- applications/aphoria/docs/advanced/ - Advanced topics
- applications/aphoria/docs/reference/ - CLI reference
- applications/aphoria/docs/archive/ - Archived aphoria docs

**Updated**:
- README.md - New root README with clear navigation
- CONTRIBUTING.md - Contribution guidelines
- CLAUDE.md - Updated paths to new structure
- roadmap.md - Added recent completions

## Files Changed
- 57 files changed
- 1,977 insertions(+)
- 961 deletions(-)

**Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-11 07:33:40 +00:00

45 KiB

Raw Blame History

Implementation Audit Checklist

Systematic verification of every completed roadmap feature (Phases 1-5B). Organized by functional theme with actionable /investigate and /trace-feature entries.

Scope: 8 Rust crates + Go SDK | Phases: 1, 2, 2.5, 3, 4, 5A, 5B

How to use: Work through each section. Run the suggested command for each entry. Mark items as they are verified. Priority guides triage order.

Note: Phase 5B roadmap checkboxes are not yet updated, but the code was delivered in commit 3320c24 ("feat: WAL hardening (Phase 5B)"). All 5B features (CRC32C, crash recovery, group commit, log rotation) exist in the codebase and are included in this checklist.

1. Write Path (WAL, Ingestion, Durability)

1.1 WAL Record Format & CRC32C Integrity

Field	Value
Priority	Critical
Why it matters	Torn writes corrupt the append-only log. CRC32C is the first line of defense.
Key files	`crates/stemedb-wal/src/format.rs`, `crates/stemedb-wal/src/journal.rs`
What to verify	Record format is `[len:u32][crc32c:u32][data][blake3:32]`. CRC32C validated on every read before deserialization. Torn writes detected and rejected.
Command	`/investigate WAL record format — confirm dual checksum (CRC32C + BLAKE3), torn write rejection, and that no record can be deserialized without passing CRC32C`

1.2 Crash Recovery (Full Scan & Truncate)

Field	Value
Priority	Critical
Why it matters	After an unclean shutdown, the WAL must recover to a consistent state. Partial writes must be truncated, not served.
Key files	`crates/stemedb-wal/src/journal.rs`, `crates/stemedb-wal/src/segment.rs`
What to verify	`recover()` performs sequential record scan with CRC32C validation. Truncates at first corrupted/incomplete record. Recovery metrics tracked (valid/invalid records, bytes truncated).
Command	`/trace-feature WAL crash recovery — trace from journal open through record scan, truncation, and metric reporting`

1.3 Group Commit

Field	Value
Priority	High
Why it matters	Without group commit, fsync-per-write caps throughput at ~1K writes/sec. Group commit batches fsyncs for dramatically higher throughput.
Key files	`crates/stemedb-wal/src/group_commit.rs`, `crates/stemedb-wal/src/durability.rs`
What to verify	`GroupCommitBuffer` buffers N writes or T milliseconds before single fsync. Writers wait on `Notify`. Background flusher calls fsync and notifies all waiters. Configurable `max_writes` and `max_duration`.
Command	`/investigate group commit — verify buffer/flush cycle, writer notification, and that no write is acknowledged before its fsync completes`

1.4 Log Rotation

Field	Value
Priority	High
Why it matters	Unbounded WAL growth will eventually exhaust disk. Rotation keeps disk usage bounded.
Key files	`crates/stemedb-wal/src/segment.rs`, `crates/stemedb-wal/src/journal.rs`
What to verify	Segment naming follows `{seq:08x}.wal`. New segment created when current exceeds threshold. Old segments deleted after cursor passes them. Recovery works across multiple segments.
Command	`/investigate log rotation — verify segment creation, naming, cleanup after cursor advancement, and multi-segment recovery`

1.5 Ed25519 Signature Verification

Field	Value
Priority	Critical
Why it matters	Unsigned or mis-signed assertions bypass the entire trust model. Every assertion must be cryptographically verified before storage.
Key files	`crates/stemedb-ingest/src/worker/processing.rs`, `crates/stemedb-ingest/src/worker/tests/signatures.rs`
What to verify	Signature verified during ingestion before KV write. Invalid signatures rejected with error. Multi-sig (`SignatureEntry` vec) verified. Unsigned assertions rejected.
Command	`/investigate Ed25519 verification in ingestion — confirm every assertion is verified, invalid sigs rejected, and multi-sig entries all checked`

1.6 Cursor Persistence

Field	Value
Priority	Critical
Why it matters	If the ingest cursor is lost, the entire WAL is re-processed on restart. If it advances too early, assertions are silently dropped.
Key files	`crates/stemedb-ingest/src/worker/run.rs`, `crates/stemedb-ingest/src/worker/storage.rs`, `crates/stemedb-ingest/src/worker/tests/cursor.rs`
What to verify	Cursor stored at `__CURSOR__:ingest` key in KV. Updated after successful processing (not before). Survives restart and resumes from correct offset.
Command	`/trace-feature cursor persistence — trace from WAL tail through processing, cursor update, restart, and resume`

1.7 Epoch Cascade & Supersession Markers

Field	Value
Priority	High
Why it matters	Without transitive cascade, epoch chains require O(chain_length) walks. Markers enable O(1) supersession checks.
Key files	`crates/stemedb-ingest/src/worker/processing.rs`, `crates/stemedb-ingest/src/worker/tests/epoch_cascade.rs`, `crates/stemedb-storage/src/supersession_store.rs`
What to verify	`write_supersession_cascade()` writes `SUPERSEDED:{old_epoch_id}` for full transitive closure. All markers point to LATEST superseding epoch. Max depth guard (100) and cycle detection.
Command	`/investigate epoch cascade — verify transitive closure, marker correctness for A->B->C chains, cycle detection, and depth guard`

1.8 Ingestion Record Type Routing

Field	Value
Priority	Medium
Why it matters	The ingest worker handles assertions, votes, and epochs. Mis-routing corrupts data.
Key files	`crates/stemedb-ingest/src/worker/record_types.rs`, `crates/stemedb-ingest/src/worker/processing.rs`
What to verify	Each record type deserialized and routed correctly. Unknown record types handled gracefully (logged, not panicked). Indexes (S:, SP:) updated on assertion ingest.
Command	`/investigate ingest record routing — verify assertion/vote/epoch dispatch, index updates, and unknown-type handling`

2. Read Path (Query Engine, Materialized Views)

2.1 MV Fast Path

Field	Value
Priority	Critical
Why it matters	The MV fast path is the primary read performance mechanism. If it silently serves stale data or misses updates, query results are wrong.
Key files	`crates/stemedb-query/src/engine.rs`, `crates/stemedb-query/src/query.rs`
What to verify	`try_fast_path()` looks up `MV:{subject}:{predicate}`. Returns immediately if MV exists and no features bypass it. Falls through to slow path when MV missing.
Command	`/trace-feature MV fast path — trace from QueryEngine::execute through try_fast_path, MV lookup, and fallback to slow path`

2.2 MV Staleness Detection

Field	Value
Priority	High
Why it matters	Without staleness detection, the fast path serves arbitrarily old cached results.
Key files	`crates/stemedb-query/src/engine.rs`, `crates/stemedb-query/src/query.rs`
What to verify	`max_stale` parameter on Query. If set and MV age exceeds threshold, falls through to slow path. No `max_stale` = any MV age accepted (backward compatible). `max_stale=0` rejects all but brand-new MVs.
Command	`/investigate MV staleness — verify threshold comparison logic, edge cases (zero, None), and that stale MVs trigger slow path with debug log`

2.3 Fast Path Bypass Conditions

Field	Value
Priority	High
Why it matters	Several features must bypass the fast path because MVs don't capture their state. Missing a bypass means wrong results.
Key files	`crates/stemedb-query/src/engine.rs`
What to verify	Fast path bypassed when: `as_of` is set (time-travel), `since` is set (changelog), `decay_halflife` is set (confidence decay), `source_class_decay` is enabled, `vector_near` is set. All other queries use fast path.
Command	`/investigate fast path bypass — enumerate all conditions that force slow path and verify each is correctly checked`

2.4 Slow Path Filtering & Index Routing

Field	Value
Priority	High
Why it matters	The slow path is the correctness fallback. If index routing picks the wrong index or filtering misses conditions, results are incorrect.
Key files	`crates/stemedb-query/src/engine.rs`, `crates/stemedb-storage/src/index_store.rs`
What to verify	QueryEngine routes: SP index (subject+predicate) -> S index (subject only) -> full scan. `Query::matches()` checks all filter fields (subject, predicate, lifecycle, epoch, as_of, visual_near, metadata filters).
Command	`/trace-feature query index routing — trace from execute() through index selection, candidate fetch, matches() filtering, and result assembly`

2.5 Materializer Worker

Field	Value
Priority	High
Why it matters	The materializer pre-computes winning assertions for O(1) reads. If it fails to run or computes wrong winners, the fast path serves incorrect data.
Key files	`crates/stemedb-query/src/materializer.rs`
What to verify	`step()` processes pending subject/predicate pairs. `run()` loops continuously. `run_notified()` wakes on Notify events. `materialize_pair()` applies lens, writes MV, writes changelog on winner change, fires escalation checks.
Command	`/trace-feature materializer — trace step/run/run_notified cycle, MV write, changelog write, and escalation trigger`

3. Lens Resolution

3.1 RecencyLens

Field	Value
Priority	High
Why it matters	Default lens. Picks the newest assertion. If timestamp comparison is wrong, newest-wins semantics break.
Key files	`crates/stemedb-lens/src/recency.rs`
What to verify	Selects assertion with highest timestamp. Computes conflict_score. Empty candidates return empty resolution.
Command	`/investigate RecencyLens — verify timestamp comparison, conflict score computation, empty/single candidate edge cases`

3.2 ConsensusLens

Field	Value
Priority	High
Why it matters	Groups assertions by object value, picks the group with most support. Incorrect grouping means wrong consensus.
Key files	`crates/stemedb-lens/src/consensus.rs`
What to verify	Groups by object value. Picks group with highest total confidence. Conflict score reflects inter-group disagreement.
Command	`/investigate ConsensusLens — verify grouping logic, confidence aggregation, and conflict score for multi-group scenarios`

3.3 ConfidenceLens (formerly AuthorityLens)

Field	Value
Priority	Medium
Why it matters	Selects by raw confidence field. The rename from AuthorityLens must not have broken routing.
Key files	`crates/stemedb-lens/src/confidence.rs`
What to verify	Picks assertion with highest `confidence` field. `LensDto::Confidence` routes here. Old `Authority` name no longer routes here.
Command	`/investigate ConfidenceLens — verify confidence-field selection, DTO routing, and that Authority name redirects to TrustAwareAuthorityLens`

3.4 VoteAwareConsensusLens

Field	Value
Priority	High
Why it matters	Real vote-weighted resolution. If vote counts are miscounted or weights wrong, democratic consensus breaks.
Key files	`crates/stemedb-lens/src/vote_aware_consensus.rs`, `crates/stemedb-storage/src/vote_store/`
What to verify	Fetches vote counts/weights from VoteStore. Groups by object value weighted by votes. Falls back gracefully when no votes exist.
Command	`/trace-feature VoteAwareConsensusLens — trace from lens resolve through VoteStore lookup, weight calculation, and winner selection`

3.5 TrustAwareAuthorityLens

Field	Value
Priority	High
Why it matters	Weights assertions by agent TrustRank. If trust scores not fetched or weighted wrong, reputation system is decorative.
Key files	`crates/stemedb-lens/src/trust_aware_authority.rs`, `crates/stemedb-storage/src/trust_rank_store/`
What to verify	Fetches per-agent TrustRank. Weights assertion confidence by trust score. `LensDto::Authority` routes here (not to ConfidenceLens). Falls back when TrustRankStore unavailable.
Command	`/trace-feature TrustAwareAuthorityLens — trace from resolve through TrustRankStore lookup, weight multiplication, and winner selection`

3.6 EpochAwareLens

Field	Value
Priority	High
Why it matters	Filters out assertions from superseded epochs. Without this, old paradigm data contaminates current results.
Key files	`crates/stemedb-lens/src/lib.rs` (epoch_aware logic may be inlined or split)
What to verify	Uses O(1) `SUPERSEDED:{epoch_id}` marker lookup. Fail-open on missing markers. Cycle detection + max depth 100. Decorates any inner lens.
Command	`/investigate EpochAwareLens — verify marker-based filtering, fail-open semantics, cycle detection, and decorator pattern with inner lens`

3.7 SkepticLens

Field	Value
Priority	Medium
Why it matters	Surfaces disagreement instead of resolving it. Critical for the browser extension's contradiction overlay.
Key files	`crates/stemedb-lens/src/skeptic.rs`, `crates/stemedb-query/src/skeptic.rs`, `crates/stemedb-api/src/handlers/skeptic.rs`
What to verify	Returns `ConflictAnalysis` with Shannon entropy-based conflict score. `ResolutionStatus` (Unanimous/Agreed/Contested) thresholds correct. All claims ranked by weight. API endpoint `/v1/skeptic` returns full analysis.
Command	`/trace-feature SkepticLens — trace from API endpoint through SkepticResolver, entropy computation, status classification, and response assembly`

3.8 LayeredConsensusLens

Field	Value
Priority	Medium
Why it matters	Per-source-class consensus for the pharma vertical. Ensures Regulatory tier outweighs Anecdotal even with fewer assertions.
Key files	`crates/stemedb-lens/src/layered_consensus.rs`, `crates/stemedb-api/src/handlers/layered.rs`
What to verify	Groups by SourceClass tier. Per-tier resolution with individual conflict scores. Cross-tier conflict via Shannon entropy. Overall winner from highest-authority tier. API endpoint `/v1/layered` returns `LayeredQueryResponse`.
Command	`/trace-feature LayeredConsensusLens — trace from API endpoint through tier grouping, per-tier resolution, cross-tier conflict, and overall winner`

3.9 ConstraintsLens

Field	Value
Priority	Medium
Why it matters	Pre-flight safety checks for must_use/forbidden/prefer predicates. Incorrect categorization could allow forbidden items or miss required ones.
Key files	`crates/stemedb-lens/src/constraints.rs`, `crates/stemedb-api/src/handlers/constraints.rs`
What to verify	Categorizes by predicate pattern: `must_use:`, `forbidden:`, `prefer:*`. Priority: must_use > forbidden > prefer. Sorted by confidence within category. Non-constraint predicates ignored. API endpoint `/v1/constraints` returns `ConstraintsResponse`.
Command	`/investigate ConstraintsLens — verify predicate pattern matching, priority ordering, confidence sort, and that non-constraint predicates are fully excluded`

4. Trust & Safety

4.1 TrustRank Engine

Field	Value
Priority	High
Why it matters	Foundation of the reputation system. If trust scores drift, decay wrong, or clamp incorrectly, the entire trust model is unreliable.
Key files	`crates/stemedb-storage/src/trust_rank_store/store_impl.rs`, `crates/stemedb-storage/src/trust_rank_store/model.rs`
What to verify	Per-agent trust score storage and retrieval. `record_outcome()` for accuracy tracking. Trust score clamping to valid range. Decay calculation with configurable half-life.
Command	`/investigate TrustRank engine — verify score storage, outcome recording, clamping, and decay calculation`

4.2 Gold Standard Verification

Field	Value
Priority	High
Why it matters	Sybil defense. If agents can game gold standards (verify repeatedly, get wrong rewards), the trust bootstrapping mechanism is broken.
Key files	`crates/stemedb-storage/src/gold_standard_store.rs`, `crates/stemedb-storage/src/trust_rank_store/store_impl.rs`, `crates/stemedb-storage/src/trust_rank_store/gold_standard_tests.rs`
What to verify	`GoldStandard` CRUD operations. `verify_agent_against_gold_standard()` with correct/incorrect matching. Deduplication markers at `GS_VERIFIED:{agent_id}:{subject}:{predicate}`. Trust adjustments: +0.05 reward, -0.1 penalty. Clamping after adjustment.
Command	`/trace-feature gold standard verification — trace from admin API through gold standard creation, agent verification, trust adjustment, and dedup marker`

4.3 Escalation Triggers

Field	Value
Priority	High
Why it matters	Active safety system. If high-conflict assertions don't fire escalations, dangerous disagreements go unnoticed.
Key files	`crates/stemedb-storage/src/escalation_store.rs`, `crates/stemedb-core/src/types/escalation.rs`, `crates/stemedb-query/src/materializer.rs`
What to verify	`EscalationPolicy` with configurable threshold + level. Materializer fires events when conflict_score exceeds policy threshold. Events stored at `ESC:{timestamp_nanos}:{id_hex}`. Predicate pattern matching on policies. API endpoints for query and resolution.
Command	`/trace-feature escalation triggers — trace from materializer conflict_score computation through policy check, event creation, storage, and API retrieval`

4.4 Conflict Score Computation

Field	Value
Priority	Medium
Why it matters	Conflict score drives escalations, filtering, and the Skeptic lens. If the formula is wrong, downstream features are unreliable.
Key files	`crates/stemedb-lens/src/traits.rs` (compute_conflict_score function)
What to verify	Normalized variance of confidence values. 0 or 1 candidates = 0.0. All same confidence = 0.0. Max variance (0.0 vs 1.0) = 1.0. NaN handling returns 0.0. Score added to all lens resolutions and MaterializedViews.
Command	`/investigate conflict score — verify formula correctness, edge cases (empty, single, NaN), and propagation to Resolution and MaterializedView`

4.5 Conflict Score Filtering

Field	Value
Priority	Medium
Why it matters	Browser extension needs "only high-conflict claims." If filtering is wrong, the UI shows wrong results or nothing.
Key files	`crates/stemedb-query/src/engine.rs`, `crates/stemedb-api/src/dto/query_params.rs`
What to verify	`min_conflict_score` and `max_conflict_score` on Query. Fast-path filtering checks MV conflict_score. API validation: scores 0.0-1.0, finite (rejects NaN/Inf).
Command	`/investigate conflict score filtering — verify min/max thresholds on fast path, boundary behavior, NaN/Inf rejection, and combination with other filters`

4.6 Batch TrustRank Decay API

Field	Value
Priority	Medium
Why it matters	External orchestrators (Gardener) need to trigger scheduled trust decay. If the endpoint doesn't work, trust scores never decay.
Key files	`crates/stemedb-api/src/handlers/admin.rs`, `crates/stemedb-storage/src/trust_rank_store/store_impl.rs`
What to verify	`POST /v1/admin/decay-trust-ranks` accepts `now` and `half_life_seconds`. Delegates to `TrustRankStore::decay_trust_ranks()`. Response includes `decayed_count`, `timestamp_used`, `half_life_used`, `status`.
Command	`/investigate trust decay API — verify endpoint accepts params, delegates correctly, and response has all required fields`

5. Search & Similarity

5.1 HNSW Vector Search

Field	Value
Priority	High
Why it matters	Semantic k-NN search for embeddings. If the index returns wrong neighbors or crashes on edge cases, the semantic search feature is broken.
Key files	`crates/stemedb-storage/src/vector_index.rs`, `crates/stemedb-query/src/engine.rs`
What to verify	`HnswVectorIndex` with RwLock protection. Input validation: dimension mismatch, NaN, Infinity rejected. Idempotent insert. QueryEngine uses O(log N) HNSW when `vector_near` set + index configured. Falls back to standard path without index.
Command	`/trace-feature vector search — trace from API query with vector_near through QueryEngine, HNSW lookup, candidate fetch, and result assembly`

5.2 BK-Tree Visual Search

Field	Value
Priority	High
Why it matters	Image provenance via perceptual hashes. If hamming distance or BK-tree traversal is wrong, similar images aren't found.
Key files	`crates/stemedb-storage/src/visual_index.rs`, `crates/stemedb-query/src/engine.rs`
What to verify	`BkTreeVisualIndex` with hamming distance metric. Threshold clamped to 0-64. Results sorted by distance ascending. Idempotent insert. QueryEngine uses O(log N) BK-tree when `visual_near` set + index configured. Falls back to brute-force scan without index.
Command	`/trace-feature visual search — trace from API query with visual_near through QueryEngine, BK-tree lookup, threshold filtering, and result assembly`

5.3 Visual Hash Brute-Force Fallback

Field	Value
Priority	Medium
Why it matters	Without a BK-tree index, visual search falls back to O(N) scan in `Query::matches()`. If the fallback is broken, visual search fails silently when no index is configured.
Key files	`crates/stemedb-query/src/query.rs`
What to verify	`visual_near` and `visual_threshold` on Query. `matches()` computes hamming distance when set. Invalid hex rejected. Wrong-length hash rejected. Default threshold behavior.
Command	`/investigate visual hash brute-force — verify hamming distance in matches(), invalid input handling, and default threshold`

6. Time & Decay

6.1 Time-Travel (as_of)

Field	Value
Priority	High
Why it matters	Historical queries are fundamental to "Git for Truth." If as_of doesn't exclude future assertions or incorrectly uses MVs, historical queries are wrong.
Key files	`crates/stemedb-query/src/query.rs`, `crates/stemedb-query/src/engine.rs`
What to verify	`as_of` parameter filters assertions by `timestamp <= as_of`. Fast path bypassed (MVs reflect current state). Edge case: `assertion.timestamp == as_of` included. Works with lens resolution (only historical candidates).
Command	`/investigate time-travel — verify as_of filtering in matches(), fast path bypass, exact-timestamp edge case, and lens interaction`

6.2 Change Tracking (since)

Field	Value
Priority	High
Why it matters	"What changed since I last looked?" is the returning consumer story. If changelog entries are missed or timestamps wrong, consumers miss updates.
Key files	`crates/stemedb-query/src/materializer.rs`, `crates/stemedb-query/src/engine.rs`, `crates/stemedb-core/src/types/materialized.rs`
What to verify	`MVC:{subject}:{predicate}:{timestamp_nanos}` changelog keys. Written when MV winner changes (not on same-winner re-materialization). `since` parameter triggers `fetch_changes_since()`. Entries sorted ascending. Fast path bypassed when `since` is set.
Command	`/trace-feature change tracking — trace from materializer winner-change detection through changelog write, since-based fetch, and API response`

6.3 Semantic Decay (Confidence Half-Life)

Field	Value
Priority	High
Why it matters	Old assertions should lose influence. If decay formula is wrong or not applied before lens resolution, stale high-confidence assertions dominate forever.
Key files	`crates/stemedb-query/src/decay.rs`, `crates/stemedb-query/src/engine.rs`
What to verify	`apply_decay()`: `confidence * 2^(-(age / halflife))`. Applied after filtering, before lens. Zero halflife = no decay (avoids div-by-zero). Future assertions = no decay. Confidence clamped to [0.0, 1.0]. Only confidence changes; other fields preserved. Fast path bypassed.
Command	`/investigate semantic decay — verify formula, application order (after filter, before lens), zero-halflife safety, and field preservation`

6.4 Source-Class-Aware Decay

Field	Value
Priority	Medium
Why it matters	Regulatory data should never decay. Anecdotal data should decay in 30 days. If tier-specific half-lives are wrong, the evidence hierarchy is undermined.
Key files	`crates/stemedb-query/src/decay.rs`, `crates/stemedb-core/src/types/source.rs`
What to verify	`SourceClass::default_decay_days()` returns tier-specific half-lives. Tier 0 (Regulatory) = no decay. Tier 5 (Anecdotal) = 30 days. `apply_source_class_decay()` uses per-assertion source_class. Time-travel compatible (uses as_of if set).
Command	`/investigate source-class decay — verify per-tier half-life values, Regulatory no-decay, Anecdotal rapid decay, and as_of interaction`

6.5 Epoch Supersession at Query Time

Field	Value
Priority	High
Why it matters	Superseded epoch assertions should be excluded from results. If markers aren't checked or fail-open is wrong, old paradigm data leaks into current queries.
Key files	`crates/stemedb-lens/src/lib.rs`, `crates/stemedb-storage/src/supersession_store.rs`
What to verify	`is_epoch_superseded()` uses O(1) marker lookup. Assertions from superseded epochs filtered before lens resolution. Fail-open: missing marker = not superseded. Works with all inner lenses.
Command	`/investigate epoch supersession at query time — verify marker lookup, filtering position in pipeline, fail-open behavior, and inner lens compatibility`

7. Source Provenance

7.1 Source Document Storage

Field	Value
Priority	Medium
Why it matters	100% citation recall requires every source document to be retrievable by its hash. If storage or retrieval is broken, provenance claims are unverifiable.
Key files	`crates/stemedb-api/src/handlers/source.rs`
What to verify	`POST /v1/source` stores document at `SRC:{hash}`. BLAKE3 content hash. Base64 encoding for binary-safe transport. 10MB size limit. Content-addressed (idempotent). `GET /v1/provenance/{hash}` retrieves by hash. Format: `[content_type_len:4][content_type][content]`.
Command	`/trace-feature source document storage — trace from POST /v1/source through hashing, storage format, and GET /v1/provenance retrieval`

7.2 Source Metadata Indexing

Field	Value
Priority	Medium
Why it matters	Queryable metadata fields (journal, doi, platform, study_design) enable filtered searches. If indexing is broken, metadata queries return empty or wrong results.
Key files	`crates/stemedb-storage/src/lib.rs` (SourceMetadataIndexStore), `crates/stemedb-ingest/src/worker/processing.rs`
What to verify	`SMV:{field}:{value}` key pattern. Case-insensitive normalization. IngestWorker extracts indexed fields from `source_metadata` JSON. Query supports `source_journal`, `source_doi`, `source_platform`, `source_study_design` with AND semantics. Malformed JSON gracefully skipped.
Command	`/investigate source metadata indexing — verify index key pattern, case normalization, ingest-time extraction, query-time filtering, and malformed JSON handling`

7.3 Rich Source Metadata (Opaque Blob)

Field	Value
Priority	Low
Why it matters	`source_metadata: Option<Vec<u8>>` stores arbitrary provenance. If serialization roundtrip is broken, metadata is silently lost.
Key files	`crates/stemedb-core/src/types/assertion.rs`, `crates/stemedb-api/src/dto/create.rs`, `crates/stemedb-api/src/dto/responses.rs`
What to verify	`Vec<u8>` field for rkyv zero-copy. API accepts JSON string, converts to bytes. Response converts bytes back with defensive UTF-8 handling. Builder supports `.source_metadata_json()` and `.source_metadata()`.
Command	`/investigate source metadata blob — verify serialization roundtrip, API JSON<->bytes conversion, and defensive UTF-8 handling`

8. API & Integration

8.1 Core CRUD Endpoints

Field	Value
Priority	Critical
Why it matters	These are the primary write and read endpoints. If any is broken, the database is unusable.
Key files	`crates/stemedb-api/src/handlers/assert.rs`, `crates/stemedb-api/src/handlers/vote.rs`, `crates/stemedb-api/src/handlers/epoch.rs`, `crates/stemedb-api/src/handlers/query.rs`, `crates/stemedb-api/src/handlers/health.rs`
What to verify

Endpoint	Method	Handler	Verify
`/v1/assert`	POST	`assert.rs`	Accepts JSON, writes to WAL, returns assertion hash
`/v1/vote`	POST	`vote.rs`	High-throughput vote ingestion with provenance fields
`/v1/epoch`	POST	`epoch.rs`	Creates epoch with optional `supersedes` field
`/v1/query`	GET	`query.rs`	Subject/Predicate/Lens/Lifecycle/Epoch/as_of/since/decay/vector/visual filters
`/v1/health`	GET	`health.rs`	Returns assertion count, uptime

| Command | /trace-feature core API endpoints — trace each CRUD endpoint from HTTP handler through DTO validation, WAL write (or query), and response assembly |

8.2 Advanced Query Endpoints

Field	Value
Priority	High
Why it matters	Specialized query endpoints serve distinct use cases. If routing is wrong, queries silently fall through to the wrong handler.
Key files	`crates/stemedb-api/src/handlers/skeptic.rs`, `crates/stemedb-api/src/handlers/layered.rs`, `crates/stemedb-api/src/handlers/constraints.rs`
What to verify

Endpoint	Method	Handler	Verify
`/v1/skeptic`	GET	`skeptic.rs`	Returns ConflictAnalysis with entropy-based scoring
`/v1/layered`	GET	`layered.rs`	Returns LayeredQueryResponse with per-tier resolution
`/v1/constraints`	GET	`constraints.rs`	Returns ConstraintsResponse with must_use/forbidden/prefer

| Command | /investigate advanced query endpoints — verify each endpoint returns correct response type and that LensDto redirects work |

8.3 Admin Endpoints

Field	Value
Priority	Medium
Why it matters	Admin endpoints control trust decay, gold standards, and escalations. If access control is missing, any agent can manipulate trust.
Key files	`crates/stemedb-api/src/handlers/admin.rs`, `crates/stemedb-api/src/handlers/gold_standard.rs`, `crates/stemedb-api/src/handlers/escalation.rs`
What to verify

Endpoint	Method	Handler	Verify
`/v1/admin/decay-trust-ranks`	POST	`admin.rs`	Batch trust decay with configurable params
`/v1/admin/gold-standards`	POST/GET/DELETE	`gold_standard.rs`	Gold standard CRUD
`/v1/admin/verify-agent`	POST	`gold_standard.rs`	Agent verification against gold standard
`/v1/admin/escalations`	GET	`escalation.rs`	Query escalation events
`/v1/admin/escalations/{id}/resolve`	POST	`escalation.rs`	Resolve escalation

| Command | /investigate admin endpoints — verify each endpoint works, and note whether any access control exists (or is missing) |

8.4 Provenance & Audit Endpoints

Field	Value
Priority	Medium
Why it matters	Audit trail and source provenance are compliance requirements. If broken, query decisions are untraceable.
Key files	`crates/stemedb-api/src/handlers/source.rs`, `crates/stemedb-api/src/handlers/audit.rs`
What to verify

Endpoint	Method	Handler	Verify
`/v1/source`	POST	`source.rs`	Store source document by BLAKE3 hash
`/v1/provenance/{hash}`	GET	`source.rs`	Retrieve source document
`/v1/audit/queries`	GET	`audit.rs`	Query audit history by agent
`/v1/audit/query/{id}`	GET	`audit.rs`	Full reasoning trace for single query

| Command | /trace-feature audit trail — trace from query execution through QueryAudit creation, storage at AUD: key, and retrieval via audit endpoints |

8.5 Quota Meter (TAN)

Field	Value
Priority	Medium
Why it matters	Economic throttling prevents abuse. If the meter doesn't deduct correctly or the middleware doesn't enforce, agents can write unlimited data.
Key files	`crates/stemedb-api/src/middleware/meter.rs`, `crates/stemedb-api/src/handlers/meter.rs`, `crates/stemedb-storage/src/quota_store/`
What to verify	Token Bucket algorithm with per-agent per-hour quotas. Cost model: Assert=10, Vote=1, Query=5+lens, +1/KB payload. `MeterLayer` tower middleware deducts on every request. `GET /v1/meter/quota` returns remaining. `POST /v1/meter/quota/limit` sets custom limits.
Command	`/trace-feature quota meter — trace from HTTP request through MeterLayer middleware, cost calculation, QuotaStore deduction, and quota check endpoint`

8.6 OpenAPI / Swagger

Field	Value
Priority	Low
Why it matters	Developer experience. If OpenAPI spec doesn't match actual endpoints, SDK generation produces wrong clients.
Key files	`crates/stemedb-api/src/lib.rs` (utoipa annotations)
What to verify	`GET /swagger-ui` serves interactive docs. All endpoints annotated with utoipa. DTOs have proper schema annotations. Endpoint list in OpenAPI matches actual routes.
Command	`/investigate OpenAPI spec — verify all endpoints are annotated, DTO schemas are correct, and swagger-ui is served`

8.7 Vote Provenance Witness

Field	Value
Priority	Medium
Why it matters	Votes with provenance (source_url, observed_context) are cryptographic witnesses. Without validation, votes carry no evidentiary weight.
Key files	`crates/stemedb-core/src/types/voting.rs`, `crates/stemedb-api/src/handlers/vote.rs`, `crates/stemedb-api/src/dto/create.rs`
What to verify	`source_url` max 2048 chars (non-empty if provided). `observed_context` max 64KB. Backward compatible (existing votes without provenance remain valid). API DTOs serialize/deserialize both fields.
Command	`/investigate vote provenance — verify input validation limits, backward compatibility, and API roundtrip for source_url and observed_context`

9. Storage Engine

9.1 HybridStore Routing

Field	Value
Priority	Critical
Why it matters	Every KV operation flows through HybridStore. Wrong routing sends write-heavy data to the read-optimized backend (or vice versa), causing performance degradation or correctness issues.
Key files	`crates/stemedb-storage/src/hybrid_backend.rs`, `crates/stemedb-storage/src/fjall_backend.rs`, `crates/stemedb-storage/src/redb_backend.rs`
What to verify	Prefix-based dispatch: fjall (LSM) for write-heavy (`H:`, `V:`, `VC:`, `VW:`, `E:`, `SUPERSEDED:`, `__CURSOR__:`), redb (B-tree) for read-heavy (`S:`, `SP:`, `MV:`, `TR:`, `QA:`, `QT:`, `TP:`, `GS:`, `ESC:`). All KVStore trait methods dispatched correctly. No key falls through unmatched.
Command	`/trace-feature HybridStore routing — trace prefix dispatch logic, verify every key prefix is routed, and confirm no unmatched-prefix fallthrough`

9.2 FjallStore Backend

Field	Value
Priority	High
Why it matters	Write-heavy paths depend on fjall. If atomic operations (increment, CAS) don't work correctly under concurrency, vote counts and cursors corrupt.
Key files	`crates/stemedb-storage/src/fjall_backend.rs`
What to verify	All KVStore trait methods implemented. DashMap per-key locks for atomics. ACID transactions. Error mapping to `StorageError::Backend`.
Command	`/investigate FjallStore — verify atomic operations under concurrent access, DashMap locking, and error mapping`

9.3 RedbStore Backend

Field	Value
Priority	High
Why it matters	Read-heavy paths depend on redb. If ACID transactions don't commit correctly, materialized views and indexes corrupt.
Key files	`crates/stemedb-storage/src/redb_backend.rs`
What to verify	All KVStore trait methods implemented. ACID transactions for writes. Prefix scan via range queries. Error mapping to `StorageError::Backend`.
Command	`/investigate RedbStore — verify ACID transactions, prefix scan correctness, and error mapping`

9.4 Key Codec & Subject Co-location

Field	Value
Priority	High
Why it matters	Key codec is the foundation for distributed sharding. If keys aren't co-located by subject, range sharding (Phase 6) will split related data across nodes.
Key files	`crates/stemedb-storage/src/key_codec/mod.rs`, `crates/stemedb-storage/src/key_codec/tests.rs`
What to verify	40+ key builder functions. Subject-prefixed keys use `{subject}\x00` separator. Global keys use `\x00` prefix (sort first). Subject validation. Zero hardcoded key patterns in store files (all use key_codec). `test_subject_colocation` and `test_global_keys_sort_first` pass.
Command	`/investigate key codec — verify subject co-location layout, \x00 separator/prefix usage, and that all 91+ call sites use key_codec (no hardcoded patterns)`

9.5 StorageError Generalization

Field	Value
Priority	Low
Why it matters	Error type was generalized from `Sled` to `Backend(String)`. If any code still references the old variant, it won't compile (but worth confirming).
Key files	`crates/stemedb-storage/src/error.rs`
What to verify	`StorageError::Backend(String)` exists. No references to `StorageError::Sled`. Both fjall and redb map their errors correctly.
Command	`/investigate StorageError — verify Backend variant exists, no Sled references remain, and both backends map errors correctly`

10. Cross-Cutting Concerns

10.1 No-Unwrap Enforcement

Field	Value
Priority	Critical
Why it matters	`unwrap()` and `expect()` in production code cause panics. CI enforces at deny level. A single slip crashes the server.
Key files	All `crates/stemedb-/src//.rs`
What to verify	`clippy::unwrap_used` and `clippy::expect_used` at deny level in workspace Cargo.toml or clippy.toml. No `unwrap()` or `expect()` in production code (test code is allowed). CI runs `cargo clippy -- -D warnings`.
Command	`/investigate no-unwrap enforcement — verify clippy config, scan for any unwrap/expect in production code, and confirm CI enforcement`

10.2 Structured Logging

Field	Value
Priority	Medium
Why it matters	`println!`/`eprintln!` bypass structured logging. Without `tracing`, production debugging is impossible.
Key files	All `crates/stemedb-/src//.rs`
What to verify	`tracing` used everywhere (info!, warn!, error!, debug!). `clippy::print_stdout`/`print_stderr` at warn level. `#[instrument]` on public methods in WAL, storage, ingestion, and lens code. `stemedb-sim` may use `#![allow()]` for CLI output.
Command	`/investigate structured logging — verify tracing usage, clippy print enforcement, and #[instrument] on critical public methods`

10.3 rkyv Zero-Copy Serialization

Field	Value
Priority	High
Why it matters	All data goes through rkyv. If raw `AllocSerializer` is used instead of the wrapper, serialization may miss fields or produce incompatible formats.
Key files	`crates/stemedb-core/src/serde.rs`
What to verify	`stemedb_core::serde::{serialize, deserialize}` wrapper functions exist. No raw `AllocSerializer` in production code. Roundtrip tests for all core types (Assertion, Vote, Epoch, MaterializedView, ChangeEntry, etc.).
Command	`/investigate rkyv serialization — verify wrapper usage, scan for raw AllocSerializer, and confirm roundtrip tests for all core types`

10.4 Go SDK (steme)

Field	Value
Priority	Medium
Why it matters	The Go SDK is the primary client integration. If it's out of sync with the API, external consumers get errors.
Key files	`sdk/go/steme/client.go`, `sdk/go/steme/assertion.go`, `sdk/go/steme/query.go`, `sdk/go/steme/signer.go`, `sdk/go/steme/types.go`, `sdk/go/steme/errors.go`
What to verify	HTTP client covers all endpoints. Ed25519 signing matches server verification. Fluent builder pattern for assertions and queries. Error types match API error responses. Integration test exists. Types match latest API DTOs.
Command	`/trace-feature Go SDK — trace from client.Assert() through HTTP request construction, Ed25519 signing, and response parsing. Verify all API endpoints have SDK methods.`

10.5 Go ADK Integration

Field	Value
Priority	Medium
Why it matters	ADK-Go tools let AI agents interact with Episteme. If tool definitions are wrong, agents can't use the database.
Key files	`sdk/go/adk/tools.go`, `sdk/go/adk/callbacks.go`, `sdk/go/adk/config.go`, `sdk/go/adk/types.go`
What to verify	Tool definitions match Episteme API capabilities. Callbacks wire correctly. Config supports endpoint and auth. Types match API DTOs.
Command	`/investigate ADK-Go integration — verify tool definitions, callback wiring, and type alignment with latest API`

10.6 SourceClass Enum & Evidence Hierarchy

Field	Value
Priority	Medium
Why it matters	The 6-tier evidence hierarchy (Regulatory through Anecdotal) drives decay rates, authority weights, and layered consensus. If tiers are mis-numbered, the entire evidence model is inverted.
Key files	`crates/stemedb-core/src/types/source.rs`
What to verify	6 tiers: Regulatory(0), Clinical(1), Observational(2), Expert(3), Community(4), Anecdotal(5). `tier()` returns correct ordinal. `default_decay_days()` returns tier-specific values. `authority_weight()` returns tier-specific weights. Serialization preserves tier identity.
Command	`/investigate SourceClass — verify tier numbering, decay days, authority weights, and serialization roundtrip`

10.7 Simulation Pipeline

Field	Value
Priority	Low
Why it matters	The simulator tests the full pipeline under synthetic load. If it doesn't exercise all features, it provides false confidence.
Key files	`crates/stemedb-sim/src/runner.rs`, `crates/stemedb-sim/src/agent.rs`, `crates/stemedb-sim/src/strategy.rs`
What to verify	Runner exercises write path (assertions, votes, epochs). Agent strategies produce realistic data patterns. Results can be queried through standard query path.
Command	`/investigate simulation pipeline — verify runner exercises assertions/votes/epochs, agent strategies are diverse, and output is queryable`

Summary

Section	Entries	Critical	High	Medium	Low
1. Write Path	8	3	3	1	0
2. Read Path	5	1	4	0	0
3. Lens Resolution	9	0	6	3	0
4. Trust & Safety	6	0	3	3	0
5. Search & Similarity	3	0	2	1	0
6. Time & Decay	5	0	3	1	0
7. Source Provenance	3	0	0	2	1
8. API & Integration	7	1	1	4	1
9. Storage Engine	5	1	3	0	1
10. Cross-Cutting	7	1	1	4	1
Total	58	7	26	19	4

*Section 6 includes 1 entry at High that spans as_of+epoch interactions

Crate Coverage

Crate	Entries
`stemedb-wal`	1.1, 1.2, 1.3, 1.4
`stemedb-ingest`	1.5, 1.6, 1.7, 1.8
`stemedb-core`	7.3, 10.3, 10.6
`stemedb-storage`	5.1, 5.2, 9.1, 9.2, 9.3, 9.4, 9.5, 4.1, 4.2, 4.3, 7.2
`stemedb-query`	2.1-2.5, 5.3, 6.1-6.4
`stemedb-lens`	3.1-3.9, 4.4, 6.5
`stemedb-api`	8.1-8.7
`stemedb-sim`	10.7
Go SDK (`sdk/go/steme`)	10.4
Go ADK (`sdk/go/adk`)	10.5

Command Index

Type	Count
`/investigate`	38
`/trace-feature`	18
Total	56

45 KiB Raw Blame History