From 152df4b0b47dfc5ff14e51eb0c360e6fe25e218e Mon Sep 17 00:00:00 2001 From: jordan Date: Sun, 1 Feb 2026 13:33:03 -0700 Subject: [PATCH] docs: Mark Phase 2.4, 2.5, 2.6 as complete in roadmap - 2.4 Visual Hash Query: hamming_distance, visual_near/threshold implemented - 2.5 Vector Field: N/A (Phase 3 work, scaffolding correct) - 2.6 E2E Integration Test: e2e_pipeline.rs with 5 comprehensive tests Co-Authored-By: Claude Opus 4.5 --- crates/stemedb-core/src/lib.rs | 1 + roadmap.md | 56 +++++++++++++++------------------- sdk/go/steme/client.go | 2 ++ 3 files changed, 28 insertions(+), 31 deletions(-) diff --git a/crates/stemedb-core/src/lib.rs b/crates/stemedb-core/src/lib.rs index 8ba0723..e66ab3e 100644 --- a/crates/stemedb-core/src/lib.rs +++ b/crates/stemedb-core/src/lib.rs @@ -253,3 +253,4 @@ mod tests { assert_eq!(deserialized.weight, 0.8); } } +// test hook diff --git a/roadmap.md b/roadmap.md index 73d93c3..e140d7b 100644 --- a/roadmap.md +++ b/roadmap.md @@ -137,41 +137,35 @@ - [x] Documentation updated in `ai-lookup/services/lens.md` - **Known Limitation:** Filtering only occurs when assertions from the superseding epoch are present in candidates. If all candidates are from old epoch (no new epoch assertions), they pass through (fail-open behavior). -- [ ] **2.4 Visual Hash Query Support**: Make the stored `visual_hash` queryable. - - **Problem:** `visual_hash: Option` exists on `Assertion` and is stored/returned by the API, but there is no way to query by visual similarity. The field is write-only from a query perspective. - - **Current state:** `PHash` is `[u8; 8]` (perceptual hash). Stored on assertions. API accepts/returns it. No query parameter. No similarity computation. - - [ ] Add `visual_near: Option` (hex-encoded pHash) and `visual_threshold: Option` (max hamming distance, default: 8) to `Query` struct. - - [ ] Add `.visual_near(hash, threshold)` to `QueryBuilder`. - - [ ] In `Query::matches()`: if `visual_near` is set and assertion has `visual_hash`, compute hamming distance (XOR + popcount on the 8 bytes). If distance <= threshold, match. If assertion has no `visual_hash`, don't match. - - [ ] Add `visual_near` and `visual_threshold` to API `QueryParams` DTO. - - [ ] Implement `hamming_distance(a: &[u8; 8], b: &[u8; 8]) -> u32` utility function. - - [ ] Tests: - - [ ] `test_visual_near_exact_match`: Same hash, threshold 0. Matches. - - [ ] `test_visual_near_within_threshold`: Hashes differ by 3 bits, threshold 5. Matches. - - [ ] `test_visual_near_exceeds_threshold`: Hashes differ by 10 bits, threshold 5. No match. - - [ ] `test_visual_near_skips_assertions_without_hash`: Assertion has no visual_hash. Not matched. - - [ ] **Note:** This is a brute-force scan approach (O(N) with hamming distance check). A proper VP-tree or BK-tree index is Phase 3. This gives immediate queryability. +- [x] **2.4 Visual Hash Query Support**: Make the stored `visual_hash` queryable. + - **Status:** ✅ COMPLETE + - **Implementation:** + - [x] `hamming_distance(a: &PHash, b: &PHash) -> u32` in `crates/stemedb-query/src/query.rs` (lines 26-28) + - [x] `visual_near: Option` and `visual_threshold: Option` in `Query` struct (lines 84-90) + - [x] `.visual_near(hash, threshold)` builder method + - [x] `Query::matches()` computes hamming distance when `visual_near` is set + - [x] API `QueryParams` DTO has `visual_near` and `visual_threshold` + - [x] 10+ tests: exact_match, within_threshold, exceeds_threshold, skips_without_hash, invalid_hex, wrong_length, combines_with_subject, default_threshold, max_threshold, threshold_63_rejects + - **Note:** Brute-force O(N) scan. VP-tree/BK-tree index is Phase 3+. -- [ ] **2.5 Vector Field**: No changes needed. Already roadmapped for Phase 3. +- [x] **2.5 Vector Field**: No changes needed. Already roadmapped for Phase 3. + - **Status:** ✅ N/A (No Phase 2 work required) - **Current state:** `vector: Option>` on `Assertion`. Stored and returned by API. No index, no search. - **Phase 3 plan:** Integrate `hnsw-rs` or `lance` for k-NN search. - - **No Phase 2.5 work required.** The field scaffolding is correct. -- [ ] **2.6 E2E Integration Test (Write -> Materialize -> Read)**: Prove the full pipeline works end-to-end. - - **Problem:** The IngestWorker, Materializer, and QueryEngine have been tested in isolation. The Notify integration between IngestWorker and Materializer is tested with a single notification. No test wires all three components together to verify the full write-materialize-read loop. - - **Current state:** IngestWorker has `with_notify(Arc)` (`worker.rs`). Materializer has `run_notified(Arc, Duration)` (`materializer.rs`). QueryEngine has `try_fast_path()` (`engine.rs`). Never tested together. - - [ ] Create integration test in `crates/stemedb-query/tests/e2e_pipeline.rs`: - - [ ] Setup: Create temp WAL + SledStore + VoteStore + TrustRankStore. - - [ ] Wire: IngestWorker with `with_notify(notify)`, Materializer with `run_notified(notify)`, QueryEngine with same store. - - [ ] Test steps: - 1. Write a signed assertion to WAL. - 2. Run IngestWorker.step() -> verifies assertion ingested to KV. - 3. Verify IngestWorker triggered Notify. - 4. Run Materializer.step() -> verifies MV written. - 5. Execute QueryEngine.execute() with subject+predicate -> verifies fast-path returns MV winner. - - [ ] Variant: Write 2 competing assertions, add votes favoring one, run pipeline, verify correct winner in MV. - - [ ] Variant: Write assertion, materialize, write new assertion with higher timestamp, re-materialize, verify MV updated, verify query returns new winner. - - [ ] Add `stemedb-wal` and `stemedb-ingest` as dev-dependencies in `stemedb-query/Cargo.toml`. +- [x] **2.6 E2E Integration Test (Write -> Materialize -> Read)**: Prove the full pipeline works end-to-end. + - **Status:** ✅ COMPLETE + - **Implementation:** + - [x] `crates/stemedb-query/tests/e2e_pipeline.rs` with 5 comprehensive tests: + - `test_e2e_write_materialize_read` - Basic happy path + - `test_e2e_vote_consensus` - Vote-weighted resolution + - `test_e2e_update_winner` - Winner changes on re-materialize + - `test_e2e_cursor_persistence` - Cursor survives worker restart + - `test_e2e_notify_integration` - Event-driven notification channel + - [x] `stemedb-wal` and `stemedb-ingest` added as dev-dependencies + - [x] Helper functions: `create_signed_assertion()`, `compute_assertion_hash()`, `create_vote()` + - [x] Uses Ed25519 signing for authentic signature verification + - [x] Also: `crates/stemedb-api/tests/e2e_flow_test.rs` tests the HTTP API layer end-to-end. ### Phase 3: The Pilot (BioTech/Pharma) *Goal: Prove value in the "High-Liability" beachhead. Close every Camp 4 gap that blocks a credible demo.* diff --git a/sdk/go/steme/client.go b/sdk/go/steme/client.go index 257f4f7..81ae9ec 100644 --- a/sdk/go/steme/client.go +++ b/sdk/go/steme/client.go @@ -394,3 +394,5 @@ func (c *Client) doJSON(ctx context.Context, method, path string, body any, resu return nil } + +// test hook