# UAT: Visual Anchoring (pHash Validation) **Date:** YYYY-MM-DD **Feature:** Perceptual Hash Provenance **Status:** [ ] PASS / [ ] FAIL / [ ] BLOCKED ## Scenario An OCR-extracted claim from a PDF table needs validation against the original visual. The perceptual hash (pHash) of the source image allows: 1. Detecting if the source has been tampered with 2. Fuzzy-matching similar screenshots 3. Provenance tracking to original visual evidence ## Acceptance Criteria | Criterion | Expected | Met? | |-----------|----------|------| | Assertion stored with pHash | visual_hash populated | [ ] | | Same image = same pHash | Hamming distance = 0 | [ ] | | Similar image = close pHash | Hamming distance < 10 | [ ] | | Different image = far pHash | Hamming distance > 20 | [ ] | | Query by pHash similarity | Returns matching assertions | [ ] | ## Test Matrix | Step | Action | Expected | Actual | Status | |------|--------|----------|--------|--------| | 1 | Ingest assertion with pHash | Hash returned | | [ ] | | 2 | Query by exact pHash | Assertion returned | | [ ] | | 3 | Query by similar pHash | Assertion returned (fuzzy) | | [ ] | | 4 | Query by different pHash | No match | | [ ] | ## pHash Background Perceptual hashing creates a fingerprint of visual content that: - Survives JPEG compression - Survives minor cropping/resizing - Distinguishes semantically different images We use an 8-byte (64-bit) pHash. Hamming distance measures similarity: - 0 = identical - < 10 = visually similar - > 20 = different images ## Setup Commands ```bash # Start StemeDB cargo run --bin stemedb-api & sleep 2 ``` ## Test Commands ### Step 1: Ingest Assertion with Visual Hash ```bash # pHash of a hypothetical FDA label table screenshot # In real usage, this would be computed from the actual image PHASH_HEX="a1b2c3d4e5f60718" curl -X POST http://localhost:18180/v1/assertions \ -H "Content-Type: application/json" \ -d "{ \"subject\": \"Semaglutide\", \"predicate\": \"adverse_event_rate\", \"object\": {\"Number\": 0.043}, \"confidence\": 0.98, \"source_class\": \"Regulatory\", \"visual_hash\": \"$PHASH_HEX\" }" ``` **Expected:** Hash returned **Actual:** **Status:** [ ] ### Step 2: Query by Exact pHash ```bash curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60718" ``` **Expected:** Returns the assertion from Step 1 **Actual:** **Status:** [ ] ### Step 3: Query by Similar pHash (Hamming distance < 10) ```bash # Slightly different pHash (2 bits flipped) curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60719&phash_threshold=10" ``` **Expected:** Returns the assertion (fuzzy match) **Actual:** **Status:** [ ] ### Step 4: Query by Different pHash (Hamming distance > 20) ```bash # Completely different pHash curl "http://localhost:18180/v1/query?visual_hash=1234567890abcdef&phash_threshold=10" ``` **Expected:** No results (too different) **Actual:** **Status:** [ ] ## Sign-Off Checklist - [ ] visual_hash field stored in assertion - [ ] Exact pHash match works - [ ] Fuzzy pHash match within threshold works - [ ] Different pHash correctly excluded - [ ] pHash indexed for efficient lookup ## Notes *pHash computation happens client-side (during extraction). StemeDB stores and indexes the hash but doesn't compute it.* --- **Tester:** **Date:** **Result:**