Major additions: - Staged scanning modes (working tree, staged, committed) with git integration - Drift detection for baseline vs current state comparisons - Hosted API handlers for policy CRUD operations via StemeDB API - stemedb-ontology crate with domain definitions and medical extractors - Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.) - Aphoria development skill documentation Code organization: - Split large files into focused modules to stay under 500-line limit - Extracted config tests, episteme helpers/drift/aliases, API helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.3 KiB
UAT: Visual Anchoring (pHash Validation)
Date: YYYY-MM-DD Feature: Perceptual Hash Provenance Status: [ ] PASS / [ ] FAIL / [ ] BLOCKED
Scenario
An OCR-extracted claim from a PDF table needs validation against the original visual. The perceptual hash (pHash) of the source image allows:
- Detecting if the source has been tampered with
- Fuzzy-matching similar screenshots
- Provenance tracking to original visual evidence
Acceptance Criteria
| Criterion | Expected | Met? |
|---|---|---|
| Assertion stored with pHash | visual_hash populated | [ ] |
| Same image = same pHash | Hamming distance = 0 | [ ] |
| Similar image = close pHash | Hamming distance < 10 | [ ] |
| Different image = far pHash | Hamming distance > 20 | [ ] |
| Query by pHash similarity | Returns matching assertions | [ ] |
Test Matrix
| Step | Action | Expected | Actual | Status |
|---|---|---|---|---|
| 1 | Ingest assertion with pHash | Hash returned | [ ] | |
| 2 | Query by exact pHash | Assertion returned | [ ] | |
| 3 | Query by similar pHash | Assertion returned (fuzzy) | [ ] | |
| 4 | Query by different pHash | No match | [ ] |
pHash Background
Perceptual hashing creates a fingerprint of visual content that:
- Survives JPEG compression
- Survives minor cropping/resizing
- Distinguishes semantically different images
We use an 8-byte (64-bit) pHash. Hamming distance measures similarity:
- 0 = identical
- < 10 = visually similar
-
20 = different images
Setup Commands
# Start StemeDB
cargo run --bin stemedb-api &
sleep 2
Test Commands
Step 1: Ingest Assertion with Visual Hash
# pHash of a hypothetical FDA label table screenshot
# In real usage, this would be computed from the actual image
PHASH_HEX="a1b2c3d4e5f60718"
curl -X POST http://localhost:18180/v1/assertions \
-H "Content-Type: application/json" \
-d "{
\"subject\": \"Semaglutide\",
\"predicate\": \"adverse_event_rate\",
\"object\": {\"Number\": 0.043},
\"confidence\": 0.98,
\"source_class\": \"Regulatory\",
\"visual_hash\": \"$PHASH_HEX\"
}"
Expected: Hash returned Actual: Status: [ ]
Step 2: Query by Exact pHash
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60718"
Expected: Returns the assertion from Step 1 Actual: Status: [ ]
Step 3: Query by Similar pHash (Hamming distance < 10)
# Slightly different pHash (2 bits flipped)
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60719&phash_threshold=10"
Expected: Returns the assertion (fuzzy match) Actual: Status: [ ]
Step 4: Query by Different pHash (Hamming distance > 20)
# Completely different pHash
curl "http://localhost:18180/v1/query?visual_hash=1234567890abcdef&phash_threshold=10"
Expected: No results (too different) Actual: Status: [ ]
Sign-Off Checklist
- visual_hash field stored in assertion
- Exact pHash match works
- Fuzzy pHash match within threshold works
- Different pHash correctly excluded
- pHash indexed for efficient lookup
Notes
pHash computation happens client-side (during extraction). StemeDB stores and indexes the hash but doesn't compute it.
Tester: Date: Result: