Major additions: - Staged scanning modes (working tree, staged, committed) with git integration - Drift detection for baseline vs current state comparisons - Hosted API handlers for policy CRUD operations via StemeDB API - stemedb-ontology crate with domain definitions and medical extractors - Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.) - Aphoria development skill documentation Code organization: - Split large files into focused modules to stay under 500-line limit - Extracted config tests, episteme helpers/drift/aliases, API helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
127 lines
3.3 KiB
Markdown
127 lines
3.3 KiB
Markdown
# UAT: Visual Anchoring (pHash Validation)
|
|
|
|
**Date:** YYYY-MM-DD
|
|
**Feature:** Perceptual Hash Provenance
|
|
**Status:** [ ] PASS / [ ] FAIL / [ ] BLOCKED
|
|
|
|
## Scenario
|
|
|
|
An OCR-extracted claim from a PDF table needs validation against the original visual. The perceptual hash (pHash) of the source image allows:
|
|
1. Detecting if the source has been tampered with
|
|
2. Fuzzy-matching similar screenshots
|
|
3. Provenance tracking to original visual evidence
|
|
|
|
## Acceptance Criteria
|
|
|
|
| Criterion | Expected | Met? |
|
|
|-----------|----------|------|
|
|
| Assertion stored with pHash | visual_hash populated | [ ] |
|
|
| Same image = same pHash | Hamming distance = 0 | [ ] |
|
|
| Similar image = close pHash | Hamming distance < 10 | [ ] |
|
|
| Different image = far pHash | Hamming distance > 20 | [ ] |
|
|
| Query by pHash similarity | Returns matching assertions | [ ] |
|
|
|
|
## Test Matrix
|
|
|
|
| Step | Action | Expected | Actual | Status |
|
|
|------|--------|----------|--------|--------|
|
|
| 1 | Ingest assertion with pHash | Hash returned | | [ ] |
|
|
| 2 | Query by exact pHash | Assertion returned | | [ ] |
|
|
| 3 | Query by similar pHash | Assertion returned (fuzzy) | | [ ] |
|
|
| 4 | Query by different pHash | No match | | [ ] |
|
|
|
|
## pHash Background
|
|
|
|
Perceptual hashing creates a fingerprint of visual content that:
|
|
- Survives JPEG compression
|
|
- Survives minor cropping/resizing
|
|
- Distinguishes semantically different images
|
|
|
|
We use an 8-byte (64-bit) pHash. Hamming distance measures similarity:
|
|
- 0 = identical
|
|
- < 10 = visually similar
|
|
- > 20 = different images
|
|
|
|
## Setup Commands
|
|
|
|
```bash
|
|
# Start StemeDB
|
|
cargo run --bin stemedb-api &
|
|
sleep 2
|
|
```
|
|
|
|
## Test Commands
|
|
|
|
### Step 1: Ingest Assertion with Visual Hash
|
|
|
|
```bash
|
|
# pHash of a hypothetical FDA label table screenshot
|
|
# In real usage, this would be computed from the actual image
|
|
PHASH_HEX="a1b2c3d4e5f60718"
|
|
|
|
curl -X POST http://localhost:18180/v1/assertions \
|
|
-H "Content-Type: application/json" \
|
|
-d "{
|
|
\"subject\": \"Semaglutide\",
|
|
\"predicate\": \"adverse_event_rate\",
|
|
\"object\": {\"Number\": 0.043},
|
|
\"confidence\": 0.98,
|
|
\"source_class\": \"Regulatory\",
|
|
\"visual_hash\": \"$PHASH_HEX\"
|
|
}"
|
|
```
|
|
|
|
**Expected:** Hash returned
|
|
**Actual:**
|
|
**Status:** [ ]
|
|
|
|
### Step 2: Query by Exact pHash
|
|
|
|
```bash
|
|
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60718"
|
|
```
|
|
|
|
**Expected:** Returns the assertion from Step 1
|
|
**Actual:**
|
|
**Status:** [ ]
|
|
|
|
### Step 3: Query by Similar pHash (Hamming distance < 10)
|
|
|
|
```bash
|
|
# Slightly different pHash (2 bits flipped)
|
|
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60719&phash_threshold=10"
|
|
```
|
|
|
|
**Expected:** Returns the assertion (fuzzy match)
|
|
**Actual:**
|
|
**Status:** [ ]
|
|
|
|
### Step 4: Query by Different pHash (Hamming distance > 20)
|
|
|
|
```bash
|
|
# Completely different pHash
|
|
curl "http://localhost:18180/v1/query?visual_hash=1234567890abcdef&phash_threshold=10"
|
|
```
|
|
|
|
**Expected:** No results (too different)
|
|
**Actual:**
|
|
**Status:** [ ]
|
|
|
|
## Sign-Off Checklist
|
|
|
|
- [ ] visual_hash field stored in assertion
|
|
- [ ] Exact pHash match works
|
|
- [ ] Fuzzy pHash match within threshold works
|
|
- [ ] Different pHash correctly excluded
|
|
- [ ] pHash indexed for efficient lookup
|
|
|
|
## Notes
|
|
|
|
*pHash computation happens client-side (during extraction). StemeDB stores and indexes the hash but doesn't compute it.*
|
|
|
|
---
|
|
|
|
**Tester:**
|
|
**Date:**
|
|
**Result:**
|