stemedb/uat/consumer-health/glp1-visual-anchoring.md
jordan 8f6506b70a feat: Aphoria scan modes + stemedb-ontology crate + consumer health UAT
Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation

Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:57:33 -07:00

3.3 KiB

UAT: Visual Anchoring (pHash Validation)

Date: YYYY-MM-DD Feature: Perceptual Hash Provenance Status: [ ] PASS / [ ] FAIL / [ ] BLOCKED

Scenario

An OCR-extracted claim from a PDF table needs validation against the original visual. The perceptual hash (pHash) of the source image allows:

  1. Detecting if the source has been tampered with
  2. Fuzzy-matching similar screenshots
  3. Provenance tracking to original visual evidence

Acceptance Criteria

Criterion Expected Met?
Assertion stored with pHash visual_hash populated [ ]
Same image = same pHash Hamming distance = 0 [ ]
Similar image = close pHash Hamming distance < 10 [ ]
Different image = far pHash Hamming distance > 20 [ ]
Query by pHash similarity Returns matching assertions [ ]

Test Matrix

Step Action Expected Actual Status
1 Ingest assertion with pHash Hash returned [ ]
2 Query by exact pHash Assertion returned [ ]
3 Query by similar pHash Assertion returned (fuzzy) [ ]
4 Query by different pHash No match [ ]

pHash Background

Perceptual hashing creates a fingerprint of visual content that:

  • Survives JPEG compression
  • Survives minor cropping/resizing
  • Distinguishes semantically different images

We use an 8-byte (64-bit) pHash. Hamming distance measures similarity:

  • 0 = identical
  • < 10 = visually similar
  • 20 = different images

Setup Commands

# Start StemeDB
cargo run --bin stemedb-api &
sleep 2

Test Commands

Step 1: Ingest Assertion with Visual Hash

# pHash of a hypothetical FDA label table screenshot
# In real usage, this would be computed from the actual image
PHASH_HEX="a1b2c3d4e5f60718"

curl -X POST http://localhost:18180/v1/assertions \
  -H "Content-Type: application/json" \
  -d "{
    \"subject\": \"Semaglutide\",
    \"predicate\": \"adverse_event_rate\",
    \"object\": {\"Number\": 0.043},
    \"confidence\": 0.98,
    \"source_class\": \"Regulatory\",
    \"visual_hash\": \"$PHASH_HEX\"
  }"

Expected: Hash returned Actual: Status: [ ]

Step 2: Query by Exact pHash

curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60718"

Expected: Returns the assertion from Step 1 Actual: Status: [ ]

Step 3: Query by Similar pHash (Hamming distance < 10)

# Slightly different pHash (2 bits flipped)
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60719&phash_threshold=10"

Expected: Returns the assertion (fuzzy match) Actual: Status: [ ]

Step 4: Query by Different pHash (Hamming distance > 20)

# Completely different pHash
curl "http://localhost:18180/v1/query?visual_hash=1234567890abcdef&phash_threshold=10"

Expected: No results (too different) Actual: Status: [ ]

Sign-Off Checklist

  • visual_hash field stored in assertion
  • Exact pHash match works
  • Fuzzy pHash match within threshold works
  • Different pHash correctly excluded
  • pHash indexed for efficient lookup

Notes

pHash computation happens client-side (during extraction). StemeDB stores and indexes the hash but doesn't compute it.


Tester: Date: Result: