jordan 8f6506b70a feat: Aphoria scan modes + stemedb-ontology crate + consumer health UAT

Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation

Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 21:57:33 -07:00

3.3 KiB

Raw Blame History

UAT: Visual Anchoring (pHash Validation)

Date: YYYY-MM-DD Feature: Perceptual Hash Provenance Status: [ ] PASS / [ ] FAIL / [ ] BLOCKED

Scenario

An OCR-extracted claim from a PDF table needs validation against the original visual. The perceptual hash (pHash) of the source image allows:

Detecting if the source has been tampered with
Fuzzy-matching similar screenshots
Provenance tracking to original visual evidence

Acceptance Criteria

Criterion	Expected	Met?
Assertion stored with pHash	visual_hash populated	[ ]
Same image = same pHash	Hamming distance = 0	[ ]
Similar image = close pHash	Hamming distance < 10	[ ]
Different image = far pHash	Hamming distance > 20	[ ]
Query by pHash similarity	Returns matching assertions	[ ]

Test Matrix

Step	Action	Expected	Status
1	Ingest assertion with pHash	Hash returned	[ ]
2	Query by exact pHash	Assertion returned	[ ]
3	Query by similar pHash	Assertion returned (fuzzy)	[ ]
4	Query by different pHash	No match	[ ]

pHash Background

Perceptual hashing creates a fingerprint of visual content that:

Survives JPEG compression
Survives minor cropping/resizing
Distinguishes semantically different images

We use an 8-byte (64-bit) pHash. Hamming distance measures similarity:

0 = identical
< 10 = visually similar
20 = different images

Setup Commands

# Start StemeDB
cargo run --bin stemedb-api &
sleep 2

Test Commands

Step 1: Ingest Assertion with Visual Hash

# pHash of a hypothetical FDA label table screenshot
# In real usage, this would be computed from the actual image
PHASH_HEX="a1b2c3d4e5f60718"

curl -X POST http://localhost:18180/v1/assertions \
  -H "Content-Type: application/json" \
  -d "{
    \"subject\": \"Semaglutide\",
    \"predicate\": \"adverse_event_rate\",
    \"object\": {\"Number\": 0.043},
    \"confidence\": 0.98,
    \"source_class\": \"Regulatory\",
    \"visual_hash\": \"$PHASH_HEX\"
  }"

Expected: Hash returned Actual: Status: [ ]

Step 2: Query by Exact pHash

curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60718"

Expected: Returns the assertion from Step 1 Actual: Status: [ ]

Step 3: Query by Similar pHash (Hamming distance < 10)

# Slightly different pHash (2 bits flipped)
curl "http://localhost:18180/v1/query?visual_hash=a1b2c3d4e5f60719&phash_threshold=10"

Expected: Returns the assertion (fuzzy match) Actual: Status: [ ]

Step 4: Query by Different pHash (Hamming distance > 20)

# Completely different pHash
curl "http://localhost:18180/v1/query?visual_hash=1234567890abcdef&phash_threshold=10"

Expected: No results (too different) Actual: Status: [ ]

Sign-Off Checklist

visual_hash field stored in assertion
Exact pHash match works
Fuzzy pHash match within threshold works
Different pHash correctly excluded
pHash indexed for efficient lookup

Notes

pHash computation happens client-side (during extraction). StemeDB stores and indexes the hash but doesn't compute it.

Tester: Date: Result:

3.3 KiB Raw Blame History