stemedb/uat/consumer-health/WEEK4_EXECUTION_PLAN.md

# Week 4 UAT Execution Plan - Consumer Health

**Date:** 2026-02-05
**Milestone:** stemedb-ontology Week 4 - UAT scenarios documented and verified
**Status:** Infrastructure Ready

## Objective

Validate the four critical Consumer Health UAT scenarios programmatically:
1. GLP-1 Muscle Loss Contradiction (Skeptic Lens)
2. Gastroparesis Multi-Source (Source Hierarchy)
3. Layered Consensus (Per-Tier Positions)
4. Time Travel Query (as_of Snapshot)

## Infrastructure Created

### Integration Test Suite

**Location:** `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-ontology/tests/consumer_health_uat.rs`

**Purpose:** Programmatic validation of UAT scenarios against a running StemeDB API instance.

**Features:**
- HTTP client for API calls
- DTO structures matching API contracts
- Assertion helpers for validation
- Structured test output with pass/fail/skip status
- Environment-aware API URL configuration

### Test Execution

```bash
# Start StemeDB API
cargo run -p stemedb-api &

# Wait for startup
sleep 2

# Run individual scenarios
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_glp1_muscle_loss_contradiction -- --ignored --nocapture

STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_gastroparesis_multi_source -- --ignored --nocapture

STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_layered_consensus -- --ignored --nocapture

# Run all scenarios
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat run_all_uat_scenarios -- --ignored --nocapture
```

## Scenario Details

### 1. GLP-1 Muscle Loss Contradiction

**UAT File:** `glp1-muscle-loss-contradiction.md`
**Test Function:** `uat_glp1_muscle_loss_contradiction()`
**Status:** ✅ Ready to Execute

**What it tests:**
- Two peer-reviewed studies with opposing conclusions coexist
- Skeptic Lens surfaces both claims without averaging
- Conflict score >= 0.5 for binary disagreement
- Status = "Contested"
- Both Boolean values present in claims array

**API Endpoints:**
- `POST /v1/assert` - Create Study A and Study B assertions
- `GET /v1/skeptic?subject=Semaglutide:MuscleMass&predicate=muscle_sparing_effect`

**Expected Outcome:**
```json
{
  "status": "Contested",
  "conflict_score": >= 0.5,
  "claims": [
    {"value": {"Boolean": false}, "weight_share": ~0.51},
    {"value": {"Boolean": true}, "weight_share": ~0.49}
  ],
  "candidates_count": 2
}
```

**Validation Checks:**
- ✅ 2 candidates returned
- ✅ 2 distinct claims
- ✅ Conflict score >= 0.5
- ✅ Status = "Contested"
- ✅ Both true and false values present

---

### 2. Gastroparesis Multi-Source

**UAT File:** `gastroparesis-multi-source.md`
**Test Function:** `uat_gastroparesis_multi_source()`
**Status:** ✅ Ready to Execute

**What it tests:**
- Regulatory source (Tier 0) dominates despite 100x volume of anecdotal (Tier 5)
- Source hierarchy uses tier priority, not just weighted voting
- Layered view shows per-tier breakdown

**API Endpoints:**
- `POST /v1/assert` - Create 1 FDA + 100 Reddit assertions
- `GET /v1/layered?subject=Semaglutide&predicate=gastroparesis_risk`

**Expected Outcome:**
```json
{
  "tiers": [
    {"tier": 0, "source_class": "Regulatory", "candidates_count": 1, ...},
    {"tier": 5, "source_class": "Anecdotal", "candidates_count": 100, ...}
  ],
  "overall_winner": {...},  // From Tier 0
  "total_candidates": 101
}
```

**Validation Checks:**
- ✅ 101 total candidates
- ✅ Tier 0 present with 1 candidate
- ✅ Tier 5 present with 100 candidates
- ✅ Overall winner from Tier 0
- ✅ Tier structure correct

---

### 3. Layered Consensus

**UAT File:** `layered-consensus.md`
**Test Function:** `uat_layered_consensus()`
**Status:** ✅ Ready to Execute

**What it tests:**
- Per-tier breakdown shows all populated tiers
- Within-tier conflict calculated (Tier 1 contested, Tier 5 unanimous)
- Cross-tier conflict calculated
- Overall winner from highest authority tier

**API Endpoints:**
- `POST /v1/assert` - Create 2 Clinical (conflicting) + 50 Anecdotal (unanimous)
- `GET /v1/layered?subject=Semaglutide:BodyComposition&predicate=lean_mass_preserved`

**Expected Outcome:**
```json
{
  "tiers": [
    {
      "tier": 1,
      "source_class": "Clinical",
      "candidates_count": 2,
      "conflict_score": > 0.5  // Contested within tier
    },
    {
      "tier": 5,
      "source_class": "Anecdotal",
      "candidates_count": 50,
      "conflict_score": < 0.1  // Unanimous within tier
    }
  ],
  "total_candidates": 52
}
```

**Validation Checks:**
- ✅ 52 total candidates
- ✅ Tier 1 conflict > 0.5
- ✅ Tier 5 conflict < 0.1
- ✅ Both tiers present
- ✅ Overall winner from Tier 1

---

### 4. Time Travel Query

**UAT File:** `time-travel-query.md`
**Test Function:** `uat_time_travel_query()`
**Status:** ⊘ Not Yet Implemented

**What it tests:**
- Query knowledge graph as it existed at a specific timestamp
- Historical snapshot returns only assertions before `as_of` date
- Audit trail and debugging capabilities

**API Endpoints:**
- `GET /v1/query?subject=...&predicate=...&as_of=<timestamp>`

**Blocked By:**
- Implementation of `as_of` parameter in query handlers
- Timestamp filtering in query engine

**Next Steps:**
1. Add `as_of: Option<u64>` to query parameters
2. Filter assertions by timestamp in query engine
3. Update UAT test to use actual API

---

## Execution Checklist

### Pre-Execution

- [x] Integration test suite created
- [x] Test compilation verified
- [x] API endpoints confirmed to exist
- [x] Data structures validated against API DTOs
- [ ] StemeDB API server running
- [ ] Database initialized
- [ ] Ingest worker running (for assertion processing)

### Execution

- [ ] Run GLP-1 Muscle Loss Contradiction test
- [ ] Capture test output
- [ ] Update `glp1-muscle-loss-contradiction.md` with results
- [ ] Run Gastroparesis Multi-Source test
- [ ] Capture test output
- [ ] Update `gastroparesis-multi-source.md` with results
- [ ] Run Layered Consensus test
- [ ] Capture test output
- [ ] Update `layered-consensus.md` with results
- [ ] Document Time Travel Query as blocked
- [ ] Create issue for Time Travel Query implementation

### Post-Execution

- [ ] All passing tests have markdown files updated with actual results
- [ ] Failing tests have issues created
- [ ] Week 4 sign-off in roadmap
- [ ] Update stemedb-ontology README with UAT status

## Known Issues / Risks

### 1. Signature Requirement

**Issue:** API requires valid Ed25519 signatures for all assertions.

**Current Approach:** Using dummy signatures (`0000...` for agent_id and signature).

**Risk:** If signature verification is enforced, tests will fail.

**Mitigation:** Either:
- Add a test-mode flag that disables signature verification
- Generate valid signatures in test helper
- Use a test agent keypair

### 2. Assertion Ingestion Delay

**Issue:** Assertions go through WAL → Ingest Worker → Index Store.

**Current Approach:** `sleep(2-3 seconds)` after ingestion.

**Risk:** Race conditions if ingestion is slower than expected.

**Mitigation:**
- Increase sleep duration
- Add polling for assertion availability
- Use synchronous ingestion for tests

### 3. Time Travel Query Not Implemented

**Issue:** `as_of` parameter not yet implemented in query handlers.

**Impact:** Scenario 4 will be skipped in Week 4.

**Plan:** Document as future work, implement in Phase 6.

### 4. Database State Isolation

**Issue:** Tests write to same database, may interfere with each other.

**Current Approach:** Use unique subject/predicate combinations per test.

**Risk:** If tests are re-run, old data may pollute results.

**Mitigation:**
- Use unique identifiers per test run
- Add database cleanup between tests
- Use temporary databases per test

## Success Criteria

Week 4 is considered **complete** when:

1. ✅ Integration test suite compiles without errors
2. ✅ All API endpoints referenced in scenarios exist
3. ✅ DTOs match API contracts
4. ⏳ 3 out of 4 scenarios execute successfully (Time Travel deferred)
5. ⏳ UAT markdown files updated with actual results
6. ⏳ All assertion checks pass (conflict scores, counts, status)
7. ⏳ Test output captured and documented

## Next Steps After Week 4

1. **Week 5:** Implement Time Travel Query (`as_of` parameter)
2. **Week 6:** Add more complex scenarios (multi-tier disagreement, vote integration)
3. **Week 7:** Performance testing (1000s of assertions per tier)
4. **Week 8:** End-to-end workflows (extract → ingest → query → visualize)

## Running the Tests

### Quick Start

```bash
# Terminal 1: Start StemeDB API
cd /Users/jordanwashburn/Workspace/orchard9/stemedb
cargo run -p stemedb-api

# Terminal 2: Run UAT tests (after API is ready)
cd /Users/jordanwashburn/Workspace/orchard9/stemedb
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat run_all_uat_scenarios -- --ignored --nocapture
```

### Individual Scenario Execution

```bash
# GLP-1 Muscle Loss
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_glp1 -- --ignored --nocapture

# Gastroparesis
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_gastroparesis -- --ignored --nocapture

# Layered Consensus
STEMEDB_API_URL=http://localhost:18180 cargo test --test consumer_health_uat uat_layered -- --ignored --nocapture
```

### CI/CD Integration

For automated testing in CI:

```yaml
# .github/workflows/uat.yml
name: Consumer Health UAT

on: [push, pull_request]

jobs:
  uat:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable

      - name: Start StemeDB API
        run: |
          cargo build -p stemedb-api
          cargo run -p stemedb-api &
          sleep 5

      - name: Run UAT Tests
        run: |
          STEMEDB_API_URL=http://localhost:18180 \
          cargo test --test consumer_health_uat -- --ignored --nocapture
```

## Documentation Updates Required

After successful execution:

1. **ai-lookup/index.md:** Add link to UAT results
2. **crates/stemedb-ontology/README.md:** Document test suite
3. **roadmap.md:** Mark Week 4 as complete
4. **uat/consumer-health/README.md:** Update with test results
5. **Each scenario .md file:** Fill in "Actual" columns with real data

---

**Prepared by:** Claude (Defensive Systems Architect)
**Date:** 2026-02-05
**Status:** Infrastructure Complete - Ready for Execution