stemedb/applications/aphoria/uat/2026-02-04-full-cycle-precommit-vision.md
jordan 8f6506b70a feat: Aphoria scan modes + stemedb-ontology crate + consumer health UAT
Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation

Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:57:33 -07:00

9.6 KiB

Full-Cycle Pre-Commit Vision

Date: 2026-02-04 Status: Vision / Gap Analysis

Executive Summary

The pre-commit hook should be a bidirectional knowledge sync, not just a read-only linter. Every commit extracts claims from code, checks them against authority, and records observations back — building project memory and (optionally) contributing to community intelligence.

The Vision: Scan + Sync

┌─────────────────────────────────────────────────────────────┐
│                     PRE-COMMIT FLOW                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. EXTRACT         What claims does this code make?         │
│                     (TLS settings, timeouts, crypto, etc.)   │
│                                                              │
│  2. CHECK           Against authoritative corpus (Tier 0-2)  │
│                     Against project's own prior claims       │
│                                                              │
│  3. CLASSIFY                                                 │
│     ┌────────────────────┬──────────────────────────────┐    │
│     │ Scenario           │ Result                       │    │
│     ├────────────────────┼──────────────────────────────┤    │
│     │ Authority conflict │ FIX code or ACK deviation    │    │
│     │ Self conflict      │ Intentional change? Ack it   │    │
│     │ Novel claim        │ Record as observation        │    │
│     │ Unchanged claim    │ Update timestamp (heartbeat) │    │
│     └────────────────────┴──────────────────────────────┘    │
│                                                              │
│  4. UPDATE          Store observations to local Episteme     │
│                     - New claims → Tier 4 assertions         │
│                     - Changed claims → new version           │
│                     - Acks → explicit policy decisions       │
│                                                              │
│  5. GATE            Exit codes for git hook                  │
│                     - 2 = BLOCK (authority conflict)         │
│                     - 1 = FLAG (self conflict, review)       │
│                     - 0 = PASS                               │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Concepts

Observational Claims (Tier 4)

When code makes a claim with no authoritative coverage:

Code: connection_pool.max_size = 25
Authority: (nothing from RFC/OWASP/vendor)
Action: Record as Tier 4 (Observational) assertion
        subject: code://rust/myapp/db/connection_pool/max_size
        predicate: configured_as
        object: "25"
        source_class: Observational

This is the project's own belief — not authoritative, but tracked.

Self-Conflict Detection

On subsequent commits, detect drift from prior observations:

Prior: connection_pool.max_size = 25 (recorded 2026-01-15)
Now:   connection_pool.max_size = 10

Result: SELF-CONFLICT
        "You changed max_size from 25 to 10"
        "Was this intentional? [ack/revert/explain]"

This catches accidental changes to established patterns.

The Ack Decision Tree

Conflict detected
       │
       ▼
┌──────────────────┐
│ Source of truth? │
└────────┬─────────┘
         │
    ┌────┴────┐
    │         │
Authority   Self
    │         │
    ▼         ▼
┌───────┐  ┌────────────┐
│Fix or │  │Intentional │
│comply │  │change?     │
└───┬───┘  └─────┬──────┘
    │            │
    ▼            ▼
┌───────────┐ ┌─────────────────┐
│ack:       │ │ack:             │
│deviation  │ │policy_update    │
│from_rfc   │ │old=25, new=10   │
└───────────┘ └─────────────────┘

Community Contribution (Opt-In)

If configured, observations can be anonymously contributed:

# aphoria.toml
[community]
contribute = true
anonymize = true  # Strip project-specific paths

Aggregated patterns become community intelligence:

  • "90% of Rust projects use pool_size 20-50"
  • "This TLS pattern is always acknowledged → lower severity"
  • "This JWT pattern is always a real bug → raise severity"

End-to-End Example

First Commit (Project Init)

$ git commit -m "Initial API server"

aphoria: Scanning staged files...
aphoria: Extracted 47 claims from 12 files

AUTHORITY CONFLICTS (2):
  BLOCK: tls/min_version = TLS_1_1
         RFC 8446 requires TLS_1_2 minimum

  FLAG:  jwt/expiry = 7d
         OWASP recommends <= 24h for access tokens

NOVEL OBSERVATIONS (45):
  Recorded 45 observational claims (no authority coverage)
  Examples:
    - db/pool_size = 25
    - api/timeout = 30s
    - cache/ttl = 3600s

Action required: Fix 1 BLOCK before committing

Later Commit (Drift Detection)

$ git commit -m "Tune database settings"

aphoria: Scanning staged files...
aphoria: Extracted 3 changed claims

SELF-CONFLICTS (1):
  FLAG:  db/pool_size changed: 25100
         Prior value recorded 2026-01-15
         Is this intentional?

Options:
  [a]ck  - Yes, this is intentional (records policy update)
  [r]eset - No, revert to prior value
  [e]xplain - Add rationale for the change

Acknowledgment with Rationale

$ aphoria ack db/pool_size --reason "Scaling for Black Friday traffic"

Recorded policy update:
  subject: code://rust/myapp/db/pool_size
  old_value: 25
  new_value: 100
  rationale: "Scaling for Black Friday traffic"
  timestamp: 2026-02-04T10:30:00Z

Required Capabilities

Currently Implemented

Capability Implementation
Extract claims from code Walker + 10 extractors
Check against authority ConceptIndex + corpus
Report conflicts SARIF, JSON, table, markdown
Acknowledge conflicts aphoria ack command
Baseline mode aphoria baseline
Diff detection aphoria diff
Exit codes --exit-code flag
Trust Packs Phase 6 complete

Gaps

Capability Status Notes
Record observational claims Write Tier 4 assertions for code claims
Self-conflict detection Query prior claims on same subject
Claim versioning Track value changes over time
Diff-only scanning --staged, --since-baseline flags
Ack with rationale --reason flag for ack command
Policy update assertions Record intentional changes as assertions
Community contribution Anonymous pattern telemetry
Heartbeat timestamps Update last-seen on unchanged claims

Implementation Plan

Phase 4A: Observational Claims

  1. Add ingest_observations() to LocalEpisteme
  2. Store code claims as Tier 4 (Observational) assertions
  3. Key by code://{lang}/{project}/{path} concept paths
  4. Add --sync flag to aphoria scan to enable write-back

Phase 4B: Self-Conflict Detection

  1. Before conflict check, query own prior claims
  2. Compare current extraction to stored observations
  3. Report changes as SELF-CONFLICT with diff
  4. New verdict: Drift (distinct from Block/Flag)

Phase 4C: Diff-Only Scanning

  1. --staged flag: only scan git diff --cached files
  2. --since-baseline flag: only scan files changed since baseline
  3. Incremental extraction for fast pre-commit hooks

Phase 4D: Enhanced Ack

  1. --reason "text" flag for acknowledgments
  2. Store rationale in assertion metadata
  3. ack for authority conflicts vs update for self-conflicts
  4. Policy update assertions for intentional drift

Phase 4E: Community Contribution (Optional)

  1. Anonymous aggregation of observation patterns
  2. Opt-in telemetry endpoint
  3. Privacy-preserving path normalization
  4. Community corpus fed by aggregate patterns

Success Criteria

Criterion Metric
Pre-commit is fast < 500ms for staged-only scan
Drift is caught Self-conflicts detected on value changes
Memory persists Observations survive across commits
Rationale is preserved Ack reasons queryable in reports
Opt-in works Community contribution respects config

Open Questions

  1. Storage location: .aphoria/ in project root vs ~/.local/share/aphoria/?
  2. Observation expiry: Should old observations be pruned if not seen in N commits?
  3. Merge conflicts: How to handle observation conflicts during git merge?
  4. CI mode: Should CI record observations, or only local dev?