Major additions: - Staged scanning modes (working tree, staged, committed) with git integration - Drift detection for baseline vs current state comparisons - Hosted API handlers for policy CRUD operations via StemeDB API - stemedb-ontology crate with domain definitions and medical extractors - Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.) - Aphoria development skill documentation Code organization: - Split large files into focused modules to stay under 500-line limit - Extracted config tests, episteme helpers/drift/aliases, API helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9.6 KiB
9.6 KiB
Full-Cycle Pre-Commit Vision
Date: 2026-02-04 Status: Vision / Gap Analysis
Executive Summary
The pre-commit hook should be a bidirectional knowledge sync, not just a read-only linter. Every commit extracts claims from code, checks them against authority, and records observations back — building project memory and (optionally) contributing to community intelligence.
The Vision: Scan + Sync
┌─────────────────────────────────────────────────────────────┐
│ PRE-COMMIT FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. EXTRACT What claims does this code make? │
│ (TLS settings, timeouts, crypto, etc.) │
│ │
│ 2. CHECK Against authoritative corpus (Tier 0-2) │
│ Against project's own prior claims │
│ │
│ 3. CLASSIFY │
│ ┌────────────────────┬──────────────────────────────┐ │
│ │ Scenario │ Result │ │
│ ├────────────────────┼──────────────────────────────┤ │
│ │ Authority conflict │ FIX code or ACK deviation │ │
│ │ Self conflict │ Intentional change? Ack it │ │
│ │ Novel claim │ Record as observation │ │
│ │ Unchanged claim │ Update timestamp (heartbeat) │ │
│ └────────────────────┴──────────────────────────────┘ │
│ │
│ 4. UPDATE Store observations to local Episteme │
│ - New claims → Tier 4 assertions │
│ - Changed claims → new version │
│ - Acks → explicit policy decisions │
│ │
│ 5. GATE Exit codes for git hook │
│ - 2 = BLOCK (authority conflict) │
│ - 1 = FLAG (self conflict, review) │
│ - 0 = PASS │
│ │
└─────────────────────────────────────────────────────────────┘
Key Concepts
Observational Claims (Tier 4)
When code makes a claim with no authoritative coverage:
Code: connection_pool.max_size = 25
Authority: (nothing from RFC/OWASP/vendor)
Action: Record as Tier 4 (Observational) assertion
subject: code://rust/myapp/db/connection_pool/max_size
predicate: configured_as
object: "25"
source_class: Observational
This is the project's own belief — not authoritative, but tracked.
Self-Conflict Detection
On subsequent commits, detect drift from prior observations:
Prior: connection_pool.max_size = 25 (recorded 2026-01-15)
Now: connection_pool.max_size = 10
Result: SELF-CONFLICT
"You changed max_size from 25 to 10"
"Was this intentional? [ack/revert/explain]"
This catches accidental changes to established patterns.
The Ack Decision Tree
Conflict detected
│
▼
┌──────────────────┐
│ Source of truth? │
└────────┬─────────┘
│
┌────┴────┐
│ │
Authority Self
│ │
▼ ▼
┌───────┐ ┌────────────┐
│Fix or │ │Intentional │
│comply │ │change? │
└───┬───┘ └─────┬──────┘
│ │
▼ ▼
┌───────────┐ ┌─────────────────┐
│ack: │ │ack: │
│deviation │ │policy_update │
│from_rfc │ │old=25, new=10 │
└───────────┘ └─────────────────┘
Community Contribution (Opt-In)
If configured, observations can be anonymously contributed:
# aphoria.toml
[community]
contribute = true
anonymize = true # Strip project-specific paths
Aggregated patterns become community intelligence:
- "90% of Rust projects use pool_size 20-50"
- "This TLS pattern is always acknowledged → lower severity"
- "This JWT pattern is always a real bug → raise severity"
End-to-End Example
First Commit (Project Init)
$ git commit -m "Initial API server"
aphoria: Scanning staged files...
aphoria: Extracted 47 claims from 12 files
AUTHORITY CONFLICTS (2):
BLOCK: tls/min_version = TLS_1_1
RFC 8446 requires TLS_1_2 minimum
FLAG: jwt/expiry = 7d
OWASP recommends <= 24h for access tokens
NOVEL OBSERVATIONS (45):
Recorded 45 observational claims (no authority coverage)
Examples:
- db/pool_size = 25
- api/timeout = 30s
- cache/ttl = 3600s
Action required: Fix 1 BLOCK before committing
Later Commit (Drift Detection)
$ git commit -m "Tune database settings"
aphoria: Scanning staged files...
aphoria: Extracted 3 changed claims
SELF-CONFLICTS (1):
FLAG: db/pool_size changed: 25 → 100
Prior value recorded 2026-01-15
Is this intentional?
Options:
[a]ck - Yes, this is intentional (records policy update)
[r]eset - No, revert to prior value
[e]xplain - Add rationale for the change
Acknowledgment with Rationale
$ aphoria ack db/pool_size --reason "Scaling for Black Friday traffic"
Recorded policy update:
subject: code://rust/myapp/db/pool_size
old_value: 25
new_value: 100
rationale: "Scaling for Black Friday traffic"
timestamp: 2026-02-04T10:30:00Z
Required Capabilities
Currently Implemented ✅
| Capability | Implementation |
|---|---|
| Extract claims from code | Walker + 10 extractors |
| Check against authority | ConceptIndex + corpus |
| Report conflicts | SARIF, JSON, table, markdown |
| Acknowledge conflicts | aphoria ack command |
| Baseline mode | aphoria baseline |
| Diff detection | aphoria diff |
| Exit codes | --exit-code flag |
| Trust Packs | Phase 6 complete |
Gaps ⬜
| Capability | Status | Notes |
|---|---|---|
| Record observational claims | ⬜ | Write Tier 4 assertions for code claims |
| Self-conflict detection | ⬜ | Query prior claims on same subject |
| Claim versioning | ⬜ | Track value changes over time |
| Diff-only scanning | ⬜ | --staged, --since-baseline flags |
| Ack with rationale | ⬜ | --reason flag for ack command |
| Policy update assertions | ⬜ | Record intentional changes as assertions |
| Community contribution | ⬜ | Anonymous pattern telemetry |
| Heartbeat timestamps | ⬜ | Update last-seen on unchanged claims |
Implementation Plan
Phase 4A: Observational Claims
- Add
ingest_observations()to LocalEpisteme - Store code claims as Tier 4 (Observational) assertions
- Key by
code://{lang}/{project}/{path}concept paths - Add
--syncflag toaphoria scanto enable write-back
Phase 4B: Self-Conflict Detection
- Before conflict check, query own prior claims
- Compare current extraction to stored observations
- Report changes as SELF-CONFLICT with diff
- New verdict:
Drift(distinct fromBlock/Flag)
Phase 4C: Diff-Only Scanning
--stagedflag: only scangit diff --cachedfiles--since-baselineflag: only scan files changed since baseline- Incremental extraction for fast pre-commit hooks
Phase 4D: Enhanced Ack
--reason "text"flag for acknowledgments- Store rationale in assertion metadata
ackfor authority conflicts vsupdatefor self-conflicts- Policy update assertions for intentional drift
Phase 4E: Community Contribution (Optional)
- Anonymous aggregation of observation patterns
- Opt-in telemetry endpoint
- Privacy-preserving path normalization
- Community corpus fed by aggregate patterns
Success Criteria
| Criterion | Metric |
|---|---|
| Pre-commit is fast | < 500ms for staged-only scan |
| Drift is caught | Self-conflicts detected on value changes |
| Memory persists | Observations survive across commits |
| Rationale is preserved | Ack reasons queryable in reports |
| Opt-in works | Community contribution respects config |
Open Questions
- Storage location:
.aphoria/in project root vs~/.local/share/aphoria/? - Observation expiry: Should old observations be pruned if not seen in N commits?
- Merge conflicts: How to handle observation conflicts during git merge?
- CI mode: Should CI record observations, or only local dev?