jml 3b5f88b4f0 feat(aphoria): implement claims architecture (A1-A5) with verify engine, corpus, coverage, and explain

Complete Aphoria claims system overhaul:
- A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims)
- A2: Add AuthoredClaim with full provenance, invariants, and authority tiers
- A3: Verify engine comparing observations against authored claims, CLI + formatters
- A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs
- A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill

Also includes: 42 extractors updated for Observation type, verifiable_predicates trait,
conflict detection with comparison modes, claims TOML persistence, Grafana dashboard,
backup/restore scripts, and comprehensive test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-08 09:11:47 +00:00

13 KiB

Raw Blame History

Aphoria Roadmap Archive

Completed phases moved from roadmap.md. Full implementation details preserved in git history.

Phase 0: StemeDB Foundation ✅

ConceptPath type, hierarchical index, alias store, source class inference, concept API endpoints. All shipped as Phase 5D of the main StemeDB roadmap.

Spec: docs/specs/concept-hierarchy.md

Phase 2: CLI Core ✅

End-to-end CLI pipeline with 10 extractors and bootstrapped corpus of 11 hardcoded assertions.

Task	Status
2.1 Project Walker	✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs`
2.2 Extractors (10)	✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit`, `weak_crypto`, `command_injection`, `sql_injection`
2.3 Ingestion Bridge	✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion
2.4 Conflict Query	✅ `episteme.rs` — LocalEpisteme with check_conflicts()
2.5 Report Output	✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown
2.6 Acknowledge Command	✅ `lib.rs` acknowledge()
Baseline & Diff	✅ `lib.rs` set_baseline(), show_diff()
Status Command	✅ `lib.rs` show_status()

Phase 2 Code Quality Fixes ✅

DES/RC4 concept path misclassification: Split into check_hash_pattern() and check_encryption_pattern()
SHA1 edge case: Documented as intentionally broad
JS exec() regex: Tightened to require child_process. prefix

Phase 2A: Concept Matching ✅

2A.1 Leaf-Based Matching: ConceptIndex with tail-path matching (last 2 segments + predicate)
2A.2 Alias Resolution: Wired AliasStore into QueryEngine.execute() with resolve_aliases: bool
2A.3 Auto-Alias Creation: Auto-creates aliases when code and authority share leaf names

Phase 1: Authoritative Corpus Expansion ✅

Expanded from 11 hardcoded assertions to pluggable corpus system.

1.1 CorpusBuilder Trait ✅ — name, scheme, default_tier, build, requires_network
1.2 RFC Ingester ✅ — HTTP fetching, RFC 2119 keyword parsing, 8 RFC-specific parsers
1.3 OWASP Ingester ✅ — GitHub raw content, 9 cheat sheet parsers
1.4 Vendor Docs ✅ — PostgreSQL, Redis, reqwest, hyper, Go net/http, tokio-postgres, SQLx
1.5 Hardcoded Refactor ✅ — Original 11 assertions as HardcodedCorpusBuilder
1.6 CLI Integration ✅ — aphoria corpus build/list, --only, --offline, --clear-cache
1.7 Error Handling ✅ — Per-source graceful degradation

Files: corpus/mod.rs, corpus/hardcoded.rs, corpus/rfc.rs, corpus/owasp.rs, corpus/vendor.rs

Phase 3: Skill Integration ✅

3.1 Claude Code Skill ✅ — /aphoria scan, scan --fix, ack, status, diff, init, baseline
3.2 Agent Pre-Flight Hook ✅ — --exit-code (2=BLOCK, 1=FLAG, 0=clean), --strict
3.3 Alias Suggestion ✅ — Auto-alias creation from Phase 2A.3

Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ✅

Bidirectional knowledge sync: extract → check → classify → update → gate.

4A Observational Claims ✅ — --sync records novel claims as Tier 4 observations
4B Self-Conflict Detection ✅ — Drift detection with Verdict::Drift
4C Diff-Only Scanning ✅ — --staged for fast pre-commit hooks
4D Enhanced Ack ✅ — --reason, aphoria update for policy changes
4E Hosted Mode ✅ — Team aggregation via central StemeDB server, HostedClient

Phase 4.5: Ephemeral Scan Mode ✅

40x faster scans by skipping Episteme storage. Default mode ~0.25s, persistent ~1-2s.

ScanMode enum (Ephemeral default, Persistent opt-in with --persist)
EphemeralDetector — in-memory corpus + ConceptIndex
check_conflicts_pure() extracted as standalone function

Phase 5: Research Agent Loop ✅

5.1 Gap Detection ✅ — detect_gaps() compares claims against ConceptIndex
5.2 Gap Storage ✅ — JSON-backed persistent storage with eligibility tracking
5.3 Quality Validation ✅ — Source attribution, normative language, vague content detection
5.4 Research Execution ✅ — HTTP fetching, normative extraction, confidence scoring
5.5 CLI Integration ✅ — aphoria research run/status/gaps
5.6 Community Corpus ✅ — Opt-in anonymous pattern sharing with privacy-preserving anonymization
5.7 Security Extractors ✅ — weak_crypto, command_injection, sql_injection

Phase 6: Federated Policy & Trust Packs ✅

6.1 Trust Pack Format ✅ — rkyv serialization, Ed25519 signing
6.2 Policy Management ✅ — Local and remote loading with caching
6.3 Core Integration ✅ — EphemeralDetector + LocalEpisteme policy ingestion
6.4 CLI Commands ✅ — aphoria policy export, auto-loading

Phase 6.5: Trust Pack Extensions ✅

6.5.1 Predicate Aliases ✅ — enabled ↔ required ↔ mandatory ↔ enforced
6.5.2 Pack Signing Key Rotation ✅ — aphoria policy resign, signature chain audit trail

Phase 7: Declarative Extractors ✅

TOML-defined custom extractors without Rust code.

7.1 Core Types ✅ — DeclarativeExtractorDef, DeclarativeExtractor
7.2 Configuration ✅ — [[extractors.declarative]] in aphoria.toml
7.3 Validation ✅ — ReDoS protection, confidence validation
7.4 Registry Integration ✅ — Enable/disable, Trust Pack integration
7.5 Error Handling ✅
7.6 Tests ✅ — 22 unit + 7 integration tests

Phase 7.5: LLM-in-the-Loop Extraction ✅

Gemini-powered semantic extraction for high-value files.

7.5.1 LLM Extractor ✅ — GeminiClient, structured JSON output
7.5.2 Selective Triggering ✅ — is_high_value_file(), token budget
7.5.3 Cost Controls ✅ — BLAKE3 caching, budget enforcement
7.5.4 Configuration ✅ — [llm] section in aphoria.toml

Phase 7.6: Pattern Learning Store ✅

Remember patterns LLM finds for promotion to declarative extractors.

7.6.1 Schema ✅ — LearnedPattern, ClaimTemplate, ValueType
7.6.2 PatternStore ✅ — JSON-backed, RwLock thread safety, Levenshtein dedup
7.6.3 Normalization ✅ — Version/boolean/number/string placeholder replacement
7.6.4 Configuration ✅ — [learning] section
7.6.5 Scan Integration ✅ — Project hash, record/update patterns

Phase 7.7: Pattern → Extractor Promotion ✅

Learned patterns become declarative extractors via LLM regex generation.

7.7.1 Pipeline ✅ — PromotionPipeline, RegexGenerator, ExtractorValidator, YamlWriter
7.7.2 Regex Generation ✅ — Multi-example prompt, ReDoS safety
7.7.3 Validation ✅ — Positive tests, timing validation
7.7.4 Human Review ✅ — aphoria extractors review/stats/candidates/promote
7.7.5 Extractor Output ✅ — YAML files in .aphoria/extractors/learned/

Phase 7.8: LLM Prompt Evaluation ✅

Golden fixtures with precision/recall metrics and regression detection.

7.8.1 Fixture Format ✅ — TOML-based with must_contain/must_not_contain
7.8.2 Claim Matching ✅ — Tail-path matching, type coercion
7.8.3 Metrics ✅ — Precision/Recall/F1, per-category breakdown
7.8.4 Harness ✅ — Live/Cached/Mock modes, regression detection
7.8.5 Reports ✅ — Table, JSON, Markdown
7.8.6 CLI ✅ — aphoria eval run/baseline/update-baseline/list-fixtures/validate-fixtures
7.8.7 Seed Fixtures ✅ — 10 fixtures across tls, jwt, secrets, auth, negative, edge

Phase 8: Enterprise Extractor Improvements ✅

42 extractors total. Enterprise-grade detection for production codebases.

8.1 High-Entropy Secrets ✅ — Shannon entropy, known prefixes (AWS/Stripe/GitHub/GitLab/Slack)
8.2 Framework Extractors (10) ✅ — Spring, Django, Express, Rails, ASP.NET, Laravel, FastAPI, Next.js, Flask, NestJS
8.3 Config Deep Parsing ✅ — YAML/JSON/TOML tree walking, 11 security rules
8.4 Semantic TLS Version ✅ — Cross-language const detection, Terraform, Kubernetes
8.5 ORM SQL Injection ✅ — Django/SQLAlchemy/GORM/ActiveRecord/Prisma/Sequelize
8.6 Path Traversal ✅
8.7 Unvalidated Redirects ✅
8.8 Weak Password ✅
8.9 Security Headers ✅
8.10 Insecure Deserialization ✅
8.11 SSRF ✅

Phase 9: Autonomous Extractor Generation ✅

Fully self-improving extraction system.

9.1 Autonomous Promotion ✅ — >0.95 confidence, >10 projects, full audit trail
9.2 Shadow Mode Testing ✅ — Isolated metrics, graduation gate, FP tracking
9.3 Auto-Rollback ✅ — FP rate >15% triggers automatic rollback
9.4 Cross-Project Learning ✅ — Privacy-preserving pattern sync, community extractors
9.5 Extractor Versioning ✅ — Changelogs, rollback, A/B comparison

Phase 10.1: Acknowledgment Expiry ✅

Time-limited exceptions with --expires flag.

--expires 90d or --expires 2026-12-31 (ISO 8601)
Expired acks resurface as BLOCK
Preserved for audit trail per patent claim 25
All report formatters show expiry info

Files: src/expiry.rs, cli.rs, report/*.rs

Phase 11: Evidence-Based Authority ✅

Evidence levels (ProductSpec > Standard > Research > Commit-only) with evidence-aware graduation.

11.1 Types ✅ — EvidenceLevel, PatternEvidence with ADR/spec/RFC references
11.2 Detection ✅ — Commit message parsing, ADR/spec file detection
11.3 Graduation ✅ — Thresholds vary by evidence (ProductSpec: 1 usage, Commit-only: 10)
11.4 Display ✅ — Evidence chain in output, --evidence filter

Files: src/evidence/mod.rs, evidence/types.rs, evidence/detection.rs

Phase 12: Knowledge Scope Hierarchy ✅

Organization → Team → Project scope levels with inheritance.

12.1 Scope Types ✅ — ScopeLevel enum, ScopeConfig
12.2 Inheritance ✅ — Security: no opt-out, Conventions: override with justification
12.3 Override Workflow ✅ — Justification + evidence required
12.4 Cross-Scope Queries ✅ — --scope org/team/project, --exclude-inherited

Files: src/scope/mod.rs, scope/config.rs, scope/resolver.rs, scope/override_record.rs, scope/store.rs

Phase 13: Knowledge Lifecycle Management ✅

Active → Deprecated → Superseded → Archived lifecycle for patterns.

13.1 Status Types ✅ — KnowledgeStatus enum with history tracking
13.2 Deprecation ✅ — aphoria deprecate with --reason, --superseded-by, --sunset-date
13.3 Migration Guidance ✅ — Warnings in scan output, links to replacements
13.4 Migration Dashboard ✅ — aphoria migrations status, progress tracking, export

Files: src/lifecycle/mod.rs, lifecycle/store.rs, lifecycle/migration.rs

Phase 16: Ignore & Exclusion System ✅

Clean scans by excluding test fixtures and intentional patterns.

16.1 Glob Patterns ✅ — globset with **, *, ? support
16.2 .aphoriaignore ✅ — Gitignore-style patterns, merged with aphoria.toml
16.3 Inline Comments ✅ — // aphoria:ignore, ignore-next-line, ignore-block
16.4 Ack Export/Import ✅ — .aphoria/acks.toml, version-controllable

The Self-Learning Vision (Complete)

Phase 7: Declarative Extractors                          ✅
    ↓
Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅
    ↓
Phase 7.6: Pattern Learning (remember what LLM finds)    ✅
    ↓
Phase 7.7: Pattern Promotion (patterns → extractors)     ✅
    ↓
Phase 7.8: LLM Prompt Evaluation (measure & improve)     ✅
    ↓
Phase 8: Enterprise Extractors (42 total)                ✅
    ↓
Phase 9: Autonomous Generation (fully self-improving)     ✅

Milestone Summary (Completed)

Phase	Deliverable	Status
0	ConceptPath in StemeDB	✅
2	Aphoria CLI (scan, report, ack)	✅
2A	Concept matching (leaf, alias, auto-alias)	✅
1	Authoritative corpus expansion	✅
3	Claude Code skill + hooks	✅
4	Full-cycle pre-commit (sync, drift, staged, hosted)	✅
4.5	Ephemeral scan mode (40x faster)	✅
5	Research agent loop + community corpus	✅
6	Federated Policy & Trust Packs	✅
6.5	Trust Pack Extensions	✅
7	Declarative Extractors	✅
7.5	LLM-in-the-Loop Extraction	✅
7.6	Pattern Learning Store	✅
7.7	Pattern → Extractor Promotion	✅
7.8	LLM Prompt Evaluation	✅
8	Enterprise Extractors (42 total)	✅
9	Autonomous Extractor Generation	✅
10.1	Acknowledgment Expiry	✅
11	Evidence-Based Authority	✅
12	Knowledge Scope Hierarchy	✅
13	Knowledge Lifecycle Management	✅
16	Ignore & Exclusion System	✅

13 KiB Raw Blame History