Complete Aphoria claims system overhaul: - A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims) - A2: Add AuthoredClaim with full provenance, invariants, and authority tiers - A3: Verify engine comparing observations against authored claims, CLI + formatters - A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs - A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill Also includes: 42 extractors updated for Observation type, verifiable_predicates trait, conflict detection with comparison modes, claims TOML persistence, Grafana dashboard, backup/restore scripts, and comprehensive test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
Aphoria Roadmap Archive
Completed phases moved from
roadmap.md. Full implementation details preserved in git history.
Phase 0: StemeDB Foundation ✅
ConceptPath type, hierarchical index, alias store, source class inference, concept API endpoints. All shipped as Phase 5D of the main StemeDB roadmap.
Spec: docs/specs/concept-hierarchy.md
Phase 2: CLI Core ✅
End-to-end CLI pipeline with 10 extractors and bootstrapped corpus of 11 hardcoded assertions.
| Task | Status |
|---|---|
| 2.1 Project Walker | ✅ walker/mod.rs, walker/path_mapper.rs, walker/language.rs |
| 2.2 Extractors (10) | ✅ tls_verify, jwt_config, hardcoded_secrets, timeout_config, dep_versions, cors_config, rate_limit, weak_crypto, command_injection, sql_injection |
| 2.3 Ingestion Bridge | ✅ bridge.rs — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion |
| 2.4 Conflict Query | ✅ episteme.rs — LocalEpisteme with check_conflicts() |
| 2.5 Report Output | ✅ report/ — table (comfy-table), JSON, SARIF 2.1.0, markdown |
| 2.6 Acknowledge Command | ✅ lib.rs acknowledge() |
| Baseline & Diff | ✅ lib.rs set_baseline(), show_diff() |
| Status Command | ✅ lib.rs show_status() |
Phase 2 Code Quality Fixes ✅
- DES/RC4 concept path misclassification: Split into
check_hash_pattern()andcheck_encryption_pattern() - SHA1 edge case: Documented as intentionally broad
- JS exec() regex: Tightened to require
child_process.prefix
Phase 2A: Concept Matching ✅
- 2A.1 Leaf-Based Matching:
ConceptIndexwith tail-path matching (last 2 segments + predicate) - 2A.2 Alias Resolution: Wired
AliasStoreintoQueryEngine.execute()withresolve_aliases: bool - 2A.3 Auto-Alias Creation: Auto-creates aliases when code and authority share leaf names
Phase 1: Authoritative Corpus Expansion ✅
Expanded from 11 hardcoded assertions to pluggable corpus system.
- 1.1 CorpusBuilder Trait ✅ — name, scheme, default_tier, build, requires_network
- 1.2 RFC Ingester ✅ — HTTP fetching, RFC 2119 keyword parsing, 8 RFC-specific parsers
- 1.3 OWASP Ingester ✅ — GitHub raw content, 9 cheat sheet parsers
- 1.4 Vendor Docs ✅ — PostgreSQL, Redis, reqwest, hyper, Go net/http, tokio-postgres, SQLx
- 1.5 Hardcoded Refactor ✅ — Original 11 assertions as
HardcodedCorpusBuilder - 1.6 CLI Integration ✅ —
aphoria corpus build/list,--only,--offline,--clear-cache - 1.7 Error Handling ✅ — Per-source graceful degradation
Files: corpus/mod.rs, corpus/hardcoded.rs, corpus/rfc.rs, corpus/owasp.rs, corpus/vendor.rs
Phase 3: Skill Integration ✅
- 3.1 Claude Code Skill ✅ —
/aphoria scan,scan --fix,ack,status,diff,init,baseline - 3.2 Agent Pre-Flight Hook ✅ —
--exit-code(2=BLOCK, 1=FLAG, 0=clean),--strict - 3.3 Alias Suggestion ✅ — Auto-alias creation from Phase 2A.3
Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ✅
Bidirectional knowledge sync: extract → check → classify → update → gate.
- 4A Observational Claims ✅ —
--syncrecords novel claims as Tier 4 observations - 4B Self-Conflict Detection ✅ — Drift detection with
Verdict::Drift - 4C Diff-Only Scanning ✅ —
--stagedfor fast pre-commit hooks - 4D Enhanced Ack ✅ —
--reason,aphoria updatefor policy changes - 4E Hosted Mode ✅ — Team aggregation via central StemeDB server,
HostedClient
Phase 4.5: Ephemeral Scan Mode ✅
40x faster scans by skipping Episteme storage. Default mode ~0.25s, persistent ~1-2s.
ScanModeenum (Ephemeral default, Persistent opt-in with--persist)EphemeralDetector— in-memory corpus + ConceptIndexcheck_conflicts_pure()extracted as standalone function
Phase 5: Research Agent Loop ✅
- 5.1 Gap Detection ✅ —
detect_gaps()compares claims against ConceptIndex - 5.2 Gap Storage ✅ — JSON-backed persistent storage with eligibility tracking
- 5.3 Quality Validation ✅ — Source attribution, normative language, vague content detection
- 5.4 Research Execution ✅ — HTTP fetching, normative extraction, confidence scoring
- 5.5 CLI Integration ✅ —
aphoria research run/status/gaps - 5.6 Community Corpus ✅ — Opt-in anonymous pattern sharing with privacy-preserving anonymization
- 5.7 Security Extractors ✅ — weak_crypto, command_injection, sql_injection
Phase 6: Federated Policy & Trust Packs ✅
- 6.1 Trust Pack Format ✅ — rkyv serialization, Ed25519 signing
- 6.2 Policy Management ✅ — Local and remote loading with caching
- 6.3 Core Integration ✅ — EphemeralDetector + LocalEpisteme policy ingestion
- 6.4 CLI Commands ✅ —
aphoria policy export, auto-loading
Phase 6.5: Trust Pack Extensions ✅
- 6.5.1 Predicate Aliases ✅ —
enabled↔required↔mandatory↔enforced - 6.5.2 Pack Signing Key Rotation ✅ —
aphoria policy resign, signature chain audit trail
Phase 7: Declarative Extractors ✅
TOML-defined custom extractors without Rust code.
- 7.1 Core Types ✅ —
DeclarativeExtractorDef,DeclarativeExtractor - 7.2 Configuration ✅ —
[[extractors.declarative]]in aphoria.toml - 7.3 Validation ✅ — ReDoS protection, confidence validation
- 7.4 Registry Integration ✅ — Enable/disable, Trust Pack integration
- 7.5 Error Handling ✅
- 7.6 Tests ✅ — 22 unit + 7 integration tests
Phase 7.5: LLM-in-the-Loop Extraction ✅
Gemini-powered semantic extraction for high-value files.
- 7.5.1 LLM Extractor ✅ —
GeminiClient, structured JSON output - 7.5.2 Selective Triggering ✅ —
is_high_value_file(), token budget - 7.5.3 Cost Controls ✅ — BLAKE3 caching, budget enforcement
- 7.5.4 Configuration ✅ —
[llm]section in aphoria.toml
Phase 7.6: Pattern Learning Store ✅
Remember patterns LLM finds for promotion to declarative extractors.
- 7.6.1 Schema ✅ —
LearnedPattern,ClaimTemplate,ValueType - 7.6.2 PatternStore ✅ — JSON-backed, RwLock thread safety, Levenshtein dedup
- 7.6.3 Normalization ✅ — Version/boolean/number/string placeholder replacement
- 7.6.4 Configuration ✅ —
[learning]section - 7.6.5 Scan Integration ✅ — Project hash, record/update patterns
Phase 7.7: Pattern → Extractor Promotion ✅
Learned patterns become declarative extractors via LLM regex generation.
- 7.7.1 Pipeline ✅ —
PromotionPipeline,RegexGenerator,ExtractorValidator,YamlWriter - 7.7.2 Regex Generation ✅ — Multi-example prompt, ReDoS safety
- 7.7.3 Validation ✅ — Positive tests, timing validation
- 7.7.4 Human Review ✅ —
aphoria extractors review/stats/candidates/promote - 7.7.5 Extractor Output ✅ — YAML files in
.aphoria/extractors/learned/
Phase 7.8: LLM Prompt Evaluation ✅
Golden fixtures with precision/recall metrics and regression detection.
- 7.8.1 Fixture Format ✅ — TOML-based with
must_contain/must_not_contain - 7.8.2 Claim Matching ✅ — Tail-path matching, type coercion
- 7.8.3 Metrics ✅ — Precision/Recall/F1, per-category breakdown
- 7.8.4 Harness ✅ — Live/Cached/Mock modes, regression detection
- 7.8.5 Reports ✅ — Table, JSON, Markdown
- 7.8.6 CLI ✅ —
aphoria eval run/baseline/update-baseline/list-fixtures/validate-fixtures - 7.8.7 Seed Fixtures ✅ — 10 fixtures across tls, jwt, secrets, auth, negative, edge
Phase 8: Enterprise Extractor Improvements ✅
42 extractors total. Enterprise-grade detection for production codebases.
- 8.1 High-Entropy Secrets ✅ — Shannon entropy, known prefixes (AWS/Stripe/GitHub/GitLab/Slack)
- 8.2 Framework Extractors (10) ✅ — Spring, Django, Express, Rails, ASP.NET, Laravel, FastAPI, Next.js, Flask, NestJS
- 8.3 Config Deep Parsing ✅ — YAML/JSON/TOML tree walking, 11 security rules
- 8.4 Semantic TLS Version ✅ — Cross-language const detection, Terraform, Kubernetes
- 8.5 ORM SQL Injection ✅ — Django/SQLAlchemy/GORM/ActiveRecord/Prisma/Sequelize
- 8.6 Path Traversal ✅
- 8.7 Unvalidated Redirects ✅
- 8.8 Weak Password ✅
- 8.9 Security Headers ✅
- 8.10 Insecure Deserialization ✅
- 8.11 SSRF ✅
Phase 9: Autonomous Extractor Generation ✅
Fully self-improving extraction system.
- 9.1 Autonomous Promotion ✅ — >0.95 confidence, >10 projects, full audit trail
- 9.2 Shadow Mode Testing ✅ — Isolated metrics, graduation gate, FP tracking
- 9.3 Auto-Rollback ✅ — FP rate >15% triggers automatic rollback
- 9.4 Cross-Project Learning ✅ — Privacy-preserving pattern sync, community extractors
- 9.5 Extractor Versioning ✅ — Changelogs, rollback, A/B comparison
Phase 10.1: Acknowledgment Expiry ✅
Time-limited exceptions with --expires flag.
--expires 90dor--expires 2026-12-31(ISO 8601)- Expired acks resurface as BLOCK
- Preserved for audit trail per patent claim 25
- All report formatters show expiry info
Files: src/expiry.rs, cli.rs, report/*.rs
Phase 11: Evidence-Based Authority ✅
Evidence levels (ProductSpec > Standard > Research > Commit-only) with evidence-aware graduation.
- 11.1 Types ✅ —
EvidenceLevel,PatternEvidencewith ADR/spec/RFC references - 11.2 Detection ✅ — Commit message parsing, ADR/spec file detection
- 11.3 Graduation ✅ — Thresholds vary by evidence (ProductSpec: 1 usage, Commit-only: 10)
- 11.4 Display ✅ — Evidence chain in output,
--evidencefilter
Files: src/evidence/mod.rs, evidence/types.rs, evidence/detection.rs
Phase 12: Knowledge Scope Hierarchy ✅
Organization → Team → Project scope levels with inheritance.
- 12.1 Scope Types ✅ —
ScopeLevelenum,ScopeConfig - 12.2 Inheritance ✅ — Security: no opt-out, Conventions: override with justification
- 12.3 Override Workflow ✅ — Justification + evidence required
- 12.4 Cross-Scope Queries ✅ —
--scope org/team/project,--exclude-inherited
Files: src/scope/mod.rs, scope/config.rs, scope/resolver.rs, scope/override_record.rs, scope/store.rs
Phase 13: Knowledge Lifecycle Management ✅
Active → Deprecated → Superseded → Archived lifecycle for patterns.
- 13.1 Status Types ✅ —
KnowledgeStatusenum with history tracking - 13.2 Deprecation ✅ —
aphoria deprecatewith--reason,--superseded-by,--sunset-date - 13.3 Migration Guidance ✅ — Warnings in scan output, links to replacements
- 13.4 Migration Dashboard ✅ —
aphoria migrations status, progress tracking, export
Files: src/lifecycle/mod.rs, lifecycle/store.rs, lifecycle/migration.rs
Phase 16: Ignore & Exclusion System ✅
Clean scans by excluding test fixtures and intentional patterns.
- 16.1 Glob Patterns ✅ —
globsetwith**,*,?support - 16.2
.aphoriaignore✅ — Gitignore-style patterns, merged with aphoria.toml - 16.3 Inline Comments ✅ —
// aphoria:ignore,ignore-next-line,ignore-block - 16.4 Ack Export/Import ✅ —
.aphoria/acks.toml, version-controllable
The Self-Learning Vision (Complete)
Phase 7: Declarative Extractors ✅
↓
Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅
↓
Phase 7.6: Pattern Learning (remember what LLM finds) ✅
↓
Phase 7.7: Pattern Promotion (patterns → extractors) ✅
↓
Phase 7.8: LLM Prompt Evaluation (measure & improve) ✅
↓
Phase 8: Enterprise Extractors (42 total) ✅
↓
Phase 9: Autonomous Generation (fully self-improving) ✅
Milestone Summary (Completed)
| Phase | Deliverable | Status |
|---|---|---|
| 0 | ConceptPath in StemeDB | ✅ |
| 2 | Aphoria CLI (scan, report, ack) | ✅ |
| 2A | Concept matching (leaf, alias, auto-alias) | ✅ |
| 1 | Authoritative corpus expansion | ✅ |
| 3 | Claude Code skill + hooks | ✅ |
| 4 | Full-cycle pre-commit (sync, drift, staged, hosted) | ✅ |
| 4.5 | Ephemeral scan mode (40x faster) | ✅ |
| 5 | Research agent loop + community corpus | ✅ |
| 6 | Federated Policy & Trust Packs | ✅ |
| 6.5 | Trust Pack Extensions | ✅ |
| 7 | Declarative Extractors | ✅ |
| 7.5 | LLM-in-the-Loop Extraction | ✅ |
| 7.6 | Pattern Learning Store | ✅ |
| 7.7 | Pattern → Extractor Promotion | ✅ |
| 7.8 | LLM Prompt Evaluation | ✅ |
| 8 | Enterprise Extractors (42 total) | ✅ |
| 9 | Autonomous Extractor Generation | ✅ |
| 10.1 | Acknowledgment Expiry | ✅ |
| 11 | Evidence-Based Authority | ✅ |
| 12 | Knowledge Scope Hierarchy | ✅ |
| 13 | Knowledge Lifecycle Management | ✅ |
| 16 | Ignore & Exclusion System | ✅ |