stemedb/applications/aphoria/roadmap-archive.md
jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00

18 KiB

Aphoria Roadmap Archive

Completed phases moved from roadmap.md. Full implementation details preserved in git history.


Phase 0: StemeDB Foundation

ConceptPath type, hierarchical index, alias store, source class inference, concept API endpoints. All shipped as Phase 5D of the main StemeDB roadmap.

Spec: docs/specs/concept-hierarchy.md


Phase 2: CLI Core

End-to-end CLI pipeline with 10 extractors and bootstrapped corpus of 11 hardcoded assertions.

Task Status
2.1 Project Walker walker/mod.rs, walker/path_mapper.rs, walker/language.rs
2.2 Extractors (10) tls_verify, jwt_config, hardcoded_secrets, timeout_config, dep_versions, cors_config, rate_limit, weak_crypto, command_injection, sql_injection
2.3 Ingestion Bridge bridge.rs — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion
2.4 Conflict Query episteme.rs — LocalEpisteme with check_conflicts()
2.5 Report Output report/ — table (comfy-table), JSON, SARIF 2.1.0, markdown
2.6 Acknowledge Command lib.rs acknowledge()
Baseline & Diff lib.rs set_baseline(), show_diff()
Status Command lib.rs show_status()

Phase 2 Code Quality Fixes

  • DES/RC4 concept path misclassification: Split into check_hash_pattern() and check_encryption_pattern()
  • SHA1 edge case: Documented as intentionally broad
  • JS exec() regex: Tightened to require child_process. prefix

Phase 2A: Concept Matching

  • 2A.1 Leaf-Based Matching: ConceptIndex with tail-path matching (last 2 segments + predicate)
  • 2A.2 Alias Resolution: Wired AliasStore into QueryEngine.execute() with resolve_aliases: bool
  • 2A.3 Auto-Alias Creation: Auto-creates aliases when code and authority share leaf names

Phase 1: Authoritative Corpus Expansion

Expanded from 11 hardcoded assertions to pluggable corpus system.

  • 1.1 CorpusBuilder Trait — name, scheme, default_tier, build, requires_network
  • 1.2 RFC Ingester — HTTP fetching, RFC 2119 keyword parsing, 8 RFC-specific parsers
  • 1.3 OWASP Ingester — GitHub raw content, 9 cheat sheet parsers
  • 1.4 Vendor Docs — PostgreSQL, Redis, reqwest, hyper, Go net/http, tokio-postgres, SQLx
  • 1.5 Hardcoded Refactor — Original 11 assertions as HardcodedCorpusBuilder
  • 1.6 CLI Integration aphoria corpus build/list, --only, --offline, --clear-cache
  • 1.7 Error Handling — Per-source graceful degradation

Files: corpus/mod.rs, corpus/hardcoded.rs, corpus/rfc.rs, corpus/owasp.rs, corpus/vendor.rs


Phase 3: Skill Integration

  • 3.1 Claude Code Skill /aphoria scan, scan --fix, ack, status, diff, init, baseline
  • 3.2 Agent Pre-Flight Hook --exit-code (2=BLOCK, 1=FLAG, 0=clean), --strict
  • 3.3 Alias Suggestion — Auto-alias creation from Phase 2A.3

Phase 4: Full-Cycle Pre-Commit (Scan + Sync)

Bidirectional knowledge sync: extract → check → classify → update → gate.

  • 4A Observational Claims --sync records novel claims as Tier 4 observations
  • 4B Self-Conflict Detection — Drift detection with Verdict::Drift
  • 4C Diff-Only Scanning --staged for fast pre-commit hooks
  • 4D Enhanced Ack --reason, aphoria update for policy changes
  • 4E Hosted Mode — Team aggregation via central StemeDB server, HostedClient

Phase 4.5: Ephemeral Scan Mode

40x faster scans by skipping Episteme storage. Default mode ~0.25s, persistent ~1-2s.

  • ScanMode enum (Ephemeral default, Persistent opt-in with --persist)
  • EphemeralDetector — in-memory corpus + ConceptIndex
  • check_conflicts_pure() extracted as standalone function

Phase 5: Research Agent Loop

  • 5.1 Gap Detection detect_gaps() compares claims against ConceptIndex
  • 5.2 Gap Storage — JSON-backed persistent storage with eligibility tracking
  • 5.3 Quality Validation — Source attribution, normative language, vague content detection
  • 5.4 Research Execution — HTTP fetching, normative extraction, confidence scoring
  • 5.5 CLI Integration aphoria research run/status/gaps
  • 5.6 Community Corpus — Opt-in anonymous pattern sharing with privacy-preserving anonymization
  • 5.7 Security Extractors — weak_crypto, command_injection, sql_injection

Phase 6: Federated Policy & Trust Packs

  • 6.1 Trust Pack Format — rkyv serialization, Ed25519 signing
  • 6.2 Policy Management — Local and remote loading with caching
  • 6.3 Core Integration — EphemeralDetector + LocalEpisteme policy ingestion
  • 6.4 CLI Commands aphoria policy export, auto-loading

Phase 6.5: Trust Pack Extensions

  • 6.5.1 Predicate Aliases enabledrequiredmandatoryenforced
  • 6.5.2 Pack Signing Key Rotation aphoria policy resign, signature chain audit trail

Phase 7: Declarative Extractors

TOML-defined custom extractors without Rust code.

  • 7.1 Core Types DeclarativeExtractorDef, DeclarativeExtractor
  • 7.2 Configuration [[extractors.declarative]] in aphoria.toml
  • 7.3 Validation — ReDoS protection, confidence validation
  • 7.4 Registry Integration — Enable/disable, Trust Pack integration
  • 7.5 Error Handling
  • 7.6 Tests — 22 unit + 7 integration tests

Phase 7.5: LLM-in-the-Loop Extraction

Gemini-powered semantic extraction for high-value files.

  • 7.5.1 LLM Extractor GeminiClient, structured JSON output
  • 7.5.2 Selective Triggering is_high_value_file(), token budget
  • 7.5.3 Cost Controls — BLAKE3 caching, budget enforcement
  • 7.5.4 Configuration [llm] section in aphoria.toml

Phase 7.6: Pattern Learning Store

Remember patterns LLM finds for promotion to declarative extractors.

  • 7.6.1 Schema LearnedPattern, ClaimTemplate, ValueType
  • 7.6.2 PatternStore — JSON-backed, RwLock thread safety, Levenshtein dedup
  • 7.6.3 Normalization — Version/boolean/number/string placeholder replacement
  • 7.6.4 Configuration [learning] section
  • 7.6.5 Scan Integration — Project hash, record/update patterns

Phase 7.7: Pattern → Extractor Promotion

Learned patterns become declarative extractors via LLM regex generation.

  • 7.7.1 Pipeline PromotionPipeline, RegexGenerator, ExtractorValidator, YamlWriter
  • 7.7.2 Regex Generation — Multi-example prompt, ReDoS safety
  • 7.7.3 Validation — Positive tests, timing validation
  • 7.7.4 Human Review aphoria extractors review/stats/candidates/promote
  • 7.7.5 Extractor Output — YAML files in .aphoria/extractors/learned/

Phase 7.8: LLM Prompt Evaluation

Golden fixtures with precision/recall metrics and regression detection.

  • 7.8.1 Fixture Format — TOML-based with must_contain/must_not_contain
  • 7.8.2 Claim Matching — Tail-path matching, type coercion
  • 7.8.3 Metrics — Precision/Recall/F1, per-category breakdown
  • 7.8.4 Harness — Live/Cached/Mock modes, regression detection
  • 7.8.5 Reports — Table, JSON, Markdown
  • 7.8.6 CLI aphoria eval run/baseline/update-baseline/list-fixtures/validate-fixtures
  • 7.8.7 Seed Fixtures — 10 fixtures across tls, jwt, secrets, auth, negative, edge

Phase 8: Enterprise Extractor Improvements

42 extractors total. Enterprise-grade detection for production codebases.

  • 8.1 High-Entropy Secrets — Shannon entropy, known prefixes (AWS/Stripe/GitHub/GitLab/Slack)
  • 8.2 Framework Extractors (10) — Spring, Django, Express, Rails, ASP.NET, Laravel, FastAPI, Next.js, Flask, NestJS
  • 8.3 Config Deep Parsing — YAML/JSON/TOML tree walking, 11 security rules
  • 8.4 Semantic TLS Version — Cross-language const detection, Terraform, Kubernetes
  • 8.5 ORM SQL Injection — Django/SQLAlchemy/GORM/ActiveRecord/Prisma/Sequelize
  • 8.6 Path Traversal
  • 8.7 Unvalidated Redirects
  • 8.8 Weak Password
  • 8.9 Security Headers
  • 8.10 Insecure Deserialization
  • 8.11 SSRF

Phase 9: Autonomous Extractor Generation

Fully self-improving extraction system.

  • 9.1 Autonomous Promotion — >0.95 confidence, >10 projects, full audit trail
  • 9.2 Shadow Mode Testing — Isolated metrics, graduation gate, FP tracking
  • 9.3 Auto-Rollback — FP rate >15% triggers automatic rollback
  • 9.4 Cross-Project Learning — Privacy-preserving pattern sync, community extractors
  • 9.5 Extractor Versioning — Changelogs, rollback, A/B comparison

Phase 10.1: Acknowledgment Expiry

Time-limited exceptions with --expires flag.

  • --expires 90d or --expires 2026-12-31 (ISO 8601)
  • Expired acks resurface as BLOCK
  • Preserved for audit trail per patent claim 25
  • All report formatters show expiry info

Files: src/expiry.rs, cli.rs, report/*.rs


Phase 11: Evidence-Based Authority

Evidence levels (ProductSpec > Standard > Research > Commit-only) with evidence-aware graduation.

  • 11.1 Types EvidenceLevel, PatternEvidence with ADR/spec/RFC references
  • 11.2 Detection — Commit message parsing, ADR/spec file detection
  • 11.3 Graduation — Thresholds vary by evidence (ProductSpec: 1 usage, Commit-only: 10)
  • 11.4 Display — Evidence chain in output, --evidence filter

Files: src/evidence/mod.rs, evidence/types.rs, evidence/detection.rs


Phase 12: Knowledge Scope Hierarchy

Organization → Team → Project scope levels with inheritance.

  • 12.1 Scope Types ScopeLevel enum, ScopeConfig
  • 12.2 Inheritance — Security: no opt-out, Conventions: override with justification
  • 12.3 Override Workflow — Justification + evidence required
  • 12.4 Cross-Scope Queries --scope org/team/project, --exclude-inherited

Files: src/scope/mod.rs, scope/config.rs, scope/resolver.rs, scope/override_record.rs, scope/store.rs


Phase 13: Knowledge Lifecycle Management

Active → Deprecated → Superseded → Archived lifecycle for patterns.

  • 13.1 Status Types KnowledgeStatus enum with history tracking
  • 13.2 Deprecation aphoria deprecate with --reason, --superseded-by, --sunset-date
  • 13.3 Migration Guidance — Warnings in scan output, links to replacements
  • 13.4 Migration Dashboard aphoria migrations status, progress tracking, export

Files: src/lifecycle/mod.rs, lifecycle/store.rs, lifecycle/migration.rs


Phase 16: Ignore & Exclusion System

Clean scans by excluding test fixtures and intentional patterns.

  • 16.1 Glob Patterns globset with **, *, ? support
  • 16.2 .aphoriaignore — Gitignore-style patterns, merged with aphoria.toml
  • 16.3 Inline Comments // aphoria:ignore, ignore-next-line, ignore-block
  • 16.4 Ack Export/Import .aphoria/acks.toml, version-controllable

Phase 17: Pattern Enrichment & Best Practices Infrastructure

Backend infrastructure for enriched corpus patterns and team guideline ingestion.

Note: Backend only — UI integration not implemented. Patterns have metadata but dashboard doesn't display it yet.

17.1 Enriched Pattern Metadata

What: Transform bare patterns like "md5: true" into actionable insights "MD5 is deprecated (NIST 2010)".

Task Status
Add enrichment fields to PatternAggregate (category, verdict, explanation, authority_source)
Add pattern_metadata() method to Extractor trait
Create PatternEnricher service with exact/wildcard matching + noise detection corpus/enricher.rs
Implement pattern_metadata() for 10 security extractors See below

Enriched Extractors:

  • WeakCryptoExtractor — MD5, SHA1, DES, RC4 deprecated
  • TlsVersionExtractor — TLS 1.0/1.1 deprecated, 1.2/1.3 recommended
  • TlsVerifyExtractor — cert_verification: false insecure
  • JwtConfigExtractor — algorithm: none forbidden
  • CorsConfigExtractor — allow_origin: * insecure
  • HardcodedSecretsExtractor — API keys/passwords critical
  • SqlInjectionExtractor — string interpolation vulnerable
  • CommandInjectionExtractor — shell exec vulnerable
  • PathTraversalExtractor — user-controlled paths vulnerable
  • InsecureDeserializationExtractor — pickle/yaml.load unsafe

17.2 TeamPolicy Authority Tier

What: New tier 2.5 between Observational and Expert for team-level architectural guidelines.

Task Status
Add TeamPolicy variant to SourceClass enum stemedb-core/src/types/source.rs
Add tier_fractional() for 2.5 representation
Update authority_weight() (0.6) and default_decay_days() (180)
Add "team_policy" parsing to parse_authority_tier() aphoria/src/types/authored_claim.rs
Update all DTO conversions (API, ontology)

17.3 Best Practices Import CLI

What: Batch import claims from TOML files (e.g., hexagonal architecture guidelines).

Task Status
Add Import subcommand to ClaimsCommands cli/claims.rs
Implement handle_claims_import() with merge strategies handlers/claims.rs
Support --authority-tier override
Support --source-guide for tracking
Support --dry-run for preview
Merge strategies: skip_existing, overwrite, fail_on_duplicate

Usage:

aphoria claims import docs/guidelines.toml \
  --authority-tier team_policy \
  --source-guide "hexagonal-arch" \
  --dry-run

17.4 Guideline Tracking System

What: Track which guidelines have been imported for compliance filtering and change detection.

Task Status
Create GuidelineMetadata struct types/ingested_guides.rs
Create IngestedGuidesFile with TOML persistence
Track: id, name, source_path, document_hash, claim_ids
Integrate with import command
Store in .aphoria/ingested_guides.toml

17.5 Updated Comparison Modes

Task Status
Add Contains comparison mode types/authored_claim.rs
Add NotContains comparison mode
Update API DTOs stemedb-api/src/dto/aphoria/types.rs

What's NOT Implemented

Dashboard UI Integration — Enrichment metadata exists in backend but no UI to display it Category Badges — No visual badges for security/architecture/performance Verdict Badges — No visual indicators for deprecated/recommended Filtering UI — No dropdown to filter patterns by category "Hide Noise" Toggle — Noise detection works but no UI control --check-policy Flag — Backend ready but scan filtering not implemented

Files Modified

Core:

  • crates/stemedb-core/src/types/source.rs — TeamPolicy tier
  • crates/stemedb-storage/src/pattern_aggregate_store/mod.rs — Enrichment fields

Aphoria:

  • applications/aphoria/src/extractors/traits.rspattern_metadata() method
  • applications/aphoria/src/corpus/enricher.rsNEW Pattern enrichment service
  • applications/aphoria/src/types/authored_claim.rs — Contains/NotContains modes
  • applications/aphoria/src/types/ingested_guides.rsNEW Guideline tracking
  • applications/aphoria/src/cli/claims.rs — Import subcommand
  • applications/aphoria/src/handlers/claims.rs — Import handler
  • 10 extractor files with pattern_metadata() implementations

API:

  • crates/stemedb-api/src/dto/enums.rs — TeamPolicy DTO
  • crates/stemedb-api/src/dto/aphoria/types.rs — Contains/NotContains DTOs
  • crates/stemedb-api/src/handlers/aphoria/claims.rs — Comparison mode conversion
  • crates/stemedb-api/src/handlers/layered.rs — SourceClass conversion

Ontology:

  • crates/stemedb-ontology/src/dto/enums.rs — TeamPolicy DTO
  • crates/stemedb-ontology/src/dto/conversions.rs — Conversion functions

The Self-Learning Vision (Complete)

Phase 7: Declarative Extractors                          ✅
    ↓
Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅
    ↓
Phase 7.6: Pattern Learning (remember what LLM finds)    ✅
    ↓
Phase 7.7: Pattern Promotion (patterns → extractors)     ✅
    ↓
Phase 7.8: LLM Prompt Evaluation (measure & improve)     ✅
    ↓
Phase 8: Enterprise Extractors (42 total)                ✅
    ↓
Phase 9: Autonomous Generation (fully self-improving)     ✅

Milestone Summary (Completed)

Phase Deliverable Status
0 ConceptPath in StemeDB
2 Aphoria CLI (scan, report, ack)
2A Concept matching (leaf, alias, auto-alias)
1 Authoritative corpus expansion
3 Claude Code skill + hooks
4 Full-cycle pre-commit (sync, drift, staged, hosted)
4.5 Ephemeral scan mode (40x faster)
5 Research agent loop + community corpus
6 Federated Policy & Trust Packs
6.5 Trust Pack Extensions
7 Declarative Extractors
7.5 LLM-in-the-Loop Extraction
7.6 Pattern Learning Store
7.7 Pattern → Extractor Promotion
7.8 LLM Prompt Evaluation
8 Enterprise Extractors (42 total)
9 Autonomous Extractor Generation
10.1 Acknowledgment Expiry
11 Evidence-Based Authority
12 Knowledge Scope Hierarchy
13 Knowledge Lifecycle Management
16 Ignore & Exclusion System
17 Pattern Enrichment & Best Practices Infrastructure (backend only)