stemedb/applications/aphoria/roadmap.md
jml 9bfa626203 docs: reorganize documentation structure for clarity
Major documentation restructure to improve discoverability and reduce duplication.

## Changes

**Deleted (Archived/Consolidated)**:
- Removed duplicate getting started guides
- Archived outdated planning documents
- Consolidated corpus and configuration docs
- Removed obsolete vision/spec files (superseded by vision.md)
- Cleaned up scrapyard and old PDFs

**New Structure**:
- docs/about/ - Project overview and introduction
- docs/guides/ - User guides (moved from root)
- docs/specs/ - Technical specifications
- docs/sdk/ - SDK documentation (Go)
- docs/references/ - API references
- docs/archive/ - Archived historical docs
- applications/aphoria/docs/advanced/ - Advanced topics
- applications/aphoria/docs/reference/ - CLI reference
- applications/aphoria/docs/archive/ - Archived aphoria docs

**Updated**:
- README.md - New root README with clear navigation
- CONTRIBUTING.md - Contribution guidelines
- CLAUDE.md - Updated paths to new structure
- roadmap.md - Added recent completions

## Files Changed
- 57 files changed
- 1,977 insertions(+)
- 961 deletions(-)

**Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 07:33:40 +00:00

26 KiB
Raw Permalink Blame History

Aphoria Roadmap

Completed phases archived in roadmap-archive.md


Status Overview

Phase Deliverable Status
09, 1113, 1617 Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment Archived
CC Corpus Infrastructure (Community Corpus, Wiki Import, Pattern Aggregation, Async Default) Complete
10 UX & Enterprise Polish 🔄 Partial (10.1 , 10.210.3 )
14 Governance Workflows 🎯 Current
DF-1 Dogfood: Database Connection Pool 🎯 ACTIVE
15 Evidence Source Integration Future
A6 AST-Aware Observation & Claim Verification Future

Current State

  • 42 built-in extractors + declarative custom extractors
  • Emergent corpus: RFC, OWASP, Vendor sources + community-driven patterns (CC.6 )
  • Community corpus enabled by default (CC.7 ): use_community: true, proper async, no runtime hacks
  • Pattern aggregation active: Observations auto-feed pattern aggregates after each scan
  • No hardcoded assertions: Bootstrap via wiki import or Trust Packs
  • Ephemeral mode (~0.25s), persistent mode with drift detection
  • Observation/claim distinction (A1A5 complete)
  • aphoria verify run|map for claim verification
  • 10 claims dogfooded in .aphoria/claims.toml
  • Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback

Recently Completed: Corpus Infrastructure (Phase CC )

Phase CC.1-CC.3: Removed hardcoded corpus, built emergent system (Feb 6-7)

  • Deleted hardcoded.rs (369 lines, 19 assertions)
  • Pattern aggregates stored in StemeDB: community://pattern/{BLAKE3(SPV)}
  • Multi-tier promotion: 95%+ (Regulatory), 80%+ (Clinical), 50%+ (Emerging, review required)
  • Wiki import: aphoria corpus import wiki ~/docs parses MUST/SHOULD patterns

Phase CC.6: Pattern Aggregation (Emergent Learning) (Feb 8)

  • Observations now automatically feed back into pattern aggregates
  • Every scan with --persist --sync contributes to community learning
  • Config: aggregation_enabled: true (default)
  • Tracks project_count and observation_count per pattern
  • Privacy-preserving: wildcarded subjects, project deduplication

Phase CC.7: Make Community Corpus Default (Feb 8)

  • Created AsyncCorpusBuilder trait for async-native corpus builders
  • Refactored CommunityCorpusBuilder to implement AsyncCorpusBuilder
  • Removed rt.block_on() hack that caused "runtime within runtime" errors
  • Made entire corpus building chain properly async (16 functions updated)
  • Enabled use_community: true by default in CorpusConfig
  • All 1189 tests pass, no clippy warnings, no runtime errors

Philosophy: The corpus isn't written by experts. It's discovered by the community and validated by authorities.


Phase 10: UX & Enterprise Polish (Partial)

10.1 Acknowledgment Expiry — archived

10.2 Human-Readable Signer Names

Impact: MEDIUM | Effort: MEDIUM | Priority: P2

Map issuer hex IDs to human-readable team names in output.

Task Status
Add signer_name: Option<String> to PackHeader
Add contact: Option<String> to PackHeader (Slack channel, email)
Update policy export/import to preserve new fields
Show "Signed by Platform Security Team" instead of hex in output
Backward-compat: gracefully handle packs without new fields

10.3 Speed Benchmarks

Impact: LOW | Effort: LOW | Priority: P3

Task Status
Create benchmarks/ directory with test corpora
Add aphoria scan --benchmark flag for self-test
Document test conditions in benchmark results

Phase CC: Corpus Infrastructure (Community Corpus)

Completed: 2026-02-08 | Removed hardcoded corpus, built emergent community-driven system

Philosophy

The corpus isn't written by experts. It's discovered by the community and validated by authorities. 95% adoption = "This is what the community does" = Authoritative.

CC.1 Delete Hardcoded Corpus

Task Status
Remove applications/aphoria/src/corpus/hardcoded.rs (369 lines)
Remove include_hardcoded from CorpusConfig
Remove from CorpusRegistry::with_defaults()
Update tests to use community corpus
Fix 5 pre-existing clippy errors in stemedb-api

Implemented: Destructive pre-release approach - no deprecation warnings, just deleted.

CC.2 Community Corpus Builder

Task Status
Create applications/aphoria/src/corpus/community.rs (393 lines)
Create applications/aphoria/src/corpus/thresholds.rs (230 lines)
Create applications/aphoria/src/corpus/resolver.rs (220 lines)
Create applications/aphoria/src/community/pattern_store.rs (332 lines)
Implement PatternAggregateStore trait with StemeDB backend
Multi-tier promotion: 95% (Regulatory), 80% (Clinical), 50% (Emerging)
Content-addressed storage: community://pattern/{BLAKE3(SPV)}
Config integration: use_community flag (opt-in)
Full scan flow integration

Storage Architecture:

  • Pattern aggregates stored as StemeDB assertions (no TOML files)
  • Predicate: pattern_aggregate with JSON metadata
  • Deduplication via content-addressed subjects
  • Privacy-preserving: wildcarded subjects, k-anonymity

CC.3 Wiki Import Bootstrap

Task Status
Create applications/aphoria/src/corpus/wiki_importer.rs (332 lines)
Regex extraction of MUST/SHOULD patterns from markdown
Authority source parsing (RFC, OWASP, CWE references)
Smart subject normalization (TLS → tls/cert_verification)
CLI command: aphoria corpus import wiki <path>
PatternAggregator write path (stores to StemeDB)
Integration tests with fixtures (6 tests)
Documentation: docs/bootstrap-corpus.md

Usage:

# Create wiki with best practices
mkdir -p .aphoria/wiki
echo "TLS cert verification MUST be enabled. Authority: RFC 5246" > .aphoria/wiki/tls.md

# Import patterns
aphoria corpus import wiki .aphoria/wiki
# → Patterns now in StemeDB, available for conflict detection

CC.4 Trust Pack Bootstrap

Task Status
Extend Trust Packs to include pattern aggregates Future
aphoria trust-pack install <name> writes patterns to StemeDB Future
Create rfc-owasp-baseline.toml with ~20 common patterns Future

Status: Infrastructure exists, implementation deferred. Wiki import covers bootstrap needs.

CC.5 Skill-Driven Cold Start

Task Status
Enhance aphoria-suggest skill with bootstrap mode Future
Detect empty corpus during scan Future
Analyze project structure (Cargo.toml, package.json) Future
Suggest 3-5 baseline patterns based on detected stack Future

Status: Skill exists, bootstrap mode not implemented. Manual wiki creation works well.

CC.6 Pattern Aggregation (Emergent Learning)

Completed: 2026-02-08 | Observations now feed back into pattern aggregates automatically

Task Status
Add aggregation_enabled config field (default: true)
Implement aggregate_observations_to_patterns() in scanner
Add StemeDBPatternStore::get_pattern_by_spv() for lookup
Add StemeDBPatternStore::update_pattern() for updates
Add compute_project_hash() for deduplication
Hook into scan flow after observation recording
Group observations by (subject, predicate, value)
Wildcard project paths for anonymization
Create or update PatternAggregate records
Track project_count and observation_count

Implementation:

// scanner.rs:344-357
if config.corpus.aggregation_enabled && should_persist_locally {
    let project_hash = compute_project_hash(project_root);
    aggregate_observations_to_patterns(&novel_claims, &episteme, &project_hash).await?;
}

Flow:

  1. Scan extracts observations → recorded as Tier 4 assertions
  2. Observations aggregated by (wildcarded_subject, predicate, value)
  3. For each unique pattern:
    • If exists: increment observation_count, check new project → increment project_count
    • If new: create PatternAggregate with initial counts
  4. Stored as assertions with predicate "pattern_aggregate"

Result: The corpus is now emergent. Every scan with --persist --sync feeds the learning loop.


What Remains (Future Enhancement)

CC.4 Trust Pack Bootstrap (Unchanged - Future enhancement)

CC.5 Skill-Driven Cold Start (Unchanged - Future enhancement)


CC.7 Make Community Corpus Default

Completed: 2026-02-08 | Community corpus now enabled by default, async runtime issue resolved

Task Status
Create AsyncCorpusBuilder trait for async corpus builders
Implement dual registry (sync + async builders)
Refactor CommunityCorpusBuilder to implement AsyncCorpusBuilder
Remove rt.block_on() hack, use proper .await
Make build_corpus_with_stores() async
Make create_authoritative_corpus() async
Make EphemeralDetector::new() async
Make extract_claims_from_files() async
Update all 16 function callers to use .await
Change use_community: falsetrue in defaults
Verify tests pass with community corpus enabled (1189 tests)

Architecture Improvement:

  • Before: Sync CorpusBuilder trait forced async operations to use rt.block_on(), causing runtime errors in async contexts
  • After: Dual-trait approach (CorpusBuilder + AsyncCorpusBuilder) allows sync builders (RFC, OWASP, Vendor) to stay simple while community builder uses proper async
  • Result: No block_on() hacks anywhere, proper async/await throughout

Verification:

RUST_LOG=aphoria=debug aphoria scan --persist --sync .
# Logs show:
# ✅ "Registered community corpus builder (async)"
# ✅ "Building corpus (async)" for Community builder
# ✅ "Querying popular patterns from StemeDB"
# ✅ No "Cannot start a runtime from within a runtime" errors

CC.4 Trust Pack System (Bootstrap Option 2)

Task Status
aphoria trust-pack export --source community
aphoria trust-pack install <name>
Create rfc-owasp-bootstrap Trust Pack from old hardcoded corpus
Trust Pack validation and signing
Trust Pack registry/sharing mechanism

Usage:

aphoria trust-pack install rfc-owasp-bootstrap
# Installs 19 baseline assertions for new projects

CC.5 Corpus Management CLI

Task Status
aphoria corpus build - Build community corpus
aphoria corpus list - Show loaded corpus assertions
aphoria corpus candidates --min-adoption 0.50 - List promotion candidates
aphoria corpus promote <pattern-id> - Manual promotion
Update aphoria-corpus-curator skill for manual review

CC.6 Multi-Layer Corpus Resolver

Task Status
Create applications/aphoria/src/corpus/resolver.rs
Priority layers: Manual overrides > Trust Packs > Community > (deprecated hardcoded)
Conflict resolution: higher priority overwrites lower
Config: use_community = true default
Config: include_hardcoded = false default (post-migration)

Phase 14: Governance Workflows 🎯

Vision: Clear approval paths for pattern promotion with audit trails.

14.1 Approval Workflow Definition

Task Status
Create src/governance/mod.rs module
Define ApprovalWorkflow struct
Define ApprovalStage with required approvers
Support evidence-based auto-approve thresholds
Config: define workflows in .aphoria.toml

14.2 Approval State Machine

Task Status
Implement state transitions (pending → approved/rejected)
Multi-stage approval support
Timeout and escalation policies
Store approval history with timestamps

14.3 Approval CLI

Task Status
aphoria governance pending — list pending approvals
aphoria governance approve <id> --comment "..."
aphoria governance reject <id> --reason "..."
aphoria governance escalate <id>
Show approval status in pattern list

14.4 SOC 2 Audit Trail

Task Status
Full audit log for all governance actions
aphoria audit trail --pattern <id> — show timeline
Export governance history for auditors
Include approver identity and timestamp

Phase DF-1: Dogfood Project - Database Connection Pool 🎯

Status: ACTIVE | Start: 2026-02-09 | Target: 2026-02-14 (5 days)

Vision: Build a production-ready database connection pool with intentional violations, use Aphoria to detect and guide remediation. Demonstrates real-world value in preventing production incidents.

Overview

Product: dbpool - Safe, opinionated PostgreSQL connection pool for Rust

Why This Matters:

  • Connection pool misconfigurations cause real P0 incidents
  • Clear authority sources (HikariCP, PostgreSQL docs)
  • Demonstrates Aphoria preventing actual production problems
  • "Aphoria caught this before deployment" is compelling ROI

Key Metrics:

  • Claims to extract: 25-30
  • Intentional violations: 7-8
  • Expected detection rate: 100%
  • Final state: 0 conflicts, production-ready

DF-1.1 Preparation & Corpus Building (Day 1) 🔄

Goal: Extract claims from authority sources and populate corpus database

Task Status
Create project structure at applications/aphoria/dogfood/dbpool/
Write comprehensive plan in dogfood/dbpool/plan.md
Fetch HikariCP configuration documentation
Fetch PostgreSQL connection pooling guide
Extract OWASP A07 credential guidance
Create 25-30 claims via CLI (aphoria corpus create)
Verify all claims queryable via API
Document claim templates for future dogfoods

Deliverables:

  • docs/sources/hikaricp-config.md
  • docs/sources/postgresql-pooling.md
  • docs/sources/owasp-credentials.md
  • 25-30 claims in corpus database
  • Verification report

DF-1.2 Initial Implementation with Violations (Day 2)

Goal: Write working code that compiles but violates best practices

Task Status
Create Rust project with Cargo.toml
Implement PoolConfig with 5 violations
Implement ConnectionPool with 2 violations
Add basic tests (that pass despite violations)
Verify compilation successful

Intentional Violations:

  1. Unbounded max_connections (CRITICAL)
  2. Plaintext password in connection string (CRITICAL)
  3. Missing max_lifetime (CRITICAL)
  4. Excessive connection_timeout (ERROR)
  5. Zero min_connections (ERROR)
  6. Missing connection validation (ERROR)
  7. ⚠️ No metrics exposed (WARNING)
  8. ⚠️ Missing leak detection (WARNING)

DF-1.3 First Scan & Verification (Day 3)

Goal: Run Aphoria scan and verify all violations detected

Task Status
Create .aphoria/config.toml
Run initial scan, save results JSON
Verify 7-8 violations detected (100% accuracy) ⚠️ Gap identified
Generate markdown report
Take screenshots for demo
Verify 0 false positives

Actual Results:

  • 0/7 violations detected (expected - documented in planning as Scenario 1)
  • Built-in extractors cover security patterns, not library API patterns
  • All 7 claims authored successfully via A2 system
  • Verify system working correctly (all claims returned "missing" verdict)
  • Key Finding: Extractor coverage gap identified and documented

Discovered Limitation: Aphoria's 42 built-in extractors excel at security/infrastructure patterns (TLS, JWT, CORS, SQL injection, rate limits) but don't cover library API design validation (struct field types, missing fields, numeric constraints, function call patterns).

Why This Matters:

  • This is the expected outcome documented in STATE-2026-02-10.md (Scenario 1)
  • Validates Aphoria's architecture (claims, verify, scanning all work correctly)
  • Identifies product gap: custom extractors require Rust code, not TOML
  • Confirms LLM automation requirement for flywheel (needs /aphoria-custom-extractor-creator skill)

See: dogfood/dbpool/DAY3-FINDINGS.md for complete analysis

DF-1.4 Remediation & Re-verification (Day 4)

Goal: Fix violations incrementally, re-scan after each fix

Task Status
Fix unbounded max_connections → re-scan
Fix plaintext password → re-scan
Fix missing max_lifetime → re-scan
Fix excessive timeouts → re-scan
Fix zero min_connections → re-scan
Add connection validation → re-scan
Add metrics exposure → re-scan
Add leak detection → re-scan
Final verification: 0 conflicts

Deliverables:

  • Progressive scan results (v1 through v6)
  • Git tags for each fix milestone
  • Final clean scan report

DF-1.5 Documentation & Demo Preparation (Day 5)

Goal: Create compelling documentation and demo materials

Task Status
Write success story document
Create demo script for live presentation
Record performance metrics
Create before/after visual comparison
Document prevented incidents with cost estimates
Update this roadmap with completion status

Deliverables:

  • docs/SUCCESS-STORY.md - Comprehensive case study
  • demo.sh - Automated demo script
  • Screenshots and visuals
  • Metrics report (accuracy, performance)

Success Metrics

Metric Target Actual
Claims Extracted 25-30 TBD
Violations Detected 7-8 TBD
Detection Accuracy 100% TBD
False Positives 0 TBD
Scan Performance ≤0.3s TBD
Final Conflicts 0 TBD

Lessons Learned

From Day 3 (2026-02-10):

  1. Extractor Coverage Gap Validated

    • Built-in extractors (42 total) cover security patterns excellently
    • Library API design patterns (struct fields, type constraints) need custom extractors
    • Custom extractors require Rust code (~10-20 hours), not TOML configuration
    • This was documented in planning (Scenario 1 vs 2) and validated through execution
  2. Authored Claims System Works

    • A2 system successfully created 7 claims with full provenance/invariant/consequence
    • Claims loaded correctly, verify system working as designed
    • All claims returned "missing" verdict (correct - no matching observations)
    • Demonstrates claim authoring workflow even without detection
  3. Flywheel Automation is Critical

    • Manual TOML configuration cannot address the gap
    • Requires LLM-driven extractor generation (/aphoria-custom-extractor-creator skill)
    • Confirms vision.md's emphasis on LLM automation as core, not optional
    • Manual CLI is debug interface, not primary workflow
  4. Dogfooding Reveals Product Gaps

    • Time investment: Day 3 took 8 hours (3x planned) due to troubleshooting
    • Found fundamental limitation, not implementation bug
    • "Failure" to detect is actually success at identifying product needs
    • Documentation produced (CUSTOM-EXTRACTOR-GUIDE.md) valuable despite approach not working
  5. Next Priority Clear

    • Implement /aphoria-custom-extractor-creator skill (Priority 1)
    • LLM reads violation examples → generates Rust extractor code
    • Re-run dogfood to validate end-to-end automation
    • Expand built-in extractor library with common API patterns

Next Dogfoods

Potential follow-up dogfooding projects:

  • Health check service (healthd)
  • Rate limiter middleware (ratelimit-rs)
  • Secrets manager client (secrets-rs)

Full Plan: See applications/aphoria/dogfood/dbpool/plan.md


Phase 15: Evidence Source Integration

Vision: ADRs, specs, and standards automatically link to patterns.

15.1 ADR Auto-Detection

Task Status
Create src/evidence/adr.rs
Detect ADR-XXX patterns in commit messages
Scan for ADR files in standard locations
Parse ADR content for related patterns
Link ADR to patterns automatically

15.2 Spec File Detection

Task Status
Create src/evidence/spec.rs
Detect spec files (specs/*.md, *.spec.md)
Parse requirement IDs (REQ-XXX)
Link requirements to patterns
Show requirement coverage in reports

15.3 Standard Reference Extraction

Task Status
Parse RFC references (RFC 7519)
Parse OWASP references (OWASP A03:2021)
Parse NIST references (NIST SP 800-53)
Auto-link to authoritative corpus

15.4 Evidence Display

Task Status
Show full evidence chain in pattern output
aphoria patterns --by-evidence grouping

Phase A6: AST-Aware Observation & Claim Verification

Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the Scout produces structurally richer observations that regex can't, and the Judge verifies authored claims against code rather than classifying security issues.

Why This Matters

The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:

# Regex sees `requests.get(url, verify=should_verify)` — no match
# AST sees `should_verify = False` in scope — match
should_verify = False
requests.get(url, verify=should_verify)

And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find #[derive( but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.

A6.1 Tree-sitter Infrastructure

Task Status
Add tree-sitter + language grammars to Cargo.toml
Create src/scout/mod.rs module
src/scout/engine.rs — parse files, run SCM queries
CandidateSnippet type with structural context
src/scout/queries/.scm query files per category/language
Language support: Python, Go, Rust, JavaScript/TypeScript
pub struct CandidateSnippet {
    pub file_path: String,
    pub language: Language,
    pub start_line: usize,
    pub end_line: usize,
    pub code: String,
    pub context_variables: HashMap<String, String>,
    pub query_id: String,
}

A6.2 Scout as Observation Producer

AST-aware ROI detection for patterns regex can't follow.

Task Status
Variable indirection tracking (assign → use across lines)
Context expansion: function scope, variable defs, comments
Deduplication with existing regex extractors
SCM queries for TLS, secrets, auth, crypto categories
Integration: run scout after regex, drop overlaps, combine

Key design: Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.

A6.3 Judge as Claim Verifier

LLM receives focused snippet + authored claim → structured verdict.

Task Status
Refactor LlmExtractor to accept CandidateSnippet + AuthoredClaim
Verification prompt: "Does this code satisfy this claim?"
Structured output: `{ verdict: PASS FAIL
Wire into aphoria verify Direction 2 (walk claims, verify in code)
Maps to Extractor::verify() concept (historical: vision-gaps-2026-02-08)

Token efficiency: Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.

A6.4 Scout for Claim Suggestion

Scout identifies ROIs without matching authored claims, feeds context to aphoria-suggest.

Task Status
Identify ROIs with no matching claim in .aphoria/claims.toml
Enrich context for skill: snippet + function name + surrounding comments
Feed to aphoria-suggest skill for claim drafting

A6.5 Evaluation

Task Status
Scout recall: "Did scout find the vulnerable line in fixture?"
Judge precision: "Given snippet + claim, did LLM classify correctly?"
Cost metric: tokens_per_verification vs monolithic approach
Parallel run: shadow mode alongside regex for tuning

Phase A6 Priority

Lower priority than A5 flywheel completion and Phase 14 governance. Build when:

  1. Regex extractors hit limits on specific indirection patterns
  2. aphoria verify Direction 2 needs LLM-backed verification
  3. aphoria-suggest needs richer context than regex observations provide

Enterprise Pilot Success Metrics

90-Day Pilot Targets

Metric Target Measurement
Patterns captured 100+ observations Count in knowledge graph
Patterns promoted 10+ conventions Count with status=Active
Cross-team adoption 2+ teams connected Unique team_ids
New hire guidance events 5+ accepted suggestions Accept rate tracking
False positive rate <10% FP feedback / total flags
Evidence-backed patterns >50% Patterns with Research+ evidence

180-Day Production Targets

Metric Target Measurement
Knowledge retention 0 lost patterns on departures Audit log
Onboarding velocity 50% faster ramp Time to first PR
Convention adoption 80% across org Compliance rate
SOC 2 evidence Audit pass External validation
Deprecated pattern migration 90% complete by sunset Migration tracking

Enterprise Simulation UAT

See: uat/enterprise-simulation-uat.md