This commit implements Phase 17 of the Aphoria roadmap, adding: **Inline Claim Markers (@aphoria:claim):** - New extractor for detecting inline markers in comments - Pending markers tracked in .aphoria/pending_markers.toml - CLI commands: list-markers, formalize-marker, reject-marker - Support for all major comment styles (Rust, Python, SQL, etc.) - Auto-sync during scan (configurable) **Claim Enrichment:** - ClaimEnrichment type with source attribution (inline, extractor, manual) - EnrichedClaimInfo with full enrichment metadata - Extended AuthoredClaim with optional enrichment field - API endpoints for enriched claim queries - Dashboard UI components (enrichment badge, verdict badge) **Enhanced Extractor Trait:** - verifiable_predicates() method for declaring (tail_path, predicate) pairs - 10 security extractors now implement verifiable_predicates - Enables claim suggester skill to find unclaimed patterns **Documentation:** - Phase 17 summary with complete implementation details - Gap fixes summary documenting 8 closed vision gaps - Updated CLI reference with new commands - New aphoria-docs skill for documentation maintenance - Updated roadmap with Phase 17 completion **Integration:** - ClaimsFile support for claim enrichment persistence - Pattern aggregate store support for enrichment queries - Dashboard filters and display for enrichment metadata - API handlers for list-markers and enrichment queries **Tests:** - New gap_fixes_integration test suite - Corpus enricher module with best practices ingestion Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
17 KiB
| created | last_updated | status | feature | timeline |
|---|---|---|---|---|
| 2026-02-08 | 2026-02-08 | Planning Document | Phase 2-3 - LLM-Assisted Document Ingestion | 4 weeks estimated |
Ingest Best Practices Documentation - Executable Policy
Problem Statement
Current Reality:
- Teams write extensive architecture/security/style guides (50+ pages)
- Developers are expected to read and remember all guidelines
- Compliance is checked manually in code review
- Guidelines drift out of sync with code over time
- New team members miss context from old documents
What Users Want:
- Write documentation once (markdown, PDF, confluence)
- Have Aphoria automatically enforce the guidelines
- Get real-time feedback during development
- Maintain compliance without manual review
Vision: Documentation That Enforces Itself
Example: Hexagonal Architecture Guide
Traditional Flow:
1. Architect writes: "HTTP handlers MUST be in adapters/http/"
2. Developer reads guide (hopefully)
3. Developer writes code in wrong location
4. Code reviewer catches it (maybe)
5. Fix during review (wasted time)
With Aphoria Ingestion:
1. Architect writes: "HTTP handlers MUST be in adapters/http/"
2. Run: aphoria ingest-guide architecture.md
3. Developer writes code in wrong location
4. aphoria scan immediately shows:
❌ File location violation
Expected: adapters/http/*_handler.go
Found: adapters/handlers/user.go
Fix: Move to adapters/http/user_handler.go
5. Developer fixes before commit (no review cycles wasted)
User Experience
1. Ingest Phase
$ aphoria ingest-guide docs/architecture/hexagonal.md \
--authority-tier team_policy \
--category architecture \
--dry-run
Analyzing: docs/architecture/hexagonal.md (15 KB, 342 lines)
📊 Extraction Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Section Claims Severity
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Directory Structure 8 MUST
Dependency Rules 6 MUST_NOT
Naming Conventions 5 MUST
Interface Definitions 4 SHOULD
Testing Strategy 3 SHOULD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total 26 claims extracted
🔍 Preview of Extracted Claims:
1. [MUST] HTTP handlers in adapters/http/ directory
Subject: code://go/*/adapters/http/**
Predicate: directory_pattern
Value: *_handler.go
Source: hexagonal.md:45-47
2. [MUST_NOT] Core domain imports infrastructure
Subject: code://go/*/core/domain/**
Predicate: imports_forbidden
Value: infrastructure/*
Source: hexagonal.md:62-64
3. [MUST] Handler files end with _handler.go
Subject: code://go/*/adapters/http/*.go
Predicate: filename_pattern
Value: *_handler.go
Source: hexagonal.md:89-91
... (23 more)
Would add 26 claims to authoritative corpus.
Estimated scan coverage: ~65% of codebase
Proceed with ingestion? [y/N]
2. Compliance Checking
$ aphoria scan --check-policy hexagonal-arch
📋 Policy Compliance Report: Hexagonal Architecture
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Directory Structure (95% compliant)
✓ 45 files in correct locations
❌ 3 violations:
• adapters/handlers/user.go → should be adapters/http/user_handler.go
• adapters/db/user_repo.go → should be adapters/persistence/user_repo.go
• domain/user_service.go → should be core/domain/user_service.go
✅ Dependency Rules (100% compliant)
✓ No forbidden imports detected
✓ Core domain is clean of infrastructure dependencies
⚠️ Naming Conventions (80% compliant)
✓ 35 files follow naming conventions
❌ 9 violations:
• adapters/http/user.go → should be user_handler.go
• adapters/http/order.go → should be order_handler.go
... (7 more)
✅ Interface Definitions (90% compliant)
✓ 18 interfaces properly named
⚠️ 2 warnings:
• PostgresUserRepository → consider UserStore (behavior-based naming)
• MySQLOrderRepository → consider OrderStore (behavior-based naming)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Compliance: 91% (237 of 260 checks passed)
📝 Recommendations:
1. Run: aphoria fix --policy hexagonal-arch --auto-safe
This will automatically fix 8 file location issues
2. Manually review 2 interface naming suggestions
3. Update hexagonal.md if any rules need revision
Last policy update: hexagonal.md (modified 3 days ago)
3. Real-Time Feedback
$ git add adapters/handlers/user.go
$ git commit
⚠️ Pre-commit hook: Aphoria policy check
❌ Policy violations detected (hexagonal-arch):
adapters/handlers/user.go:
❌ File location violation
Expected: adapters/http/*_handler.go
Found: adapters/handlers/user.go
Rule: HTTP handlers must be in adapters/http/
Source: hexagonal.md:45 (team policy)
Suggested fix:
git mv adapters/handlers/user.go adapters/http/user_handler.go
Commit blocked. Fix violations or use --no-verify to skip.
Data Model
Ingested Claim Structure
pub struct IngestedClaim {
/// Unique claim ID
pub id: String,
/// Subject pattern (supports wildcards)
pub subject_pattern: String,
/// Predicate
pub predicate: String,
/// Expected value
pub value: ClaimValue,
/// Comparison mode
pub comparison: ComparisonMode, // MUST, MUST_NOT, SHOULD, MAY
/// Category
pub category: String, // "architecture" | "security" | "style"
/// Explanation (from doc)
pub explanation: String,
/// Authority tier
pub authority_tier: AuthorityTier, // TeamPolicy (Tier 2.5)
/// Source document tracking
pub source: DocumentSource,
/// When ingested
pub ingested_at: u64,
}
pub struct DocumentSource {
/// Path to source document
pub file_path: String,
/// Line numbers in source
pub line_start: u32,
pub line_end: u32,
/// Section heading
pub section: String, // "Directory Structure Rules"
/// Document version (hash)
pub document_hash: String,
}
pub enum ComparisonMode {
Must, // Value MUST match
MustNot, // Value MUST NOT match
Should, // Warning if doesn't match
May, // Informational only
}
Authority Tier Hierarchy
Tier 0: System (StemeDB internals, not user-facing)
Tier 1: Regulatory (RFCs, legal requirements)
Tier 2: Clinical (OWASP, NIST, industry standards)
Tier 2.5: TeamPolicy ← NEW (team-specific guidelines)
Tier 3: Expert (recognized authorities, vetted claims)
Tier 4: Community (project-specific observations)
TeamPolicy tier:
- Higher authority than community observations
- Lower authority than industry standards (OWASP)
- Can override community patterns
- Cannot override RFCs or security standards
- Scoped to project/team
Implementation
Phase 1: Manual Extraction (MVP - 2 days)
User manually creates claims TOML from guidelines, then imports:
# User writes claims.toml manually from reading architecture.md
aphoria claims import team-guidelines.toml --authority-tier team_policy
Pros: Works immediately, no LLM needed Cons: Manual work, doesn't scale
Phase 2: LLM-Assisted Extraction (Week 1 - 3 days)
$ aphoria ingest-guide docs/architecture/hexagonal.md --preview
Processing: hexagonal.md
Using LLM to extract claims...
Found 26 potential claims. Review and edit before importing.
Opening editor...
# Generated claims (edit before importing)
[[claim]]
id = "hex-arch-http-handlers-001"
subject = "code://go/*/adapters/http/**"
predicate = "directory_pattern"
value = "*_handler.go"
comparison = "must"
category = "architecture"
explanation = "HTTP handlers must be in adapters/http/ directory"
source = "hexagonal.md:45-47"
# Edit to refine, then save and close to import
LLM Prompt:
Extract architectural claims from this document.
For each MUST/SHOULD/MUST NOT statement:
1. Identify the subject (what code element)
2. Identify the predicate (what property)
3. Identify the value (expected value)
4. Determine comparison mode (must/should/must_not)
5. Extract explanation
Format as TOML claims.
Example input:
"HTTP handlers MUST be in adapters/http/ directory and end with _handler.go"
Example output:
[[claim]]
subject = "code://go/*/adapters/http/**"
predicate = "directory_pattern"
value = "*_handler.go"
comparison = "must"
explanation = "HTTP handlers must be in adapters/http/ directory"
Implementation:
- File:
applications/aphoria/src/llm/document_ingestion.rs - Uses existing LLM infrastructure
- Outputs TOML for review before import
- User can edit/refine before committing
Phase 3: Automated Extraction with Validation (Week 2 - 4 days)
Fully automated pipeline with confidence scoring:
pub struct DocumentIngester {
llm: LlmClient,
validator: ClaimValidator,
}
impl DocumentIngester {
/// Ingest a document and extract claims.
pub async fn ingest(
&self,
doc_path: &Path,
options: IngestionOptions,
) -> Result<Vec<IngestedClaim>, IngestionError> {
// 1. Parse document (markdown/PDF/text)
let sections = self.parse_document(doc_path)?;
// 2. Extract claims using LLM
let raw_claims = self.extract_claims_from_sections(sections).await?;
// 3. Validate and score confidence
let validated = self.validate_claims(raw_claims)?;
// 4. Filter by confidence threshold
let high_confidence: Vec<_> = validated
.into_iter()
.filter(|c| c.confidence >= options.min_confidence)
.collect();
// 5. Preview or auto-import
if options.dry_run {
self.preview_claims(&high_confidence)?;
Ok(vec![])
} else {
self.import_claims(high_confidence).await
}
}
/// Extract claims from a section using LLM.
async fn extract_claims_from_section(
&self,
section: &DocumentSection,
) -> Result<Vec<ExtractedClaim>, LlmError> {
let prompt = format!(
r#"Extract architectural claims from this section.
Section: {}
Content:
{}
For each claim:
1. Identify subject pattern (supports wildcards)
2. Identify predicate
3. Identify expected value
4. Determine severity (MUST/SHOULD/MAY)
5. Extract explanation
Return as JSON array."#,
section.heading,
section.content,
);
self.llm.extract_structured(prompt).await
}
/// Validate extracted claims for quality.
fn validate_claims(
&self,
claims: Vec<ExtractedClaim>,
) -> Result<Vec<ValidatedClaim>, ValidationError> {
claims
.into_iter()
.map(|claim| {
// Check if subject pattern is valid
let subject_valid = self.validator.validate_subject(&claim.subject);
// Check if predicate is recognized
let predicate_valid = self.validator.validate_predicate(&claim.predicate);
// Compute confidence score
let confidence = self.compute_confidence(&claim, subject_valid, predicate_valid);
ValidatedClaim {
claim,
confidence,
validation_issues: vec![],
}
})
.collect()
}
}
CLI Design
Commands
# Ingest a document
aphoria ingest-guide <path> [options]
Options:
--authority-tier <tier> Authority tier (default: team_policy)
--category <category> Category (architecture|security|style)
--min-confidence <float> Min confidence to include (0.0-1.0, default: 0.7)
--dry-run Preview without importing
--edit Open editor to review/refine before importing
--project <name> Project scope (default: current project)
# List ingested guidelines
aphoria list-guides
# Check compliance against a guideline
aphoria check-compliance <guide-name>
# Update from changed document
aphoria update-guide <guide-name>
# Remove a guideline
aphoria remove-guide <guide-name>
Examples
# Ingest with preview
aphoria ingest-guide docs/architecture/hexagonal.md --dry-run
# Ingest with manual review
aphoria ingest-guide docs/security/owasp-top-10.md --edit
# Check compliance
aphoria check-compliance hexagonal-arch
# Update when doc changes
aphoria update-guide hexagonal-arch --from docs/architecture/hexagonal.md
# List all active guidelines
aphoria list-guides
Integration with Existing Features
1. Conflict Detection
Ingested claims stored as authoritative assertions → existing conflict engine detects violations
2. Scan Reports
Compliance shown in standard scan reports:
Conflicts: 3
❌ hexagonal.md:45 - File in wrong directory
❌ hexagonal.md:62 - Forbidden import detected
❌ hexagonal.md:89 - Invalid filename pattern
3. Authority Lens
TeamPolicy tier (2.5) ranks between Clinical (2) and Expert (3):
- Overrides community observations
- Can be overridden by team-authored claims (explicit)
- Respects RFCs and security standards
4. Pre-commit Hooks
Compliance checking in pre-commit:
#!/bin/bash
# .git/hooks/pre-commit
aphoria scan --check-policy hexagonal-arch --exit-code
Storage
Claims Storage
Ingested claims stored as regular AuthoredClaim instances:
- File:
.aphoria/claims.toml - Tagged with
ingested_from: "hexagonal.md" - Authority tier:
team_policy
Document Metadata
Track source documents:
# .aphoria/ingested_guides.toml
[[guide]]
id = "hexagonal-arch"
name = "Hexagonal Architecture Guidelines"
source_path = "docs/architecture/hexagonal.md"
document_hash = "blake3:abc123..."
ingested_at = 1234567890
claims_count = 26
authority_tier = "team_policy"
category = "architecture"
[[guide]]
id = "security-owasp"
name = "OWASP Top 10 Compliance"
source_path = "docs/security/owasp.md"
document_hash = "blake3:def456..."
ingested_at = 1234567900
claims_count = 15
authority_tier = "team_policy"
category = "security"
Update Detection
$ aphoria scan
⚠️ Warning: Source document has changed
Guide: hexagonal-arch
Source: docs/architecture/hexagonal.md
Last ingested: 3 days ago (hash: abc123...)
Current hash: xyz789...
Run: aphoria update-guide hexagonal-arch
Success Metrics
Phase 1 (Manual Import)
- Users can manually create claims from guidelines
- Claims enforce during scans
- Pre-commit hooks work
Phase 2 (LLM-Assisted)
- LLM extracts 80%+ of claims correctly
- Users can review/edit before importing
- Saves >90% of manual effort
Phase 3 (Automated)
- Confidence scoring filters noise
- Automatic updates when docs change
- Compliance reports show trends
Open Questions
-
How to handle ambiguous statements?
- "Handlers should generally be in adapters/http/"
- Extract as SHOULD with low confidence?
-
How to handle conflicting guidelines?
- Doc A says X, Doc B says Y
- Use authority tier + recency?
-
Should we support non-Markdown formats?
- PDF extraction (common for external standards)
- Confluence/Google Docs (via export)
- Word documents
-
How to version guidelines?
- When doc changes, update claims or create new versions?
- Show history of guideline changes?
-
Should compliance be project-scoped or team-scoped?
- Team-level guidelines apply to all team projects?
- Project-specific guidelines override team guidelines?
Future Enhancements
1. Guided Onboarding
$ aphoria init --with-guides
No guidelines found. Would you like to:
1. Import existing documentation
2. Start from template (hexagonal/clean/ddd)
3. Skip for now
Choice: 1
Enter path to architecture guide: docs/architecture.md
Enter path to security guide: docs/security.md
Enter path to style guide: (skip)
Extracting claims...
Found 42 claims across 2 documents.
Review before importing? [Y/n]
2. Compliance Dashboard
Visual compliance tracking over time:
- Trend graphs (improving/declining)
- Per-guideline compliance scores
- Team comparison (if multiple teams)
3. AI-Generated Fix Suggestions
❌ File location violation
Expected: adapters/http/*_handler.go
Found: adapters/handlers/user.go
Suggested fix:
git mv adapters/handlers/user.go adapters/http/user_handler.go
Apply fix? [y/N]
4. Guideline Templates
Pre-built guidelines for common architectures:
- Hexagonal Architecture
- Clean Architecture
- Domain-Driven Design
- Microservices patterns
- Security baselines (OWASP, NIST)
$ aphoria init-guide --template hexagonal
Imported 35 hexagonal architecture claims.
Customize at: .aphoria/claims.toml
Timeline
- Week 1: Manual import (MVP) + LLM extraction prototype
- Week 2: Automated pipeline + confidence scoring
- Week 3: CLI polish + documentation + examples
- Week 4: Dashboard integration + compliance reports
Total: 4 weeks for complete feature