jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure

This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-08 20:18:20 +00:00

17 KiB

Raw Blame History

created	last_updated	status	feature	timeline
2026-02-08	2026-02-08	Planning Document	Phase 2-3 - LLM-Assisted Document Ingestion	4 weeks estimated

Ingest Best Practices Documentation - Executable Policy

Problem Statement

Current Reality:

Teams write extensive architecture/security/style guides (50+ pages)
Developers are expected to read and remember all guidelines
Compliance is checked manually in code review
Guidelines drift out of sync with code over time
New team members miss context from old documents

What Users Want:

Write documentation once (markdown, PDF, confluence)
Have Aphoria automatically enforce the guidelines
Get real-time feedback during development
Maintain compliance without manual review

Vision: Documentation That Enforces Itself

Example: Hexagonal Architecture Guide

Traditional Flow:

1. Architect writes: "HTTP handlers MUST be in adapters/http/"
2. Developer reads guide (hopefully)
3. Developer writes code in wrong location
4. Code reviewer catches it (maybe)
5. Fix during review (wasted time)

With Aphoria Ingestion:

1. Architect writes: "HTTP handlers MUST be in adapters/http/"
2. Run: aphoria ingest-guide architecture.md
3. Developer writes code in wrong location
4. aphoria scan immediately shows:
   ❌ File location violation
   Expected: adapters/http/*_handler.go
   Found: adapters/handlers/user.go
   Fix: Move to adapters/http/user_handler.go
5. Developer fixes before commit (no review cycles wasted)

User Experience

1. Ingest Phase

$ aphoria ingest-guide docs/architecture/hexagonal.md \
  --authority-tier team_policy \
  --category architecture \
  --dry-run

Analyzing: docs/architecture/hexagonal.md (15 KB, 342 lines)

📊 Extraction Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Section                   Claims    Severity
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Directory Structure       8         MUST
Dependency Rules          6         MUST_NOT
Naming Conventions        5         MUST
Interface Definitions     4         SHOULD
Testing Strategy          3         SHOULD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total                     26 claims extracted

🔍 Preview of Extracted Claims:

1. [MUST] HTTP handlers in adapters/http/ directory
   Subject: code://go/*/adapters/http/**
   Predicate: directory_pattern
   Value: *_handler.go
   Source: hexagonal.md:45-47

2. [MUST_NOT] Core domain imports infrastructure
   Subject: code://go/*/core/domain/**
   Predicate: imports_forbidden
   Value: infrastructure/*
   Source: hexagonal.md:62-64

3. [MUST] Handler files end with _handler.go
   Subject: code://go/*/adapters/http/*.go
   Predicate: filename_pattern
   Value: *_handler.go
   Source: hexagonal.md:89-91

... (23 more)

Would add 26 claims to authoritative corpus.
Estimated scan coverage: ~65% of codebase

Proceed with ingestion? [y/N]

2. Compliance Checking

$ aphoria scan --check-policy hexagonal-arch

📋 Policy Compliance Report: Hexagonal Architecture
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Directory Structure (95% compliant)
   ✓ 45 files in correct locations
   ❌ 3 violations:
      • adapters/handlers/user.go → should be adapters/http/user_handler.go
      • adapters/db/user_repo.go → should be adapters/persistence/user_repo.go
      • domain/user_service.go → should be core/domain/user_service.go

✅ Dependency Rules (100% compliant)
   ✓ No forbidden imports detected
   ✓ Core domain is clean of infrastructure dependencies

⚠️  Naming Conventions (80% compliant)
   ✓ 35 files follow naming conventions
   ❌ 9 violations:
      • adapters/http/user.go → should be user_handler.go
      • adapters/http/order.go → should be order_handler.go
      ... (7 more)

✅ Interface Definitions (90% compliant)
   ✓ 18 interfaces properly named
   ⚠️  2 warnings:
      • PostgresUserRepository → consider UserStore (behavior-based naming)
      • MySQLOrderRepository → consider OrderStore (behavior-based naming)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Compliance: 91% (237 of 260 checks passed)

📝 Recommendations:
  1. Run: aphoria fix --policy hexagonal-arch --auto-safe
     This will automatically fix 8 file location issues

  2. Manually review 2 interface naming suggestions

  3. Update hexagonal.md if any rules need revision

Last policy update: hexagonal.md (modified 3 days ago)

3. Real-Time Feedback

$ git add adapters/handlers/user.go
$ git commit

⚠️  Pre-commit hook: Aphoria policy check

❌ Policy violations detected (hexagonal-arch):

  adapters/handlers/user.go:
    ❌ File location violation
       Expected: adapters/http/*_handler.go
       Found: adapters/handlers/user.go
       Rule: HTTP handlers must be in adapters/http/
       Source: hexagonal.md:45 (team policy)

       Suggested fix:
         git mv adapters/handlers/user.go adapters/http/user_handler.go

Commit blocked. Fix violations or use --no-verify to skip.

Data Model

Ingested Claim Structure

pub struct IngestedClaim {
    /// Unique claim ID
    pub id: String,

    /// Subject pattern (supports wildcards)
    pub subject_pattern: String,

    /// Predicate
    pub predicate: String,

    /// Expected value
    pub value: ClaimValue,

    /// Comparison mode
    pub comparison: ComparisonMode,  // MUST, MUST_NOT, SHOULD, MAY

    /// Category
    pub category: String,  // "architecture" | "security" | "style"

    /// Explanation (from doc)
    pub explanation: String,

    /// Authority tier
    pub authority_tier: AuthorityTier,  // TeamPolicy (Tier 2.5)

    /// Source document tracking
    pub source: DocumentSource,

    /// When ingested
    pub ingested_at: u64,
}

pub struct DocumentSource {
    /// Path to source document
    pub file_path: String,

    /// Line numbers in source
    pub line_start: u32,
    pub line_end: u32,

    /// Section heading
    pub section: String,  // "Directory Structure Rules"

    /// Document version (hash)
    pub document_hash: String,
}

pub enum ComparisonMode {
    Must,       // Value MUST match
    MustNot,    // Value MUST NOT match
    Should,     // Warning if doesn't match
    May,        // Informational only
}

Authority Tier Hierarchy

Tier 0: System      (StemeDB internals, not user-facing)
Tier 1: Regulatory  (RFCs, legal requirements)
Tier 2: Clinical    (OWASP, NIST, industry standards)
Tier 2.5: TeamPolicy ← NEW (team-specific guidelines)
Tier 3: Expert      (recognized authorities, vetted claims)
Tier 4: Community   (project-specific observations)

TeamPolicy tier:

Higher authority than community observations
Lower authority than industry standards (OWASP)
Can override community patterns
Cannot override RFCs or security standards
Scoped to project/team

Implementation

Phase 1: Manual Extraction (MVP - 2 days)

User manually creates claims TOML from guidelines, then imports:

# User writes claims.toml manually from reading architecture.md
aphoria claims import team-guidelines.toml --authority-tier team_policy

Pros: Works immediately, no LLM needed Cons: Manual work, doesn't scale

Phase 2: LLM-Assisted Extraction (Week 1 - 3 days)

$ aphoria ingest-guide docs/architecture/hexagonal.md --preview

Processing: hexagonal.md
Using LLM to extract claims...

Found 26 potential claims. Review and edit before importing.
Opening editor...

# Generated claims (edit before importing)
[[claim]]
id = "hex-arch-http-handlers-001"
subject = "code://go/*/adapters/http/**"
predicate = "directory_pattern"
value = "*_handler.go"
comparison = "must"
category = "architecture"
explanation = "HTTP handlers must be in adapters/http/ directory"
source = "hexagonal.md:45-47"

# Edit to refine, then save and close to import

LLM Prompt:

Extract architectural claims from this document.

For each MUST/SHOULD/MUST NOT statement:
1. Identify the subject (what code element)
2. Identify the predicate (what property)
3. Identify the value (expected value)
4. Determine comparison mode (must/should/must_not)
5. Extract explanation

Format as TOML claims.

Example input:
"HTTP handlers MUST be in adapters/http/ directory and end with _handler.go"

Example output:
[[claim]]
subject = "code://go/*/adapters/http/**"
predicate = "directory_pattern"
value = "*_handler.go"
comparison = "must"
explanation = "HTTP handlers must be in adapters/http/ directory"

Implementation:

File: applications/aphoria/src/llm/document_ingestion.rs
Uses existing LLM infrastructure
Outputs TOML for review before import
User can edit/refine before committing

Phase 3: Automated Extraction with Validation (Week 2 - 4 days)

Fully automated pipeline with confidence scoring:

pub struct DocumentIngester {
    llm: LlmClient,
    validator: ClaimValidator,
}

impl DocumentIngester {
    /// Ingest a document and extract claims.
    pub async fn ingest(
        &self,
        doc_path: &Path,
        options: IngestionOptions,
    ) -> Result<Vec<IngestedClaim>, IngestionError> {
        // 1. Parse document (markdown/PDF/text)
        let sections = self.parse_document(doc_path)?;

        // 2. Extract claims using LLM
        let raw_claims = self.extract_claims_from_sections(sections).await?;

        // 3. Validate and score confidence
        let validated = self.validate_claims(raw_claims)?;

        // 4. Filter by confidence threshold
        let high_confidence: Vec<_> = validated
            .into_iter()
            .filter(|c| c.confidence >= options.min_confidence)
            .collect();

        // 5. Preview or auto-import
        if options.dry_run {
            self.preview_claims(&high_confidence)?;
            Ok(vec![])
        } else {
            self.import_claims(high_confidence).await
        }
    }

    /// Extract claims from a section using LLM.
    async fn extract_claims_from_section(
        &self,
        section: &DocumentSection,
    ) -> Result<Vec<ExtractedClaim>, LlmError> {
        let prompt = format!(
            r#"Extract architectural claims from this section.

Section: {}

Content:
{}

For each claim:
1. Identify subject pattern (supports wildcards)
2. Identify predicate
3. Identify expected value
4. Determine severity (MUST/SHOULD/MAY)
5. Extract explanation

Return as JSON array."#,
            section.heading,
            section.content,
        );

        self.llm.extract_structured(prompt).await
    }

    /// Validate extracted claims for quality.
    fn validate_claims(
        &self,
        claims: Vec<ExtractedClaim>,
    ) -> Result<Vec<ValidatedClaim>, ValidationError> {
        claims
            .into_iter()
            .map(|claim| {
                // Check if subject pattern is valid
                let subject_valid = self.validator.validate_subject(&claim.subject);

                // Check if predicate is recognized
                let predicate_valid = self.validator.validate_predicate(&claim.predicate);

                // Compute confidence score
                let confidence = self.compute_confidence(&claim, subject_valid, predicate_valid);

                ValidatedClaim {
                    claim,
                    confidence,
                    validation_issues: vec![],
                }
            })
            .collect()
    }
}

CLI Design

Commands

# Ingest a document
aphoria ingest-guide <path> [options]

Options:
  --authority-tier <tier>    Authority tier (default: team_policy)
  --category <category>      Category (architecture|security|style)
  --min-confidence <float>   Min confidence to include (0.0-1.0, default: 0.7)
  --dry-run                  Preview without importing
  --edit                     Open editor to review/refine before importing
  --project <name>           Project scope (default: current project)

# List ingested guidelines
aphoria list-guides

# Check compliance against a guideline
aphoria check-compliance <guide-name>

# Update from changed document
aphoria update-guide <guide-name>

# Remove a guideline
aphoria remove-guide <guide-name>

Examples

# Ingest with preview
aphoria ingest-guide docs/architecture/hexagonal.md --dry-run

# Ingest with manual review
aphoria ingest-guide docs/security/owasp-top-10.md --edit

# Check compliance
aphoria check-compliance hexagonal-arch

# Update when doc changes
aphoria update-guide hexagonal-arch --from docs/architecture/hexagonal.md

# List all active guidelines
aphoria list-guides

Integration with Existing Features

1. Conflict Detection

Ingested claims stored as authoritative assertions → existing conflict engine detects violations

2. Scan Reports

Compliance shown in standard scan reports:

Conflicts: 3
  ❌ hexagonal.md:45 - File in wrong directory
  ❌ hexagonal.md:62 - Forbidden import detected
  ❌ hexagonal.md:89 - Invalid filename pattern

3. Authority Lens

TeamPolicy tier (2.5) ranks between Clinical (2) and Expert (3):

Overrides community observations
Can be overridden by team-authored claims (explicit)
Respects RFCs and security standards

4. Pre-commit Hooks

Compliance checking in pre-commit:

#!/bin/bash
# .git/hooks/pre-commit

aphoria scan --check-policy hexagonal-arch --exit-code

Storage

Claims Storage

Ingested claims stored as regular AuthoredClaim instances:

File: .aphoria/claims.toml
Tagged with ingested_from: "hexagonal.md"
Authority tier: team_policy

Document Metadata

Track source documents:

# .aphoria/ingested_guides.toml

[[guide]]
id = "hexagonal-arch"
name = "Hexagonal Architecture Guidelines"
source_path = "docs/architecture/hexagonal.md"
document_hash = "blake3:abc123..."
ingested_at = 1234567890
claims_count = 26
authority_tier = "team_policy"
category = "architecture"

[[guide]]
id = "security-owasp"
name = "OWASP Top 10 Compliance"
source_path = "docs/security/owasp.md"
document_hash = "blake3:def456..."
ingested_at = 1234567900
claims_count = 15
authority_tier = "team_policy"
category = "security"

Update Detection

$ aphoria scan

⚠️  Warning: Source document has changed
    Guide: hexagonal-arch
    Source: docs/architecture/hexagonal.md
    Last ingested: 3 days ago (hash: abc123...)
    Current hash: xyz789...

    Run: aphoria update-guide hexagonal-arch

Success Metrics

Phase 1 (Manual Import)

Users can manually create claims from guidelines
Claims enforce during scans
Pre-commit hooks work

Phase 2 (LLM-Assisted)

LLM extracts 80%+ of claims correctly
Users can review/edit before importing
Saves >90% of manual effort

Phase 3 (Automated)

Confidence scoring filters noise
Automatic updates when docs change
Compliance reports show trends

Open Questions

How to handle ambiguous statements?
- "Handlers should generally be in adapters/http/"
- Extract as SHOULD with low confidence?
How to handle conflicting guidelines?
- Doc A says X, Doc B says Y
- Use authority tier + recency?
Should we support non-Markdown formats?
- PDF extraction (common for external standards)
- Confluence/Google Docs (via export)
- Word documents
How to version guidelines?
- When doc changes, update claims or create new versions?
- Show history of guideline changes?
Should compliance be project-scoped or team-scoped?
- Team-level guidelines apply to all team projects?
- Project-specific guidelines override team guidelines?

Future Enhancements

1. Guided Onboarding

$ aphoria init --with-guides

No guidelines found. Would you like to:
1. Import existing documentation
2. Start from template (hexagonal/clean/ddd)
3. Skip for now

Choice: 1

Enter path to architecture guide: docs/architecture.md
Enter path to security guide: docs/security.md
Enter path to style guide: (skip)

Extracting claims...
Found 42 claims across 2 documents.
Review before importing? [Y/n]

2. Compliance Dashboard

Visual compliance tracking over time:

Trend graphs (improving/declining)
Per-guideline compliance scores
Team comparison (if multiple teams)

3. AI-Generated Fix Suggestions

❌ File location violation
   Expected: adapters/http/*_handler.go
   Found: adapters/handlers/user.go

   Suggested fix:
     git mv adapters/handlers/user.go adapters/http/user_handler.go

   Apply fix? [y/N]

4. Guideline Templates

Pre-built guidelines for common architectures:

Hexagonal Architecture
Clean Architecture
Domain-Driven Design
Microservices patterns
Security baselines (OWASP, NIST)

$ aphoria init-guide --template hexagonal

Imported 35 hexagonal architecture claims.
Customize at: .aphoria/claims.toml

Timeline

Week 1: Manual import (MVP) + LLM extraction prototype
Week 2: Automated pipeline + confidence scoring
Week 3: CLI polish + documentation + examples
Week 4: Dashboard integration + compliance reports

Total: 4 weeks for complete feature

17 KiB Raw Blame History