jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure

This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-08 20:18:20 +00:00

9.6 KiB

Raw Blame History

Phase 17: Pattern Enrichment & Best Practices Infrastructure

Status: ✅ Complete (Backend Only) Date: 2026-02-08

What Was Built

This phase implemented backend infrastructure for enriched corpus patterns and team guideline ingestion. The features are fully functional via CLI but not yet integrated with the dashboard UI.

1. Enriched Pattern Metadata

The Problem

Community patterns showed bare statistics like "md5: true, 347 projects" with no context about whether MD5 is deprecated, recommended, or neutral.

The Solution

Extractors now provide enrichment metadata:

pub struct PatternMetadata {
    pub tail_path: String,         // "crypto/hashing/algorithm"
    pub predicate: String,          // "algorithm"
    pub value: Option<String>,      // "md5" (or None for wildcard)
    pub category: String,           // "security"
    pub verdict: String,            // "deprecated"
    pub explanation: String,        // "MD5 is cryptographically broken..."
    pub authority_source: Option<String>,  // "NIST SP 800-131A"
}

What Works Now

10 security extractors provide enrichment metadata
PatternEnricher service matches patterns to metadata (exact, wildcard, noise detection)
Data model supports category, verdict, explanation, authority_source

What's Missing

❌ Dashboard doesn't display this metadata yet ❌ No category filter dropdown ❌ No "Hide noise" toggle ❌ No visual badges for deprecated/recommended

2. TeamPolicy Authority Tier

The Problem

No authority tier between community observations (tier 4) and expert opinions (tier 3) for team-level architectural guidelines.

The Solution

New tier 2.5: TeamPolicy

Sits between Observational (tier 2) and Expert (tier 3)
Authority weight: 0.6 (between 0.7 and 0.5)
Decay: 180 days (same as Expert)
Use case: Team architectural guidelines, internal standards

What Works Now

# Create team policy claim
aphoria claims create \
  --tier team_policy \
  --id hex-arch-http-001 \
  --concept-path myapp/adapters/http \
  --predicate layer \
  --value adapter \
  --invariant "HTTP handlers MUST be in adapters layer" \
  --consequence "Business logic leaks into infrastructure" \
  --provenance "Architecture team decision 2026-02-08" \
  --category architecture \
  --by architecture-team

3. Best Practices Import CLI

The Problem

Teams write extensive architectural guidelines in markdown/PDFs but have no way to automatically enforce them.

The Solution

Batch import claims from TOML files:

# Preview import
aphoria claims import docs/hexagonal-arch.toml --dry-run

# Import with tracking
aphoria claims import docs/hexagonal-arch.toml \
  --authority-tier team_policy \
  --source-guide "hexagonal-arch"

What Works Now

Batch import claims from TOML
Override authority tier for all claims
Merge strategies: skip_existing, overwrite, fail_on_duplicate
Dry-run preview
Guideline tracking in .aphoria/ingested_guides.toml

Example TOML

[[claim]]
id = "hex-arch-http-001"
concept_path = "myapp/adapters/http"
predicate = "layer"
value = "adapter"
comparison = "equals"
provenance = "Hexagonal Architecture Guidelines"
invariant = "HTTP handlers MUST be in adapters layer"
consequence = "Business logic leaks into infrastructure"
authority_tier = "team_policy"
category = "architecture"
evidence = ["docs/architecture/hexagonal.md"]
created_by = "architecture-team"
created_at = "2026-02-08T12:00:00Z"

4. Guideline Tracking

The Problem

No way to track which guidelines have been imported, detect changes, or filter compliance.

The Solution

.aphoria/ingested_guides.toml tracks imported guidelines:

[[guide]]
id = "hexagonal-arch"
name = "Hexagonal Architecture Guidelines"
source_path = "docs/hexagonal.md"
document_hash = "blake3:abc123..."
ingested_at = "2026-02-08T12:00:00Z"
claims_count = 26
authority_tier = "team_policy"
category = "architecture"
claim_ids = ["hex-arch-http-001", "hex-arch-domain-imports-001", ...]

What Works Now

Guideline metadata tracked with BLAKE3 hash
Change detection (compare hash to detect doc updates)
Audit trail (who imported what, when)

What's Missing

❌ aphoria scan --check-policy <guide-id> not implemented ❌ No re-extraction workflow when source doc changes ❌ No compliance dashboard

5. Updated Comparison Modes

What Was Added

Two new comparison modes for list/substring matching:

Contains - Value must contain substring/element

comparison = "contains"
value = "Serialize"
# Passes: "Clone,Debug,Serialize"
# Fails: "Clone,Debug"

NotContains - Value must NOT contain substring/element

comparison = "not_contains"
value = "Clone"
# Passes: "Debug"
# Fails: "Clone,Debug"

10 Enriched Security Extractors

Extractor	Enriched Patterns	Authority Source
`WeakCryptoExtractor`	MD5, SHA1 (deprecated), DES, RC4	NIST SP 800-131A, RFC 7465
`TlsVersionExtractor`	TLS 1.0/1.1 (deprecated), 1.2/1.3 (recommended)	RFC 8996, RFC 8446
`TlsVerifyExtractor`	cert_verification: false (insecure)	OWASP
`JwtConfigExtractor`	algorithm: none (forbidden)	RFC 7519
`CorsConfigExtractor`	allow_origin: * (insecure)	OWASP, W3C CORS Spec
`HardcodedSecretsExtractor`	API keys/passwords (critical)	OWASP A07:2021
`SqlInjectionExtractor`	String interpolation (vulnerable)	OWASP A03:2021
`CommandInjectionExtractor`	Shell exec (vulnerable)	OWASP A03:2021
`PathTraversalExtractor`	User-controlled paths (vulnerable)	OWASP A01:2021
`InsecureDeserializationExtractor`	pickle/yaml.load (unsafe)	OWASP A08:2021

Files Created/Modified

New Files

applications/aphoria/src/corpus/enricher.rs - Pattern enrichment service
applications/aphoria/src/types/ingested_guides.rs - Guideline tracking

Modified Files

Core Types:

crates/stemedb-core/src/types/source.rs - TeamPolicy tier
crates/stemedb-storage/src/pattern_aggregate_store/mod.rs - Enrichment fields

Aphoria:

applications/aphoria/src/extractors/traits.rs - pattern_metadata() method
applications/aphoria/src/types/authored_claim.rs - Contains/NotContains modes
applications/aphoria/src/cli/claims.rs - Import subcommand
applications/aphoria/src/handlers/claims.rs - Import handler
10 extractor files with pattern_metadata() implementations

API & DTOs:

crates/stemedb-api/src/dto/enums.rs - TeamPolicy DTO
crates/stemedb-api/src/dto/aphoria/types.rs - Contains/NotContains DTOs
crates/stemedb-ontology/src/dto/enums.rs - TeamPolicy DTO

How to Use (CLI)

1. Create a guideline TOML file

cat > docs/architecture-guidelines.toml <<EOF
[[claim]]
id = "no-tokio-in-core"
concept_path = "myapp/core/imports/tokio"
predicate = "imported"
value = "true"
comparison = "absent"
provenance = "Architecture decision: core must be sync-only"
invariant = "Core modules MUST NOT import tokio"
consequence = "Creates async runtime coupling, breaks sync library users"
authority_tier = "team_policy"
category = "architecture"
evidence = ["ADR-003"]
created_by = "tech-lead"
created_at = "2026-02-08T12:00:00Z"
EOF

2. Import the guideline

aphoria claims import docs/architecture-guidelines.toml \
  --source-guide "architecture-2026" \
  --dry-run

3. Run verification

aphoria scan --persist
aphoria verify run

What's NOT Done (UI Integration)

The backend is complete but the dashboard doesn't display any of this:

❌ Category badges (security/architecture/performance) ❌ Verdict badges (deprecated/recommended/emerging) ❌ Explanation tooltips ("MD5 is deprecated - NIST 2010") ❌ Filter by category dropdown ❌ "Hide noise" toggle ❌ Guideline compliance filtering (--check-policy flag) ❌ Compliance dashboard showing guideline status

Next Steps

To Make This User-Visible:

Option 1: Dashboard Integration (Frontend work)

Add category/verdict badges to pattern cards
Show explanations in tooltips
Add category filter dropdown
Implement "Hide noise" toggle
Build compliance dashboard

Option 2: Enhanced CLI Output (Backend work)

Show enrichment in aphoria scan table output
Add --show-enrichment flag
Color-code deprecated patterns (red), recommended (green)
Filter by category: aphoria scan --category security

Option 3: Policy Filtering (Backend work)

Implement aphoria scan --check-policy <guide-id>
Show only violations of specific guideline
Pre-commit hook support for policy enforcement

Testing

All code compiles and passes existing tests. To verify:

# Build workspace
cargo build --workspace

# Test aphoria
cargo test --package aphoria

# Try the import command
aphoria claims import --help

Documentation Updated

✅ roadmap-archive.md - Added Phase 17
✅ roadmap.md - Updated status table
✅ cli-reference.md - Added aphoria claims import documentation
✅ comparison-modes.md - Contains/NotContains already documented
✅ This summary document

Questions?

Q: Why can't I see any changes in the UI? A: This phase implemented backend infrastructure only. The dashboard doesn't consume the enrichment metadata yet.

Q: How do I know it works? A: Use the CLI commands. The aphoria claims import command is fully functional.

Q: When will this show up in the dashboard? A: That requires frontend work to integrate the enrichment metadata into the UI components.

Q: Is this production-ready? A: The backend is production-ready. The CLI commands work. The UI integration is not done.

9.6 KiB Raw Blame History

Phase 17: Pattern Enrichment & Best Practices Infrastructure

What Was Built

1. Enriched Pattern Metadata

The Problem

The Solution

What Works Now

What's Missing

2. TeamPolicy Authority Tier

The Problem

The Solution

What Works Now

3. Best Practices Import CLI

The Problem

The Solution

What Works Now

Example TOML

4. Guideline Tracking

The Problem

The Solution

What Works Now

What's Missing

5. Updated Comparison Modes

What Was Added

10 Enriched Security Extractors

Files Created/Modified

New Files

Modified Files

How to Use (CLI)

1. Create a guideline TOML file

2. Import the guideline

3. Run verification

What's NOT Done (UI Integration)

Next Steps

To Make This User-Visible:

Testing

Documentation Updated

Questions?

9.6 KiB

Raw Blame History