stemedb/applications/aphoria/docs/phase-17-summary.md
jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00

334 lines
9.6 KiB
Markdown

# Phase 17: Pattern Enrichment & Best Practices Infrastructure
**Status:** ✅ Complete (Backend Only)
**Date:** 2026-02-08
## What Was Built
This phase implemented **backend infrastructure** for enriched corpus patterns and team guideline ingestion. The features are **fully functional via CLI** but **not yet integrated with the dashboard UI**.
---
## 1. Enriched Pattern Metadata
### The Problem
Community patterns showed bare statistics like "md5: true, 347 projects" with no context about whether MD5 is deprecated, recommended, or neutral.
### The Solution
Extractors now provide enrichment metadata:
```rust
pub struct PatternMetadata {
pub tail_path: String, // "crypto/hashing/algorithm"
pub predicate: String, // "algorithm"
pub value: Option<String>, // "md5" (or None for wildcard)
pub category: String, // "security"
pub verdict: String, // "deprecated"
pub explanation: String, // "MD5 is cryptographically broken..."
pub authority_source: Option<String>, // "NIST SP 800-131A"
}
```
### What Works Now
- 10 security extractors provide enrichment metadata
- `PatternEnricher` service matches patterns to metadata (exact, wildcard, noise detection)
- Data model supports category, verdict, explanation, authority_source
### What's Missing
❌ Dashboard doesn't display this metadata yet
❌ No category filter dropdown
❌ No "Hide noise" toggle
❌ No visual badges for deprecated/recommended
---
## 2. TeamPolicy Authority Tier
### The Problem
No authority tier between community observations (tier 4) and expert opinions (tier 3) for team-level architectural guidelines.
### The Solution
New **tier 2.5**: `TeamPolicy`
- Sits between Observational (tier 2) and Expert (tier 3)
- Authority weight: 0.6 (between 0.7 and 0.5)
- Decay: 180 days (same as Expert)
- Use case: Team architectural guidelines, internal standards
### What Works Now
```bash
# Create team policy claim
aphoria claims create \
--tier team_policy \
--id hex-arch-http-001 \
--concept-path myapp/adapters/http \
--predicate layer \
--value adapter \
--invariant "HTTP handlers MUST be in adapters layer" \
--consequence "Business logic leaks into infrastructure" \
--provenance "Architecture team decision 2026-02-08" \
--category architecture \
--by architecture-team
```
---
## 3. Best Practices Import CLI
### The Problem
Teams write extensive architectural guidelines in markdown/PDFs but have no way to automatically enforce them.
### The Solution
Batch import claims from TOML files:
```bash
# Preview import
aphoria claims import docs/hexagonal-arch.toml --dry-run
# Import with tracking
aphoria claims import docs/hexagonal-arch.toml \
--authority-tier team_policy \
--source-guide "hexagonal-arch"
```
### What Works Now
- Batch import claims from TOML
- Override authority tier for all claims
- Merge strategies: `skip_existing`, `overwrite`, `fail_on_duplicate`
- Dry-run preview
- Guideline tracking in `.aphoria/ingested_guides.toml`
### Example TOML
```toml
[[claim]]
id = "hex-arch-http-001"
concept_path = "myapp/adapters/http"
predicate = "layer"
value = "adapter"
comparison = "equals"
provenance = "Hexagonal Architecture Guidelines"
invariant = "HTTP handlers MUST be in adapters layer"
consequence = "Business logic leaks into infrastructure"
authority_tier = "team_policy"
category = "architecture"
evidence = ["docs/architecture/hexagonal.md"]
created_by = "architecture-team"
created_at = "2026-02-08T12:00:00Z"
```
---
## 4. Guideline Tracking
### The Problem
No way to track which guidelines have been imported, detect changes, or filter compliance.
### The Solution
`.aphoria/ingested_guides.toml` tracks imported guidelines:
```toml
[[guide]]
id = "hexagonal-arch"
name = "Hexagonal Architecture Guidelines"
source_path = "docs/hexagonal.md"
document_hash = "blake3:abc123..."
ingested_at = "2026-02-08T12:00:00Z"
claims_count = 26
authority_tier = "team_policy"
category = "architecture"
claim_ids = ["hex-arch-http-001", "hex-arch-domain-imports-001", ...]
```
### What Works Now
- Guideline metadata tracked with BLAKE3 hash
- Change detection (compare hash to detect doc updates)
- Audit trail (who imported what, when)
### What's Missing
`aphoria scan --check-policy <guide-id>` not implemented
❌ No re-extraction workflow when source doc changes
❌ No compliance dashboard
---
## 5. Updated Comparison Modes
### What Was Added
Two new comparison modes for list/substring matching:
**Contains** - Value must contain substring/element
```toml
comparison = "contains"
value = "Serialize"
# Passes: "Clone,Debug,Serialize"
# Fails: "Clone,Debug"
```
**NotContains** - Value must NOT contain substring/element
```toml
comparison = "not_contains"
value = "Clone"
# Passes: "Debug"
# Fails: "Clone,Debug"
```
---
## 10 Enriched Security Extractors
| Extractor | Enriched Patterns | Authority Source |
|-----------|-------------------|------------------|
| `WeakCryptoExtractor` | MD5, SHA1 (deprecated), DES, RC4 | NIST SP 800-131A, RFC 7465 |
| `TlsVersionExtractor` | TLS 1.0/1.1 (deprecated), 1.2/1.3 (recommended) | RFC 8996, RFC 8446 |
| `TlsVerifyExtractor` | cert_verification: false (insecure) | OWASP |
| `JwtConfigExtractor` | algorithm: none (forbidden) | RFC 7519 |
| `CorsConfigExtractor` | allow_origin: * (insecure) | OWASP, W3C CORS Spec |
| `HardcodedSecretsExtractor` | API keys/passwords (critical) | OWASP A07:2021 |
| `SqlInjectionExtractor` | String interpolation (vulnerable) | OWASP A03:2021 |
| `CommandInjectionExtractor` | Shell exec (vulnerable) | OWASP A03:2021 |
| `PathTraversalExtractor` | User-controlled paths (vulnerable) | OWASP A01:2021 |
| `InsecureDeserializationExtractor` | pickle/yaml.load (unsafe) | OWASP A08:2021 |
---
## Files Created/Modified
### New Files
- `applications/aphoria/src/corpus/enricher.rs` - Pattern enrichment service
- `applications/aphoria/src/types/ingested_guides.rs` - Guideline tracking
### Modified Files
**Core Types:**
- `crates/stemedb-core/src/types/source.rs` - TeamPolicy tier
- `crates/stemedb-storage/src/pattern_aggregate_store/mod.rs` - Enrichment fields
**Aphoria:**
- `applications/aphoria/src/extractors/traits.rs` - `pattern_metadata()` method
- `applications/aphoria/src/types/authored_claim.rs` - Contains/NotContains modes
- `applications/aphoria/src/cli/claims.rs` - Import subcommand
- `applications/aphoria/src/handlers/claims.rs` - Import handler
- 10 extractor files with `pattern_metadata()` implementations
**API & DTOs:**
- `crates/stemedb-api/src/dto/enums.rs` - TeamPolicy DTO
- `crates/stemedb-api/src/dto/aphoria/types.rs` - Contains/NotContains DTOs
- `crates/stemedb-ontology/src/dto/enums.rs` - TeamPolicy DTO
---
## How to Use (CLI)
### 1. Create a guideline TOML file
```bash
cat > docs/architecture-guidelines.toml <<EOF
[[claim]]
id = "no-tokio-in-core"
concept_path = "myapp/core/imports/tokio"
predicate = "imported"
value = "true"
comparison = "absent"
provenance = "Architecture decision: core must be sync-only"
invariant = "Core modules MUST NOT import tokio"
consequence = "Creates async runtime coupling, breaks sync library users"
authority_tier = "team_policy"
category = "architecture"
evidence = ["ADR-003"]
created_by = "tech-lead"
created_at = "2026-02-08T12:00:00Z"
EOF
```
### 2. Import the guideline
```bash
aphoria claims import docs/architecture-guidelines.toml \
--source-guide "architecture-2026" \
--dry-run
```
### 3. Run verification
```bash
aphoria scan --persist
aphoria verify run
```
---
## What's NOT Done (UI Integration)
The backend is complete but the **dashboard doesn't display any of this**:
❌ Category badges (security/architecture/performance)
❌ Verdict badges (deprecated/recommended/emerging)
❌ Explanation tooltips ("MD5 is deprecated - NIST 2010")
❌ Filter by category dropdown
❌ "Hide noise" toggle
❌ Guideline compliance filtering (`--check-policy` flag)
❌ Compliance dashboard showing guideline status
---
## Next Steps
### To Make This User-Visible:
**Option 1: Dashboard Integration** (Frontend work)
- Add category/verdict badges to pattern cards
- Show explanations in tooltips
- Add category filter dropdown
- Implement "Hide noise" toggle
- Build compliance dashboard
**Option 2: Enhanced CLI Output** (Backend work)
- Show enrichment in `aphoria scan` table output
- Add `--show-enrichment` flag
- Color-code deprecated patterns (red), recommended (green)
- Filter by category: `aphoria scan --category security`
**Option 3: Policy Filtering** (Backend work)
- Implement `aphoria scan --check-policy <guide-id>`
- Show only violations of specific guideline
- Pre-commit hook support for policy enforcement
---
## Testing
All code compiles and passes existing tests. To verify:
```bash
# Build workspace
cargo build --workspace
# Test aphoria
cargo test --package aphoria
# Try the import command
aphoria claims import --help
```
---
## Documentation Updated
-`roadmap-archive.md` - Added Phase 17
-`roadmap.md` - Updated status table
-`cli-reference.md` - Added `aphoria claims import` documentation
-`comparison-modes.md` - Contains/NotContains already documented
- ✅ This summary document
---
## Questions?
**Q: Why can't I see any changes in the UI?**
A: This phase implemented backend infrastructure only. The dashboard doesn't consume the enrichment metadata yet.
**Q: How do I know it works?**
A: Use the CLI commands. The `aphoria claims import` command is fully functional.
**Q: When will this show up in the dashboard?**
A: That requires frontend work to integrate the enrichment metadata into the UI components.
**Q: Is this production-ready?**
A: The backend is production-ready. The CLI commands work. The UI integration is not done.