stemedb/applications/aphoria/docs/phase-17-summary.md

# Phase 17: Pattern Enrichment & Best Practices Infrastructure

**Status:** ✅ Complete (Backend Only)
**Date:** 2026-02-08

## What Was Built

This phase implemented **backend infrastructure** for enriched corpus patterns and team guideline ingestion. The features are **fully functional via CLI** but **not yet integrated with the dashboard UI**.

---

## 1. Enriched Pattern Metadata

### The Problem
Community patterns showed bare statistics like "md5: true, 347 projects" with no context about whether MD5 is deprecated, recommended, or neutral.

### The Solution
Extractors now provide enrichment metadata:

```rust
pub struct PatternMetadata {
    pub tail_path: String,         // "crypto/hashing/algorithm"
    pub predicate: String,          // "algorithm"
    pub value: Option<String>,      // "md5" (or None for wildcard)
    pub category: String,           // "security"
    pub verdict: String,            // "deprecated"
    pub explanation: String,        // "MD5 is cryptographically broken..."
    pub authority_source: Option<String>,  // "NIST SP 800-131A"
}
```

### What Works Now
- 10 security extractors provide enrichment metadata
- `PatternEnricher` service matches patterns to metadata (exact, wildcard, noise detection)
- Data model supports category, verdict, explanation, authority_source

### What's Missing
❌ Dashboard doesn't display this metadata yet
❌ No category filter dropdown
❌ No "Hide noise" toggle
❌ No visual badges for deprecated/recommended

---

## 2. TeamPolicy Authority Tier

### The Problem
No authority tier between community observations (tier 4) and expert opinions (tier 3) for team-level architectural guidelines.

### The Solution
New **tier 2.5**: `TeamPolicy`

- Sits between Observational (tier 2) and Expert (tier 3)
- Authority weight: 0.6 (between 0.7 and 0.5)
- Decay: 180 days (same as Expert)
- Use case: Team architectural guidelines, internal standards

### What Works Now
```bash
# Create team policy claim
aphoria claims create \
  --tier team_policy \
  --id hex-arch-http-001 \
  --concept-path myapp/adapters/http \
  --predicate layer \
  --value adapter \
  --invariant "HTTP handlers MUST be in adapters layer" \
  --consequence "Business logic leaks into infrastructure" \
  --provenance "Architecture team decision 2026-02-08" \
  --category architecture \
  --by architecture-team
```

---

## 3. Best Practices Import CLI

### The Problem
Teams write extensive architectural guidelines in markdown/PDFs but have no way to automatically enforce them.

### The Solution
Batch import claims from TOML files:

```bash
# Preview import
aphoria claims import docs/hexagonal-arch.toml --dry-run

# Import with tracking
aphoria claims import docs/hexagonal-arch.toml \
  --authority-tier team_policy \
  --source-guide "hexagonal-arch"
```

### What Works Now
- Batch import claims from TOML
- Override authority tier for all claims
- Merge strategies: `skip_existing`, `overwrite`, `fail_on_duplicate`
- Dry-run preview
- Guideline tracking in `.aphoria/ingested_guides.toml`

### Example TOML
```toml
[[claim]]
id = "hex-arch-http-001"
concept_path = "myapp/adapters/http"
predicate = "layer"
value = "adapter"
comparison = "equals"
provenance = "Hexagonal Architecture Guidelines"
invariant = "HTTP handlers MUST be in adapters layer"
consequence = "Business logic leaks into infrastructure"
authority_tier = "team_policy"
category = "architecture"
evidence = ["docs/architecture/hexagonal.md"]
created_by = "architecture-team"
created_at = "2026-02-08T12:00:00Z"
```

---

## 4. Guideline Tracking

### The Problem
No way to track which guidelines have been imported, detect changes, or filter compliance.

### The Solution
`.aphoria/ingested_guides.toml` tracks imported guidelines:

```toml
[[guide]]
id = "hexagonal-arch"
name = "Hexagonal Architecture Guidelines"
source_path = "docs/hexagonal.md"
document_hash = "blake3:abc123..."
ingested_at = "2026-02-08T12:00:00Z"
claims_count = 26
authority_tier = "team_policy"
category = "architecture"
claim_ids = ["hex-arch-http-001", "hex-arch-domain-imports-001", ...]
```

### What Works Now
- Guideline metadata tracked with BLAKE3 hash
- Change detection (compare hash to detect doc updates)
- Audit trail (who imported what, when)

### What's Missing
❌ `aphoria scan --check-policy <guide-id>` not implemented
❌ No re-extraction workflow when source doc changes
❌ No compliance dashboard

---

## 5. Updated Comparison Modes

### What Was Added
Two new comparison modes for list/substring matching:

**Contains** - Value must contain substring/element
```toml
comparison = "contains"
value = "Serialize"
# Passes: "Clone,Debug,Serialize"
# Fails: "Clone,Debug"
```

**NotContains** - Value must NOT contain substring/element
```toml
comparison = "not_contains"
value = "Clone"
# Passes: "Debug"
# Fails: "Clone,Debug"
```

---

## 10 Enriched Security Extractors

| Extractor | Enriched Patterns | Authority Source |
|-----------|-------------------|------------------|
| `WeakCryptoExtractor` | MD5, SHA1 (deprecated), DES, RC4 | NIST SP 800-131A, RFC 7465 |
| `TlsVersionExtractor` | TLS 1.0/1.1 (deprecated), 1.2/1.3 (recommended) | RFC 8996, RFC 8446 |
| `TlsVerifyExtractor` | cert_verification: false (insecure) | OWASP |
| `JwtConfigExtractor` | algorithm: none (forbidden) | RFC 7519 |
| `CorsConfigExtractor` | allow_origin: * (insecure) | OWASP, W3C CORS Spec |
| `HardcodedSecretsExtractor` | API keys/passwords (critical) | OWASP A07:2021 |
| `SqlInjectionExtractor` | String interpolation (vulnerable) | OWASP A03:2021 |
| `CommandInjectionExtractor` | Shell exec (vulnerable) | OWASP A03:2021 |
| `PathTraversalExtractor` | User-controlled paths (vulnerable) | OWASP A01:2021 |
| `InsecureDeserializationExtractor` | pickle/yaml.load (unsafe) | OWASP A08:2021 |

---

## Files Created/Modified

### New Files
- `applications/aphoria/src/corpus/enricher.rs` - Pattern enrichment service
- `applications/aphoria/src/types/ingested_guides.rs` - Guideline tracking

### Modified Files
**Core Types:**
- `crates/stemedb-core/src/types/source.rs` - TeamPolicy tier
- `crates/stemedb-storage/src/pattern_aggregate_store/mod.rs` - Enrichment fields

**Aphoria:**
- `applications/aphoria/src/extractors/traits.rs` - `pattern_metadata()` method
- `applications/aphoria/src/types/authored_claim.rs` - Contains/NotContains modes
- `applications/aphoria/src/cli/claims.rs` - Import subcommand
- `applications/aphoria/src/handlers/claims.rs` - Import handler
- 10 extractor files with `pattern_metadata()` implementations

**API & DTOs:**
- `crates/stemedb-api/src/dto/enums.rs` - TeamPolicy DTO
- `crates/stemedb-api/src/dto/aphoria/types.rs` - Contains/NotContains DTOs
- `crates/stemedb-ontology/src/dto/enums.rs` - TeamPolicy DTO

---

## How to Use (CLI)

### 1. Create a guideline TOML file
```bash
cat > docs/architecture-guidelines.toml <<EOF
[[claim]]
id = "no-tokio-in-core"
concept_path = "myapp/core/imports/tokio"
predicate = "imported"
value = "true"
comparison = "absent"
provenance = "Architecture decision: core must be sync-only"
invariant = "Core modules MUST NOT import tokio"
consequence = "Creates async runtime coupling, breaks sync library users"
authority_tier = "team_policy"
category = "architecture"
evidence = ["ADR-003"]
created_by = "tech-lead"
created_at = "2026-02-08T12:00:00Z"
EOF
```

### 2. Import the guideline
```bash
aphoria claims import docs/architecture-guidelines.toml \
  --source-guide "architecture-2026" \
  --dry-run
```

### 3. Run verification
```bash
aphoria scan --persist
aphoria verify run
```

---

## What's NOT Done (UI Integration)

The backend is complete but the **dashboard doesn't display any of this**:

❌ Category badges (security/architecture/performance)
❌ Verdict badges (deprecated/recommended/emerging)
❌ Explanation tooltips ("MD5 is deprecated - NIST 2010")
❌ Filter by category dropdown
❌ "Hide noise" toggle
❌ Guideline compliance filtering (`--check-policy` flag)
❌ Compliance dashboard showing guideline status

---

## Next Steps

### To Make This User-Visible:

**Option 1: Dashboard Integration** (Frontend work)
- Add category/verdict badges to pattern cards
- Show explanations in tooltips
- Add category filter dropdown
- Implement "Hide noise" toggle
- Build compliance dashboard

**Option 2: Enhanced CLI Output** (Backend work)
- Show enrichment in `aphoria scan` table output
- Add `--show-enrichment` flag
- Color-code deprecated patterns (red), recommended (green)
- Filter by category: `aphoria scan --category security`

**Option 3: Policy Filtering** (Backend work)
- Implement `aphoria scan --check-policy <guide-id>`
- Show only violations of specific guideline
- Pre-commit hook support for policy enforcement

---

## Testing

All code compiles and passes existing tests. To verify:

```bash
# Build workspace
cargo build --workspace

# Test aphoria
cargo test --package aphoria

# Try the import command
aphoria claims import --help
```

---

## Documentation Updated

- ✅ `roadmap-archive.md` - Added Phase 17
- ✅ `roadmap.md` - Updated status table
- ✅ `cli-reference.md` - Added `aphoria claims import` documentation
- ✅ `comparison-modes.md` - Contains/NotContains already documented
- ✅ This summary document

---

## Questions?

**Q: Why can't I see any changes in the UI?**
A: This phase implemented backend infrastructure only. The dashboard doesn't consume the enrichment metadata yet.

**Q: How do I know it works?**
A: Use the CLI commands. The `aphoria claims import` command is fully functional.

**Q: When will this show up in the dashboard?**
A: That requires frontend work to integrate the enrichment metadata into the UI components.

**Q: Is this production-ready?**
A: The backend is production-ready. The CLI commands work. The UI integration is not done.