jordan 157dbbb9eb feat: Complete Aphoria Phase 8-9 + UAT suite (90/90 tests passing)

## Phase 8: Enterprise Extractor Improvements ✅
- 14 security extractors (TLS, JWT, SQL injection, XSS, etc.)
- 10 framework-specific extractors (Spring, Django, Rails, etc.)
- Config file security detection (YAML, TOML)

## Phase 9: Autonomous Extractor Generation ✅
- Shadow mode executor with TP/FP tracking
- Graduation pipeline with confidence thresholds
- Auto-rollback on regression detection
- Cross-project pattern syncing

## UAT Suite Complete (14 scripts, 90 tests)
- test-core-detection.sh (6 tests)
- test-declarative-extractors.sh (5 tests)
- test-domain-frameworks.sh (5 tests)
- test-domain-unreal.sh (3 tests)
- test-llm-extraction.sh (6 tests)
- test-eval-harness.sh (5 tests)
- test-cross-language.sh (3 tests)
- test-precommit-performance.sh (4 tests)
- test-output-formats.sh (8 tests)
- test-drift-detection.sh (6 tests)
- test-exit-codes.sh (12 tests)
+ 3 more scripts

## Other Changes
- Updated roadmap to mark Phase 8-9 complete
- Added .gitignore entries for build artifacts
- Updated pre-commit: 800 line limit, exclude tests/data/cmd

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-06 22:50:55 -07:00

3.1 KiB

Raw Blame History

LLM Extraction Optimization

Systematic approach to maximizing Aphoria's LLM extraction quality.

Quick Links

Document	When to Use
Quick Start	First time optimizing, want to get started fast
Full Playbook	Comprehensive optimization guide with decision trees
Baseline Template	Recording metrics after each optimization cycle
Research Template	Investigating unknown issues or new approaches

Current Status

Latest Baseline: 2026-02-06

Metric	Current	Target	Status
Precision	0.93	0.80	✅ Exceeded
Recall	1.00	0.75	✅ Exceeded
F1	0.96	0.77	✅ Exceeded
Parse Rate	100%	95%	✅
Fixtures Passing	10/10	-	✅ All pass

Verdict: PASS - All metrics exceed targets.

Directory Structure

docs/llm-optimization/
├── index.md              # This file
├── quickstart.md         # 15-minute getting started
├── playbook.md           # Full optimization guide
├── baselines/            # Historical metrics
│   ├── template.md
│   └── YYYY-MM-DD.md     # One per baseline
└── research/             # Investigation notes
    ├── template.md
    └── [topic].md        # One per research topic

Key Commands

# Run evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live

# Check for regressions (CI)
aphoria eval run --mode cached --fail-on-regression

# Update baseline after improvements
aphoria eval update-baseline --force

# List fixtures
aphoria eval list-fixtures

# Validate fixtures
aphoria eval validate-fixtures

Optimization Flow

1. Run baseline evaluation
       ↓
2. Identify failure categories
       ↓
3. Apply targeted fixes (one at a time!)
       ↓
4. Validate: did metrics improve?
       ↓
   YES → Save new baseline, continue to next issue
   NO  → Revert, try different approach or research
       ↓
5. Repeat until targets met
       ↓
6. Set up CI to prevent regressions

Fixture Locations

Category	Path	Count
TLS	`tests/llm_fixtures/tls/`	2
JWT	`tests/llm_fixtures/jwt/`	2
Secrets	`tests/llm_fixtures/secrets/`	2
Auth	`tests/llm_fixtures/auth/`	1
Negative	`tests/llm_fixtures/negative/`	2
Edge	`tests/llm_fixtures/edge/`	1
Total		10

Prompt source: src/llm/prompts.rs
Extractor: src/llm/extractor.rs
Client: src/llm/client.rs
Eval harness: src/eval/harness.rs
Fixtures: tests/llm_fixtures/

Contributing Fixtures

See Fixture Writing Guide in the playbook.

Quick checklist:

Create TOML file in appropriate category folder
Include both must_contain and must_not_contain
Run aphoria eval validate-fixtures
Test with aphoria eval run --max-fixtures 1
Update manifest.toml category counts

3.1 KiB Raw Blame History

LLM Extraction Optimization

Quick Links

Current Status

Directory Structure

Key Commands

Optimization Flow

Fixture Locations

Related Files

Contributing Fixtures

3.1 KiB

Raw Blame History