## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.1 KiB
3.1 KiB
LLM Extraction Optimization
Systematic approach to maximizing Aphoria's LLM extraction quality.
Quick Links
| Document | When to Use |
|---|---|
| Quick Start | First time optimizing, want to get started fast |
| Full Playbook | Comprehensive optimization guide with decision trees |
| Baseline Template | Recording metrics after each optimization cycle |
| Research Template | Investigating unknown issues or new approaches |
Current Status
Latest Baseline: 2026-02-06
| Metric | Current | Target | Status |
|---|---|---|---|
| Precision | 0.93 | 0.80 | ✅ Exceeded |
| Recall | 1.00 | 0.75 | ✅ Exceeded |
| F1 | 0.96 | 0.77 | ✅ Exceeded |
| Parse Rate | 100% | 95% | ✅ |
| Fixtures Passing | 10/10 | - | ✅ All pass |
Verdict: PASS - All metrics exceed targets.
Directory Structure
docs/llm-optimization/
├── index.md # This file
├── quickstart.md # 15-minute getting started
├── playbook.md # Full optimization guide
├── baselines/ # Historical metrics
│ ├── template.md
│ └── YYYY-MM-DD.md # One per baseline
└── research/ # Investigation notes
├── template.md
└── [topic].md # One per research topic
Key Commands
# Run evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live
# Check for regressions (CI)
aphoria eval run --mode cached --fail-on-regression
# Update baseline after improvements
aphoria eval update-baseline --force
# List fixtures
aphoria eval list-fixtures
# Validate fixtures
aphoria eval validate-fixtures
Optimization Flow
1. Run baseline evaluation
↓
2. Identify failure categories
↓
3. Apply targeted fixes (one at a time!)
↓
4. Validate: did metrics improve?
↓
YES → Save new baseline, continue to next issue
NO → Revert, try different approach or research
↓
5. Repeat until targets met
↓
6. Set up CI to prevent regressions
Fixture Locations
| Category | Path | Count |
|---|---|---|
| TLS | tests/llm_fixtures/tls/ |
2 |
| JWT | tests/llm_fixtures/jwt/ |
2 |
| Secrets | tests/llm_fixtures/secrets/ |
2 |
| Auth | tests/llm_fixtures/auth/ |
1 |
| Negative | tests/llm_fixtures/negative/ |
2 |
| Edge | tests/llm_fixtures/edge/ |
1 |
| Total | 10 |
Related Files
- Prompt source:
src/llm/prompts.rs - Extractor:
src/llm/extractor.rs - Client:
src/llm/client.rs - Eval harness:
src/eval/harness.rs - Fixtures:
tests/llm_fixtures/
Contributing Fixtures
See Fixture Writing Guide in the playbook.
Quick checklist:
- Create TOML file in appropriate category folder
- Include both
must_containandmust_not_contain - Run
aphoria eval validate-fixtures - Test with
aphoria eval run --max-fixtures 1 - Update
manifest.tomlcategory counts