## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
111 lines
3.1 KiB
Markdown
111 lines
3.1 KiB
Markdown
# LLM Extraction Optimization
|
|
|
|
> Systematic approach to maximizing Aphoria's LLM extraction quality.
|
|
|
|
## Quick Links
|
|
|
|
| Document | When to Use |
|
|
|----------|-------------|
|
|
| [Quick Start](./quickstart.md) | First time optimizing, want to get started fast |
|
|
| [Full Playbook](./playbook.md) | Comprehensive optimization guide with decision trees |
|
|
| [Baseline Template](./baselines/template.md) | Recording metrics after each optimization cycle |
|
|
| [Research Template](./research/template.md) | Investigating unknown issues or new approaches |
|
|
|
|
## Current Status
|
|
|
|
**Latest Baseline:** [2026-02-06](./baselines/2026-02-06.md)
|
|
|
|
| Metric | Current | Target | Status |
|
|
|--------|---------|--------|--------|
|
|
| Precision | 0.93 | 0.80 | ✅ Exceeded |
|
|
| Recall | 1.00 | 0.75 | ✅ Exceeded |
|
|
| F1 | 0.96 | 0.77 | ✅ Exceeded |
|
|
| Parse Rate | 100% | 95% | ✅ |
|
|
| Fixtures Passing | 10/10 | - | ✅ All pass |
|
|
|
|
**Verdict:** PASS - All metrics exceed targets.
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
docs/llm-optimization/
|
|
├── index.md # This file
|
|
├── quickstart.md # 15-minute getting started
|
|
├── playbook.md # Full optimization guide
|
|
├── baselines/ # Historical metrics
|
|
│ ├── template.md
|
|
│ └── YYYY-MM-DD.md # One per baseline
|
|
└── research/ # Investigation notes
|
|
├── template.md
|
|
└── [topic].md # One per research topic
|
|
```
|
|
|
|
## Key Commands
|
|
|
|
```bash
|
|
# Run evaluation
|
|
aphoria eval run --fixtures tests/llm_fixtures --mode live
|
|
|
|
# Check for regressions (CI)
|
|
aphoria eval run --mode cached --fail-on-regression
|
|
|
|
# Update baseline after improvements
|
|
aphoria eval update-baseline --force
|
|
|
|
# List fixtures
|
|
aphoria eval list-fixtures
|
|
|
|
# Validate fixtures
|
|
aphoria eval validate-fixtures
|
|
```
|
|
|
|
## Optimization Flow
|
|
|
|
```
|
|
1. Run baseline evaluation
|
|
↓
|
|
2. Identify failure categories
|
|
↓
|
|
3. Apply targeted fixes (one at a time!)
|
|
↓
|
|
4. Validate: did metrics improve?
|
|
↓
|
|
YES → Save new baseline, continue to next issue
|
|
NO → Revert, try different approach or research
|
|
↓
|
|
5. Repeat until targets met
|
|
↓
|
|
6. Set up CI to prevent regressions
|
|
```
|
|
|
|
## Fixture Locations
|
|
|
|
| Category | Path | Count |
|
|
|----------|------|-------|
|
|
| TLS | `tests/llm_fixtures/tls/` | 2 |
|
|
| JWT | `tests/llm_fixtures/jwt/` | 2 |
|
|
| Secrets | `tests/llm_fixtures/secrets/` | 2 |
|
|
| Auth | `tests/llm_fixtures/auth/` | 1 |
|
|
| Negative | `tests/llm_fixtures/negative/` | 2 |
|
|
| Edge | `tests/llm_fixtures/edge/` | 1 |
|
|
| **Total** | | **10** |
|
|
|
|
## Related Files
|
|
|
|
- **Prompt source:** `src/llm/prompts.rs`
|
|
- **Extractor:** `src/llm/extractor.rs`
|
|
- **Client:** `src/llm/client.rs`
|
|
- **Eval harness:** `src/eval/harness.rs`
|
|
- **Fixtures:** `tests/llm_fixtures/`
|
|
|
|
## Contributing Fixtures
|
|
|
|
See [Fixture Writing Guide](./playbook.md#appendix-b-fixture-writing-guide) in the playbook.
|
|
|
|
Quick checklist:
|
|
- [ ] Create TOML file in appropriate category folder
|
|
- [ ] Include both `must_contain` and `must_not_contain`
|
|
- [ ] Run `aphoria eval validate-fixtures`
|
|
- [ ] Test with `aphoria eval run --max-fixtures 1`
|
|
- [ ] Update `manifest.toml` category counts
|