stemedb/applications/aphoria/docs/llm-optimization/index.md
jordan 157dbbb9eb feat: Complete Aphoria Phase 8-9 + UAT suite (90/90 tests passing)
## Phase 8: Enterprise Extractor Improvements 
- 14 security extractors (TLS, JWT, SQL injection, XSS, etc.)
- 10 framework-specific extractors (Spring, Django, Rails, etc.)
- Config file security detection (YAML, TOML)

## Phase 9: Autonomous Extractor Generation 
- Shadow mode executor with TP/FP tracking
- Graduation pipeline with confidence thresholds
- Auto-rollback on regression detection
- Cross-project pattern syncing

## UAT Suite Complete (14 scripts, 90 tests)
- test-core-detection.sh (6 tests)
- test-declarative-extractors.sh (5 tests)
- test-domain-frameworks.sh (5 tests)
- test-domain-unreal.sh (3 tests)
- test-llm-extraction.sh (6 tests)
- test-eval-harness.sh (5 tests)
- test-cross-language.sh (3 tests)
- test-precommit-performance.sh (4 tests)
- test-output-formats.sh (8 tests)
- test-drift-detection.sh (6 tests)
- test-exit-codes.sh (12 tests)
+ 3 more scripts

## Other Changes
- Updated roadmap to mark Phase 8-9 complete
- Added .gitignore entries for build artifacts
- Updated pre-commit: 800 line limit, exclude tests/data/cmd

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 22:50:55 -07:00

111 lines
3.1 KiB
Markdown

# LLM Extraction Optimization
> Systematic approach to maximizing Aphoria's LLM extraction quality.
## Quick Links
| Document | When to Use |
|----------|-------------|
| [Quick Start](./quickstart.md) | First time optimizing, want to get started fast |
| [Full Playbook](./playbook.md) | Comprehensive optimization guide with decision trees |
| [Baseline Template](./baselines/template.md) | Recording metrics after each optimization cycle |
| [Research Template](./research/template.md) | Investigating unknown issues or new approaches |
## Current Status
**Latest Baseline:** [2026-02-06](./baselines/2026-02-06.md)
| Metric | Current | Target | Status |
|--------|---------|--------|--------|
| Precision | 0.93 | 0.80 | ✅ Exceeded |
| Recall | 1.00 | 0.75 | ✅ Exceeded |
| F1 | 0.96 | 0.77 | ✅ Exceeded |
| Parse Rate | 100% | 95% | ✅ |
| Fixtures Passing | 10/10 | - | ✅ All pass |
**Verdict:** PASS - All metrics exceed targets.
## Directory Structure
```
docs/llm-optimization/
├── index.md # This file
├── quickstart.md # 15-minute getting started
├── playbook.md # Full optimization guide
├── baselines/ # Historical metrics
│ ├── template.md
│ └── YYYY-MM-DD.md # One per baseline
└── research/ # Investigation notes
├── template.md
└── [topic].md # One per research topic
```
## Key Commands
```bash
# Run evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live
# Check for regressions (CI)
aphoria eval run --mode cached --fail-on-regression
# Update baseline after improvements
aphoria eval update-baseline --force
# List fixtures
aphoria eval list-fixtures
# Validate fixtures
aphoria eval validate-fixtures
```
## Optimization Flow
```
1. Run baseline evaluation
2. Identify failure categories
3. Apply targeted fixes (one at a time!)
4. Validate: did metrics improve?
YES → Save new baseline, continue to next issue
NO → Revert, try different approach or research
5. Repeat until targets met
6. Set up CI to prevent regressions
```
## Fixture Locations
| Category | Path | Count |
|----------|------|-------|
| TLS | `tests/llm_fixtures/tls/` | 2 |
| JWT | `tests/llm_fixtures/jwt/` | 2 |
| Secrets | `tests/llm_fixtures/secrets/` | 2 |
| Auth | `tests/llm_fixtures/auth/` | 1 |
| Negative | `tests/llm_fixtures/negative/` | 2 |
| Edge | `tests/llm_fixtures/edge/` | 1 |
| **Total** | | **10** |
## Related Files
- **Prompt source:** `src/llm/prompts.rs`
- **Extractor:** `src/llm/extractor.rs`
- **Client:** `src/llm/client.rs`
- **Eval harness:** `src/eval/harness.rs`
- **Fixtures:** `tests/llm_fixtures/`
## Contributing Fixtures
See [Fixture Writing Guide](./playbook.md#appendix-b-fixture-writing-guide) in the playbook.
Quick checklist:
- [ ] Create TOML file in appropriate category folder
- [ ] Include both `must_contain` and `must_not_contain`
- [ ] Run `aphoria eval validate-fixtures`
- [ ] Test with `aphoria eval run --max-fixtures 1`
- [ ] Update `manifest.toml` category counts