stemedb/applications/aphoria/docs/llm-optimization/index.md

# LLM Extraction Optimization

> Systematic approach to maximizing Aphoria's LLM extraction quality.

## Quick Links

| Document | When to Use |
|----------|-------------|
| [Quick Start](./quickstart.md) | First time optimizing, want to get started fast |
| [Full Playbook](./playbook.md) | Comprehensive optimization guide with decision trees |
| [Baseline Template](./baselines/template.md) | Recording metrics after each optimization cycle |
| [Research Template](./research/template.md) | Investigating unknown issues or new approaches |

## Current Status

**Latest Baseline:** [2026-02-06](./baselines/2026-02-06.md)

| Metric | Current | Target | Status |
|--------|---------|--------|--------|
| Precision | 0.93 | 0.80 | ✅ Exceeded |
| Recall | 1.00 | 0.75 | ✅ Exceeded |
| F1 | 0.96 | 0.77 | ✅ Exceeded |
| Parse Rate | 100% | 95% | ✅ |
| Fixtures Passing | 10/10 | - | ✅ All pass |

**Verdict:** PASS - All metrics exceed targets.

## Directory Structure

```
docs/llm-optimization/
├── index.md              # This file
├── quickstart.md         # 15-minute getting started
├── playbook.md           # Full optimization guide
├── baselines/            # Historical metrics
│   ├── template.md
│   └── YYYY-MM-DD.md     # One per baseline
└── research/             # Investigation notes
    ├── template.md
    └── [topic].md        # One per research topic
```

## Key Commands

```bash
# Run evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live

# Check for regressions (CI)
aphoria eval run --mode cached --fail-on-regression

# Update baseline after improvements
aphoria eval update-baseline --force

# List fixtures
aphoria eval list-fixtures

# Validate fixtures
aphoria eval validate-fixtures
```

## Optimization Flow

```
1. Run baseline evaluation
       ↓
2. Identify failure categories
       ↓
3. Apply targeted fixes (one at a time!)
       ↓
4. Validate: did metrics improve?
       ↓
   YES → Save new baseline, continue to next issue
   NO  → Revert, try different approach or research
       ↓
5. Repeat until targets met
       ↓
6. Set up CI to prevent regressions
```

## Fixture Locations

| Category | Path | Count |
|----------|------|-------|
| TLS | `tests/llm_fixtures/tls/` | 2 |
| JWT | `tests/llm_fixtures/jwt/` | 2 |
| Secrets | `tests/llm_fixtures/secrets/` | 2 |
| Auth | `tests/llm_fixtures/auth/` | 1 |
| Negative | `tests/llm_fixtures/negative/` | 2 |
| Edge | `tests/llm_fixtures/edge/` | 1 |
| **Total** | | **10** |

## Related Files

- **Prompt source:** `src/llm/prompts.rs`
- **Extractor:** `src/llm/extractor.rs`
- **Client:** `src/llm/client.rs`
- **Eval harness:** `src/eval/harness.rs`
- **Fixtures:** `tests/llm_fixtures/`

## Contributing Fixtures

See [Fixture Writing Guide](./playbook.md#appendix-b-fixture-writing-guide) in the playbook.

Quick checklist:
- [ ] Create TOML file in appropriate category folder
- [ ] Include both `must_contain` and `must_not_contain`
- [ ] Run `aphoria eval validate-fixtures`
- [ ] Test with `aphoria eval run --max-fixtures 1`
- [ ] Update `manifest.toml` category counts