stemedb/applications/aphoria/docs/llm-optimization/quickstart.md

# LLM Optimization Quick Start

> Get started with LLM extraction optimization in 15 minutes.

## Prerequisites

1. Aphoria built and working
2. `GEMINI_API_KEY` set in environment
3. Fixtures exist in `tests/llm_fixtures/`

## Step 1: Validate Setup (2 min)

```bash
# Check fixtures are valid
aphoria eval validate-fixtures --fixtures tests/llm_fixtures

# Expected: "All fixtures are valid."
```

## Step 2: Run Baseline (5 min)

```bash
# Run live evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live --format table
```

Record these numbers:
- Precision: ______
- Recall: ______
- F1: ______
- Parse Rate: ______%

## Step 3: Identify Priority (3 min)

Look at the output and answer:

| Question | Answer | Action |
|----------|--------|--------|
| Parse Rate < 95%? | Y/N | Fix output structure first |
| Recall < 70%? | Y/N | Add few-shot examples |
| Precision < 70%? | Y/N | Add negative examples |
| Many subject mismatches? | Y/N | Standardize vocabulary |

## Step 4: Make ONE Change (5 min)

Pick the highest-priority issue and make a single change:

### If Parse Issues:
Edit `llm/extractor.rs` - add response cleaning:
```rust
fn clean_response(raw: &str) -> String {
    raw.trim()
        .trim_start_matches("```json")
        .trim_start_matches("```")
        .trim_end_matches("```")
        .trim()
        .to_string()
}
```

### If Recall Issues:
Edit `llm/prompts.rs` - add examples:
```rust
const EXAMPLES: &str = r#"
Example: verify=False → {"subject": "tls/cert_verification", "predicate": "enabled", "value": false}
"#;
```

### If Precision Issues:
Edit `llm/prompts.rs` - add what NOT to flag:
```rust
const NEGATIVE_EXAMPLES: &str = r#"
Do NOT flag:
- verify=certifi.where() (using CA bundle, this is safe)
- API_KEY = os.environ['KEY'] (from environment, not hardcoded)
"#;
```

## Step 5: Validate Change

```bash
# Run eval again
aphoria eval run --fixtures tests/llm_fixtures --mode live --fail-on-regression
```

**If improved:** Save new baseline:
```bash
aphoria eval update-baseline --fixtures tests/llm_fixtures --force
```

**If regressed:** Revert change, try different approach.

## What's Next?

- Read full playbook: [playbook.md](./playbook.md)
- Add more fixtures: [playbook.md#fixture-writing-guide](./playbook.md#appendix-b-fixture-writing-guide)
- Set up CI: [playbook.md#ci-integration](./playbook.md#phase-5-ci-integration--monitoring)

## Common Commands

```bash
# Evaluate all fixtures
aphoria eval run --mode live

# Evaluate one category
aphoria eval run --mode live --category tls

# Use cached responses (fast, deterministic)
aphoria eval run --mode cached

# List all fixtures
aphoria eval list-fixtures

# Check for regressions (CI mode)
aphoria eval run --mode cached --fail-on-regression --threshold 0.05
```

## Troubleshooting

### "No fixtures found"
```bash
ls tests/llm_fixtures/
# Should see: manifest.toml, tls/, jwt/, etc.
```

### "API error"
```bash
echo $GEMINI_API_KEY
# Should show your key (not empty)
```

### "All fixtures failed"
```bash
# Run in mock mode to test harness
aphoria eval run --mode mock
# If this fails too, harness is broken
```

### "Results differ between runs"
- LLM is non-deterministic
- Use `--mode cached` for consistent results
- Set temperature to 0 in config (if supported)