# LLM Optimization Quick Start > Get started with LLM extraction optimization in 15 minutes. ## Prerequisites 1. Aphoria built and working 2. `GEMINI_API_KEY` set in environment 3. Fixtures exist in `tests/llm_fixtures/` ## Step 1: Validate Setup (2 min) ```bash # Check fixtures are valid aphoria eval validate-fixtures --fixtures tests/llm_fixtures # Expected: "All fixtures are valid." ``` ## Step 2: Run Baseline (5 min) ```bash # Run live evaluation aphoria eval run --fixtures tests/llm_fixtures --mode live --format table ``` Record these numbers: - Precision: ______ - Recall: ______ - F1: ______ - Parse Rate: ______% ## Step 3: Identify Priority (3 min) Look at the output and answer: | Question | Answer | Action | |----------|--------|--------| | Parse Rate < 95%? | Y/N | Fix output structure first | | Recall < 70%? | Y/N | Add few-shot examples | | Precision < 70%? | Y/N | Add negative examples | | Many subject mismatches? | Y/N | Standardize vocabulary | ## Step 4: Make ONE Change (5 min) Pick the highest-priority issue and make a single change: ### If Parse Issues: Edit `llm/extractor.rs` - add response cleaning: ```rust fn clean_response(raw: &str) -> String { raw.trim() .trim_start_matches("```json") .trim_start_matches("```") .trim_end_matches("```") .trim() .to_string() } ``` ### If Recall Issues: Edit `llm/prompts.rs` - add examples: ```rust const EXAMPLES: &str = r#" Example: verify=False → {"subject": "tls/cert_verification", "predicate": "enabled", "value": false} "#; ``` ### If Precision Issues: Edit `llm/prompts.rs` - add what NOT to flag: ```rust const NEGATIVE_EXAMPLES: &str = r#" Do NOT flag: - verify=certifi.where() (using CA bundle, this is safe) - API_KEY = os.environ['KEY'] (from environment, not hardcoded) "#; ``` ## Step 5: Validate Change ```bash # Run eval again aphoria eval run --fixtures tests/llm_fixtures --mode live --fail-on-regression ``` **If improved:** Save new baseline: ```bash aphoria eval update-baseline --fixtures tests/llm_fixtures --force ``` **If regressed:** Revert change, try different approach. ## What's Next? - Read full playbook: [playbook.md](./playbook.md) - Add more fixtures: [playbook.md#fixture-writing-guide](./playbook.md#appendix-b-fixture-writing-guide) - Set up CI: [playbook.md#ci-integration](./playbook.md#phase-5-ci-integration--monitoring) ## Common Commands ```bash # Evaluate all fixtures aphoria eval run --mode live # Evaluate one category aphoria eval run --mode live --category tls # Use cached responses (fast, deterministic) aphoria eval run --mode cached # List all fixtures aphoria eval list-fixtures # Check for regressions (CI mode) aphoria eval run --mode cached --fail-on-regression --threshold 0.05 ``` ## Troubleshooting ### "No fixtures found" ```bash ls tests/llm_fixtures/ # Should see: manifest.toml, tls/, jwt/, etc. ``` ### "API error" ```bash echo $GEMINI_API_KEY # Should show your key (not empty) ``` ### "All fixtures failed" ```bash # Run in mock mode to test harness aphoria eval run --mode mock # If this fails too, harness is broken ``` ### "Results differ between runs" - LLM is non-deterministic - Use `--mode cached` for consistent results - Set temperature to 0 in config (if supported)