stemedb/applications/aphoria/docs/llm-optimization/quickstart.md
jordan 157dbbb9eb feat: Complete Aphoria Phase 8-9 + UAT suite (90/90 tests passing)
## Phase 8: Enterprise Extractor Improvements 
- 14 security extractors (TLS, JWT, SQL injection, XSS, etc.)
- 10 framework-specific extractors (Spring, Django, Rails, etc.)
- Config file security detection (YAML, TOML)

## Phase 9: Autonomous Extractor Generation 
- Shadow mode executor with TP/FP tracking
- Graduation pipeline with confidence thresholds
- Auto-rollback on regression detection
- Cross-project pattern syncing

## UAT Suite Complete (14 scripts, 90 tests)
- test-core-detection.sh (6 tests)
- test-declarative-extractors.sh (5 tests)
- test-domain-frameworks.sh (5 tests)
- test-domain-unreal.sh (3 tests)
- test-llm-extraction.sh (6 tests)
- test-eval-harness.sh (5 tests)
- test-cross-language.sh (3 tests)
- test-precommit-performance.sh (4 tests)
- test-output-formats.sh (8 tests)
- test-drift-detection.sh (6 tests)
- test-exit-codes.sh (12 tests)
+ 3 more scripts

## Other Changes
- Updated roadmap to mark Phase 8-9 complete
- Added .gitignore entries for build artifacts
- Updated pre-commit: 800 line limit, exclude tests/data/cmd

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 22:50:55 -07:00

143 lines
3.2 KiB
Markdown

# LLM Optimization Quick Start
> Get started with LLM extraction optimization in 15 minutes.
## Prerequisites
1. Aphoria built and working
2. `GEMINI_API_KEY` set in environment
3. Fixtures exist in `tests/llm_fixtures/`
## Step 1: Validate Setup (2 min)
```bash
# Check fixtures are valid
aphoria eval validate-fixtures --fixtures tests/llm_fixtures
# Expected: "All fixtures are valid."
```
## Step 2: Run Baseline (5 min)
```bash
# Run live evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live --format table
```
Record these numbers:
- Precision: ______
- Recall: ______
- F1: ______
- Parse Rate: ______%
## Step 3: Identify Priority (3 min)
Look at the output and answer:
| Question | Answer | Action |
|----------|--------|--------|
| Parse Rate < 95%? | Y/N | Fix output structure first |
| Recall < 70%? | Y/N | Add few-shot examples |
| Precision < 70%? | Y/N | Add negative examples |
| Many subject mismatches? | Y/N | Standardize vocabulary |
## Step 4: Make ONE Change (5 min)
Pick the highest-priority issue and make a single change:
### If Parse Issues:
Edit `llm/extractor.rs` - add response cleaning:
```rust
fn clean_response(raw: &str) -> String {
raw.trim()
.trim_start_matches("```json")
.trim_start_matches("```")
.trim_end_matches("```")
.trim()
.to_string()
}
```
### If Recall Issues:
Edit `llm/prompts.rs` - add examples:
```rust
const EXAMPLES: &str = r#"
Example: verify=False → {"subject": "tls/cert_verification", "predicate": "enabled", "value": false}
"#;
```
### If Precision Issues:
Edit `llm/prompts.rs` - add what NOT to flag:
```rust
const NEGATIVE_EXAMPLES: &str = r#"
Do NOT flag:
- verify=certifi.where() (using CA bundle, this is safe)
- API_KEY = os.environ['KEY'] (from environment, not hardcoded)
"#;
```
## Step 5: Validate Change
```bash
# Run eval again
aphoria eval run --fixtures tests/llm_fixtures --mode live --fail-on-regression
```
**If improved:** Save new baseline:
```bash
aphoria eval update-baseline --fixtures tests/llm_fixtures --force
```
**If regressed:** Revert change, try different approach.
## What's Next?
- Read full playbook: [playbook.md](./playbook.md)
- Add more fixtures: [playbook.md#fixture-writing-guide](./playbook.md#appendix-b-fixture-writing-guide)
- Set up CI: [playbook.md#ci-integration](./playbook.md#phase-5-ci-integration--monitoring)
## Common Commands
```bash
# Evaluate all fixtures
aphoria eval run --mode live
# Evaluate one category
aphoria eval run --mode live --category tls
# Use cached responses (fast, deterministic)
aphoria eval run --mode cached
# List all fixtures
aphoria eval list-fixtures
# Check for regressions (CI mode)
aphoria eval run --mode cached --fail-on-regression --threshold 0.05
```
## Troubleshooting
### "No fixtures found"
```bash
ls tests/llm_fixtures/
# Should see: manifest.toml, tls/, jwt/, etc.
```
### "API error"
```bash
echo $GEMINI_API_KEY
# Should show your key (not empty)
```
### "All fixtures failed"
```bash
# Run in mock mode to test harness
aphoria eval run --mode mock
# If this fails too, harness is broken
```
### "Results differ between runs"
- LLM is non-deterministic
- Use `--mode cached` for consistent results
- Set temperature to 0 in config (if supported)