## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
143 lines
3.2 KiB
Markdown
143 lines
3.2 KiB
Markdown
# LLM Optimization Quick Start
|
|
|
|
> Get started with LLM extraction optimization in 15 minutes.
|
|
|
|
## Prerequisites
|
|
|
|
1. Aphoria built and working
|
|
2. `GEMINI_API_KEY` set in environment
|
|
3. Fixtures exist in `tests/llm_fixtures/`
|
|
|
|
## Step 1: Validate Setup (2 min)
|
|
|
|
```bash
|
|
# Check fixtures are valid
|
|
aphoria eval validate-fixtures --fixtures tests/llm_fixtures
|
|
|
|
# Expected: "All fixtures are valid."
|
|
```
|
|
|
|
## Step 2: Run Baseline (5 min)
|
|
|
|
```bash
|
|
# Run live evaluation
|
|
aphoria eval run --fixtures tests/llm_fixtures --mode live --format table
|
|
```
|
|
|
|
Record these numbers:
|
|
- Precision: ______
|
|
- Recall: ______
|
|
- F1: ______
|
|
- Parse Rate: ______%
|
|
|
|
## Step 3: Identify Priority (3 min)
|
|
|
|
Look at the output and answer:
|
|
|
|
| Question | Answer | Action |
|
|
|----------|--------|--------|
|
|
| Parse Rate < 95%? | Y/N | Fix output structure first |
|
|
| Recall < 70%? | Y/N | Add few-shot examples |
|
|
| Precision < 70%? | Y/N | Add negative examples |
|
|
| Many subject mismatches? | Y/N | Standardize vocabulary |
|
|
|
|
## Step 4: Make ONE Change (5 min)
|
|
|
|
Pick the highest-priority issue and make a single change:
|
|
|
|
### If Parse Issues:
|
|
Edit `llm/extractor.rs` - add response cleaning:
|
|
```rust
|
|
fn clean_response(raw: &str) -> String {
|
|
raw.trim()
|
|
.trim_start_matches("```json")
|
|
.trim_start_matches("```")
|
|
.trim_end_matches("```")
|
|
.trim()
|
|
.to_string()
|
|
}
|
|
```
|
|
|
|
### If Recall Issues:
|
|
Edit `llm/prompts.rs` - add examples:
|
|
```rust
|
|
const EXAMPLES: &str = r#"
|
|
Example: verify=False → {"subject": "tls/cert_verification", "predicate": "enabled", "value": false}
|
|
"#;
|
|
```
|
|
|
|
### If Precision Issues:
|
|
Edit `llm/prompts.rs` - add what NOT to flag:
|
|
```rust
|
|
const NEGATIVE_EXAMPLES: &str = r#"
|
|
Do NOT flag:
|
|
- verify=certifi.where() (using CA bundle, this is safe)
|
|
- API_KEY = os.environ['KEY'] (from environment, not hardcoded)
|
|
"#;
|
|
```
|
|
|
|
## Step 5: Validate Change
|
|
|
|
```bash
|
|
# Run eval again
|
|
aphoria eval run --fixtures tests/llm_fixtures --mode live --fail-on-regression
|
|
```
|
|
|
|
**If improved:** Save new baseline:
|
|
```bash
|
|
aphoria eval update-baseline --fixtures tests/llm_fixtures --force
|
|
```
|
|
|
|
**If regressed:** Revert change, try different approach.
|
|
|
|
## What's Next?
|
|
|
|
- Read full playbook: [playbook.md](./playbook.md)
|
|
- Add more fixtures: [playbook.md#fixture-writing-guide](./playbook.md#appendix-b-fixture-writing-guide)
|
|
- Set up CI: [playbook.md#ci-integration](./playbook.md#phase-5-ci-integration--monitoring)
|
|
|
|
## Common Commands
|
|
|
|
```bash
|
|
# Evaluate all fixtures
|
|
aphoria eval run --mode live
|
|
|
|
# Evaluate one category
|
|
aphoria eval run --mode live --category tls
|
|
|
|
# Use cached responses (fast, deterministic)
|
|
aphoria eval run --mode cached
|
|
|
|
# List all fixtures
|
|
aphoria eval list-fixtures
|
|
|
|
# Check for regressions (CI mode)
|
|
aphoria eval run --mode cached --fail-on-regression --threshold 0.05
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "No fixtures found"
|
|
```bash
|
|
ls tests/llm_fixtures/
|
|
# Should see: manifest.toml, tls/, jwt/, etc.
|
|
```
|
|
|
|
### "API error"
|
|
```bash
|
|
echo $GEMINI_API_KEY
|
|
# Should show your key (not empty)
|
|
```
|
|
|
|
### "All fixtures failed"
|
|
```bash
|
|
# Run in mock mode to test harness
|
|
aphoria eval run --mode mock
|
|
# If this fails too, harness is broken
|
|
```
|
|
|
|
### "Results differ between runs"
|
|
- LLM is non-deterministic
|
|
- Use `--mode cached` for consistent results
|
|
- Set temperature to 0 in config (if supported)
|