## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.2 KiB
3.2 KiB
LLM Optimization Quick Start
Get started with LLM extraction optimization in 15 minutes.
Prerequisites
- Aphoria built and working
GEMINI_API_KEYset in environment- Fixtures exist in
tests/llm_fixtures/
Step 1: Validate Setup (2 min)
# Check fixtures are valid
aphoria eval validate-fixtures --fixtures tests/llm_fixtures
# Expected: "All fixtures are valid."
Step 2: Run Baseline (5 min)
# Run live evaluation
aphoria eval run --fixtures tests/llm_fixtures --mode live --format table
Record these numbers:
- Precision: ______
- Recall: ______
- F1: ______
- Parse Rate: ______%
Step 3: Identify Priority (3 min)
Look at the output and answer:
| Question | Answer | Action |
|---|---|---|
| Parse Rate < 95%? | Y/N | Fix output structure first |
| Recall < 70%? | Y/N | Add few-shot examples |
| Precision < 70%? | Y/N | Add negative examples |
| Many subject mismatches? | Y/N | Standardize vocabulary |
Step 4: Make ONE Change (5 min)
Pick the highest-priority issue and make a single change:
If Parse Issues:
Edit llm/extractor.rs - add response cleaning:
fn clean_response(raw: &str) -> String {
raw.trim()
.trim_start_matches("```json")
.trim_start_matches("```")
.trim_end_matches("```")
.trim()
.to_string()
}
If Recall Issues:
Edit llm/prompts.rs - add examples:
const EXAMPLES: &str = r#"
Example: verify=False → {"subject": "tls/cert_verification", "predicate": "enabled", "value": false}
"#;
If Precision Issues:
Edit llm/prompts.rs - add what NOT to flag:
const NEGATIVE_EXAMPLES: &str = r#"
Do NOT flag:
- verify=certifi.where() (using CA bundle, this is safe)
- API_KEY = os.environ['KEY'] (from environment, not hardcoded)
"#;
Step 5: Validate Change
# Run eval again
aphoria eval run --fixtures tests/llm_fixtures --mode live --fail-on-regression
If improved: Save new baseline:
aphoria eval update-baseline --fixtures tests/llm_fixtures --force
If regressed: Revert change, try different approach.
What's Next?
- Read full playbook: playbook.md
- Add more fixtures: playbook.md#fixture-writing-guide
- Set up CI: playbook.md#ci-integration
Common Commands
# Evaluate all fixtures
aphoria eval run --mode live
# Evaluate one category
aphoria eval run --mode live --category tls
# Use cached responses (fast, deterministic)
aphoria eval run --mode cached
# List all fixtures
aphoria eval list-fixtures
# Check for regressions (CI mode)
aphoria eval run --mode cached --fail-on-regression --threshold 0.05
Troubleshooting
"No fixtures found"
ls tests/llm_fixtures/
# Should see: manifest.toml, tls/, jwt/, etc.
"API error"
echo $GEMINI_API_KEY
# Should show your key (not empty)
"All fixtures failed"
# Run in mock mode to test harness
aphoria eval run --mode mock
# If this fails too, harness is broken
"Results differ between runs"
- LLM is non-deterministic
- Use
--mode cachedfor consistent results - Set temperature to 0 in config (if supported)