jordan 157dbbb9eb feat: Complete Aphoria Phase 8-9 + UAT suite (90/90 tests passing)

## Phase 8: Enterprise Extractor Improvements ✅
- 14 security extractors (TLS, JWT, SQL injection, XSS, etc.)
- 10 framework-specific extractors (Spring, Django, Rails, etc.)
- Config file security detection (YAML, TOML)

## Phase 9: Autonomous Extractor Generation ✅
- Shadow mode executor with TP/FP tracking
- Graduation pipeline with confidence thresholds
- Auto-rollback on regression detection
- Cross-project pattern syncing

## UAT Suite Complete (14 scripts, 90 tests)
- test-core-detection.sh (6 tests)
- test-declarative-extractors.sh (5 tests)
- test-domain-frameworks.sh (5 tests)
- test-domain-unreal.sh (3 tests)
- test-llm-extraction.sh (6 tests)
- test-eval-harness.sh (5 tests)
- test-cross-language.sh (3 tests)
- test-precommit-performance.sh (4 tests)
- test-output-formats.sh (8 tests)
- test-drift-detection.sh (6 tests)
- test-exit-codes.sh (12 tests)
+ 3 more scripts

## Other Changes
- Updated roadmap to mark Phase 8-9 complete
- Added .gitignore entries for build artifacts
- Updated pre-commit: 800 line limit, exclude tests/data/cmd

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-06 22:50:55 -07:00

1.1 KiB

Raw Blame History

Baseline: YYYY-MM-DD

Prompt Version: X.Y.Z Model: gemini-2.0-flash Fixture Count: N

Overall Metrics

Metric	Value	Target
Precision	X.XX	0.80
Recall	X.XX	0.75
F1	X.XX	0.77
Parse Success	X.XX%	95%

Per-Category Breakdown

Category	Fixtures	Passed	Failed	Precision	Recall	F1
tls	N	N	N	X.XX	X.XX	X.XX
jwt	N	N	N	X.XX	X.XX	X.XX
secrets	N	N	N	X.XX	X.XX	X.XX
auth	N	N	N	X.XX	X.XX	X.XX
negative	N	N	N	X.XX	X.XX	X.XX
edge	N	N	N	X.XX	X.XX	X.XX

Failed Fixtures

ID	Category	Issue	Root Cause

Changes Since Last Baseline

Change 1
Change 2

Known Issues

Issue 1
Issue 2

Next Optimization Targets

Target 1
Target 2
Target 3

Raw Results

// Paste JSON output here for reference

1.1 KiB Raw Blame History

Baseline: YYYY-MM-DD

Overall Metrics

Per-Category Breakdown

Failed Fixtures

Changes Since Last Baseline

Known Issues

Next Optimization Targets

Raw Results

1.1 KiB

Raw Blame History