This commit is contained in:
jordan 2026-02-07 19:51:05 -07:00
parent 183238d6ea
commit e0d2940b82
17 changed files with 801 additions and 308 deletions

View File

@ -0,0 +1,235 @@
---
name: aphoria-self-review
description: Run Self-Review SOP on Aphoria scan results. Use when evaluating scan quality, reducing noise, or auditing coverage after running Aphoria. Triggers on "review aphoria scan", "check scan quality", "reduce aphoria noise", "aphoria self-review".
---
# Aphoria Self-Review Skill
You are a security tool usability researcher combining SIEM analyst (signal-to-noise optimization), UX researcher (persona value assessment), and test engineer (coverage mapping) perspectives. Your job is to evaluate Aphoria scan quality and produce actionable recommendations for noise reduction and coverage improvement.
## Core Metrics
| Metric | Formula | Target | Interpretation |
|--------|---------|--------|----------------|
| Actionability Ratio | (Block + Flag) / Total | >= 0.30 | If < 0.30, too much noise |
| Persona Value Score | Useful / (Useful + Noise) | >= 0.70 | Per-persona threshold |
| Conflict Precision | True Positives / Total Conflicts | >= 0.80 | False positive rate |
## Prerequisites
Before running this skill:
1. Verify a scan exists with JSON output. Run `aphoria scan --persist --format json` if needed.
2. Have access to the scan results (conflicts, claims, verdicts).
3. Know which persona is primary (Developer, Auditor, ADK User, or SDET).
## Phase 0: Load Scan Results
1. Read the scan output file or re-run with `--format json`.
2. Count total claims extracted.
3. Count Block, Flag, Pass, and Ack verdicts.
4. Identify the primary persona (ask user if unclear).
## Phase 1: Signal-to-Noise Analysis
5. Calculate actionability ratio: `(Block + Flag) / Total Claims`.
6. Compare against 0.30 threshold.
7. List top 5 claim types by volume.
8. Classify each top claim type as Signal (useful) or Noise (ignorable).
9. Identify extraction patterns that produce the most noise.
10. Document the specific files/directories generating noise.
Use checklist: [checklists/signal-noise.md](./checklists/signal-noise.md)
## Phase 2: Persona Value Audit
11. For **Developer** persona: What claims help "fix before commit"?
12. For **Auditor** persona: What claims reveal risk posture?
13. For **ADK User** persona: What claims provide useful agent context?
14. For **SDET** persona: What claims validate test coverage?
15. Calculate persona value score for each: `Useful / (Useful + Noise)`.
16. Flag personas with score < 0.70 as needing improvement.
Use checklist: [checklists/persona-value.md](./checklists/persona-value.md)
## Phase 3: Coverage Analysis
17. Map each active extractor to file types it processes.
18. Identify file types with zero claims extracted.
19. Check for blind spots in security-critical paths: `auth/`, `crypto/`, `network/`, `secrets/`.
20. Flag coverage gaps in critical paths as BLOCKER.
21. Document acceptable gaps (test fixtures, demos, third-party).
Use checklist: [checklists/coverage.md](./checklists/coverage.md)
## Phase 4: Conflict Quality Review
22. Sample at least 10 conflicts for manual review.
23. For each conflict, determine: True Positive or False Positive?
24. Calculate conflict precision: `True Positives / Total Sampled`.
25. Document false positive patterns (common causes).
26. Identify extractors with highest false positive rates.
27. Note conflicts that are technically correct but not actionable.
Use checklist: [checklists/conflict-quality.md](./checklists/conflict-quality.md)
## Phase 5: Recommendations
28. Propose `.aphoriaignore` patterns for directory-level noise.
29. Suggest extractors to disable in `aphoria.toml` for extractor-level noise.
30. Recommend inline `// aphoria-ignore: reason` for one-off suppressions.
31. Identify missing extractors that would catch current blind spots.
32. Suggest threshold tuning for low-confidence noise.
33. Prioritize recommendations by impact (noise reduction potential).
## Phase 6: Action Plan
34. Create ordered list of changes by priority.
35. Provide exact config changes (copy-pasteable).
36. Define verification steps: re-scan and measure improvement.
37. Set targets for next review cycle.
## Decision Points
### Decision 1: Signal-to-Noise Acceptable?
Stop and evaluate:
| Condition | Next Action |
|-----------|-------------|
| Ratio >= 0.30 | Proceed to Phase 2 (persona audit) |
| Ratio < 0.30 AND < 10 total claims | Coverage problem - investigate extractors |
| Ratio < 0.30 AND >= 10 claims | Noise suppression needed - focus on Phase 5 |
### Decision 2: Noise Reduction Strategy?
Stop and choose approach based on noise pattern:
| Pattern | Strategy | Implementation |
|---------|----------|----------------|
| Entire directories are noise | .aphoriaignore | Add directory glob patterns |
| Specific extractor produces noise | aphoria.toml | Disable extractor for project |
| Specific file/line is noise | Inline ignore | Add `// aphoria-ignore: reason` |
| Low confidence claims are noise | Threshold tuning | Raise confidence threshold |
### Decision 3: Coverage Gaps Acceptable?
Stop and evaluate each gap:
| Gap Location | Verdict | Action |
|--------------|---------|--------|
| auth/, crypto/, network/, secrets/ | BLOCKER | Must add extractors or investigate |
| test/, fixtures/, examples/ | ACCEPTABLE | Document and proceed |
| vendor/, third-party/, node_modules/ | ACCEPTABLE | Document exclusion rationale |
| Other production code | WARNING | Investigate why no claims |
## Output Template
Generate a markdown report following this structure:
```markdown
# Aphoria Self-Review Report
**Project:** {project_name}
**Scan Date:** {date}
**Primary Persona:** {Developer/Auditor/ADK User/SDET}
**Total Claims:** {n}
**Total Conflicts:** {n}
## Signal-to-Noise Analysis
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| Actionability Ratio | {x.xx} | >= 0.30 | {PASS/FAIL} |
| Block Verdicts | {n} | - | - |
| Flag Verdicts | {n} | - | - |
| Pass Verdicts | {n} | - | - |
### Top Claim Types
| Claim Type | Count | Classification |
|------------|-------|----------------|
| {type} | {n} | Signal/Noise |
| ... | ... | ... |
## Persona Value Audit
| Persona | Useful | Noise | Score | Status |
|---------|--------|-------|-------|--------|
| Developer | {n} | {n} | {x.xx} | {PASS/FAIL} |
| Auditor | {n} | {n} | {x.xx} | {PASS/FAIL} |
| ADK User | {n} | {n} | {x.xx} | {PASS/FAIL} |
| SDET | {n} | {n} | {x.xx} | {PASS/FAIL} |
## Coverage Analysis
### Active Extractors
| Extractor | File Types | Claims |
|-----------|------------|--------|
| {name} | {types} | {n} |
| ... | ... | ... |
### Coverage Gaps
| Path Pattern | Severity | Rationale |
|--------------|----------|-----------|
| {pattern} | {BLOCKER/WARNING/OK} | {reason} |
## Conflict Quality
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| Conflict Precision | {x.xx} | >= 0.80 | {PASS/FAIL} |
| True Positives | {n} | - | - |
| False Positives | {n} | - | - |
### False Positive Patterns
{List common false positive causes}
## Recommendations
### Priority 1: Immediate Actions
{List high-impact changes}
### Priority 2: Configuration Changes
{Exact config changes, copy-pasteable}
### Priority 3: Future Improvements
{Longer-term suggestions}
## Action Plan
1. {First action with exact command/change}
2. {Second action}
3. Re-scan: `aphoria scan --persist --format json`
4. Measure improvement against baseline
## Targets for Next Review
- Actionability Ratio: {current} -> {target}
- Conflict Precision: {current} -> {target}
- Primary Persona Score: {current} -> {target}
```
## Constraints
1. **Never fabricate data.** All metrics must come from actual scan results.
2. **Sample fairly.** When reviewing conflicts, sample across extractors and file types.
3. **Document rationale.** Every recommendation needs a clear "why".
4. **Be specific.** Config changes must be copy-pasteable.
5. **Measure twice.** Always define verification steps.
## Example Invocations
```bash
# User triggers
"review aphoria scan"
"check scan quality"
"reduce aphoria noise"
"aphoria self-review"
"audit scan coverage"
"why are there so many conflicts?"
```
## Related Skills
- `aphoria-dev`: Development guidelines for Aphoria
- `aphoria-remediate`: Fix conflicts after identifying them
- `aphoria-install`: Install and configure Aphoria

View File

@ -0,0 +1,124 @@
# Conflict Quality Checklist
Use this checklist during Phase 4 of the Self-Review SOP.
## Sampling Strategy
- Total conflicts in scan: ___
- Sample size (minimum 10): ___
- Sampling approach: Random / Stratified by extractor / Stratified by file
## Conflict Review Template
For each sampled conflict:
### Conflict #1
- **File:**
- **Line:**
- **Claim:**
- **Conflicting Authority:**
- **Conflict Score:**
- **Verdict:** Block / Flag / Pass
**Assessment:**
- [ ] True Positive: Real conflict that matters
- [ ] False Positive: Not actually a conflict
- [ ] True but Not Actionable: Correct but user can't/won't fix
**Rationale:**
---
### Conflict #2
- **File:**
- **Line:**
- **Claim:**
- **Conflicting Authority:**
- **Conflict Score:**
- **Verdict:** Block / Flag / Pass
**Assessment:**
- [ ] True Positive
- [ ] False Positive
- [ ] True but Not Actionable
**Rationale:**
---
(Repeat for 10+ conflicts)
## Precision Calculation
| Category | Count |
|----------|-------|
| True Positives | |
| False Positives | |
| True but Not Actionable | |
| **Total Sampled** | |
**Conflict Precision:** True Positives / Total = ___ / ___ = ___
**Status:** PASS (>= 0.80) / FAIL (< 0.80)
## False Positive Pattern Analysis
Group false positives by cause:
| Pattern | Count | Examples | Root Cause |
|---------|-------|----------|------------|
| Wrong concept path matching | | | |
| Stale authority data | | | |
| Context not considered | | | |
| Overly broad regex | | | |
| Test/fixture misidentified | | | |
| Other: ___ | | | |
## Extractor Quality
| Extractor | Conflicts Sampled | True Positive Rate | Issues |
|-----------|-------------------|-------------------|--------|
| | | | |
| | | | |
| | | | |
## High-Value True Positives
List conflicts that provided genuine value:
1. **File:** ___ — Why: ___
2. **File:** ___ — Why: ___
3. **File:** ___ — Why: ___
## Problematic False Positives
List conflicts that wasted time or caused confusion:
1. **File:** ___ — Problem: ___
2. **File:** ___ — Problem: ___
3. **File:** ___ — Problem: ___
## Authority Quality Issues
| Authority Source | Issue | Impact |
|-----------------|-------|--------|
| | | |
| | | |
## Recommendations from Conflict Review
Based on the conflict quality analysis:
1. **Suppress these patterns:** ___
2. **Fix these extractors:** ___
3. **Update these authorities:** ___
4. **Tune these thresholds:** ___
## Outcome
- [ ] Sample size >= 10
- [ ] All conflicts assessed
- [ ] Precision calculated
- [ ] False positive patterns documented
- [ ] Problematic extractors identified
- [ ] Recommendations generated

View File

@ -0,0 +1,104 @@
# Coverage Analysis Checklist
Use this checklist during Phase 3 of the Self-Review SOP.
## Active Extractors Inventory
List all extractors that produced claims:
| Extractor | File Types | Claims Count | Top Files |
|-----------|------------|--------------|-----------|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## File Type Coverage
| File Extension | Extractor(s) | Claims | Status |
|----------------|--------------|--------|--------|
| .rs | | | Covered / Gap |
| .toml | | | Covered / Gap |
| .json | | | Covered / Gap |
| .yaml/.yml | | | Covered / Gap |
| .go | | | Covered / Gap |
| .py | | | Covered / Gap |
| .ts/.js | | | Covered / Gap |
| .env | | | Covered / Gap |
| Dockerfile | | | Covered / Gap |
| Other: ___ | | | Covered / Gap |
## Security-Critical Path Audit
These paths MUST have coverage:
### Authentication (`auth/`, `authn/`, `login/`)
- [ ] Path exists in project: YES / NO
- [ ] Claims extracted: ___
- [ ] Extractors active: ___
- [ ] **Status:** Covered / BLOCKER
### Cryptography (`crypto/`, `encryption/`, `tls/`, `ssl/`)
- [ ] Path exists in project: YES / NO
- [ ] Claims extracted: ___
- [ ] Extractors active: ___
- [ ] **Status:** Covered / BLOCKER
### Networking (`network/`, `http/`, `api/`, `rpc/`)
- [ ] Path exists in project: YES / NO
- [ ] Claims extracted: ___
- [ ] Extractors active: ___
- [ ] **Status:** Covered / BLOCKER
### Secrets (`secrets/`, `credentials/`, `.env`)
- [ ] Path exists in project: YES / NO
- [ ] Claims extracted: ___
- [ ] Extractors active: ___
- [ ] **Status:** Covered / BLOCKER
### Authorization (`authz/`, `permissions/`, `acl/`, `rbac/`)
- [ ] Path exists in project: YES / NO
- [ ] Claims extracted: ___
- [ ] Extractors active: ___
- [ ] **Status:** Covered / BLOCKER
## Acceptable Gaps
Gaps that are expected and documented:
| Path Pattern | Reason for Exclusion |
|--------------|---------------------|
| `test/`, `tests/`, `*_test.rs` | Test fixtures, not production |
| `fixtures/`, `testdata/` | Mock data for testing |
| `examples/`, `demo/` | Documentation, not production |
| `vendor/`, `node_modules/` | Third-party code |
| `target/`, `dist/`, `build/` | Generated artifacts |
## Zero-Extraction Analysis
Files/directories with no claims:
| Path | Expected? | Investigation Needed? |
|------|-----------|----------------------|
| | YES / NO | YES / NO |
| | YES / NO | YES / NO |
| | YES / NO | YES / NO |
## Missing Extractor Analysis
Patterns that should be extracted but aren't:
| Pattern | Example File | Suggested Extractor |
|---------|--------------|---------------------|
| | | |
| | | |
## Outcome
- [ ] All active extractors documented
- [ ] File type coverage assessed
- [ ] Security-critical paths checked
- [ ] BLOCKER gaps identified: ___
- [ ] Acceptable gaps documented
- [ ] Missing extractors identified

View File

@ -0,0 +1,130 @@
# Persona Value Audit Checklist
Use this checklist during Phase 2 of the Self-Review SOP.
## Persona Definitions
### Developer Persona
**Question:** "What do I need to fix before commit?"
Useful claims:
- Security misconfigurations that block merge
- API contract violations
- Deprecated pattern usage
- Missing required validations
Noise claims:
- Informational metadata
- Correct defaults that don't need action
- Claims about unchanged code
### Auditor Persona
**Question:** "What's the risk posture?"
Useful claims:
- Security-relevant configurations
- Compliance violations
- Trust boundary crossings
- Cryptographic choices
Noise claims:
- Non-security configurations
- Style/formatting observations
- Internal implementation details
### ADK User Persona
**Question:** "What context should my agent see?"
Useful claims:
- API endpoints and contracts
- Authentication requirements
- Rate limits and quotas
- Error handling patterns
Noise claims:
- Internal implementation details
- Test-only configurations
- Build system metadata
### SDET Persona
**Question:** "Are we testing the right things?"
Useful claims:
- Security-critical code paths
- External integrations
- Error conditions
- Configuration variations
Noise claims:
- Already-tested patterns
- Trivial configurations
- Generated code
## Per-Persona Audit
### Developer
| Claim Type | Count | Classification | Rationale |
|------------|-------|----------------|-----------|
| | | Useful / Noise | |
| | | Useful / Noise | |
| | | Useful / Noise | |
- Useful claims: ___
- Noise claims: ___
- **Persona Value Score:** ___ / ___ = ___
- **Status:** PASS (>= 0.70) / FAIL (< 0.70)
### Auditor
| Claim Type | Count | Classification | Rationale |
|------------|-------|----------------|-----------|
| | | Useful / Noise | |
| | | Useful / Noise | |
| | | Useful / Noise | |
- Useful claims: ___
- Noise claims: ___
- **Persona Value Score:** ___ / ___ = ___
- **Status:** PASS (>= 0.70) / FAIL (< 0.70)
### ADK User
| Claim Type | Count | Classification | Rationale |
|------------|-------|----------------|-----------|
| | | Useful / Noise | |
| | | Useful / Noise | |
| | | Useful / Noise | |
- Useful claims: ___
- Noise claims: ___
- **Persona Value Score:** ___ / ___ = ___
- **Status:** PASS (>= 0.70) / FAIL (< 0.70)
### SDET
| Claim Type | Count | Classification | Rationale |
|------------|-------|----------------|-----------|
| | | Useful / Noise | |
| | | Useful / Noise | |
| | | Useful / Noise | |
- Useful claims: ___
- Noise claims: ___
- **Persona Value Score:** ___ / ___ = ___
- **Status:** PASS (>= 0.70) / FAIL (< 0.70)
## Cross-Persona Analysis
Claims useful to multiple personas (high value):
- ___
Claims that are noise for ALL personas (should suppress):
- ___
## Outcome
- [ ] Primary persona identified
- [ ] All four personas audited
- [ ] Scores calculated
- [ ] Failing personas identified for improvement

View File

@ -0,0 +1,76 @@
# Signal-to-Noise Checklist
Use this checklist during Phase 1 of the Self-Review SOP.
## Verdict Counts
- [ ] Count Block verdicts: ___
- [ ] Count Flag verdicts: ___
- [ ] Count Pass verdicts: ___
- [ ] Count Ack verdicts: ___
- [ ] Total claims: ___
## Actionability Ratio
Formula: `(Block + Flag) / Total`
- [ ] Calculate ratio: ___
- [ ] Compare to threshold (0.30): PASS / FAIL
## Top Claim Types
List the 5 most frequent claim types:
| Rank | Claim Type | Count | Classification |
|------|------------|-------|----------------|
| 1 | | | Signal / Noise |
| 2 | | | Signal / Noise |
| 3 | | | Signal / Noise |
| 4 | | | Signal / Noise |
| 5 | | | Signal / Noise |
## Signal Classification Criteria
A claim is **Signal** if it helps answer:
- "What do I need to fix before commit?" (Developer)
- "What's the risk posture?" (Auditor)
- "What context should my agent see?" (ADK User)
- "Are we testing the right things?" (SDET)
A claim is **Noise** if:
- It describes build metadata (Cargo.toml version, package name)
- It extracts from test fixtures or mock data
- It duplicates information available elsewhere
- It has no actionable remediation
- It would never block a commit or change a decision
## Noise Source Analysis
For each noise claim type, identify:
| Claim Type | Source Pattern | Noise Cause |
|------------|----------------|-------------|
| | Files: | Reason: |
| | Files: | Reason: |
| | Files: | Reason: |
## Common Noise Patterns
- [ ] Build/package metadata (Cargo.toml, package.json)
- [ ] Test fixtures and mock data
- [ ] Documentation examples
- [ ] Generated code
- [ ] Vendored dependencies
- [ ] Configuration defaults that are correct
## Noise Volume Assessment
- High noise directories: ___
- High noise file patterns: ___
- High noise extractors: ___
## Outcome
- [ ] Actionability ratio meets threshold (>= 0.30)
- [ ] Top noise sources identified
- [ ] Suppression strategy selected (see Decision Point 2)

View File

@ -47,6 +47,7 @@ A probabilistic knowledge graph database that stores Claims, not Facts. Append-o
| **Aphoria LLM eval** | Load skill: `aphoria-llm-optimization` |
| **General LLM optimization** | Load skill: `llm-optimization` |
| **Install Aphoria** | Load skill: `aphoria-install` |
| **Run Aphoria self-review** | Load skill: `aphoria-self-review` |
## Roadmap Maintenance

View File

@ -93,7 +93,7 @@ impl Default for TimeoutExtractorConfig {
impl Default for DepVersionConfig {
fn default() -> Self {
Self {
enabled: false, // OPT-IN: Disabled by default to reduce noise
enabled: false, // OPT-IN: Disabled by default to reduce noise
advisory_db: dirs_default_advisory_db(),
}
}

View File

@ -40,23 +40,16 @@ impl ApiKeySecurityExtractor {
// Rust: require_for_all: false
// Go: RequireForAll: false
// YAML: require_for_all: false
require_for_all_false: Regex::new(
r#"(?i)require_?for_?all\s*[:=]\s*false"#
)
.expect("valid regex"),
require_for_all_false: Regex::new(r#"(?i)require_?for_?all\s*[:=]\s*false"#)
.expect("valid regex"),
// Look for public_paths arrays - we'll count entries manually
// Handles Rust vec![...], Go []string{...}, YAML lists
public_paths_array: Regex::new(
r#"(?i)public_?paths\s*[:=]\s*(?:vec!|[\[\{])"#
)
.expect("valid regex"),
public_paths_array: Regex::new(r#"(?i)public_?paths\s*[:=]\s*(?:vec!|[\[\{])"#)
.expect("valid regex"),
// Using default rate limit constant
default_rate_limit: Regex::new(
r"DEFAULT_API_KEY_RATE_LIMIT"
)
.expect("valid regex"),
default_rate_limit: Regex::new(r"DEFAULT_API_KEY_RATE_LIMIT").expect("valid regex"),
}
}
@ -157,7 +150,8 @@ impl Extractor for ApiKeySecurityExtractor {
line: line_num,
matched_text: line.trim().to_string(),
confidence,
description: "API key not required for all endpoints (require_for_all: false)".to_string(),
description: "API key not required for all endpoints (require_for_all: false)"
.to_string(),
});
}
@ -199,7 +193,8 @@ impl Extractor for ApiKeySecurityExtractor {
line: line_num,
matched_text: line.trim().to_string(),
confidence: confidence * 0.7, // Lower confidence - might be intentional
description: "Using default API key rate limit without customization".to_string(),
description: "Using default API key rate limit without customization"
.to_string(),
});
}
}
@ -251,12 +246,8 @@ api:
- /metrics
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Yaml,
"config/api.yaml",
);
let claims =
extractor.extract(&["config".to_string()], content, Language::Yaml, "config/api.yaml");
assert!(!claims.is_empty());
let require_claim = claims.iter().find(|c| c.predicate == "require_api_key");
@ -278,12 +269,8 @@ api:
]
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/middleware.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/middleware.rs");
assert!(!claims.is_empty());
let paths_claim = claims.iter().find(|c| c.predicate == "public_paths_count");
@ -308,12 +295,8 @@ api:
]
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/middleware.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/middleware.rs");
// Should not flag this - only 3 paths
let paths_claim = claims.iter().find(|c| c.predicate == "public_paths_count");
@ -327,12 +310,8 @@ api:
let rate_limit = record.rate_limit.unwrap_or(DEFAULT_API_KEY_RATE_LIMIT);
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/handlers.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/handlers.rs");
assert!(!claims.is_empty());
let rate_claim = claims.iter().find(|c| c.predicate == "using_default");
@ -346,12 +325,8 @@ api:
pub const DEFAULT_API_KEY_RATE_LIMIT: u64 = 10_000;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
// Should not flag constant definition
let rate_claim = claims.iter().find(|c| c.predicate == "using_default");
@ -388,12 +363,7 @@ config := &AuthConfig{
}
"#;
let claims = extractor.extract(
&["go".to_string()],
content,
Language::Go,
"config.go",
);
let claims = extractor.extract(&["go".to_string()], content, Language::Go, "config.go");
assert!(!claims.is_empty());
let require_claim = claims.iter().find(|c| c.predicate == "require_api_key");

View File

@ -37,17 +37,12 @@ impl CircuitBreakerConfigExtractor {
pub fn new() -> Self {
Self {
// YAML/TOML: circuit_breaker_enabled: false
disabled_pattern: Regex::new(
r#"(?i)circuit_?breaker_?enabled\s*[:=]\s*false"#
)
.expect("valid regex"),
disabled_pattern: Regex::new(r#"(?i)circuit_?breaker_?enabled\s*[:=]\s*false"#)
.expect("valid regex"),
// Look for lines with just "enabled: false" in circuit breaker context
// We'll rely on the first pattern for most cases
config_disabled: Regex::new(
r"(?i)^\s*enabled\s*:\s*false"
)
.expect("valid regex"),
config_disabled: Regex::new(r"(?i)^\s*enabled\s*:\s*false").expect("valid regex"),
}
}
@ -67,13 +62,7 @@ impl Extractor for CircuitBreakerConfigExtractor {
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Yaml,
Language::Toml,
Language::Json,
]
&[Language::Rust, Language::Go, Language::Yaml, Language::Toml, Language::Json]
}
fn extract(
@ -143,12 +132,8 @@ api:
timeout: 30s
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Yaml,
"config/api.yaml",
);
let claims =
extractor.extract(&["config".to_string()], content, Language::Yaml, "config/api.yaml");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].predicate, "enabled");
@ -165,12 +150,8 @@ circuit_breaker_enabled = false
timeout = 30
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Toml,
"config.toml",
);
let claims =
extractor.extract(&["config".to_string()], content, Language::Toml, "config.toml");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].value, ObjectValue::Boolean(false));
@ -186,12 +167,8 @@ timeout = 30
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].predicate, "enabled");
@ -206,12 +183,8 @@ api:
failure_threshold: 5
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Yaml,
"config/api.yaml",
);
let claims =
extractor.extract(&["config".to_string()], content, Language::Yaml, "config/api.yaml");
// Should not flag when enabled
assert_eq!(claims.len(), 0);
@ -224,12 +197,8 @@ api:
circuit_breaker_enabled: false
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config_test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config_test.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5);
@ -244,12 +213,7 @@ config := Config{
}
"#;
let claims = extractor.extract(
&["go".to_string()],
content,
Language::Go,
"config.go",
);
let claims = extractor.extract(&["go".to_string()], content, Language::Go, "config.go");
assert_eq!(claims.len(), 1);
}

View File

@ -39,13 +39,13 @@ impl ConstDeclarationsExtractor {
Self {
// const RAPL_POWER_UNIT: u32 = 0x606;
const_decl: Regex::new(
r"^\s*(?:pub\s+)?const\s+([A-Z_][A-Z0-9_]*)\s*:\s*(\w+)\s*=\s*([^;]+);"
r"^\s*(?:pub\s+)?const\s+([A-Z_][A-Z0-9_]*)\s*:\s*(\w+)\s*=\s*([^;]+);",
)
.expect("valid regex"),
// static MAX_CONNECTIONS: usize = 100;
static_decl: Regex::new(
r"^\s*(?:pub\s+)?static\s+([A-Z_][A-Z0-9_]*)\s*:\s*(\w+)\s*=\s*([^;]+);"
r"^\s*(?:pub\s+)?static\s+([A-Z_][A-Z0-9_]*)\s*:\s*(\w+)\s*=\s*([^;]+);",
)
.expect("valid regex"),
}
@ -53,12 +53,7 @@ impl ConstDeclarationsExtractor {
/// Clean up the value string (remove comments, whitespace).
fn clean_value(&self, value: &str) -> String {
value
.split("//")
.next()
.unwrap_or(value)
.trim()
.to_string()
value.split("//").next().unwrap_or(value).trim().to_string()
}
/// Determine confidence based on context.
@ -200,12 +195,8 @@ const RAPL_POWER_UNIT: u32 = 0x606;
pub const DEFAULT_TIMEOUT: u64 = 30;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].value, ObjectValue::Text("30".to_string()));
@ -218,12 +209,8 @@ pub const DEFAULT_TIMEOUT: u64 = 30;
static MAX_CONNECTIONS: usize = 100;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/server.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/server.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("static"));
@ -237,12 +224,8 @@ static MAX_CONNECTIONS: usize = 100;
const TIMEOUT_MS: u64 = 5000; // 5 seconds
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
assert_eq!(claims.len(), 1);
// Comment should be stripped
@ -256,12 +239,8 @@ const TIMEOUT_MS: u64 = 5000; // 5 seconds
const TEST_VALUE: u32 = 42;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib_test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib_test.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5);

View File

@ -85,7 +85,8 @@ impl DepVersionsExtractor {
if in_dependencies {
if let Some(captures) = self.cargo_dep.captures(line) {
let package = captures.get(1).map(|m| m.as_str()).unwrap_or("");
let version = captures.get(2).or(captures.get(3)).map(|m| m.as_str()).unwrap_or("");
let version =
captures.get(2).or(captures.get(3)).map(|m| m.as_str()).unwrap_or("");
if !package.is_empty() && !version.is_empty() && version != "*" {
// Record the dependency for potential advisory lookup
@ -397,7 +398,8 @@ tokio = "1.28"
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("tokio"));
assert!(!claims.iter().any(|c| c.concept_path.contains("name")));
assert!(!claims.iter().any(|c| c.concept_path.contains("version") && c.value == ObjectValue::Text("0.1.0".to_string())));
assert!(!claims.iter().any(|c| c.concept_path.contains("version")
&& c.value == ObjectValue::Text("0.1.0".to_string())));
}
#[test]

View File

@ -39,16 +39,11 @@ impl DerivePatternExtractor {
pub fn new() -> Self {
Self {
// Matches: #[derive(Debug, Clone, Serialize)]
derive_attr: Regex::new(
r#"#\[derive\s*\((.*?)\)\]"#
)
.expect("valid regex"),
derive_attr: Regex::new(r#"#\[derive\s*\((.*?)\)\]"#).expect("valid regex"),
// Matches struct/enum declarations
type_decl: Regex::new(
r"^\s*(?:pub\s+)?(?:struct|enum)\s+([A-Z][a-zA-Z0-9_]*)"
)
.expect("valid regex"),
type_decl: Regex::new(r"^\s*(?:pub\s+)?(?:struct|enum)\s+([A-Z][a-zA-Z0-9_]*)")
.expect("valid regex"),
}
}
@ -78,8 +73,11 @@ impl DerivePatternExtractor {
"error"
} else if type_name.ends_with("Config") || type_name.ends_with("Settings") {
"config"
} else if type_name.ends_with("Request") || type_name.ends_with("Response")
|| type_name.ends_with("Message") || type_name.ends_with("Event") {
} else if type_name.ends_with("Request")
|| type_name.ends_with("Response")
|| type_name.ends_with("Message")
|| type_name.ends_with("Event")
{
"message"
} else if derives.iter().any(|d| d == "Serialize" || d == "Deserialize") {
"data" // Serializable types
@ -239,12 +237,8 @@ pub enum WalletError {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/error.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/error.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("error"));
@ -261,12 +255,8 @@ pub struct AppConfig {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("config"));
@ -283,12 +273,8 @@ pub struct Wallet {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
assert_eq!(claims.len(), 1);
}
@ -303,12 +289,8 @@ struct TestHelper {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/wallet_test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/wallet_test.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5);
@ -322,12 +304,8 @@ struct TestHelper {
struct Foo {}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
assert_eq!(claims.len(), 1);
// Should be alphabetically sorted
@ -357,7 +335,12 @@ pub struct AckMessage {
"#;
let claims = extractor.extract(
&["rust".to_string(), "maxwell".to_string(), "vsock".to_string(), "messages".to_string()],
&[
"rust".to_string(),
"maxwell".to_string(),
"vsock".to_string(),
"messages".to_string(),
],
content,
Language::Rust,
"src/vsock/messages.rs",

View File

@ -41,28 +41,24 @@ impl DurabilityConfigExtractor {
pub fn new() -> Self {
Self {
// Rust: DurabilityLevel::Eventual | ::Batched | ::Immediate
durability_enum: Regex::new(
r"DurabilityLevel::(Eventual|Batched|Immediate)"
)
.expect("valid regex"),
durability_enum: Regex::new(r"DurabilityLevel::(Eventual|Batched|Immediate)")
.expect("valid regex"),
// YAML: durability: "eventual" | "batched" | "immediate"
yaml_durability: Regex::new(
r#"(?i)durability\s*:\s*["']?(eventual|batched|immediate)["']?"#
r#"(?i)durability\s*:\s*["']?(eventual|batched|immediate)["']?"#,
)
.expect("valid regex"),
// TOML: fsync_strategy = "none" | "batched" | "immediate"
toml_fsync: Regex::new(
r#"(?i)fsync_strategy\s*=\s*["']?(none|batched|immediate)["']?"#
r#"(?i)fsync_strategy\s*=\s*["']?(none|batched|immediate)["']?"#,
)
.expect("valid regex"),
// Batched with parameters: DurabilityLevel::batched_with(max_writes, max_duration)
batched_pattern: Regex::new(
r"DurabilityLevel::batched(?:_with)?\("
)
.expect("valid regex"),
batched_pattern: Regex::new(r"DurabilityLevel::batched(?:_with)?\(")
.expect("valid regex"),
}
}
@ -92,13 +88,7 @@ impl Extractor for DurabilityConfigExtractor {
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Yaml,
Language::Toml,
Language::Json,
]
&[Language::Rust, Language::Go, Language::Yaml, Language::Toml, Language::Json]
}
fn extract(
@ -237,12 +227,8 @@ mod tests {
DurabilityLevel::Batched { max_writes: 100, max_duration: Duration::from_secs(1) }
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
assert_eq!(claims.len(), 1);
if let ObjectValue::Text(ref value) = claims[0].value {
@ -259,12 +245,8 @@ mod tests {
let guard = FsyncGuard::new(file, path, DurabilityLevel::Immediate);
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/guard.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/guard.rs");
assert_eq!(claims.len(), 1);
if let ObjectValue::Text(ref value) = claims[0].value {
@ -307,12 +289,8 @@ fsync_strategy = "none"
max_file_size = 104857600
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Toml,
"config.toml",
);
let claims =
extractor.extract(&["config".to_string()], content, Language::Toml, "config.toml");
assert_eq!(claims.len(), 1);
if let ObjectValue::Text(ref value) = claims[0].value {
@ -329,12 +307,8 @@ max_file_size = 104857600
let level = DurabilityLevel::batched_with(50, Duration::from_millis(100));
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/journal.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/journal.rs");
assert_eq!(claims.len(), 1);
if let ObjectValue::Text(ref value) = claims[0].value {
@ -352,12 +326,8 @@ max_file_size = 104857600
.with_durability(DurabilityLevel::Eventual);
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/wal_test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/wal_test.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5); // Test file gets reduced confidence
@ -374,12 +344,8 @@ max_file_size = 104857600
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/config.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs");
assert_eq!(claims.len(), 2);
// Should detect both eventual and immediate

View File

@ -37,16 +37,12 @@ impl ImportGraphExtractor {
Self {
// Matches: use tokio::runtime::Runtime;
// Captures the root crate name
use_statement: Regex::new(
r"^\s*(?:pub\s+)?use\s+([a-zA-Z_][a-zA-Z0-9_]*)"
)
.expect("valid regex"),
use_statement: Regex::new(r"^\s*(?:pub\s+)?use\s+([a-zA-Z_][a-zA-Z0-9_]*)")
.expect("valid regex"),
// For grouped imports: use tokio::{...};
use_group: Regex::new(
r"^\s*(?:pub\s+)?use\s+([a-zA-Z_][a-zA-Z0-9_]*)::\{"
)
.expect("valid regex"),
use_group: Regex::new(r"^\s*(?:pub\s+)?use\s+([a-zA-Z_][a-zA-Z0-9_]*)::\{")
.expect("valid regex"),
}
}
@ -163,9 +159,8 @@ use std::sync::Arc;
assert_eq!(claims.len(), 3);
// Check that we captured the right crates
let crate_names: Vec<_> = claims.iter()
.filter_map(|c| c.concept_path.split('/').last())
.collect();
let crate_names: Vec<_> =
claims.iter().filter_map(|c| c.concept_path.split('/').last()).collect();
assert!(crate_names.contains(&"tokio"));
assert!(crate_names.contains(&"serde"));
@ -199,12 +194,8 @@ use super::common;
use self::internal;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
// Should not create claims for crate/super/self
assert_eq!(claims.len(), 0);
@ -219,12 +210,8 @@ use tokio::sync::Mutex;
use tokio::time::sleep;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
// Should only create one claim for "tokio" even though it's imported 3 times
assert_eq!(claims.len(), 1);
@ -238,12 +225,8 @@ use tokio::time::sleep;
use tokio::runtime::Runtime;
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/wallet_test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/wallet_test.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5); // Test file gets reduced confidence

View File

@ -94,9 +94,9 @@ mod tls_verify;
mod tls_version;
mod traits;
mod unreal_config;
mod unsafe_atomic;
mod unreal_cpp;
mod unreal_performance;
mod unsafe_atomic;
mod unvalidated_redirects;
mod weak_crypto;
mod weak_password;

View File

@ -37,16 +37,11 @@ impl UnsafeAtomicExtractor {
pub fn new() -> Self {
Self {
// Ordering::SeqCst, Ordering::Relaxed, etc.
ordering_pattern: Regex::new(
r"Ordering::(SeqCst|Acquire|Release|AcqRel|Relaxed)"
)
.expect("valid regex"),
ordering_pattern: Regex::new(r"Ordering::(SeqCst|Acquire|Release|AcqRel|Relaxed)")
.expect("valid regex"),
// unsafe keyword (blocks or functions)
unsafe_keyword: Regex::new(
r"\b(unsafe)\s*(\{|fn)"
)
.expect("valid regex"),
unsafe_keyword: Regex::new(r"\b(unsafe)\s*(\{|fn)").expect("valid regex"),
}
}
@ -130,7 +125,10 @@ impl Extractor for UnsafeAtomicExtractor {
line: 1,
matched_text: format!("{} unsafe blocks/functions", unsafe_count),
confidence: confidence * 0.9, // Slightly lower as this is a summary
description: format!("File contains {} unsafe block(s) or function(s)", unsafe_count),
description: format!(
"File contains {} unsafe block(s) or function(s)",
unsafe_count
),
});
}
@ -159,8 +157,8 @@ self.balance.store(new_balance, Ordering::SeqCst);
// Should have one claim for SeqCst (deduplicated)
assert!(claims.iter().any(|c| {
c.concept_path.contains("atomics/ordering") &&
c.value == ObjectValue::Text("SeqCst".to_string())
c.concept_path.contains("atomics/ordering")
&& c.value == ObjectValue::Text("SeqCst".to_string())
}));
}
@ -173,17 +171,12 @@ let b = atomic.load(Ordering::Relaxed);
atomic.store(x, Ordering::Release);
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/sync.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/sync.rs");
// Should have 3 distinct ordering claims (Acquire, Relaxed, Release)
let ordering_claims: Vec<_> = claims.iter()
.filter(|c| c.concept_path.contains("ordering"))
.collect();
let ordering_claims: Vec<_> =
claims.iter().filter(|c| c.concept_path.contains("ordering")).collect();
assert_eq!(ordering_claims.len(), 3);
}
@ -197,12 +190,8 @@ unsafe {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
// Should have one unsafe count claim
let unsafe_claim = claims.iter().find(|c| c.concept_path.contains("unsafe/count"));
@ -219,12 +208,8 @@ unsafe fn read_msr(reg: u32) -> u64 {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/msr.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/msr.rs");
let unsafe_claim = claims.iter().find(|c| c.concept_path.contains("unsafe"));
assert!(unsafe_claim.is_some());
@ -247,12 +232,8 @@ fn bar() {
}
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/lib.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/lib.rs");
let unsafe_claim = claims.iter().find(|c| c.concept_path.contains("unsafe/count")).unwrap();
assert_eq!(unsafe_claim.value, ObjectValue::Number(3.0)); // 1 fn + 2 blocks
@ -265,12 +246,8 @@ fn bar() {
unsafe { test_something(); }
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::Rust,
"src/test.rs",
);
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/test.rs");
assert!(!claims.is_empty());
// Confidence should be reduced for test files
@ -318,10 +295,8 @@ impl Wallet {
);
// Should detect SeqCst ordering (all wallet ops use it consistently)
assert!(claims.iter().any(|c|
c.concept_path.contains("ordering") &&
c.value == ObjectValue::Text("SeqCst".to_string())
));
assert!(claims.iter().any(|c| c.concept_path.contains("ordering")
&& c.value == ObjectValue::Text("SeqCst".to_string())));
// Should NOT have unsafe claims (no unsafe code)
assert!(!claims.iter().any(|c| c.concept_path.contains("unsafe")));

View File

@ -61,6 +61,7 @@ pub async fn scan(
sync: false,
file_source: aphoria::FileSource::All,
benchmark: false,
show_claims: false,
};
// Execute scan