stemedb/applications/aphoria/uat/2026-02-03-citadel-scan-v1.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

263 lines
8.3 KiB
Markdown

# UAT Report: Citadel Scan
**Date:** 2026-02-03
**Target:** `/Users/jordanwashburn/Workspace/orchard9/citadel`
**Aphoria Version:** 0.1.0
**Status:** PASS
---
## Executive Summary
Ran Aphoria scan against the Citadel codebase (a production Rust monorepo with ~50 crates). Initial scan revealed critical precision issues with extractors producing 98% false positives. After bug fixes, precision improved to 100% with only 3 true positive findings remaining.
| Metric | Before Fixes | After Fixes |
|--------|--------------|-------------|
| Files scanned | 1,438 | 1,438 |
| Claims extracted | 1,259 | 1,259 |
| Conflicts reported | 168 | 3 |
| True positives | ~3 | 3 |
| False positives | ~165 | 0 |
| Precision | ~2% | **100%** |
| Scan time | 1.7s | 1.7s |
---
## Test Environment
### Target Codebase: Citadel
- **Type:** Rust workspace with ~50 crates
- **Domain:** Log aggregation and observability platform
- **Notable crates:**
- `citadel-auth`, `citadel-auth-db`, `citadel-auth-oidc` (authentication)
- `citadel-api` (HTTP API)
- `citadel-agent` (log collection agent)
- `ingestion-edge`, `tiered-storage` (data pipeline)
- `citadel-community` (Next.js frontend)
- **Languages:** Rust, TypeScript, YAML, TOML, JSON
### Scan Configuration
```bash
cargo run --release -- scan /Users/jordanwashburn/Workspace/orchard9/citadel
```
- Mode: Ephemeral (no persistence)
- Format: Table (default), JSON for analysis
---
## Bugs Discovered and Fixed
### Bug #1: Rate Limit Regex Matches Embedded Zeros (HIGH)
**Symptom:** 150 of 168 conflicts were rate_limit false positives.
**Root Cause:** The regex pattern `r"(?i)(rate_?limit|ratelimit).*(?:disabled|off|false|0|none|skip)"` matched any line containing "rate limit" followed by any `0` digit anywhere.
**Example False Positive:**
```rust
// Line: RateLimiter::new(10); // 10 req/min
// Matched because: "RateLimiter" contains "rate...limit", and "10" contains "0"
```
**Fix:** Changed pattern to require assignment context:
```rust
// Before
r"(?i)(rate_?limit|ratelimit).*(?:disabled|off|false|0|none|skip)"
// After
r"(?i)(rate_?limit|ratelimit)(?:_enabled)?\s*[:=]\s*(?:disabled|off|false|\b0\b|none|skip)"
```
**Key changes:**
- Added `(?:_enabled)?` to match `ratelimit_enabled = false`
- Required `\s*[:=]\s*` for assignment context
- Changed `0` to `\b0\b` (word boundary) to avoid matching `10`, `100`, etc.
**File:** `src/extractors/rate_limit.rs:55-58`
---
### Bug #2: Rate Limit Numeric Pattern Doesn't Handle Underscores (MEDIUM)
**Symptom:** `rateLimitPerTenant: 100_000` not detected as high limit.
**Root Cause:** Regex `\d+` doesn't match underscore-separated numbers common in YAML/Rust.
**Fix:**
1. Changed pattern to `[\d_]+` to capture underscored numbers
2. Added underscore stripping before parsing: `value_str.chars().filter(|c| *c != '_').collect()`
**File:** `src/extractors/rate_limit.rs:59-70, 143-146`
---
### Bug #3: JWT Audience Pattern Matches "audit" (MEDIUM)
**Symptom:** `audit_log_path.is_none()` flagged as "JWT audience validation disabled".
**Root Cause:** Pattern `aud.*None` matched `audit_log_path...None` because `aud` is a prefix of `audit`.
**Example False Positive:**
```rust
// Line: assert!(config.audit_log_path.is_none());
// Matched because: "audit" starts with "aud", followed eventually by "None"
```
**Fix:** Made pattern specific to JWT audience contexts:
```rust
// Before
r"(?i)(set_audience.*\[\]|validate_aud.*false|aud.*None|ValidateAudience.*false)"
// After
r"(?i)(set_audience\s*\(\s*(?:vec!)?\s*\[\s*\]\s*\)|validate_aud\w*\s*[:=]\s*false|\baud(?:ience)?\s*[:=]\s*None|ValidateAudience\s*[:=]\s*false)"
```
**Key changes:**
- Used `\baud(?:ience)?` with word boundary
- Required assignment context `[:=]`
**File:** `src/extractors/jwt_config.rs:40-47`
---
### Bug #4: JWT Signature Verify Pattern Too Broad (MEDIUM)
**Symptom:** `insecure_skip_verify = false` flagged as "signature verification disabled" when it's actually the SECURE setting.
**Root Cause:** Pattern `verify.*false` matched any line with "verify" followed by "false".
**Example False Positives:**
```rust
// Line: insecure_skip_verify = false (SECURE - not skipping!)
// Line: const [isVerifying, setIsVerifying] = useState(false) (React UI state)
```
**Fix:** Required specific `verify_signature` or `signature_verify` context:
```rust
// Before
r"(?i)(dangerous_insecure|skip_signature|verify.*false|RequireSignedTokens.*false)"
// After
r"(?i)(dangerous_insecure|skip_signature|(?:verify_signature|signature_verify)\s*[:=]\s*false|RequireSignedTokens\s*[:=]\s*false)"
```
**File:** `src/extractors/jwt_config.rs:60-69`
---
### Bug #5: JWT Algorithm None Matches Documentation (LOW)
**Symptom:** Documentation line `(no 'alg: none')` flagged when it's describing PREVENTION.
**Example False Positive:**
```rust
// Line: /// - Algorithm whitelist (no `alg: none`)
// This is documentation saying alg:none is NOT allowed (secure!)
```
**Fix:** Added post-processing filter to exclude prevention documentation:
```rust
let is_prevention_doc = lower_line.contains("no alg")
|| lower_line.contains("no `alg")
|| lower_line.contains("whitelist")
|| lower_line.contains("reject")
// ... etc
```
**File:** `src/extractors/jwt_config.rs:170-183`
---
## Regression Tests Added
| Test | Purpose | File |
|------|---------|------|
| `test_rate_limiter_constructor_not_flagged` | `RateLimiter::new(10)` not flagged | rate_limit.rs |
| `test_rate_limit_constant_not_flagged` | `RATE_LIMIT: u64 = 100` not flagged | rate_limit.rs |
| `test_rate_limit_per_tenant_not_flagged` | Underscore numbers handled | rate_limit.rs |
| `test_rate_limit_explicitly_zero` | `rate_limit = 0` IS flagged | rate_limit.rs |
| `test_rate_limit_documentation_not_flagged` | Doc comments not flagged | rate_limit.rs |
| `test_skip_verify_false_not_flagged` | `skip_verify = false` not flagged | jwt_config.rs |
| `test_ui_state_not_flagged` | `isVerifying` state not flagged | jwt_config.rs |
| `test_verify_release_not_flagged` | Non-JWT verify functions | jwt_config.rs |
| `test_actual_signature_verify_disabled` | True positives still caught | jwt_config.rs |
---
## Final Scan Results
### Command
```bash
cargo run --release -- scan /Users/jordanwashburn/Workspace/orchard9/citadel
```
### Output Summary
```
3 BLOCK, 0 FLAG, 0 PASS
```
### Remaining True Positives
| File | Line | Finding |
|------|------|---------|
| `tools/citadel-cli/src/commands/agent.rs` | 903 | API key hardcoded in source |
| `crates/citadel-cli/src/commands/query.rs` | 736 | API key hardcoded in source |
| `crates/citadel-agent/src/config.rs` | 101 | API key hardcoded in source |
**Analysis:** All 3 are example API keys in test code or documentation. They follow the pattern `ck_live_*` or `ck_test_*` which correctly matches the hardcoded secrets extractor. These are true positives (low severity) that can be:
1. Fixed by moving examples to environment variables
2. Acknowledged with `aphoria ack` if intentional for documentation
---
## Quality Gates Passed
| Check | Status |
|-------|--------|
| `cargo test` | 158 passed, 0 failed |
| `cargo clippy -- -D warnings` | 0 warnings |
| `cargo fmt --check` | Formatted |
---
## Known Limitations
1. **No TLS conflicts found:** Citadel's TLS configuration appears compliant, or patterns aren't matching their specific setup.
2. **Secrets extractor catches test keys:** Low-severity true positives in test/doc code. May want to add confidence reduction for `test` or `example` path segments.
3. **Algorithm none filter is heuristic:** The documentation filter uses keyword matching which could miss edge cases or over-filter.
---
## Recommendations
1. **Add path-aware confidence:** Reduce confidence for findings in `/tests/`, `/examples/`, or files containing "test" in the name.
2. **Consider YAML-aware extraction:** The current line-by-line regex approach could be improved with proper YAML/TOML parsing for structured configs.
3. **Expand test corpus:** Use Citadel findings to build a larger regression test suite.
4. **Document patterns:** Add ai-lookup entries explaining the regex patterns for future maintainers.
---
## Appendix: Conflict Categories
### Before Fixes
| Category | Count | % |
|----------|-------|---|
| rate_limit | 150 | 89% |
| jwt | 15 | 9% |
| secrets | 3 | 2% |
| **Total** | **168** | |
### After Fixes
| Category | Count | % |
|----------|-------|---|
| secrets | 3 | 100% |
| **Total** | **3** | |