Phase 5C (Index Persistence) implementation: - PersistentVectorIndex with hot/cold architecture - Hot: in-memory HNSW for recent vectors - Cold: memory-mapped HNSW loaded from disk - Background builder for WAL replay and atomic swap - BLAKE3 integrity verification - PersistentVisualIndex with checkpoint persistence - BkTreeSnapshot with rkyv serialization - CRC32C corruption detection - Atomic write pattern (temp → fsync → rename) - Key codec additions for vector index metadata - Split large files into modules (<500 lines each) - battery_pre_sentinel.rs → battery/ directory - visual_index.rs → visual_index/ directory - persistent.rs → persistent/ directory - Refactored ingest worker tests for clarity - Updated roadmap to mark Phase 5 complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
13 KiB
Aphoria Roadmap
Phase 0: StemeDB Foundation
Tracked in: roadmap.md § 5D. Concept Hierarchy
Changes to the core database that Aphoria depends on. These ship before the CLI and are tracked in the main StemeDB roadmap as Phase 5D.
| Aphoria Phase 0 | StemeDB Phase 5D | Status |
|---|---|---|
| 0.1 ConceptPath Type | 5D.1 ConceptPath Type | ⬜ |
| 0.2 ConceptPath in Assertion | (implicit in 5D.1) | ⬜ |
| 0.3 Hierarchical Index | 5D.4 Hierarchical Query | ⬜ |
| 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | ⬜ |
| 0.5 Source Class Inference | 5D.6 Source Class Inference | ⬜ |
| 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | ⬜ |
Spec: docs/specs/concept-hierarchy.md
Phase 1: Authoritative Corpus
Before Aphoria can find conflicts, Episteme needs the authoritative sources to conflict against.
1.1 RFC Ingester
A CLI tool (or ingestion module) that:
- Fetches RFC text from
rfc-editor.org(text format, no PDF parsing needed) - Extracts normative statements (MUST, MUST NOT, SHOULD, SHALL per RFC 2119)
- Maps each statement to a ConceptPath:
rfc://{number}/{topic}/{claim} - Ingests as Tier 0 assertions
Start with a curated list of security-relevant RFCs:
| RFC | Topic |
|---|---|
| 7519 | JWT |
| 6749 | OAuth 2.0 |
| 6750 | Bearer tokens |
| 8446 | TLS 1.3 |
| 7525 | TLS best practices |
| 6238 | TOTP |
| 7617 | HTTP Basic Auth |
| 9110 | HTTP Semantics |
1.2 OWASP Ingester
Parse OWASP Cheat Sheets (markdown source on GitHub):
- Extract each recommendation as a claim
- Map to
owasp://cheatsheet/{topic}/{claim} - Ingest as Tier 1 assertions
Priority cheat sheets: Authentication, JWT, TLS, Secrets Management, Input Validation, Session Management.
1.3 Vendor Docs (Manual Bootstrap)
For v1, manually curate a small set of vendor doc claims:
- Postgres connection pool recommendations
- Redis timeout defaults
- Common HTTP client library defaults (reqwest, hyper, net/http)
These are vendor://{product}/{topic}/{claim} at Tier 2.
This doesn't need to be exhaustive. It needs to cover the claims that Aphoria's extractors will actually find in code.
Phase 2: CLI Core
The Aphoria binary itself.
2.1 Project Walker
Input: a project root path. Output: a list of files to scan, each tagged with:
- Language (rust, go, python, typescript, yaml, toml, json)
- ConceptPath prefix derived from directory structure
crates/citadeldb/src/auth/jwt.rs
→ language: rust
→ prefix: code://rust/citadeldb/auth/jwt
Normalization rules:
- Strip
src/,lib/,pkg/,internal/(language boilerplate) - Strip
crates/,packages/,apps/(monorepo wrappers) - Map
config/,deploy/,infra/tocode://config/{project}/... - File extension determines language, not directory
2.2 Extractors
Each extractor is a module that:
- Takes a file path + content + language
- Returns a
Vec<ExtractedClaim>
Ship these extractors in v1:
| Extractor | What it finds | Languages |
|---|---|---|
tls_verify |
TLS certificate verification disabled | rust, go, python, js/ts |
jwt_config |
JWT validation settings (aud, exp, alg) | rust, go, python, js/ts |
hardcoded_secrets |
Credentials in source (not .env) | all |
timeout_config |
HTTP/DB/Redis timeout values | all (config files) |
dep_versions |
Known-vulnerable dependency versions | Cargo.toml, go.mod, package.json, requirements.txt |
cors_config |
CORS allow-origin settings | rust, go, js/ts |
rate_limit |
Rate limiting disabled or unreasonable | rust, go, js/ts |
Extractors use regex + AST patterns, not LLMs. Each extractor declares:
- The patterns it searches for
- The ConceptPath leaf it maps to
- The predicate (e.g.,
config_value,enabled,version) - How to extract the ObjectValue from the match
2.3 Ingestion Bridge
Connect extractor output to the Episteme ingestion pipeline:
ExtractedClaim {
path: code://rust/citadeldb/auth/jwt/audience_validation
predicate: "enabled"
value: Boolean(false)
source_location: "src/auth/jwt.rs:47"
confidence: 1.0 // regex match, not heuristic
}
↓
Assertion {
subject: ConceptPath::parse("code://rust/citadeldb/auth/jwt/audience_validation")
predicate: "enabled"
object: ObjectValue::Boolean(false)
source_class: SourceClass::Expert // inferred from code:// scheme
source_hash: blake3(file_content)
source_metadata: { "file": "src/auth/jwt.rs", "line": 47 }
confidence: 1.0
lifecycle: LifecycleStage::Approved // code is deployed, it's a fact about the code
}
The bridge handles:
- ConceptPath construction from extractor output
- Source hash computation (BLAKE3 of the file at scan time)
- Source metadata encoding (file path, line number, extraction method)
- Signing with the Aphoria agent's keypair
2.4 Conflict Query
After ingestion, query Episteme for each extracted concept:
for claim in extracted_claims {
let results = query_engine.query(Query {
subject: Some(claim.path.to_string()),
resolve_aliases: true,
hierarchical: false,
lens: Some("skeptic"),
..Default::default()
});
if results.conflict_score > threshold {
report.add_conflict(claim, results);
}
}
The Skeptic lens returns all claims for the concept across all aliased paths, with a conflict score. If the code claim (Tier 3) contradicts an RFC claim (Tier 0), the conflict score will be high because of the tier spread.
2.5 Report Output
$ aphoria scan ./citadeldb --format table
┌──────────────────────────────────────────────────────────────────────┐
│ Aphoria Report: citadeldb │
│ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │
├──────────┬───────────────────────────────────────┬──────────┬───────┤
│ Verdict │ Concept │ Score │ Tier │
├──────────┼───────────────────────────────────────┼──────────┼───────┤
│ BLOCK │ auth/jwt/audience_validation │ 0.92 │ 0↔3 │
│ BLOCK │ net/tls/cert_verification │ 0.87 │ 1↔3 │
│ FLAG │ http/timeout │ 0.54 │ 2↔3 │
└──────────┴───────────────────────────────────────┴──────────┴───────┘
Details:
BLOCK code://rust/citadeldb/auth/jwt/audience_validation
Your code: aud validation disabled (src/auth/jwt.rs:47)
RFC 7519: aud validation MUST be enabled (Tier 0)
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
BLOCK code://rust/citadeldb/net/tls/cert_verification
Your code: verify = false (src/net/client.rs:23)
OWASP: verification required (Tier 1)
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
FLAG code://rust/citadeldb/http/timeout
Your code: timeout = 0 (infinite) (config/production.yaml:8)
reqwest: default timeout 30s (Tier 2)
Action: Review recommended
Output formats: table (default), json, sarif (for CI integration), markdown.
2.6 Acknowledge Command
$ aphoria ack code://rust/citadeldb/auth/jwt/audience_validation \
--reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003."
This creates a new Assertion:
- Subject:
internal://decision/citadeldb/auth/jwt/audience_validation - Predicate:
deviation_accepted - Object: Text with the reason
- SourceClass: Expert (Tier 3)
- Aliased to:
code://rust/citadeldb/auth/jwt/audience_validation
The conflict still exists in Episteme, but the acknowledgment is recorded. Next scan, the conflict still shows but with context: "Acknowledged by [agent] on [date]: [reason]." The Skeptic lens sees the acknowledgment as an additional claim in the space.
Phase 3: Skill Integration
3.1 Claude Code Skill
A /aphoria skill that wraps the CLI:
/aphoria scan Scan current project, report conflicts
/aphoria scan --fix Scan and offer to fix each conflict
/aphoria ack <path> Acknowledge a conflict with a reason
/aphoria status Show current conflict summary
/aphoria diff Show new conflicts since last scan
The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session.
3.2 Agent Pre-Flight Hook
A Claude Code hook that runs Aphoria before certain operations:
{
"hooks": {
"pre-commit": "aphoria scan --format sarif --exit-code",
"pre-deploy": "aphoria scan --strict --exit-code"
}
}
--exit-code returns non-zero if any BLOCK verdicts exist, preventing the commit or deploy.
3.3 Alias Suggestion Workflow
When Aphoria scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts:
New concept detected: code://rust/newproject/auth/jwt/audience_validation
Suggested alias:
→ rfc://7519/jwt/audience_validation (Tier 0, RFC 7519 Section 4.1.3)
Accept? [y/n/defer]
Accepting creates the alias. Deferring flags it for later review. Rejecting records that these are intentionally different concepts.
Phase 4: CI Integration
4.1 GitHub Action
- name: Aphoria Scan
uses: orchard9/aphoria-action@v1
with:
episteme-url: ${{ secrets.EPISTEME_URL }}
fail-on: block
format: sarif
Publishes SARIF results to GitHub Security tab. BLOCK verdicts fail the check. FLAG verdicts appear as warnings.
4.2 PR Comment Bot
On pull request, Aphoria scans the diff (not the whole project) and comments:
## Aphoria Report
This PR introduces 1 new conflict:
| File | Conflict | Score |
|------|----------|-------|
| src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 |
Run `aphoria ack` to acknowledge, or fix before merge.
4.3 Baseline Mode
For existing projects with many conflicts, aphoria baseline records the current state. Subsequent scans only report new conflicts. This prevents the "500 warnings so we ignore all of them" problem.
$ aphoria baseline
Baseline recorded: 12 existing conflicts frozen.
Future scans will only report new conflicts.
Phase 5: Research Agent Loop
5.1 Gap Detection
When Aphoria extracts a claim and no authoritative source exists for that concept, log it as a gap:
GAP: code://rust/citadeldb/cache/redis/max_memory_policy
No authoritative source found for redis/max_memory_policy
Seen in 3 projects
5.2 Research Agent Trigger
When a gap is seen across N projects (configurable, default 3), dispatch a research agent:
- Agent searches for authoritative documentation on
redis max_memory_policy - Finds Redis official docs
- Extracts normative claims: "default is
noeviction, recommendedallkeys-lrufor cache use cases" - Ingests as
vendor://redis/cache/max_memory_policyat Tier 2 - Future Aphoria scans now have something to conflict against
5.3 Community Corpus Contributions
Users who run Aphoria can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate:
- "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries
- "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept
- "This TLS pattern is always a real bug" → elevate the default threshold
Milestone Summary
| Phase | Deliverable | Depends On |
|---|---|---|
| 0 | ConceptPath in StemeDB | concept-hierarchy spec |
| 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 |
| 2 | Aphoria CLI (scan, report, ack) | Phase 0, Phase 1 |
| 3 | Claude Code skill + hooks | Phase 2 |
| 4 | CI integration (GitHub Action, PR bot) | Phase 2 |
| 5 | Research agent loop | Phase 2, Phase 4 (gap data) |
Phase 0 and Phase 1 can run in parallel — the corpus ingestion uses the ConceptPath types as they're built. Phase 2 is the critical path. Everything after Phase 2 is distribution and flywheel.