## Problem CLI-created community corpus items (tier 3) were stored correctly but invisible via API queries. Two issues blocked discoverability: 1. **Prefix mismatch**: API hardcoded 'community://pattern/' for aggregated patterns, but CLI creates 'community://rust/http/...' URIs 2. **Query parameter parsing**: Axum's default parser doesn't support bracket notation (?sources[]=value) used by the dashboard Result: 0/22 CLI-created items were queryable. ## Solution ### Fix 1: Broaden Community Prefix - Changed: 'community://pattern/' → 'community://' in corpus handler - Impact: Now matches both aggregated patterns AND CLI-created items - Backward compatible: Broader prefix includes narrower results ### Fix 2: Add QsQuery Extractor - Added: serde_qs dependency + custom QsQuery extractor - Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b) - Compatible: Works with JavaScript URLSearchParams standard - Tested: 3 new unit tests for extractor behavior ## Verification - ✅ All 22 CLI-created community items now queryable (was 0) - ✅ Source filtering works: community (22), RFC (2), vendor (5) - ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24 - ✅ All 89 API tests pass + 3 new extractor tests - ✅ Clippy clean (0 warnings) - ✅ No regressions in existing functionality ## Files Changed - crates/stemedb-api/Cargo.toml: Add serde_qs dependency - crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines) - crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix - crates/stemedb-api/src/lib.rs: Export extractors module Also includes: Scale-adaptive thresholds, wiki corpus extraction, documentation updates, and dashboard UI improvements from prior work. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
414 lines
8.7 KiB
Markdown
414 lines
8.7 KiB
Markdown
# Aphoria Configuration Reference
|
|
|
|
Complete reference for `aphoria.toml` configuration options.
|
|
|
|
---
|
|
|
|
## File Location
|
|
|
|
`.aphoria/config.toml` - Created by `aphoria init` in your project root.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
**Minimal configuration (defaults work for most projects):**
|
|
```toml
|
|
[project]
|
|
name = "my-project"
|
|
```
|
|
|
|
That's it! Aphoria uses sensible defaults for everything else.
|
|
|
|
---
|
|
|
|
## Database Configuration
|
|
|
|
### Per-Project Databases (Default)
|
|
|
|
**New in 2026-02-09:** Each project now has its own isolated database by default.
|
|
|
|
```toml
|
|
[episteme]
|
|
# Project database (observations from this project)
|
|
# Default: .aphoria/db (project-local)
|
|
data_dir = ".aphoria/db"
|
|
|
|
# Corpus database (aggregated patterns across all projects)
|
|
# Default: ~/.aphoria/corpus-db (home-based, shared)
|
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
|
```
|
|
|
|
**Architecture:**
|
|
```
|
|
~/projects/
|
|
├── maxwell/
|
|
│ └── .aphoria/db/ # Maxwell's observations
|
|
├── billing-api/
|
|
│ └── .aphoria/db/ # Billing API's observations
|
|
└── ~/.aphoria/
|
|
└── corpus-db/ # Shared corpus (all projects)
|
|
```
|
|
|
|
### Legacy Shared Mode
|
|
|
|
To use the old behavior (single shared database for all projects):
|
|
|
|
```toml
|
|
[episteme]
|
|
data_dir = "~/.aphoria/db"
|
|
```
|
|
|
|
### Disable Corpus Aggregation
|
|
|
|
To disable cross-project pattern aggregation:
|
|
|
|
```toml
|
|
[episteme]
|
|
corpus_data_dir = null
|
|
```
|
|
|
|
---
|
|
|
|
## Full Configuration Example
|
|
|
|
```toml
|
|
[project]
|
|
name = "my-project"
|
|
language = "rust"
|
|
|
|
[episteme]
|
|
# Per-project database (default: .aphoria/db)
|
|
data_dir = ".aphoria/db"
|
|
|
|
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
|
|
|
# Optional: Remote Episteme URL (future feature)
|
|
# url = "https://episteme.example.com"
|
|
|
|
[thresholds]
|
|
block = 0.7 # Conflict score at or above → BLOCK verdict
|
|
flag = 0.4 # Conflict score at or above → FLAG verdict
|
|
|
|
[extractors]
|
|
enabled = [
|
|
"tls_verify",
|
|
"tls_version",
|
|
"jwt_config",
|
|
"hardcoded_secrets",
|
|
"timeout_config",
|
|
"dep_versions",
|
|
"cors_config",
|
|
"durability_config",
|
|
"rate_limit",
|
|
# ... (42 total extractors, see cli-reference.md for full list)
|
|
]
|
|
disabled = []
|
|
|
|
[extractors.timeout_config]
|
|
min_reasonable_ms = 1000
|
|
max_reasonable_ms = 300_000
|
|
|
|
[extractors.dep_versions]
|
|
enabled = false # OPT-IN: Disabled by default to reduce noise
|
|
advisory_db = "~/.aphoria/advisory-db"
|
|
|
|
[extractors.entropy]
|
|
min_entropy = 4.5
|
|
min_charset_variety = 0.4
|
|
min_length = 20
|
|
max_length = 200
|
|
|
|
[extractors.inline_markers]
|
|
enabled = false # OPT-IN: Disabled by default
|
|
sync_to_pending = true # Auto-sync when enabled
|
|
|
|
[scan]
|
|
exclude = [
|
|
"target/",
|
|
"node_modules/",
|
|
".git/",
|
|
"vendor/",
|
|
]
|
|
max_file_size = 1_048_576 # 1MB
|
|
include_tests = false
|
|
|
|
[aliases]
|
|
auto_suggest = true
|
|
auto_accept_tier0 = true
|
|
auto_create_aliases = true
|
|
|
|
[corpus]
|
|
cache_dir = "~/.cache/aphoria" # Or system cache dir
|
|
include_rfc = true
|
|
include_owasp = true
|
|
include_vendor = true
|
|
use_community = true
|
|
aggregation_enabled = true
|
|
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
|
|
|
# Optional: Override adaptive thresholds
|
|
# adaptive_thresholds = { micro_floor = 2, small_floor = 5 }
|
|
|
|
[hosted]
|
|
# Optional: Hosted mode for team aggregation
|
|
# url = "https://aphoria-hosted.example.com"
|
|
# project_id = "billing-api"
|
|
# team_id = "platform-team"
|
|
# sync_mode = "push_only" # or "bidirectional"
|
|
# max_retries = 3
|
|
# retry_delay_ms = 1000
|
|
# api_key_env = "APHORIA_API_KEY"
|
|
|
|
[community]
|
|
enabled = false # CRITICAL: Opt-in only
|
|
anonymize = true # CRITICAL: Privacy by default
|
|
exclude = []
|
|
include = []
|
|
min_confidence = 0.8
|
|
|
|
[llm]
|
|
enabled = false
|
|
provider = "gemini"
|
|
model = "gemini-3-flash-preview"
|
|
api_key_env = "GEMINI_API_KEY"
|
|
max_tokens_per_scan = 50000
|
|
max_tokens_per_file = 4000
|
|
cache_responses = true
|
|
timeout_secs = 60
|
|
high_value_only = true
|
|
min_confidence = 0.7
|
|
|
|
[learning]
|
|
enabled = false
|
|
store = "local"
|
|
min_confidence = 0.7
|
|
prune_after_days = 90
|
|
max_patterns = 10_000
|
|
|
|
[learning.promotion]
|
|
min_projects = 5
|
|
min_confidence = 0.8
|
|
auto_promote = false
|
|
output_dir = ".aphoria/extractors/learned"
|
|
require_review = true
|
|
|
|
[autonomous]
|
|
# CRITICAL: Opt-in only - kill switch defaults to off
|
|
enabled = false
|
|
min_confidence = 0.95
|
|
min_projects = 10
|
|
require_zero_failures = true
|
|
require_zero_warnings = true
|
|
audit_log = true
|
|
# audit_dir defaults to ~/.aphoria/audit/
|
|
```
|
|
|
|
---
|
|
|
|
## Key Sections
|
|
|
|
### Project
|
|
|
|
Basic project metadata.
|
|
|
|
```toml
|
|
[project]
|
|
name = "my-project" # Optional: auto-detected from directory name
|
|
language = "rust" # Optional: auto-detected from file extensions
|
|
```
|
|
|
|
### Episteme
|
|
|
|
Database and storage configuration.
|
|
|
|
```toml
|
|
[episteme]
|
|
data_dir = ".aphoria/db" # Per-project observations
|
|
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus (optional)
|
|
url = null # Remote Episteme (future)
|
|
```
|
|
|
|
**Key Options:**
|
|
- `data_dir` - Where to store this project's observations
|
|
- Default: `.aphoria/db` (project-local)
|
|
- Override to `~/.aphoria/db` for legacy shared mode
|
|
- `corpus_data_dir` - Where to store aggregated patterns
|
|
- Default: `~/.aphoria/corpus-db` (home-based, shared)
|
|
- Set to `null` to disable cross-project aggregation
|
|
|
|
### Thresholds
|
|
|
|
Conflict severity thresholds.
|
|
|
|
```toml
|
|
[thresholds]
|
|
block = 0.7 # High severity (blocks CI)
|
|
flag = 0.4 # Medium severity (warns)
|
|
```
|
|
|
|
Conflict scores range from 0.0 (no conflict) to 1.0 (total conflict).
|
|
|
|
### Extractors
|
|
|
|
Control which extractors run.
|
|
|
|
```toml
|
|
[extractors]
|
|
enabled = ["tls_verify", "jwt_config", ...]
|
|
disabled = []
|
|
```
|
|
|
|
See [cli-reference.md](cli-reference.md) for the full list of 42 available extractors.
|
|
|
|
### Scan
|
|
|
|
Control which files are scanned.
|
|
|
|
```toml
|
|
[scan]
|
|
exclude = ["target/", "node_modules/"]
|
|
max_file_size = 1_048_576 # 1MB
|
|
include_tests = false
|
|
```
|
|
|
|
You can also use `.aphoriaignore` files (gitignore syntax).
|
|
|
|
### Corpus
|
|
|
|
Control corpus building and thresholds.
|
|
|
|
```toml
|
|
[corpus]
|
|
include_rfc = true
|
|
include_owasp = true
|
|
include_vendor = true
|
|
use_community = true
|
|
aggregation_enabled = true
|
|
use_legacy_thresholds = false # Use adaptive thresholds
|
|
```
|
|
|
|
**Scale-Adaptive Thresholds (default):**
|
|
|
|
Automatically adjusts promotion thresholds based on team size:
|
|
- Micro (1-5 projects): Patterns visible with 2/3 adoption
|
|
- Small (6-25 projects): Patterns visible with 5+ projects
|
|
- Enterprise (501+): Unchanged behavior
|
|
|
|
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
|
|
|
**Legacy Thresholds:**
|
|
|
|
```toml
|
|
[corpus]
|
|
use_legacy_thresholds = true
|
|
```
|
|
|
|
Fixed thresholds regardless of team size (old behavior).
|
|
|
|
### Hosted Mode
|
|
|
|
For team collaboration and pattern sharing.
|
|
|
|
```toml
|
|
[hosted]
|
|
url = "https://aphoria.example.com"
|
|
project_id = "billing-api"
|
|
team_id = "platform-team"
|
|
sync_mode = "push_only"
|
|
```
|
|
|
|
Requires hosted Aphoria server (future feature).
|
|
|
|
### Community Sharing
|
|
|
|
**CRITICAL:** Opt-in only. Anonymous pattern contribution.
|
|
|
|
```toml
|
|
[community]
|
|
enabled = false # Must explicitly opt-in
|
|
anonymize = true # Project names are wildcarded
|
|
```
|
|
|
|
When enabled with `--sync`, observations are anonymized and shared with the community corpus.
|
|
|
|
**Privacy Guarantees:**
|
|
- Project names are wildcarded in paths
|
|
- No file paths, line numbers, or source code
|
|
- Only pattern aggregates (subject + predicate + value)
|
|
|
|
### LLM Extraction
|
|
|
|
Use LLMs (Gemini) for semantic claim detection.
|
|
|
|
```toml
|
|
[llm]
|
|
enabled = false # OPT-IN
|
|
provider = "gemini"
|
|
model = "gemini-3-flash-preview"
|
|
api_key_env = "GEMINI_API_KEY"
|
|
```
|
|
|
|
Requires API key in environment.
|
|
|
|
### Learning & Autonomous Promotion
|
|
|
|
**CRITICAL:** Both require explicit opt-in.
|
|
|
|
```toml
|
|
[learning]
|
|
enabled = false # Pattern learning from scans
|
|
|
|
[autonomous]
|
|
enabled = false # Auto-promotion to extractors (kill switch)
|
|
```
|
|
|
|
See [vision-gaps.md](vision-gaps.md) for implementation status.
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
Aphoria respects these environment variables:
|
|
|
|
| Variable | Purpose | Default |
|
|
|----------|---------|---------|
|
|
| `APHORIA_API_KEY` | Hosted mode API key | None (required if hosted.enabled) |
|
|
| `GEMINI_API_KEY` | Gemini API key | None (required if llm.enabled) |
|
|
| `STEMEDB_DB_DIR` | Override `data_dir` | `.aphoria/db` |
|
|
| `APHORIA_CONFIG` | Config file path | `.aphoria/config.toml` |
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### From Old Home-Based Database
|
|
|
|
**Before (legacy):**
|
|
```toml
|
|
# Default in old versions: ~/.aphoria/db
|
|
```
|
|
|
|
**After (new default):**
|
|
```toml
|
|
# Default now: ./.aphoria/db (per-project)
|
|
```
|
|
|
|
**To keep legacy behavior:**
|
|
```toml
|
|
[episteme]
|
|
data_dir = "~/.aphoria/db"
|
|
```
|
|
|
|
No migration needed - just set `data_dir` to old path.
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- [CLI Reference](cli-reference.md) - All commands and flags
|
|
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Threshold configuration
|
|
- [Comparison Modes](comparison-modes.md) - Claim comparison operators
|
|
- [Vision Gaps](vision-gaps.md) - Implementation status
|