stemedb/applications/aphoria/docs/reference/configuration.md
jml 9bfa626203 docs: reorganize documentation structure for clarity
Major documentation restructure to improve discoverability and reduce duplication.

## Changes

**Deleted (Archived/Consolidated)**:
- Removed duplicate getting started guides
- Archived outdated planning documents
- Consolidated corpus and configuration docs
- Removed obsolete vision/spec files (superseded by vision.md)
- Cleaned up scrapyard and old PDFs

**New Structure**:
- docs/about/ - Project overview and introduction
- docs/guides/ - User guides (moved from root)
- docs/specs/ - Technical specifications
- docs/sdk/ - SDK documentation (Go)
- docs/references/ - API references
- docs/archive/ - Archived historical docs
- applications/aphoria/docs/advanced/ - Advanced topics
- applications/aphoria/docs/reference/ - CLI reference
- applications/aphoria/docs/archive/ - Archived aphoria docs

**Updated**:
- README.md - New root README with clear navigation
- CONTRIBUTING.md - Contribution guidelines
- CLAUDE.md - Updated paths to new structure
- roadmap.md - Added recent completions

## Files Changed
- 57 files changed
- 1,977 insertions(+)
- 961 deletions(-)

**Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 07:33:40 +00:00

411 lines
8.6 KiB
Markdown

# Aphoria Configuration Reference
Complete reference for `aphoria.toml` configuration options.
---
## File Location
`.aphoria/config.toml` - Created by `aphoria init` in your project root.
---
## Quick Start
**Minimal configuration (defaults work for most projects):**
```toml
[project]
name = "my-project"
```
That's it! Aphoria uses sensible defaults for everything else.
---
## Database Configuration
### Per-Project Databases (Default)
**New in 2026-02-09:** Each project now has its own isolated database by default.
```toml
[episteme]
# Project database (observations from this project)
# Default: .aphoria/db (project-local)
data_dir = ".aphoria/db"
# Corpus database (aggregated patterns across all projects)
# Default: ~/.aphoria/corpus-db (home-based, shared)
corpus_data_dir = "~/.aphoria/corpus-db"
```
**Architecture:**
```
~/projects/
├── maxwell/
│ └── .aphoria/db/ # Maxwell's observations
├── billing-api/
│ └── .aphoria/db/ # Billing API's observations
└── ~/.aphoria/
└── corpus-db/ # Shared corpus (all projects)
```
### Legacy Shared Mode
To use the old behavior (single shared database for all projects):
```toml
[episteme]
data_dir = "~/.aphoria/db"
```
### Disable Corpus Aggregation
To disable cross-project pattern aggregation:
```toml
[episteme]
corpus_data_dir = null
```
---
## Full Configuration Example
```toml
[project]
name = "my-project"
language = "rust"
[episteme]
# Per-project database (default: .aphoria/db)
data_dir = ".aphoria/db"
# Shared corpus database (default: ~/.aphoria/corpus-db)
corpus_data_dir = "~/.aphoria/corpus-db"
# Optional: Remote Episteme URL (future feature)
# url = "https://episteme.example.com"
[thresholds]
block = 0.7 # Conflict score at or above → BLOCK verdict
flag = 0.4 # Conflict score at or above → FLAG verdict
[extractors]
enabled = [
"tls_verify",
"tls_version",
"jwt_config",
"hardcoded_secrets",
"timeout_config",
"dep_versions",
"cors_config",
"durability_config",
"rate_limit",
# ... (42 total extractors, see cli-reference.md for full list)
]
disabled = []
[extractors.timeout_config]
min_reasonable_ms = 1000
max_reasonable_ms = 300_000
[extractors.dep_versions]
enabled = false # OPT-IN: Disabled by default to reduce noise
advisory_db = "~/.aphoria/advisory-db"
[extractors.entropy]
min_entropy = 4.5
min_charset_variety = 0.4
min_length = 20
max_length = 200
[extractors.inline_markers]
enabled = false # OPT-IN: Disabled by default
sync_to_pending = true # Auto-sync when enabled
[scan]
exclude = [
"target/",
"node_modules/",
".git/",
"vendor/",
]
max_file_size = 1_048_576 # 1MB
include_tests = false
[aliases]
auto_suggest = true
auto_accept_tier0 = true
auto_create_aliases = true
[corpus]
cache_dir = "~/.cache/aphoria" # Or system cache dir
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
aggregation_enabled = true
use_legacy_thresholds = false # Use adaptive thresholds (default)
# Optional: Override adaptive thresholds
# adaptive_thresholds = { micro_floor = 2, small_floor = 5 }
[hosted]
# Optional: Hosted mode for team aggregation
# url = "https://aphoria-hosted.example.com"
# project_id = "billing-api"
# team_id = "platform-team"
# sync_mode = "push_only" # or "bidirectional"
# max_retries = 3
# retry_delay_ms = 1000
# api_key_env = "APHORIA_API_KEY"
[community]
enabled = false # CRITICAL: Opt-in only
anonymize = true # CRITICAL: Privacy by default
exclude = []
include = []
min_confidence = 0.8
[llm]
enabled = false
provider = "gemini"
model = "gemini-3-flash-preview"
api_key_env = "GEMINI_API_KEY"
max_tokens_per_scan = 50000
max_tokens_per_file = 4000
cache_responses = true
timeout_secs = 60
high_value_only = true
min_confidence = 0.7
[learning]
enabled = false
store = "local"
min_confidence = 0.7
prune_after_days = 90
max_patterns = 10_000
[learning.promotion]
min_projects = 5
min_confidence = 0.8
auto_promote = false
output_dir = ".aphoria/extractors/learned"
require_review = true
[autonomous]
# CRITICAL: Opt-in only - kill switch defaults to off
enabled = false
min_confidence = 0.95
min_projects = 10
require_zero_failures = true
require_zero_warnings = true
audit_log = true
# audit_dir defaults to ~/.aphoria/audit/
```
---
## Key Sections
### Project
Basic project metadata.
```toml
[project]
name = "my-project" # Optional: auto-detected from directory name
language = "rust" # Optional: auto-detected from file extensions
```
### Episteme
Database and storage configuration.
```toml
[episteme]
data_dir = ".aphoria/db" # Per-project observations
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus (optional)
url = null # Remote Episteme (future)
```
**Key Options:**
- `data_dir` - Where to store this project's observations
- Default: `.aphoria/db` (project-local)
- Override to `~/.aphoria/db` for legacy shared mode
- `corpus_data_dir` - Where to store aggregated patterns
- Default: `~/.aphoria/corpus-db` (home-based, shared)
- Set to `null` to disable cross-project aggregation
### Thresholds
Conflict severity thresholds.
```toml
[thresholds]
block = 0.7 # High severity (blocks CI)
flag = 0.4 # Medium severity (warns)
```
Conflict scores range from 0.0 (no conflict) to 1.0 (total conflict).
### Extractors
Control which extractors run.
```toml
[extractors]
enabled = ["tls_verify", "jwt_config", ...]
disabled = []
```
See [cli-reference.md](cli-reference.md) for the full list of 42 available extractors.
### Scan
Control which files are scanned.
```toml
[scan]
exclude = ["target/", "node_modules/"]
max_file_size = 1_048_576 # 1MB
include_tests = false
```
You can also use `.aphoriaignore` files (gitignore syntax).
### Corpus
Control corpus building and thresholds.
```toml
[corpus]
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
aggregation_enabled = true
use_legacy_thresholds = false # Use adaptive thresholds
```
**Scale-Adaptive Thresholds (default):**
Automatically adjusts promotion thresholds based on team size:
- Micro (1-5 projects): Patterns visible with 2/3 adoption
- Small (6-25 projects): Patterns visible with 5+ projects
- Enterprise (501+): Unchanged behavior
See [../advanced/scale-adaptive-thresholds.md](../advanced/scale-adaptive-thresholds.md) for details.
**Legacy Thresholds:**
```toml
[corpus]
use_legacy_thresholds = true
```
Fixed thresholds regardless of team size (old behavior).
### Hosted Mode
For team collaboration and pattern sharing.
```toml
[hosted]
url = "https://aphoria.example.com"
project_id = "billing-api"
team_id = "platform-team"
sync_mode = "push_only"
```
Requires hosted Aphoria server (future feature).
### Community Sharing
**CRITICAL:** Opt-in only. Anonymous pattern contribution.
```toml
[community]
enabled = false # Must explicitly opt-in
anonymize = true # Project names are wildcarded
```
When enabled with `--sync`, observations are anonymized and shared with the community corpus.
**Privacy Guarantees:**
- Project names are wildcarded in paths
- No file paths, line numbers, or source code
- Only pattern aggregates (subject + predicate + value)
### LLM Extraction
Use LLMs (Gemini) for semantic claim detection.
```toml
[llm]
enabled = false # OPT-IN
provider = "gemini"
model = "gemini-3-flash-preview"
api_key_env = "GEMINI_API_KEY"
```
Requires API key in environment.
### Learning & Autonomous Promotion
**CRITICAL:** Both require explicit opt-in.
```toml
[learning]
enabled = false # Pattern learning from scans
[autonomous]
enabled = false # Auto-promotion to extractors (kill switch)
```
---
## Environment Variables
Aphoria respects these environment variables:
| Variable | Purpose | Default |
|----------|---------|---------|
| `APHORIA_API_KEY` | Hosted mode API key | None (required if hosted.enabled) |
| `GEMINI_API_KEY` | Gemini API key | None (required if llm.enabled) |
| `STEMEDB_DB_DIR` | Override `data_dir` | `.aphoria/db` |
| `APHORIA_CONFIG` | Config file path | `.aphoria/config.toml` |
---
## Migration Guide
### From Old Home-Based Database
**Before (legacy):**
```toml
# Default in old versions: ~/.aphoria/db
```
**After (new default):**
```toml
# Default now: ./.aphoria/db (per-project)
```
**To keep legacy behavior:**
```toml
[episteme]
data_dir = "~/.aphoria/db"
```
No migration needed - just set `data_dir` to old path.
---
## See Also
- [CLI Reference](cli-reference.md) - All commands and flags
- [Scale-Adaptive Thresholds](../advanced/scale-adaptive-thresholds.md) - Threshold configuration
- [Comparison Modes](comparison-modes.md) - Claim comparison operators