jml bb0c33f8d3 fix(api): enable querying of CLI-created community corpus items

## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:

1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
   aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
   bracket notation (?sources[]=value) used by the dashboard

Result: 0/22 CLI-created items were queryable.

## Solution

### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results

### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior

## Verification
- ✅ All 22 CLI-created community items now queryable (was 0)
- ✅ Source filtering works: community (22), RFC (2), vendor (5)
- ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
- ✅ All 89 API tests pass + 3 new extractor tests
- ✅ Clippy clean (0 warnings)
- ✅ No regressions in existing functionality

## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module

Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-09 15:54:35 +00:00

8.7 KiB

Raw Blame History

Aphoria Configuration Reference

Complete reference for aphoria.toml configuration options.

File Location

.aphoria/config.toml - Created by aphoria init in your project root.

Quick Start

Minimal configuration (defaults work for most projects):

[project]
name = "my-project"

That's it! Aphoria uses sensible defaults for everything else.

Database Configuration

Per-Project Databases (Default)

New in 2026-02-09: Each project now has its own isolated database by default.

[episteme]
# Project database (observations from this project)
# Default: .aphoria/db (project-local)
data_dir = ".aphoria/db"

# Corpus database (aggregated patterns across all projects)
# Default: ~/.aphoria/corpus-db (home-based, shared)
corpus_data_dir = "~/.aphoria/corpus-db"

Architecture:

~/projects/
├── maxwell/
│   └── .aphoria/db/        # Maxwell's observations
├── billing-api/
│   └── .aphoria/db/        # Billing API's observations
└── ~/.aphoria/
    └── corpus-db/          # Shared corpus (all projects)

Legacy Shared Mode

To use the old behavior (single shared database for all projects):

[episteme]
data_dir = "~/.aphoria/db"

Disable Corpus Aggregation

To disable cross-project pattern aggregation:

[episteme]
corpus_data_dir = null

Full Configuration Example

[project]
name = "my-project"
language = "rust"

[episteme]
# Per-project database (default: .aphoria/db)
data_dir = ".aphoria/db"

# Shared corpus database (default: ~/.aphoria/corpus-db)
corpus_data_dir = "~/.aphoria/corpus-db"

# Optional: Remote Episteme URL (future feature)
# url = "https://episteme.example.com"

[thresholds]
block = 0.7  # Conflict score at or above → BLOCK verdict
flag = 0.4   # Conflict score at or above → FLAG verdict

[extractors]
enabled = [
    "tls_verify",
    "tls_version",
    "jwt_config",
    "hardcoded_secrets",
    "timeout_config",
    "dep_versions",
    "cors_config",
    "durability_config",
    "rate_limit",
    # ... (42 total extractors, see cli-reference.md for full list)
]
disabled = []

[extractors.timeout_config]
min_reasonable_ms = 1000
max_reasonable_ms = 300_000

[extractors.dep_versions]
enabled = false  # OPT-IN: Disabled by default to reduce noise
advisory_db = "~/.aphoria/advisory-db"

[extractors.entropy]
min_entropy = 4.5
min_charset_variety = 0.4
min_length = 20
max_length = 200

[extractors.inline_markers]
enabled = false        # OPT-IN: Disabled by default
sync_to_pending = true # Auto-sync when enabled

[scan]
exclude = [
    "target/",
    "node_modules/",
    ".git/",
    "vendor/",
]
max_file_size = 1_048_576  # 1MB
include_tests = false

[aliases]
auto_suggest = true
auto_accept_tier0 = true
auto_create_aliases = true

[corpus]
cache_dir = "~/.cache/aphoria"  # Or system cache dir
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
aggregation_enabled = true
use_legacy_thresholds = false  # Use adaptive thresholds (default)

# Optional: Override adaptive thresholds
# adaptive_thresholds = { micro_floor = 2, small_floor = 5 }

[hosted]
# Optional: Hosted mode for team aggregation
# url = "https://aphoria-hosted.example.com"
# project_id = "billing-api"
# team_id = "platform-team"
# sync_mode = "push_only"  # or "bidirectional"
# max_retries = 3
# retry_delay_ms = 1000
# api_key_env = "APHORIA_API_KEY"

[community]
enabled = false  # CRITICAL: Opt-in only
anonymize = true # CRITICAL: Privacy by default
exclude = []
include = []
min_confidence = 0.8

[llm]
enabled = false
provider = "gemini"
model = "gemini-3-flash-preview"
api_key_env = "GEMINI_API_KEY"
max_tokens_per_scan = 50000
max_tokens_per_file = 4000
cache_responses = true
timeout_secs = 60
high_value_only = true
min_confidence = 0.7

[learning]
enabled = false
store = "local"
min_confidence = 0.7
prune_after_days = 90
max_patterns = 10_000

[learning.promotion]
min_projects = 5
min_confidence = 0.8
auto_promote = false
output_dir = ".aphoria/extractors/learned"
require_review = true

[autonomous]
# CRITICAL: Opt-in only - kill switch defaults to off
enabled = false
min_confidence = 0.95
min_projects = 10
require_zero_failures = true
require_zero_warnings = true
audit_log = true
# audit_dir defaults to ~/.aphoria/audit/

Key Sections

Project

Basic project metadata.

[project]
name = "my-project"       # Optional: auto-detected from directory name
language = "rust"          # Optional: auto-detected from file extensions

Episteme

Database and storage configuration.

[episteme]
data_dir = ".aphoria/db"              # Per-project observations
corpus_data_dir = "~/.aphoria/corpus-db"  # Shared corpus (optional)
url = null                            # Remote Episteme (future)

Key Options:

data_dir - Where to store this project's observations
- Default: .aphoria/db (project-local)
- Override to ~/.aphoria/db for legacy shared mode
corpus_data_dir - Where to store aggregated patterns
- Default: ~/.aphoria/corpus-db (home-based, shared)
- Set to null to disable cross-project aggregation

Thresholds

Conflict severity thresholds.

[thresholds]
block = 0.7  # High severity (blocks CI)
flag = 0.4   # Medium severity (warns)

Conflict scores range from 0.0 (no conflict) to 1.0 (total conflict).

Extractors

Control which extractors run.

[extractors]
enabled = ["tls_verify", "jwt_config", ...]
disabled = []

See cli-reference.md for the full list of 42 available extractors.

Scan

Control which files are scanned.

[scan]
exclude = ["target/", "node_modules/"]
max_file_size = 1_048_576  # 1MB
include_tests = false

You can also use .aphoriaignore files (gitignore syntax).

Corpus

Control corpus building and thresholds.

[corpus]
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
aggregation_enabled = true
use_legacy_thresholds = false  # Use adaptive thresholds

Scale-Adaptive Thresholds (default):

Automatically adjusts promotion thresholds based on team size:

Micro (1-5 projects): Patterns visible with 2/3 adoption
Small (6-25 projects): Patterns visible with 5+ projects
Enterprise (501+): Unchanged behavior

See scale-adaptive-thresholds.md for details.

Legacy Thresholds:

[corpus]
use_legacy_thresholds = true

Fixed thresholds regardless of team size (old behavior).

Hosted Mode

For team collaboration and pattern sharing.

[hosted]
url = "https://aphoria.example.com"
project_id = "billing-api"
team_id = "platform-team"
sync_mode = "push_only"

Requires hosted Aphoria server (future feature).

CRITICAL: Opt-in only. Anonymous pattern contribution.

[community]
enabled = false  # Must explicitly opt-in
anonymize = true # Project names are wildcarded

When enabled with --sync, observations are anonymized and shared with the community corpus.

Privacy Guarantees:

Project names are wildcarded in paths
No file paths, line numbers, or source code
Only pattern aggregates (subject + predicate + value)

LLM Extraction

Use LLMs (Gemini) for semantic claim detection.

[llm]
enabled = false  # OPT-IN
provider = "gemini"
model = "gemini-3-flash-preview"
api_key_env = "GEMINI_API_KEY"

Requires API key in environment.

Learning & Autonomous Promotion

CRITICAL: Both require explicit opt-in.

[learning]
enabled = false  # Pattern learning from scans

[autonomous]
enabled = false  # Auto-promotion to extractors (kill switch)

See vision-gaps.md for implementation status.

Environment Variables

Aphoria respects these environment variables:

Variable	Purpose	Default
`APHORIA_API_KEY`	Hosted mode API key	None (required if hosted.enabled)
`GEMINI_API_KEY`	Gemini API key	None (required if llm.enabled)
`STEMEDB_DB_DIR`	Override `data_dir`	`.aphoria/db`
`APHORIA_CONFIG`	Config file path	`.aphoria/config.toml`

Migration Guide

From Old Home-Based Database

Before (legacy):

# Default in old versions: ~/.aphoria/db

After (new default):

# Default now: ./.aphoria/db (per-project)

To keep legacy behavior:

[episteme]
data_dir = "~/.aphoria/db"

No migration needed - just set data_dir to old path.

8.7 KiB Raw Blame History