# Documentation Update: Corpus Endpoint & Multi-Project Architecture **Date:** 2026-02-09 **Scope:** Align docs with Phase 1-3 implementation (corpus endpoint, per-project databases, corpus database) --- ## Changes Implemented (Code) ### Phase 1: Dashboard Corpus Endpoint ✅ - **New endpoint:** `/v1/aphoria/corpus` (replaces `/v1/aphoria/patterns` for valuable content) - **DTOs:** `CorpusItemDto`, `GetCorpusRequest`, `GetCorpusResponse` - **Purpose:** Return RFC/OWASP/Community best practices instead of statistical aggregates ### Phase 2: Per-Project Database Configuration ✅ - **Old default:** `~/.aphoria/db` (home-based, shared across all projects) - **New default:** `.aphoria/db` (project-local, isolated per-project) - **Override:** Users can set `[episteme] data_dir = "~/.aphoria/db"` for shared mode ### Phase 3: Corpus Database Architecture ✅ - **New field:** `EpistemeConfig.corpus_data_dir` - **Default:** `~/.aphoria/corpus-db` (home-based, shared across projects) - **Purpose:** Aggregated pattern data from multiple projects for community corpus building --- ## Documentation Issues Found ### 1. Stale Database Path Reference ❌ **File:** `applications/aphoria/docs/guides/the-first-scan.md:45` **Current (WRONG):** ```markdown This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`). ``` **Problem:** References old home-based path. Default is now `.aphoria/db` (project-local). **Fix Required:** ```markdown This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`). > **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`. ``` --- ### 2. Missing Corpus Architecture Documentation ❌ **Issue:** No documentation explaining: - Per-project databases (observations) - Shared corpus database (aggregated patterns) - How community learning works across projects - The `/v1/aphoria/corpus` endpoint **Action Required:** Create new guide: `applications/aphoria/docs/guides/multi-project-architecture.md` **Outline:** ```markdown # Multi-Project Architecture ## Overview Aphoria now uses a dual-database architecture: - **Per-project databases** (`.aphoria/db/`) - Store observations from each project - **Shared corpus database** (`~/.aphoria/corpus-db/`) - Aggregate patterns across projects ## Per-Project Isolation Each project gets its own database: ``` ~/projects/ ├── maxwell/ │ └── .aphoria/db/ # Maxwell's observations ├── billing-api/ │ └── .aphoria/db/ # Billing API's observations └── frontend/ └── .aphoria/db/ # Frontend's observations ``` ## Community Corpus Building When you run `aphoria scan --persist --sync`: 1. Observations are written to your project database (`.aphoria/db/`) 2. Pattern aggregates are pushed to the corpus database (`~/.aphoria/corpus-db/`) 3. Patterns with 95%+ adoption + authority backing auto-promote to corpus The corpus database accumulates patterns from all your projects on this machine. ## Configuration **Default (per-project isolation):** ```toml # .aphoria/config.toml (default) [episteme] # data_dir defaults to ./.aphoria/db (project-local) # corpus_data_dir defaults to ~/.aphoria/corpus-db (shared) ``` **Shared mode (legacy behavior):** ```toml [episteme] data_dir = "~/.aphoria/db" # All projects share one database ``` ## API Endpoints For hosted/dashboard mode: - `/v1/aphoria/corpus` - Query RFC/OWASP/Community best practices - `/v1/aphoria/patterns` - Query statistical pattern aggregates (project counts) ``` --- ### 3. Dashboard References (Stale/Future) ⚠️ **Files:** - `applications/aphoria/docs/phase-17-summary.md` - References "dashboard" 6 times - `applications/aphoria/docs/scale-adaptive-thresholds.md:163` - "empty dashboard" **Issue:** These docs reference a dashboard that exists but isn't documented as a user-facing feature yet. **Action:** - **If dashboard is user-facing:** Create `applications/aphoria/docs/guides/dashboard-setup.md` - **If dashboard is internal only:** Add note to phase-17 that dashboard is "not yet production-ready" **Recommendation:** Dashboard is mentioned in implementation docs but not in user guides. Add to CLI reference: ```markdown ## Dashboard (Beta) Start the Aphoria dashboard: ```bash cd applications/aphoria-dashboard npm install npm run dev ``` Navigate to `http://localhost:3000` to view: - Scan results - Corpus items (RFC/OWASP/Community) - Claims coverage **Note:** Dashboard is in beta. For production use, query via API (`/v1/aphoria/*`). ``` --- ### 4. Configuration Guide Missing ❌ **Issue:** No comprehensive configuration reference showing all `aphoria.toml` options. **Action Required:** Create `applications/aphoria/docs/configuration.md` **Outline:** ```markdown # Configuration Reference ## File Location `.aphoria/config.toml` (created by `aphoria init`) ## Full Example ```toml [project] name = "my-project" language = "rust" [episteme] # Per-project database (default: .aphoria/db) data_dir = ".aphoria/db" # Shared corpus database (default: ~/.aphoria/corpus-db) corpus_data_dir = "~/.aphoria/corpus-db" # Optional: Remote Episteme URL (future feature) # url = "https://episteme.example.com" [thresholds] block = 0.7 # Conflict score to BLOCK flag = 0.4 # Conflict score to FLAG [extractors] enabled = [ "tls_verify", "jwt_config", # ... (see cli-reference.md for full list) ] [scan] exclude = [ "target/", "node_modules/", ".git/", ] max_file_size = 1_048_576 # 1MB [corpus] include_rfc = true include_owasp = true include_vendor = true use_community = true aggregation_enabled = true use_legacy_thresholds = false # Use adaptive thresholds (default) [hosted] # Optional: Hosted mode for team aggregation # url = "https://aphoria-hosted.example.com" # project_id = "billing-api" # team_id = "platform-team" [community] enabled = false # Opt-in for anonymous pattern sharing anonymize = true ``` ## Key Settings ### Database Paths **Per-project (default):** ```toml [episteme] data_dir = ".aphoria/db" ``` **Shared (legacy):** ```toml [episteme] data_dir = "~/.aphoria/db" ``` **Corpus database:** ```toml [episteme] corpus_data_dir = "~/.aphoria/corpus-db" # Default # Or disable: corpus_data_dir = null ``` ### Thresholds **Scale-Adaptive (default):** ```toml [corpus] use_legacy_thresholds = false ``` Auto-detects team size (Micro: 1-5 projects → Enterprise: 501+) and adjusts promotion thresholds accordingly. **Legacy (fixed thresholds):** ```toml [corpus] use_legacy_thresholds = true ``` See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details. ``` --- ## Summary of Required Changes ### DELETE - None (no stale planning docs found related to this change) ### UPDATE 1. **`the-first-scan.md:45`** - Change `~/.aphoria/db` → `.aphoria/db` + add override note 2. **`README.md:39`** - Add note about per-project databases (optional, keep lean) 3. **`cli-reference.md`** - Add configuration section linking to new `configuration.md` ### CREATE 1. **`configuration.md`** - Complete config reference with database path examples 2. **`guides/multi-project-architecture.md`** - Explain dual-database architecture 3. **Optional: `guides/dashboard-setup.md`** - If dashboard is user-facing --- ## Implementation Plan ### Step 1: Fix Immediate Stale Reference (5 min) - Update `the-first-scan.md:45` with correct path ### Step 2: Create Configuration Guide (15 min) - New file: `configuration.md` - Include all `episteme` options with examples - Cross-reference from `cli-reference.md` ### Step 3: Create Multi-Project Guide (20 min) - New file: `guides/multi-project-architecture.md` - Explain per-project vs corpus databases - Include community learning flow diagram (optional) ### Step 4: Update README (5 min) - Add one-line note about per-project isolation - Keep it lean (link to configuration.md for details) ### Step 5: CLI Reference Update (5 min) - Add "Configuration" section - Link to `configuration.md` - Add dashboard section if ready for users --- ## Testing Checklist Before committing: - [ ] All bash examples tested and working - [ ] Cross-links verified (configuration.md ↔ cli-reference.md ↔ guides/) - [ ] No old terminology (`~/.aphoria/db` as default) - [ ] Examples match current CLI output - [ ] Dashboard references accurate (production vs beta) --- ## Questions for User 1. **Dashboard Status:** Is the Aphoria dashboard ready for user-facing docs, or should it remain "internal/beta" for now? 2. **Corpus Database:** Should we document how to disable corpus aggregation (`corpus_data_dir = null`), or is it always-on? 3. **Migration Guide:** Do we need a migration guide for users upgrading from old `~/.aphoria/db` to new per-project databases? - **Recommendation:** Not needed. Old users can override to `data_dir = "~/.aphoria/db"` for legacy behavior. --- ## Files to Modify ### High Priority (Stale References) - `applications/aphoria/docs/guides/the-first-scan.md` - Line 45 (stale path) ### Medium Priority (New Content) - `applications/aphoria/docs/configuration.md` (NEW) - `applications/aphoria/docs/guides/multi-project-architecture.md` (NEW) - `applications/aphoria/docs/cli-reference.md` - Add configuration section ### Low Priority (Enhancement) - `applications/aphoria/README.md` - Brief note on per-project isolation - `applications/aphoria/docs/guides/dashboard-setup.md` (NEW, if dashboard is ready) --- ## Next Steps **Immediate:** 1. Fix stale path reference in `the-first-scan.md` 2. Create `configuration.md` with database path examples **Follow-up:** 3. Create `multi-project-architecture.md` guide 4. Decide on dashboard documentation strategy