## Problem CLI-created community corpus items (tier 3) were stored correctly but invisible via API queries. Two issues blocked discoverability: 1. **Prefix mismatch**: API hardcoded 'community://pattern/' for aggregated patterns, but CLI creates 'community://rust/http/...' URIs 2. **Query parameter parsing**: Axum's default parser doesn't support bracket notation (?sources[]=value) used by the dashboard Result: 0/22 CLI-created items were queryable. ## Solution ### Fix 1: Broaden Community Prefix - Changed: 'community://pattern/' → 'community://' in corpus handler - Impact: Now matches both aggregated patterns AND CLI-created items - Backward compatible: Broader prefix includes narrower results ### Fix 2: Add QsQuery Extractor - Added: serde_qs dependency + custom QsQuery extractor - Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b) - Compatible: Works with JavaScript URLSearchParams standard - Tested: 3 new unit tests for extractor behavior ## Verification - ✅ All 22 CLI-created community items now queryable (was 0) - ✅ Source filtering works: community (22), RFC (2), vendor (5) - ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24 - ✅ All 89 API tests pass + 3 new extractor tests - ✅ Clippy clean (0 warnings) - ✅ No regressions in existing functionality ## Files Changed - crates/stemedb-api/Cargo.toml: Add serde_qs dependency - crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines) - crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix - crates/stemedb-api/src/lib.rs: Export extractors module Also includes: Scale-adaptive thresholds, wiki corpus extraction, documentation updates, and dashboard UI improvements from prior work. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
353 lines
9.7 KiB
Markdown
353 lines
9.7 KiB
Markdown
# Documentation Update: Corpus Endpoint & Multi-Project Architecture
|
|
|
|
**Date:** 2026-02-09
|
|
**Scope:** Align docs with Phase 1-3 implementation (corpus endpoint, per-project databases, corpus database)
|
|
|
|
---
|
|
|
|
## Changes Implemented (Code)
|
|
|
|
### Phase 1: Dashboard Corpus Endpoint ✅
|
|
- **New endpoint:** `/v1/aphoria/corpus` (replaces `/v1/aphoria/patterns` for valuable content)
|
|
- **DTOs:** `CorpusItemDto`, `GetCorpusRequest`, `GetCorpusResponse`
|
|
- **Purpose:** Return RFC/OWASP/Community best practices instead of statistical aggregates
|
|
|
|
### Phase 2: Per-Project Database Configuration ✅
|
|
- **Old default:** `~/.aphoria/db` (home-based, shared across all projects)
|
|
- **New default:** `.aphoria/db` (project-local, isolated per-project)
|
|
- **Override:** Users can set `[episteme] data_dir = "~/.aphoria/db"` for shared mode
|
|
|
|
### Phase 3: Corpus Database Architecture ✅
|
|
- **New field:** `EpistemeConfig.corpus_data_dir`
|
|
- **Default:** `~/.aphoria/corpus-db` (home-based, shared across projects)
|
|
- **Purpose:** Aggregated pattern data from multiple projects for community corpus building
|
|
|
|
---
|
|
|
|
## Documentation Issues Found
|
|
|
|
### 1. Stale Database Path Reference ❌
|
|
|
|
**File:** `applications/aphoria/docs/guides/the-first-scan.md:45`
|
|
|
|
**Current (WRONG):**
|
|
```markdown
|
|
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).
|
|
```
|
|
|
|
**Problem:** References old home-based path. Default is now `.aphoria/db` (project-local).
|
|
|
|
**Fix Required:**
|
|
```markdown
|
|
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).
|
|
|
|
> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Missing Corpus Architecture Documentation ❌
|
|
|
|
**Issue:** No documentation explaining:
|
|
- Per-project databases (observations)
|
|
- Shared corpus database (aggregated patterns)
|
|
- How community learning works across projects
|
|
- The `/v1/aphoria/corpus` endpoint
|
|
|
|
**Action Required:** Create new guide: `applications/aphoria/docs/guides/multi-project-architecture.md`
|
|
|
|
**Outline:**
|
|
```markdown
|
|
# Multi-Project Architecture
|
|
|
|
## Overview
|
|
Aphoria now uses a dual-database architecture:
|
|
- **Per-project databases** (`.aphoria/db/`) - Store observations from each project
|
|
- **Shared corpus database** (`~/.aphoria/corpus-db/`) - Aggregate patterns across projects
|
|
|
|
## Per-Project Isolation
|
|
|
|
Each project gets its own database:
|
|
```
|
|
~/projects/
|
|
├── maxwell/
|
|
│ └── .aphoria/db/ # Maxwell's observations
|
|
├── billing-api/
|
|
│ └── .aphoria/db/ # Billing API's observations
|
|
└── frontend/
|
|
└── .aphoria/db/ # Frontend's observations
|
|
```
|
|
|
|
## Community Corpus Building
|
|
|
|
When you run `aphoria scan --persist --sync`:
|
|
1. Observations are written to your project database (`.aphoria/db/`)
|
|
2. Pattern aggregates are pushed to the corpus database (`~/.aphoria/corpus-db/`)
|
|
3. Patterns with 95%+ adoption + authority backing auto-promote to corpus
|
|
|
|
The corpus database accumulates patterns from all your projects on this machine.
|
|
|
|
## Configuration
|
|
|
|
**Default (per-project isolation):**
|
|
```toml
|
|
# .aphoria/config.toml (default)
|
|
[episteme]
|
|
# data_dir defaults to ./.aphoria/db (project-local)
|
|
# corpus_data_dir defaults to ~/.aphoria/corpus-db (shared)
|
|
```
|
|
|
|
**Shared mode (legacy behavior):**
|
|
```toml
|
|
[episteme]
|
|
data_dir = "~/.aphoria/db" # All projects share one database
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
For hosted/dashboard mode:
|
|
- `/v1/aphoria/corpus` - Query RFC/OWASP/Community best practices
|
|
- `/v1/aphoria/patterns` - Query statistical pattern aggregates (project counts)
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Dashboard References (Stale/Future) ⚠️
|
|
|
|
**Files:**
|
|
- `applications/aphoria/docs/phase-17-summary.md` - References "dashboard" 6 times
|
|
- `applications/aphoria/docs/scale-adaptive-thresholds.md:163` - "empty dashboard"
|
|
|
|
**Issue:** These docs reference a dashboard that exists but isn't documented as a user-facing feature yet.
|
|
|
|
**Action:**
|
|
- **If dashboard is user-facing:** Create `applications/aphoria/docs/guides/dashboard-setup.md`
|
|
- **If dashboard is internal only:** Add note to phase-17 that dashboard is "not yet production-ready"
|
|
|
|
**Recommendation:** Dashboard is mentioned in implementation docs but not in user guides. Add to CLI reference:
|
|
|
|
```markdown
|
|
## Dashboard (Beta)
|
|
|
|
Start the Aphoria dashboard:
|
|
```bash
|
|
cd applications/aphoria-dashboard
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
Navigate to `http://localhost:3000` to view:
|
|
- Scan results
|
|
- Corpus items (RFC/OWASP/Community)
|
|
- Claims coverage
|
|
|
|
**Note:** Dashboard is in beta. For production use, query via API (`/v1/aphoria/*`).
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Configuration Guide Missing ❌
|
|
|
|
**Issue:** No comprehensive configuration reference showing all `aphoria.toml` options.
|
|
|
|
**Action Required:** Create `applications/aphoria/docs/configuration.md`
|
|
|
|
**Outline:**
|
|
```markdown
|
|
# Configuration Reference
|
|
|
|
## File Location
|
|
|
|
`.aphoria/config.toml` (created by `aphoria init`)
|
|
|
|
## Full Example
|
|
|
|
```toml
|
|
[project]
|
|
name = "my-project"
|
|
language = "rust"
|
|
|
|
[episteme]
|
|
# Per-project database (default: .aphoria/db)
|
|
data_dir = ".aphoria/db"
|
|
|
|
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
|
|
|
# Optional: Remote Episteme URL (future feature)
|
|
# url = "https://episteme.example.com"
|
|
|
|
[thresholds]
|
|
block = 0.7 # Conflict score to BLOCK
|
|
flag = 0.4 # Conflict score to FLAG
|
|
|
|
[extractors]
|
|
enabled = [
|
|
"tls_verify",
|
|
"jwt_config",
|
|
# ... (see cli-reference.md for full list)
|
|
]
|
|
|
|
[scan]
|
|
exclude = [
|
|
"target/",
|
|
"node_modules/",
|
|
".git/",
|
|
]
|
|
max_file_size = 1_048_576 # 1MB
|
|
|
|
[corpus]
|
|
include_rfc = true
|
|
include_owasp = true
|
|
include_vendor = true
|
|
use_community = true
|
|
aggregation_enabled = true
|
|
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
|
|
|
[hosted]
|
|
# Optional: Hosted mode for team aggregation
|
|
# url = "https://aphoria-hosted.example.com"
|
|
# project_id = "billing-api"
|
|
# team_id = "platform-team"
|
|
|
|
[community]
|
|
enabled = false # Opt-in for anonymous pattern sharing
|
|
anonymize = true
|
|
```
|
|
|
|
## Key Settings
|
|
|
|
### Database Paths
|
|
|
|
**Per-project (default):**
|
|
```toml
|
|
[episteme]
|
|
data_dir = ".aphoria/db"
|
|
```
|
|
|
|
**Shared (legacy):**
|
|
```toml
|
|
[episteme]
|
|
data_dir = "~/.aphoria/db"
|
|
```
|
|
|
|
**Corpus database:**
|
|
```toml
|
|
[episteme]
|
|
corpus_data_dir = "~/.aphoria/corpus-db" # Default
|
|
# Or disable: corpus_data_dir = null
|
|
```
|
|
|
|
### Thresholds
|
|
|
|
**Scale-Adaptive (default):**
|
|
```toml
|
|
[corpus]
|
|
use_legacy_thresholds = false
|
|
```
|
|
|
|
Auto-detects team size (Micro: 1-5 projects → Enterprise: 501+) and adjusts promotion thresholds accordingly.
|
|
|
|
**Legacy (fixed thresholds):**
|
|
```toml
|
|
[corpus]
|
|
use_legacy_thresholds = true
|
|
```
|
|
|
|
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
|
```
|
|
|
|
---
|
|
|
|
## Summary of Required Changes
|
|
|
|
### DELETE
|
|
- None (no stale planning docs found related to this change)
|
|
|
|
### UPDATE
|
|
1. **`the-first-scan.md:45`** - Change `~/.aphoria/db` → `.aphoria/db` + add override note
|
|
2. **`README.md:39`** - Add note about per-project databases (optional, keep lean)
|
|
3. **`cli-reference.md`** - Add configuration section linking to new `configuration.md`
|
|
|
|
### CREATE
|
|
1. **`configuration.md`** - Complete config reference with database path examples
|
|
2. **`guides/multi-project-architecture.md`** - Explain dual-database architecture
|
|
3. **Optional: `guides/dashboard-setup.md`** - If dashboard is user-facing
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Step 1: Fix Immediate Stale Reference (5 min)
|
|
- Update `the-first-scan.md:45` with correct path
|
|
|
|
### Step 2: Create Configuration Guide (15 min)
|
|
- New file: `configuration.md`
|
|
- Include all `episteme` options with examples
|
|
- Cross-reference from `cli-reference.md`
|
|
|
|
### Step 3: Create Multi-Project Guide (20 min)
|
|
- New file: `guides/multi-project-architecture.md`
|
|
- Explain per-project vs corpus databases
|
|
- Include community learning flow diagram (optional)
|
|
|
|
### Step 4: Update README (5 min)
|
|
- Add one-line note about per-project isolation
|
|
- Keep it lean (link to configuration.md for details)
|
|
|
|
### Step 5: CLI Reference Update (5 min)
|
|
- Add "Configuration" section
|
|
- Link to `configuration.md`
|
|
- Add dashboard section if ready for users
|
|
|
|
---
|
|
|
|
## Testing Checklist
|
|
|
|
Before committing:
|
|
|
|
- [ ] All bash examples tested and working
|
|
- [ ] Cross-links verified (configuration.md ↔ cli-reference.md ↔ guides/)
|
|
- [ ] No old terminology (`~/.aphoria/db` as default)
|
|
- [ ] Examples match current CLI output
|
|
- [ ] Dashboard references accurate (production vs beta)
|
|
|
|
---
|
|
|
|
## Questions for User
|
|
|
|
1. **Dashboard Status:** Is the Aphoria dashboard ready for user-facing docs, or should it remain "internal/beta" for now?
|
|
|
|
2. **Corpus Database:** Should we document how to disable corpus aggregation (`corpus_data_dir = null`), or is it always-on?
|
|
|
|
3. **Migration Guide:** Do we need a migration guide for users upgrading from old `~/.aphoria/db` to new per-project databases?
|
|
- **Recommendation:** Not needed. Old users can override to `data_dir = "~/.aphoria/db"` for legacy behavior.
|
|
|
|
---
|
|
|
|
## Files to Modify
|
|
|
|
### High Priority (Stale References)
|
|
- `applications/aphoria/docs/guides/the-first-scan.md` - Line 45 (stale path)
|
|
|
|
### Medium Priority (New Content)
|
|
- `applications/aphoria/docs/configuration.md` (NEW)
|
|
- `applications/aphoria/docs/guides/multi-project-architecture.md` (NEW)
|
|
- `applications/aphoria/docs/cli-reference.md` - Add configuration section
|
|
|
|
### Low Priority (Enhancement)
|
|
- `applications/aphoria/README.md` - Brief note on per-project isolation
|
|
- `applications/aphoria/docs/guides/dashboard-setup.md` (NEW, if dashboard is ready)
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
**Immediate:**
|
|
1. Fix stale path reference in `the-first-scan.md`
|
|
2. Create `configuration.md` with database path examples
|
|
|
|
**Follow-up:**
|
|
3. Create `multi-project-architecture.md` guide
|
|
4. Decide on dashboard documentation strategy
|