jml bb0c33f8d3 fix(api): enable querying of CLI-created community corpus items

## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:

1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
   aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
   bracket notation (?sources[]=value) used by the dashboard

Result: 0/22 CLI-created items were queryable.

## Solution

### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results

### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior

## Verification
- ✅ All 22 CLI-created community items now queryable (was 0)
- ✅ Source filtering works: community (22), RFC (2), vendor (5)
- ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
- ✅ All 89 API tests pass + 3 new extractor tests
- ✅ Clippy clean (0 warnings)
- ✅ No regressions in existing functionality

## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module

Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-09 15:54:35 +00:00

9.7 KiB

Raw Blame History

Documentation Update: Corpus Endpoint & Multi-Project Architecture

Date: 2026-02-09 Scope: Align docs with Phase 1-3 implementation (corpus endpoint, per-project databases, corpus database)

Changes Implemented (Code)

Phase 1: Dashboard Corpus Endpoint ✅

New endpoint: /v1/aphoria/corpus (replaces /v1/aphoria/patterns for valuable content)
DTOs: CorpusItemDto, GetCorpusRequest, GetCorpusResponse
Purpose: Return RFC/OWASP/Community best practices instead of statistical aggregates

Phase 2: Per-Project Database Configuration ✅

Old default: ~/.aphoria/db (home-based, shared across all projects)
New default: .aphoria/db (project-local, isolated per-project)
Override: Users can set [episteme] data_dir = "~/.aphoria/db" for shared mode

Phase 3: Corpus Database Architecture ✅

New field: EpistemeConfig.corpus_data_dir
Default: ~/.aphoria/corpus-db (home-based, shared across projects)
Purpose: Aggregated pattern data from multiple projects for community corpus building

Documentation Issues Found

1. Stale Database Path Reference ❌

File: applications/aphoria/docs/guides/the-first-scan.md:45

Current (WRONG):

This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).

Problem: References old home-based path. Default is now .aphoria/db (project-local).

Fix Required:

This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).

> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.

2. Missing Corpus Architecture Documentation ❌

Issue: No documentation explaining:

Per-project databases (observations)
Shared corpus database (aggregated patterns)
How community learning works across projects
The /v1/aphoria/corpus endpoint

Action Required: Create new guide: applications/aphoria/docs/guides/multi-project-architecture.md

Outline:

# Multi-Project Architecture

## Overview
Aphoria now uses a dual-database architecture:
- **Per-project databases** (`.aphoria/db/`) - Store observations from each project
- **Shared corpus database** (`~/.aphoria/corpus-db/`) - Aggregate patterns across projects

## Per-Project Isolation

Each project gets its own database:

~/projects/ ├── maxwell/ │ └── .aphoria/db/ # Maxwell's observations ├── billing-api/ │ └── .aphoria/db/ # Billing API's observations └── frontend/ └── .aphoria/db/ # Frontend's observations


## Community Corpus Building

When you run `aphoria scan --persist --sync`:
1. Observations are written to your project database (`.aphoria/db/`)
2. Pattern aggregates are pushed to the corpus database (`~/.aphoria/corpus-db/`)
3. Patterns with 95%+ adoption + authority backing auto-promote to corpus

The corpus database accumulates patterns from all your projects on this machine.

## Configuration

**Default (per-project isolation):**
```toml
# .aphoria/config.toml (default)
[episteme]
# data_dir defaults to ./.aphoria/db (project-local)
# corpus_data_dir defaults to ~/.aphoria/corpus-db (shared)

Shared mode (legacy behavior):

[episteme]
data_dir = "~/.aphoria/db"  # All projects share one database

API Endpoints

For hosted/dashboard mode:

/v1/aphoria/corpus - Query RFC/OWASP/Community best practices
/v1/aphoria/patterns - Query statistical pattern aggregates (project counts)


---

### 3. Dashboard References (Stale/Future) ⚠️

**Files:**
- `applications/aphoria/docs/phase-17-summary.md` - References "dashboard" 6 times
- `applications/aphoria/docs/scale-adaptive-thresholds.md:163` - "empty dashboard"

**Issue:** These docs reference a dashboard that exists but isn't documented as a user-facing feature yet.

**Action:**
- **If dashboard is user-facing:** Create `applications/aphoria/docs/guides/dashboard-setup.md`
- **If dashboard is internal only:** Add note to phase-17 that dashboard is "not yet production-ready"

**Recommendation:** Dashboard is mentioned in implementation docs but not in user guides. Add to CLI reference:

```markdown
## Dashboard (Beta)

Start the Aphoria dashboard:
```bash
cd applications/aphoria-dashboard
npm install
npm run dev

Navigate to http://localhost:3000 to view:

Scan results
Corpus items (RFC/OWASP/Community)
Claims coverage

Note: Dashboard is in beta. For production use, query via API (/v1/aphoria/*).


---

### 4. Configuration Guide Missing ❌

**Issue:** No comprehensive configuration reference showing all `aphoria.toml` options.

**Action Required:** Create `applications/aphoria/docs/configuration.md`

**Outline:**
```markdown
# Configuration Reference

## File Location

`.aphoria/config.toml` (created by `aphoria init`)

## Full Example

```toml
[project]
name = "my-project"
language = "rust"

[episteme]
# Per-project database (default: .aphoria/db)
data_dir = ".aphoria/db"

# Shared corpus database (default: ~/.aphoria/corpus-db)
corpus_data_dir = "~/.aphoria/corpus-db"

# Optional: Remote Episteme URL (future feature)
# url = "https://episteme.example.com"

[thresholds]
block = 0.7  # Conflict score to BLOCK
flag = 0.4   # Conflict score to FLAG

[extractors]
enabled = [
    "tls_verify",
    "jwt_config",
    # ... (see cli-reference.md for full list)
]

[scan]
exclude = [
    "target/",
    "node_modules/",
    ".git/",
]
max_file_size = 1_048_576  # 1MB

[corpus]
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
aggregation_enabled = true
use_legacy_thresholds = false  # Use adaptive thresholds (default)

[hosted]
# Optional: Hosted mode for team aggregation
# url = "https://aphoria-hosted.example.com"
# project_id = "billing-api"
# team_id = "platform-team"

[community]
enabled = false  # Opt-in for anonymous pattern sharing
anonymize = true

Key Settings

Database Paths

Per-project (default):

[episteme]
data_dir = ".aphoria/db"

Shared (legacy):

[episteme]
data_dir = "~/.aphoria/db"

Corpus database:

[episteme]
corpus_data_dir = "~/.aphoria/corpus-db"  # Default
# Or disable: corpus_data_dir = null

Thresholds

Scale-Adaptive (default):

[corpus]
use_legacy_thresholds = false

Auto-detects team size (Micro: 1-5 projects → Enterprise: 501+) and adjusts promotion thresholds accordingly.

Legacy (fixed thresholds):

[corpus]
use_legacy_thresholds = true

See scale-adaptive-thresholds.md for details.


---

## Summary of Required Changes

### DELETE
- None (no stale planning docs found related to this change)

### UPDATE
1. **`the-first-scan.md:45`** - Change `~/.aphoria/db` → `.aphoria/db` + add override note
2. **`README.md:39`** - Add note about per-project databases (optional, keep lean)
3. **`cli-reference.md`** - Add configuration section linking to new `configuration.md`

### CREATE
1. **`configuration.md`** - Complete config reference with database path examples
2. **`guides/multi-project-architecture.md`** - Explain dual-database architecture
3. **Optional: `guides/dashboard-setup.md`** - If dashboard is user-facing

---

## Implementation Plan

### Step 1: Fix Immediate Stale Reference (5 min)
- Update `the-first-scan.md:45` with correct path

### Step 2: Create Configuration Guide (15 min)
- New file: `configuration.md`
- Include all `episteme` options with examples
- Cross-reference from `cli-reference.md`

### Step 3: Create Multi-Project Guide (20 min)
- New file: `guides/multi-project-architecture.md`
- Explain per-project vs corpus databases
- Include community learning flow diagram (optional)

### Step 4: Update README (5 min)
- Add one-line note about per-project isolation
- Keep it lean (link to configuration.md for details)

### Step 5: CLI Reference Update (5 min)
- Add "Configuration" section
- Link to `configuration.md`
- Add dashboard section if ready for users

---

## Testing Checklist

Before committing:

- [ ] All bash examples tested and working
- [ ] Cross-links verified (configuration.md ↔ cli-reference.md ↔ guides/)
- [ ] No old terminology (`~/.aphoria/db` as default)
- [ ] Examples match current CLI output
- [ ] Dashboard references accurate (production vs beta)

---

## Questions for User

1. **Dashboard Status:** Is the Aphoria dashboard ready for user-facing docs, or should it remain "internal/beta" for now?

2. **Corpus Database:** Should we document how to disable corpus aggregation (`corpus_data_dir = null`), or is it always-on?

3. **Migration Guide:** Do we need a migration guide for users upgrading from old `~/.aphoria/db` to new per-project databases?
   - **Recommendation:** Not needed. Old users can override to `data_dir = "~/.aphoria/db"` for legacy behavior.

---

## Files to Modify

### High Priority (Stale References)
- `applications/aphoria/docs/guides/the-first-scan.md` - Line 45 (stale path)

### Medium Priority (New Content)
- `applications/aphoria/docs/configuration.md` (NEW)
- `applications/aphoria/docs/guides/multi-project-architecture.md` (NEW)
- `applications/aphoria/docs/cli-reference.md` - Add configuration section

### Low Priority (Enhancement)
- `applications/aphoria/README.md` - Brief note on per-project isolation
- `applications/aphoria/docs/guides/dashboard-setup.md` (NEW, if dashboard is ready)

---

## Next Steps

**Immediate:**
1. Fix stale path reference in `the-first-scan.md`
2. Create `configuration.md` with database path examples

**Follow-up:**
3. Create `multi-project-architecture.md` guide
4. Decide on dashboard documentation strategy

9.7 KiB Raw Blame History