stemedb/CORPUS-QUICK-START.md
jml bb0c33f8d3 fix(api): enable querying of CLI-created community corpus items
## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:

1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
   aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
   bracket notation (?sources[]=value) used by the dashboard

Result: 0/22 CLI-created items were queryable.

## Solution

### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results

### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior

## Verification
-  All 22 CLI-created community items now queryable (was 0)
-  Source filtering works: community (22), RFC (2), vendor (5)
-  Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
-  All 89 API tests pass + 3 new extractor tests
-  Clippy clean (0 warnings)
-  No regressions in existing functionality

## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module

Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-09 15:54:35 +00:00

2.6 KiB

Corpus Quick Start Guide

TL;DR - API is Already Running!

The corpus API is currently serving data at:

  • URL: http://localhost:18180/v1/aphoria/corpus
  • Database: ~/.aphoria/corpus-db
  • Data: 2 RFC items (TLS cert verification, JWT audience validation)

Test It Right Now

# Get all RFC corpus items
curl -s 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc' | jq '.items[].subject'

# Expected output:
# "rfc://5246/tls/certificate_verification"
# "rfc://7519/audience_validation"

Import Production Wiki

cd ~/Workspace/stemedb
target/release/aphoria corpus import wiki ~/Workspace/orchard9/wiki/content

Start Dashboard

cd applications/aphoria-dashboard
npm run dev
# Open: http://localhost:3000/corpus

Restart API Later (if needed)

cd ~/Workspace/stemedb
STEMEDB_DB_DIR=$HOME/.aphoria/corpus-db \
STEMEDB_WAL_DIR=$HOME/.aphoria/corpus-db/wal \
target/release/stemedb-api

Query Examples

# Get all sources (RFC, OWASP, vendor, community)
curl 'http://localhost:18180/v1/aphoria/corpus'

# Filter by multiple sources
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc&sources[]=owasp'

# Filter by category
curl 'http://localhost:18180/v1/aphoria/corpus?category=security'

# Pagination
curl 'http://localhost:18180/v1/aphoria/corpus?limit=10&offset=0'

Response Format

{
  "items": [
    {
      "subject": "rfc://5246/tls/certificate_verification",
      "predicate": "enabled",
      "value": "true",
      "source": "rfc://",
      "tier": 0,
      "category": "security",
      "explanation": "TLS certificate verification MUST be enabled...",
      "authority_source": "RFC 5246 Section 7.4.2"
    }
  ],
  "total_matching": 2,
  "sources_included": ["rfc://"]
}

Files to Know

  • Corpus DB: ~/.aphoria/corpus-db/ (shared across projects)
  • Project DB: .aphoria/db/ (per-project)
  • Import CLI: aphoria corpus import wiki <path>
  • API Config: Set STEMEDB_DB_DIR to choose database

Troubleshooting

Dashboard shows empty results?

  • Check API is running on port 18180
  • Verify API is using corpus database: ps aux | grep stemedb-api
  • Check API logs for database path

API won't start?

  • Make sure corpus DB exists: ls ~/.aphoria/corpus-db/
  • Check port not in use: lsof -i :18180
  • View logs: tail -f /tmp/api-corpus.log

Need to reimport wiki?

rm -rf ~/.aphoria/corpus-db
target/release/aphoria corpus import wiki <path>

Current Status: API running, corpus database populated, ready for dashboard!