# Corpus Database Architecture **Audience:** Engineers integrating Aphoria with StemeDB API, ops teams deploying both systems. **What you'll learn:** - How Aphoria's corpus database integrates with StemeDB API - URI scheme inference for authoritative sources - Where CLI-created corpus items live - Git hooks for automatic binary rebuilds - Production deployment patterns --- ## Quick Reference ```bash # Aphoria CLI writes to: ~/.aphoria/corpus-db/ # StemeDB API reads from: data/db/ # Default, or configure STEMEDB_CORPUS_DB_DIR # Make API see Aphoria corpus: export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db" stemedb-api ``` --- ## Database Separation ### The Problem Aphoria and StemeDB API use separate databases: ``` Aphoria CLI: └─ corpus create/build → ~/.aphoria/corpus-db/ StemeDB API: └─ GET /v1/aphoria/corpus → data/db/ Result: Items created via CLI aren't visible in API/Dashboard ``` ### The Solution Three integration patterns: #### Pattern 1: Shared Database (Recommended for Development) Point API to Aphoria's corpus database: ```bash # .env STEMEDB_CORPUS_DB_DIR=/home/user/.aphoria/corpus-db # Start API cargo run --release -p stemedb-api ``` **Pros:** - Zero synchronization needed - Single source of truth - Changes immediately visible **Cons:** - API has read-only access (can't write to corpus) - Not suitable if API needs to write corpus items #### Pattern 2: Unified Database (Recommended for Production) Use shared directory for both: ```bash # Create shared directory sudo mkdir -p /var/lib/stemedb/corpus sudo chown aphoria:stemedb /var/lib/stemedb/corpus sudo chmod 775 /var/lib/stemedb/corpus ``` ```toml # .aphoria/config.toml [episteme] corpus_data_dir = "/var/lib/stemedb/corpus" ``` ```bash # StemeDB API export STEMEDB_CORPUS_DB_DIR="/var/lib/stemedb/corpus" ``` **Pros:** - Single database, no sync - Both systems have write access - Production-ready pattern **Cons:** - Requires deployment coordination - Permissions management needed #### Pattern 3: Sync Mechanism (Future) ```bash # Planned (not yet implemented) aphoria corpus sync --to-api --api-db-dir data/db ``` **Use case:** When databases must remain separate. --- ## URI Scheme Inference ### The Problem Corpus items need URI-schemed subjects for API prefix scanning: ```bash # Without URI scheme (won't work): subject: "tls/certificate_verification" # API queries: curl '/v1/aphoria/corpus?sources[]=rfc' # Scans for "subject:rfc://" → doesn't match plain subjects ``` ### The Solution Automatic URI inference based on authority and tier: ```rust // In aphoria corpus create Authority: "RFC 5246 Section 7.4.2" Tier: 0 // Auto-inferred: subject_uri: "rfc://tls/certificate_verification" ``` ### Inference Rules | Condition | Scheme | Example | |-----------|--------|---------| | Already has `://` | Preserved | `rfc://test` → `rfc://test` | | Authority contains "rfc" (case-insensitive) | `rfc://` | "RFC 5280" → `rfc://...` | | Authority contains "owasp" | `owasp://` | "OWASP Top 10" → `owasp://...` | | Authority contains "cwe" | `cwe://` | "CWE-120" → `cwe://...` | | Tier 2 | `vendor://` | GitHub docs → `vendor://...` | | Tier 3 | `community://` | Team wiki → `community://...` | | Tier 0/1 unrecognized | `corpus://` | Unknown → `corpus://...` | **Priority:** Authority matching > Tier-based > Fallback ### Examples ```bash # RFC claim (tier 0) aphoria corpus create \ --subject "tls/validation" \ --authority "RFC 5280 Section 6.1" \ --tier 0 # Stored as: subject:rfc://tls/validation # OWASP claim (tier 1) aphoria corpus create \ --subject "password/storage" \ --authority "OWASP Password Storage Cheat Sheet" \ --tier 1 # Stored as: subject:owasp://password/storage # Vendor docs (tier 2) aphoria corpus create \ --subject "postgresql/connection_pool" \ --authority "PostgreSQL Documentation" \ --tier 2 # Stored as: subject:vendor://postgresql/connection_pool # Community (tier 3) aphoria corpus create \ --subject "api/rest/pagination" \ --authority "Team wiki: API standards" \ --tier 3 # Stored as: subject:community://api/rest/pagination # Already schemed (preserved) aphoria corpus create \ --subject "custom://myapp/feature" \ --authority "Internal spec" \ --tier 2 # Stored as: subject:custom://myapp/feature ``` --- ## CLI-Created Corpus Source ### The Problem Items created with `aphoria corpus create` weren't visible in: ```bash aphoria corpus list # Showed: RFC, OWASP, VendorDocs # Missing: CLI-created items aphoria corpus build # Total assertions: 86 # Missing: CLI-created items ``` ### The Solution CLI-created items are now a first-class corpus source: ```rust // Tagged at creation time metadata: { "source": "cli_create", "description": "...", "authority_source": "...", "category": "..." } // Discovered by CliCreatedBuilder impl AsyncCorpusBuilder for CliCreatedBuilder { async fn build(...) -> Vec { // Scan corpus DB // Filter by metadata: "source": "cli_create" // Return assertions } } ``` ### Now They Appear ```bash aphoria corpus list # Available corpus sources: # rfc:// (Tier 0) - RFC # owasp:// (Tier 1) - OWASP # vendor:// (Tier 2) - VendorDocs # cli:// (Tier 3) - CLI-Created Items ← NEW aphoria corpus build # Corpus build complete: # Total assertions: 157 # CLI-Created Items: 3 assertions ← NEW ``` ### Querying CLI-Created Items ```bash # Via API curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=cli' # Via Dashboard # Navigate to: http://localhost:3000/corpus # Filter by "CLI-Created" source ``` --- ## Git Hooks for Binary Rebuilds ### The Problem Developer workflow: 1. `git pull` (gets CLI definition changes) 2. Run `aphoria corpus create` 3. Error: "unrecognized subcommand 'create'" 4. Confusion, time wasted 5. Realize binary is stale: `cargo build --release -p aphoria` ### The Solution Automatic rebuild hooks: ```bash # .git/hooks/post-merge if git diff-tree ... | grep -q "^applications/aphoria/src/cli"; then echo "🔧 CLI changed, rebuilding aphoria..." cargo build --release -p aphoria fi ``` ### Installed Hooks **post-merge** - After `git pull` or `git merge` **post-checkout** - After `git checkout ` **post-rewrite** - After `git rebase` ### What Triggers Rebuild - **Aphoria CLI**: `applications/aphoria/src/cli/` - **API handlers**: `crates/stemedb-api/src/` - **Simulator**: `crates/stemedb-sim/src/` - **Core libraries**: `crates/stemedb-*` - **Dependencies**: `Cargo.toml` changes ### Installation Hooks are in `.git/hooks/` (not tracked by git). To install on new clone: ```bash cd /home/jml/Workspace/stemedb ls -la .git/hooks/post-* # If missing, check GIT-HOOKS-IMPLEMENTATION.md for setup ``` ### Bypass Hook (Emergency) ```bash # Temporarily disable all hooks git pull --no-verify # Or set env var GIT_HOOKS_DISABLE=1 git pull ``` --- ## Deployment Configurations ### Local Development **Aphoria:** ```bash # Default: uses ~/.aphoria/corpus-db/ aphoria corpus create ... aphoria corpus build ``` **StemeDB API:** ```bash # Point to Aphoria's corpus export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db" cargo run --release -p stemedb-api ``` ### Docker Compose ```yaml version: '3.8' volumes: corpus-db: services: stemedb-api: image: stemedb-api:latest environment: - STEMEDB_CORPUS_DB_DIR=/var/lib/stemedb/corpus volumes: - corpus-db:/var/lib/stemedb/corpus ports: - "18180:18180" aphoria-builder: image: aphoria:latest volumes: - corpus-db:/var/lib/stemedb/corpus - ./aphoria-config.toml:/etc/aphoria/config.toml command: corpus build ``` ### Kubernetes ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: corpus-db spec: accessModes: [ReadWriteMany] resources: requests: storage: 10Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: stemedb-api spec: template: spec: containers: - name: api image: stemedb-api:latest env: - name: STEMEDB_CORPUS_DB_DIR value: /var/lib/stemedb/corpus volumeMounts: - name: corpus-db mountPath: /var/lib/stemedb/corpus volumes: - name: corpus-db persistentVolumeClaim: claimName: corpus-db ``` ### Production (Bare Metal) ```bash # 1. Create shared corpus directory sudo mkdir -p /var/lib/stemedb/corpus sudo chown aphoria:stemedb /var/lib/stemedb/corpus sudo chmod 775 /var/lib/stemedb/corpus # 2. Configure Aphoria cat > /etc/aphoria/config.toml < /etc/systemd/system/stemedb-api.service <