fix(api): enable querying of CLI-created community corpus items
## Problem CLI-created community corpus items (tier 3) were stored correctly but invisible via API queries. Two issues blocked discoverability: 1. **Prefix mismatch**: API hardcoded 'community://pattern/' for aggregated patterns, but CLI creates 'community://rust/http/...' URIs 2. **Query parameter parsing**: Axum's default parser doesn't support bracket notation (?sources[]=value) used by the dashboard Result: 0/22 CLI-created items were queryable. ## Solution ### Fix 1: Broaden Community Prefix - Changed: 'community://pattern/' → 'community://' in corpus handler - Impact: Now matches both aggregated patterns AND CLI-created items - Backward compatible: Broader prefix includes narrower results ### Fix 2: Add QsQuery Extractor - Added: serde_qs dependency + custom QsQuery extractor - Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b) - Compatible: Works with JavaScript URLSearchParams standard - Tested: 3 new unit tests for extractor behavior ## Verification - ✅ All 22 CLI-created community items now queryable (was 0) - ✅ Source filtering works: community (22), RFC (2), vendor (5) - ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24 - ✅ All 89 API tests pass + 3 new extractor tests - ✅ Clippy clean (0 warnings) - ✅ No regressions in existing functionality ## Files Changed - crates/stemedb-api/Cargo.toml: Add serde_qs dependency - crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines) - crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix - crates/stemedb-api/src/lib.rs: Export extractors module Also includes: Scale-adaptive thresholds, wiki corpus extraction, documentation updates, and dashboard UI improvements from prior work. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
65065f3d8f
commit
bb0c33f8d3
@ -62,6 +62,23 @@ stemedb/
|
||||
guides/ # You are here
|
||||
```
|
||||
|
||||
## Git Hooks
|
||||
|
||||
The repository includes automatic git hooks to rebuild binaries when source code changes:
|
||||
|
||||
- **post-merge**: Runs after `git pull` or `git merge`
|
||||
- **post-checkout**: Runs after `git checkout` (branch switches only)
|
||||
|
||||
These hooks detect changes to:
|
||||
- Aphoria CLI and core logic
|
||||
- StemeDB API server
|
||||
- StemeDB simulator
|
||||
- Core libraries (affects all binaries)
|
||||
|
||||
When changes are detected, the hooks automatically run `cargo build --release --workspace` to rebuild all binaries. This prevents "command not found" errors from stale binaries.
|
||||
|
||||
The hooks are installed in `.git/hooks/` and are already executable. If you need to disable them temporarily, you can use `--no-verify` with git commands or rename the hook files.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Build fails with missing dependencies
|
||||
@ -79,6 +96,15 @@ Run with `--fix` for auto-corrections:
|
||||
cargo clippy --workspace --fix --allow-dirty
|
||||
```
|
||||
|
||||
### "Command not found" after git pull
|
||||
|
||||
If you see this error despite the git hooks, manually rebuild:
|
||||
```bash
|
||||
cargo build --release --workspace
|
||||
```
|
||||
|
||||
The binaries are in `target/release/` and should be in your PATH or called via `cargo run --release -p <package>`.
|
||||
|
||||
## Related
|
||||
|
||||
- [Testing Guide](./testing.md)
|
||||
|
||||
602
.claude/skills/extract-wiki-corpus/SKILL.md
Normal file
602
.claude/skills/extract-wiki-corpus/SKILL.md
Normal file
@ -0,0 +1,602 @@
|
||||
---
|
||||
name: extract-wiki-corpus
|
||||
description: Extract structured claims from wiki documentation using LLM reasoning. Use when importing technical wikis, research docs, or compatibility guides into Aphoria corpus.
|
||||
---
|
||||
|
||||
# Wiki Corpus Extraction Skill
|
||||
|
||||
## Identity
|
||||
|
||||
You are an intelligent claim extraction engine that reads technical documentation and extracts factual, verifiable claims for the Aphoria knowledge corpus.
|
||||
|
||||
Your job is to:
|
||||
1. Read wiki markdown files
|
||||
2. Extract factual claims using LLM reasoning
|
||||
3. Generate CLI commands to persist claims in the corpus database
|
||||
4. Report comprehensive results with success/failure breakdown
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Factual over Normative**: Extract what IS (not what SHOULD BE)
|
||||
2. **Context-Aware Authority**: Infer sources from GitHub URLs, paper citations, official docs
|
||||
3. **Hierarchical Subjects**: Build semantic paths (ml/dependencies/basicsr/version)
|
||||
4. **Intelligent Chunking**: Break at headings when possible, ~4K token chunks
|
||||
5. **Batch Processing**: Extract all claims, then execute CLI commands
|
||||
6. **Bundle Errors**: Collect all errors and report them together
|
||||
|
||||
## Workflow Overview
|
||||
|
||||
```
|
||||
Phase 1: Discover & Read
|
||||
↓
|
||||
Phase 1.2: Verify Commands
|
||||
↓
|
||||
Phase 2: Intelligent Chunking
|
||||
↓
|
||||
Phase 3: Claim Extraction (Per Chunk)
|
||||
↓
|
||||
Phase 4: Validation
|
||||
↓
|
||||
Phase 5: CLI Execution
|
||||
↓
|
||||
Phase 6: Summary Report
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Discover & Read
|
||||
|
||||
### Step 1.1: Check Input
|
||||
|
||||
- If file passed via CLI args: use that file
|
||||
- If directory passed: walk to find all `.md` files
|
||||
- Use Read tool to get full content of each file
|
||||
|
||||
### Step 1.2: Verify Aphoria Binary and Commands
|
||||
|
||||
Before proceeding, verify that the required commands exist:
|
||||
|
||||
```bash
|
||||
# Check Aphoria version
|
||||
aphoria --version
|
||||
|
||||
# Verify corpus create command exists
|
||||
if ! aphoria corpus --help 2>&1 | grep -q "create"; then
|
||||
echo "❌ ERROR: 'aphoria corpus create' command not available"
|
||||
echo ""
|
||||
echo "This suggests the aphoria binary is out of date."
|
||||
echo ""
|
||||
echo "Fix options:"
|
||||
echo " 1. Rebuild: cargo build --release -p aphoria"
|
||||
echo " 2. Check git status: git status"
|
||||
echo " 3. Pull latest: git pull && cargo build --release -p aphoria"
|
||||
echo ""
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Aphoria binary up to date (corpus create available)"
|
||||
```
|
||||
|
||||
**Decision Gate:** Command exists? → Proceed to token estimation
|
||||
|
||||
### Step 1.3: Estimate Token Count
|
||||
|
||||
Rough estimate: **1 token ≈ 4 characters**
|
||||
|
||||
```
|
||||
token_count = len(content) / 4
|
||||
```
|
||||
|
||||
If `token_count > 4000`, proceed to Phase 2 (chunking).
|
||||
If `token_count <= 4000`, treat as single chunk.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Intelligent Chunking
|
||||
|
||||
### Goal
|
||||
Split content into ~4K token chunks, preferring heading boundaries.
|
||||
|
||||
### Algorithm
|
||||
|
||||
1. **Try splitting on `## ` (level 2 headings)**
|
||||
- Sections should be roughly 4K tokens each
|
||||
- If a section is still > 4K, split on `### ` (level 3 headings)
|
||||
|
||||
2. **Include context in each chunk**
|
||||
- Document title (from `# ` heading)
|
||||
- Section path (breadcrumb of headings)
|
||||
- Example: "Document: ML Dependencies Guide / Section: Critical Compatibility Solutions / Subsection: BasicSR Fix"
|
||||
|
||||
3. **Maintain overlap**
|
||||
- Include previous heading for context
|
||||
- This helps LLM understand relationships
|
||||
|
||||
### Chunk Metadata Format
|
||||
|
||||
```json
|
||||
{
|
||||
"chunk_id": 1,
|
||||
"total_chunks": 3,
|
||||
"document_title": "ML Dependencies Guide",
|
||||
"section_path": "Critical Compatibility Solutions / BasicSR Fix",
|
||||
"content": "..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Claim Extraction (Per Chunk)
|
||||
|
||||
### Prompt the LLM
|
||||
|
||||
For each chunk, use a structured extraction prompt:
|
||||
|
||||
````
|
||||
You are extracting factual claims from technical documentation for a knowledge corpus.
|
||||
|
||||
**Context:**
|
||||
- Document: {document_title}
|
||||
- Section: {section_path}
|
||||
- Chunk: {chunk_id}/{total_chunks}
|
||||
|
||||
**Content:**
|
||||
{chunk_content}
|
||||
|
||||
**Task:**
|
||||
Extract all factual claims as JSON array. Each claim must be:
|
||||
1. Factual (not opinion or speculation)
|
||||
2. Verifiable from the text
|
||||
3. Useful for developers
|
||||
|
||||
**Authority Inference Rules:**
|
||||
- GitHub URLs/commits → "Repository/Project@hash"
|
||||
- Research papers → "Author et al. (Year)"
|
||||
- Official documentation → "Project Documentation"
|
||||
- Empirical observation → "Community consensus"
|
||||
|
||||
**Tier Assignment:**
|
||||
- 0: RFC, W3C spec, ISO standard (regulatory)
|
||||
- 1: OWASP, CWE, security advisory (clinical)
|
||||
- 2: Project docs, compatibility notes (observational)
|
||||
- 3: Blog posts, forum consensus (community)
|
||||
|
||||
**Output Format:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"subject": "hierarchical/path/to/concept",
|
||||
"predicate": "relationship_type",
|
||||
"value": "constraint_or_value",
|
||||
"explanation": "full sentence with context",
|
||||
"authority": "inferred_source",
|
||||
"category": "compatibility|performance|security|architecture|quality",
|
||||
"confidence": 0.95,
|
||||
"tier": 2
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Return ONLY the JSON array, no additional text.
|
||||
````
|
||||
|
||||
### Expected Output Structure
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"subject": "ml/dependencies/basicsr/torchvision",
|
||||
"predicate": "incompatible_with",
|
||||
"value": ">=0.15",
|
||||
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||
"authority": "XPixelGroup/BasicSR GitHub",
|
||||
"category": "compatibility",
|
||||
"confidence": 0.95,
|
||||
"tier": 2
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Validation
|
||||
|
||||
### Step 4.1: Filter by Confidence
|
||||
|
||||
Only keep claims where `confidence >= 0.7`
|
||||
|
||||
### Step 4.2: Check Required Fields
|
||||
|
||||
Each claim must have:
|
||||
- `subject` (non-empty string)
|
||||
- `predicate` (non-empty string)
|
||||
- `value` (any type)
|
||||
- `explanation` (non-empty string)
|
||||
- `authority` (non-empty string)
|
||||
- `category` (one of: compatibility, performance, security, architecture, quality)
|
||||
- `tier` (0-3)
|
||||
|
||||
### Step 4.3: Validate Tier
|
||||
|
||||
Tier must be 0, 1, 2, or 3. If invalid, record error and skip claim.
|
||||
|
||||
### Step 4.4: Check for Duplicates
|
||||
|
||||
**Important**: The corpus database is **append-only**. Multiple sources can create the same `subject+predicate` pair. This is **allowed and expected**. Do NOT filter duplicates — just warn about them in the report.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: CLI Execution
|
||||
|
||||
### Step 5.1: Construct CLI Commands
|
||||
|
||||
For each validated claim, construct:
|
||||
|
||||
```bash
|
||||
aphoria corpus create \
|
||||
--subject "{subject}" \
|
||||
--predicate "{predicate}" \
|
||||
--value "{value}" \
|
||||
--explanation "{explanation}" \
|
||||
--authority "{authority}" \
|
||||
--category "{category}" \
|
||||
--tier {tier}
|
||||
```
|
||||
|
||||
**Important**: Use proper shell escaping for strings with quotes or special characters.
|
||||
|
||||
### Step 5.2: Execute Commands
|
||||
|
||||
Use the Bash tool to execute each command.
|
||||
|
||||
### Step 5.3: Collect Results
|
||||
|
||||
For each execution:
|
||||
- **Success**: Record the corpus ID (e.g., "corpus://ml/foo/bar/predicate")
|
||||
- **Failure**: Record the full error message
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Summary Report
|
||||
|
||||
### Report Structure
|
||||
|
||||
```markdown
|
||||
# Wiki Corpus Extraction Report
|
||||
|
||||
**File:** /path/to/wiki/article.md
|
||||
**Chunks Processed:** 3
|
||||
**Claims Extracted:** 23
|
||||
**Claims Stored:** 20
|
||||
**Errors:** 3
|
||||
|
||||
## Stored Claims
|
||||
|
||||
| Subject | Predicate | Value | Authority | Tier |
|
||||
|---------|-----------|-------|-----------|------|
|
||||
| ml/basicsr/torchvision | incompatible_with | >=0.15 | XPixelGroup/BasicSR | 2 |
|
||||
| ... | ... | ... | ... | ... |
|
||||
|
||||
## Errors
|
||||
|
||||
### Validation Errors (2)
|
||||
|
||||
1. **ml/foo/bar** - Invalid tier '5' (must be 0-3)
|
||||
2. **api/rest/foo** - Missing explanation field
|
||||
|
||||
### Storage Errors (1)
|
||||
|
||||
1. **net/http/timeout** - Database write failed: connection refused
|
||||
|
||||
## Next Steps
|
||||
|
||||
View corpus items: http://localhost:3000/corpus
|
||||
Query API: curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=community&limit=100'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Predicate Naming Conventions
|
||||
|
||||
Use consistent predicate names to enable effective querying:
|
||||
|
||||
| Relationship | Predicate |
|
||||
|--------------|-----------|
|
||||
| Version constraint | `requires`, `incompatible_with`, `compatible_with` |
|
||||
| Recommendation | `recommends`, `discourages` |
|
||||
| Performance | `faster_than`, `slower_than`, `optimal_for` |
|
||||
| Security | `vulnerable_to`, `mitigates`, `exposes` |
|
||||
| Configuration | `default_value`, `max_value`, `required_for` |
|
||||
|
||||
---
|
||||
|
||||
## Subject Path Guidelines
|
||||
|
||||
Build hierarchical paths that reflect the domain structure:
|
||||
|
||||
### Examples
|
||||
|
||||
- `ml/dependencies/{package}/{aspect}`
|
||||
- Example: `ml/dependencies/basicsr/torchvision`
|
||||
- `api/{protocol}/{feature}`
|
||||
- Example: `api/rest/authentication`
|
||||
- `security/{category}/{vuln_type}`
|
||||
- Example: `security/input-validation/xss`
|
||||
- `performance/{component}/{metric}`
|
||||
- Example: `performance/database/connection-pool`
|
||||
|
||||
### Principles
|
||||
|
||||
- Start general, get specific
|
||||
- Use lowercase with forward slashes
|
||||
- Use hyphens for multi-word segments
|
||||
- Keep paths under 6-7 levels
|
||||
|
||||
---
|
||||
|
||||
## Category Guidelines
|
||||
|
||||
Choose the most appropriate category:
|
||||
|
||||
| Category | Use When |
|
||||
|----------|----------|
|
||||
| `compatibility` | Version constraints, breaking changes, API compatibility |
|
||||
| `performance` | Optimization, resource usage, latency, throughput |
|
||||
| `security` | Vulnerabilities, mitigations, attack vectors |
|
||||
| `architecture` | Design patterns, module structure, dependencies |
|
||||
| `quality` | Code quality, maintainability, best practices |
|
||||
|
||||
---
|
||||
|
||||
## Authority Tier Guidelines
|
||||
|
||||
| Tier | Name | Examples | When to Use |
|
||||
|------|------|----------|-------------|
|
||||
| 0 | Regulatory | RFC 7231, W3C spec, ISO 27001 | Official standards bodies |
|
||||
| 1 | Clinical | OWASP Top 10, CWE-79, NVD | Security advisories, vulnerability databases |
|
||||
| 2 | Observational | PyTorch docs, GitHub project READMEs | Official project documentation |
|
||||
| 3 | Community | Blog posts, Stack Overflow, forum threads | Community wisdom, empirical observations |
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Validation Errors
|
||||
|
||||
Collect all validation errors and report them together. Do NOT stop on the first error.
|
||||
|
||||
Example validation errors:
|
||||
- Invalid tier (not 0-3)
|
||||
- Missing required field
|
||||
- Confidence below threshold (< 0.7)
|
||||
|
||||
### Storage Errors
|
||||
|
||||
If a CLI command fails:
|
||||
- Capture the full error message
|
||||
- Continue with remaining commands
|
||||
- Report all failures at the end
|
||||
|
||||
### LLM Extraction Errors
|
||||
|
||||
If the LLM returns invalid JSON:
|
||||
- Log the chunk that failed
|
||||
- Continue with remaining chunks
|
||||
- Report the parsing error in summary
|
||||
|
||||
---
|
||||
|
||||
## Do's and Don'ts
|
||||
|
||||
### Do
|
||||
|
||||
- ✅ Extract factual claims (not opinions)
|
||||
- ✅ Verify command availability before execution
|
||||
- ✅ Infer authority from context
|
||||
- ✅ Generate semantic subject paths
|
||||
- ✅ Include full explanation context
|
||||
- ✅ Bundle errors for batch reporting
|
||||
- ✅ Use Read tool to get file content
|
||||
- ✅ Use Bash tool to execute CLI commands
|
||||
- ✅ Filter by confidence >= 0.7
|
||||
- ✅ Allow duplicate subject+predicate (append-only DB)
|
||||
|
||||
### Do Not
|
||||
|
||||
- ❌ Extract opinions or speculative claims
|
||||
- ❌ Assume binary is up to date
|
||||
- ❌ Lose source attribution
|
||||
- ❌ Hardcode authority (infer from content)
|
||||
- ❌ Stop on first error (collect all errors)
|
||||
- ❌ Modify files (read-only skill)
|
||||
- ❌ Use placeholder values
|
||||
- ❌ Skip validation
|
||||
- ❌ Filter duplicates (append-only allows them)
|
||||
|
||||
---
|
||||
|
||||
## Example Extraction
|
||||
|
||||
### Input Text
|
||||
|
||||
```markdown
|
||||
## BasicSR and Torchvision Compatibility
|
||||
|
||||
The BasicSR library (v1.4.2) has a critical compatibility issue with torchvision >= 0.15.
|
||||
The library imports from `torchvision.transforms.functional_tensor`, which was removed
|
||||
in torchvision 0.15+.
|
||||
|
||||
Source: https://github.com/XPixelGroup/BasicSR/issues/123
|
||||
|
||||
Recommended workaround: Pin torchvision to 0.14.1 or earlier.
|
||||
```
|
||||
|
||||
### Extracted Claims
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"subject": "ml/dependencies/basicsr/torchvision",
|
||||
"predicate": "incompatible_with",
|
||||
"value": ">=0.15",
|
||||
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||
"authority": "XPixelGroup/BasicSR#123",
|
||||
"category": "compatibility",
|
||||
"confidence": 0.95,
|
||||
"tier": 2
|
||||
},
|
||||
{
|
||||
"subject": "ml/dependencies/basicsr/torchvision",
|
||||
"predicate": "recommends",
|
||||
"value": "<=0.14.1",
|
||||
"explanation": "Workaround for basicsr compatibility issue: pin torchvision to 0.14.1 or earlier",
|
||||
"authority": "XPixelGroup/BasicSR#123",
|
||||
"category": "compatibility",
|
||||
"confidence": 0.9,
|
||||
"tier": 3
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### CLI Commands
|
||||
|
||||
```bash
|
||||
aphoria corpus create \
|
||||
--subject "ml/dependencies/basicsr/torchvision" \
|
||||
--predicate "incompatible_with" \
|
||||
--value ">=0.15" \
|
||||
--explanation "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+" \
|
||||
--authority "XPixelGroup/BasicSR#123" \
|
||||
--category "compatibility" \
|
||||
--tier 2
|
||||
|
||||
aphoria corpus create \
|
||||
--subject "ml/dependencies/basicsr/torchvision" \
|
||||
--predicate "recommends" \
|
||||
--value "<=0.14.1" \
|
||||
--explanation "Workaround for basicsr compatibility issue: pin torchvision to 0.14.1 or earlier" \
|
||||
--authority "XPixelGroup/BasicSR#123" \
|
||||
--category "compatibility" \
|
||||
--tier 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **extract-claims**: Entity-level extraction from prose (for StemeDB ingestion)
|
||||
- **aphoria-suggest**: Suggest claims from existing patterns
|
||||
- **aphoria-claims**: Author claims from diffs
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Counting
|
||||
|
||||
Use rough heuristic: `token_count = len(content) / 4`
|
||||
|
||||
This is approximate but good enough for chunking decisions.
|
||||
|
||||
### Shell Escaping
|
||||
|
||||
When constructing CLI commands, properly escape strings:
|
||||
|
||||
```python
|
||||
import shlex
|
||||
|
||||
escaped_explanation = shlex.quote(explanation)
|
||||
```
|
||||
|
||||
Or in bash:
|
||||
```bash
|
||||
explanation="${explanation//\"/\\\"}" # Escape quotes
|
||||
```
|
||||
|
||||
### Confidence Threshold
|
||||
|
||||
Only extract claims with `confidence >= 0.7`. This filters out:
|
||||
- Speculative statements
|
||||
- Uncertain inferences
|
||||
- Low-quality extractions
|
||||
|
||||
### Append-Only Semantics
|
||||
|
||||
The corpus database is append-only. Multiple sources can contribute claims for the same `subject+predicate`. This enables:
|
||||
- Cross-validation from multiple sources
|
||||
- Community consensus building
|
||||
- Evolving knowledge over time
|
||||
|
||||
Do NOT filter duplicates. Just warn about them in the report.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
A successful extraction should:
|
||||
|
||||
1. ✅ Read all markdown files in the input directory
|
||||
2. ✅ Extract factual claims with proper structure
|
||||
3. ✅ Infer authority from context (GitHub URLs, docs, etc.)
|
||||
4. ✅ Assign appropriate tiers (0-3)
|
||||
5. ✅ Execute CLI commands successfully
|
||||
6. ✅ Report comprehensive summary with errors bundled
|
||||
7. ✅ Handle validation errors gracefully
|
||||
8. ✅ Handle storage errors gracefully
|
||||
9. ✅ Generate semantic subject paths
|
||||
10. ✅ Use consistent predicate naming
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Command not found" or "unrecognized subcommand 'create'" Errors
|
||||
|
||||
If you see `error: unrecognized subcommand 'create'` or similar errors:
|
||||
|
||||
**Diagnosis:**
|
||||
1. **Check binary date**: `ls -lh target/release/aphoria`
|
||||
2. **Check CLI code date**: `ls -lh applications/aphoria/src/cli/mod.rs`
|
||||
3. **If CLI is newer**: The binary is out of date
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# Option 1: Rebuild the binary
|
||||
cargo build --release -p aphoria
|
||||
|
||||
# Option 2: Pull latest changes and rebuild
|
||||
git pull && cargo build --release -p aphoria
|
||||
|
||||
# Option 3: Check if there are uncommitted changes
|
||||
git status
|
||||
```
|
||||
|
||||
**Prevention:**
|
||||
See Fix #1 for setting up git hooks that automatically rebuild binaries on pull.
|
||||
|
||||
### "Database already open" error
|
||||
|
||||
The corpus database at `~/.aphoria/corpus-db` is locked by another process (probably the API server).
|
||||
|
||||
**Solution**: Stop the API server temporarily:
|
||||
```bash
|
||||
pkill -f stemedb-api
|
||||
```
|
||||
|
||||
### "Invalid tier" error
|
||||
|
||||
Tier must be 0, 1, 2, or 3.
|
||||
|
||||
**Solution**: Review tier assignment rules and fix the extracted tier value.
|
||||
|
||||
### "Missing required field" error
|
||||
|
||||
All claims must have: subject, predicate, value, explanation, authority, category, tier.
|
||||
|
||||
**Solution**: Review the LLM extraction prompt and ensure all fields are present.
|
||||
|
||||
### LLM returns invalid JSON
|
||||
|
||||
The LLM might return markdown formatting or extra text.
|
||||
|
||||
**Solution**: Update the extraction prompt to be more explicit about returning ONLY the JSON array.
|
||||
1573
.claude/skills/verify-wiki-corpus/SKILL.md
Normal file
1573
.claude/skills/verify-wiki-corpus/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
109
CORPUS-QUICK-START.md
Normal file
109
CORPUS-QUICK-START.md
Normal file
@ -0,0 +1,109 @@
|
||||
# Corpus Quick Start Guide
|
||||
|
||||
## TL;DR - API is Already Running!
|
||||
|
||||
The corpus API is currently serving data at:
|
||||
- **URL:** `http://localhost:18180/v1/aphoria/corpus`
|
||||
- **Database:** `~/.aphoria/corpus-db`
|
||||
- **Data:** 2 RFC items (TLS cert verification, JWT audience validation)
|
||||
|
||||
## Test It Right Now
|
||||
|
||||
```bash
|
||||
# Get all RFC corpus items
|
||||
curl -s 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc' | jq '.items[].subject'
|
||||
|
||||
# Expected output:
|
||||
# "rfc://5246/tls/certificate_verification"
|
||||
# "rfc://7519/audience_validation"
|
||||
```
|
||||
|
||||
## Import Production Wiki
|
||||
|
||||
```bash
|
||||
cd ~/Workspace/stemedb
|
||||
target/release/aphoria corpus import wiki ~/Workspace/orchard9/wiki/content
|
||||
```
|
||||
|
||||
## Start Dashboard
|
||||
|
||||
```bash
|
||||
cd applications/aphoria-dashboard
|
||||
npm run dev
|
||||
# Open: http://localhost:3000/corpus
|
||||
```
|
||||
|
||||
## Restart API Later (if needed)
|
||||
|
||||
```bash
|
||||
cd ~/Workspace/stemedb
|
||||
STEMEDB_DB_DIR=$HOME/.aphoria/corpus-db \
|
||||
STEMEDB_WAL_DIR=$HOME/.aphoria/corpus-db/wal \
|
||||
target/release/stemedb-api
|
||||
```
|
||||
|
||||
## Query Examples
|
||||
|
||||
```bash
|
||||
# Get all sources (RFC, OWASP, vendor, community)
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus'
|
||||
|
||||
# Filter by multiple sources
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc&sources[]=owasp'
|
||||
|
||||
# Filter by category
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus?category=security'
|
||||
|
||||
# Pagination
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus?limit=10&offset=0'
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"subject": "rfc://5246/tls/certificate_verification",
|
||||
"predicate": "enabled",
|
||||
"value": "true",
|
||||
"source": "rfc://",
|
||||
"tier": 0,
|
||||
"category": "security",
|
||||
"explanation": "TLS certificate verification MUST be enabled...",
|
||||
"authority_source": "RFC 5246 Section 7.4.2"
|
||||
}
|
||||
],
|
||||
"total_matching": 2,
|
||||
"sources_included": ["rfc://"]
|
||||
}
|
||||
```
|
||||
|
||||
## Files to Know
|
||||
|
||||
- **Corpus DB:** `~/.aphoria/corpus-db/` (shared across projects)
|
||||
- **Project DB:** `.aphoria/db/` (per-project)
|
||||
- **Import CLI:** `aphoria corpus import wiki <path>`
|
||||
- **API Config:** Set `STEMEDB_DB_DIR` to choose database
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Dashboard shows empty results?**
|
||||
- Check API is running on port 18180
|
||||
- Verify API is using corpus database: `ps aux | grep stemedb-api`
|
||||
- Check API logs for database path
|
||||
|
||||
**API won't start?**
|
||||
- Make sure corpus DB exists: `ls ~/.aphoria/corpus-db/`
|
||||
- Check port not in use: `lsof -i :18180`
|
||||
- View logs: `tail -f /tmp/api-corpus.log`
|
||||
|
||||
**Need to reimport wiki?**
|
||||
```bash
|
||||
rm -rf ~/.aphoria/corpus-db
|
||||
target/release/aphoria corpus import wiki <path>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
✅ **Current Status:** API running, corpus database populated, ready for dashboard!
|
||||
@ -1,7 +1,6 @@
|
||||
// Corpus page constants
|
||||
|
||||
export const CORPUS_FETCH_LIMIT = 100;
|
||||
export const DEFAULT_MIN_PROJECTS = 1;
|
||||
|
||||
// Re-export shared formatters for convenience
|
||||
export { formatRelativeTime, formatUnixTimestamp } from "@/lib/format";
|
||||
|
||||
@ -1,20 +1,15 @@
|
||||
"use client";
|
||||
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Button } from "@/components/ui/button";
|
||||
import { Checkbox } from "@/components/ui/checkbox";
|
||||
import { X, Search } from "lucide-react";
|
||||
|
||||
interface CorpusFiltersProps {
|
||||
subjectPrefix: string;
|
||||
minProjects: number;
|
||||
sources: string[];
|
||||
filterCategory: string;
|
||||
hideNoise: boolean;
|
||||
availableCategories: string[];
|
||||
onSubjectPrefixChange: (value: string) => void;
|
||||
onMinProjectsChange: (value: number) => void;
|
||||
onSourcesChange: (value: string[]) => void;
|
||||
onFilterCategoryChange: (value: string) => void;
|
||||
onHideNoiseChange: (value: boolean) => void;
|
||||
onSubmit: () => void;
|
||||
onClear: () => void;
|
||||
totalCount: number;
|
||||
@ -23,16 +18,19 @@ interface CorpusFiltersProps {
|
||||
hasActiveFilter: boolean;
|
||||
}
|
||||
|
||||
const AVAILABLE_SOURCES = [
|
||||
{ id: "rfc", label: "RFC" },
|
||||
{ id: "owasp", label: "OWASP" },
|
||||
{ id: "community", label: "Community" },
|
||||
{ id: "vendor", label: "Vendor" },
|
||||
];
|
||||
|
||||
export function CorpusFilters({
|
||||
subjectPrefix,
|
||||
minProjects,
|
||||
sources,
|
||||
filterCategory,
|
||||
hideNoise,
|
||||
availableCategories,
|
||||
onSubjectPrefixChange,
|
||||
onMinProjectsChange,
|
||||
onSourcesChange,
|
||||
onFilterCategoryChange,
|
||||
onHideNoiseChange,
|
||||
onSubmit,
|
||||
onClear,
|
||||
totalCount,
|
||||
@ -45,39 +43,38 @@ export function CorpusFilters({
|
||||
onSubmit();
|
||||
};
|
||||
|
||||
const handleSourceToggle = (sourceId: string) => {
|
||||
if (sources.includes(sourceId)) {
|
||||
onSourcesChange(sources.filter((s) => s !== sourceId));
|
||||
} else {
|
||||
onSourcesChange([...sources, sourceId]);
|
||||
}
|
||||
};
|
||||
|
||||
return (
|
||||
<form onSubmit={handleSubmit}>
|
||||
<div className="flex flex-wrap items-end gap-4">
|
||||
{/* Subject Prefix Filter */}
|
||||
<div className="flex-1 min-w-[200px]">
|
||||
<label htmlFor="subject-prefix" className="text-sm font-medium mb-2 block">
|
||||
Subject Prefix
|
||||
</label>
|
||||
<Input
|
||||
id="subject-prefix"
|
||||
placeholder="e.g., code://rust"
|
||||
value={subjectPrefix}
|
||||
onChange={(e) => onSubjectPrefixChange(e.target.value)}
|
||||
className="max-w-md"
|
||||
disabled={isLoading}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Min Projects Filter */}
|
||||
{/* Sources Filter */}
|
||||
<div className="flex flex-col gap-2">
|
||||
<label htmlFor="min-projects" className="text-sm font-medium">
|
||||
Min Projects
|
||||
</label>
|
||||
<Input
|
||||
id="min-projects"
|
||||
type="number"
|
||||
min={1}
|
||||
max={100}
|
||||
value={minProjects}
|
||||
onChange={(e) => onMinProjectsChange(Math.max(1, parseInt(e.target.value) || 1))}
|
||||
className="w-24"
|
||||
disabled={isLoading}
|
||||
/>
|
||||
<label className="text-sm font-medium">Sources</label>
|
||||
<div className="flex items-center gap-4">
|
||||
{AVAILABLE_SOURCES.map((source) => (
|
||||
<div key={source.id} className="flex items-center gap-2">
|
||||
<Checkbox
|
||||
id={`source-${source.id}`}
|
||||
checked={sources.includes(source.id)}
|
||||
onCheckedChange={() => handleSourceToggle(source.id)}
|
||||
disabled={isLoading}
|
||||
/>
|
||||
<label
|
||||
htmlFor={`source-${source.id}`}
|
||||
className="text-sm font-medium cursor-pointer"
|
||||
>
|
||||
{source.label}
|
||||
</label>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Category Filter */}
|
||||
@ -101,23 +98,10 @@ export function CorpusFilters({
|
||||
</select>
|
||||
</div>
|
||||
|
||||
{/* Hide Noise Toggle */}
|
||||
<div className="flex items-center gap-2 h-10">
|
||||
<Checkbox
|
||||
id="hide-noise"
|
||||
checked={hideNoise}
|
||||
onCheckedChange={onHideNoiseChange}
|
||||
disabled={isLoading}
|
||||
/>
|
||||
<label htmlFor="hide-noise" className="text-sm font-medium cursor-pointer">
|
||||
Hide noise
|
||||
</label>
|
||||
</div>
|
||||
|
||||
{/* Submit Button */}
|
||||
<Button type="submit" disabled={isLoading}>
|
||||
<Search className="h-4 w-4 mr-2" />
|
||||
{isLoading ? "Searching..." : "Search"}
|
||||
{isLoading ? "Loading..." : "Apply"}
|
||||
</Button>
|
||||
|
||||
{/* Clear Button */}
|
||||
@ -136,8 +120,8 @@ export function CorpusFilters({
|
||||
{/* Results Count */}
|
||||
<div className="text-sm text-muted-foreground ml-auto">
|
||||
{filteredCount === totalCount
|
||||
? `${totalCount} patterns`
|
||||
: `${filteredCount} of ${totalCount} patterns`}
|
||||
? `${totalCount} items`
|
||||
: `${filteredCount} of ${totalCount} items`}
|
||||
</div>
|
||||
</div>
|
||||
</form>
|
||||
|
||||
@ -1,19 +1,19 @@
|
||||
"use client";
|
||||
|
||||
import type { PatternDto } from "@/lib/api";
|
||||
import type { CorpusItemDto } from "@/lib/api";
|
||||
import { CorpusRow } from "./corpus-row";
|
||||
|
||||
interface CorpusListProps {
|
||||
patterns: PatternDto[];
|
||||
items: CorpusItemDto[];
|
||||
}
|
||||
|
||||
export function CorpusList({ patterns }: CorpusListProps) {
|
||||
export function CorpusList({ items }: CorpusListProps) {
|
||||
return (
|
||||
<div className="grid gap-4 md:grid-cols-2 lg:grid-cols-3">
|
||||
{patterns.map((pattern) => (
|
||||
{items.map((item) => (
|
||||
<CorpusRow
|
||||
key={`${pattern.subject}:${pattern.predicate}:${pattern.value}`}
|
||||
pattern={pattern}
|
||||
key={`${item.subject}:${item.predicate}:${item.value}`}
|
||||
item={item}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
|
||||
@ -3,12 +3,12 @@
|
||||
import { useState, useCallback, useEffect, useMemo } from "react";
|
||||
import {
|
||||
StemeDBClient,
|
||||
type GetPatternsResponse,
|
||||
type PatternDto,
|
||||
type GetCorpusResponse,
|
||||
type CorpusItemDto,
|
||||
ApiError,
|
||||
} from "@/lib/api";
|
||||
import type { PanelState } from "@/lib/types";
|
||||
import { CORPUS_FETCH_LIMIT, DEFAULT_MIN_PROJECTS } from "./constants";
|
||||
import { CORPUS_FETCH_LIMIT } from "./constants";
|
||||
import { ErrorState } from "@/components/shared/error-state";
|
||||
import { CorpusFilters } from "./corpus-filters";
|
||||
import { CorpusList } from "./corpus-list";
|
||||
@ -16,38 +16,34 @@ import { CorpusLoadingSkeleton } from "./corpus-loading-skeleton";
|
||||
import { CorpusEmptyState } from "./corpus-empty-state";
|
||||
|
||||
export function CorpusPanel() {
|
||||
const [state, setState] = useState<PanelState<GetPatternsResponse>>({
|
||||
const [state, setState] = useState<PanelState<GetCorpusResponse>>({
|
||||
status: "idle",
|
||||
});
|
||||
|
||||
// Input state (controlled form inputs) - doesn't trigger fetch
|
||||
const [inputPrefix, setInputPrefix] = useState("");
|
||||
const [inputMinProjects, setInputMinProjects] = useState(DEFAULT_MIN_PROJECTS);
|
||||
const [inputSources, setInputSources] = useState<string[]>(["rfc", "owasp", "community"]);
|
||||
|
||||
// Search state (actual search params) - triggers fetch
|
||||
const [searchPrefix, setSearchPrefix] = useState("");
|
||||
const [searchMinProjects, setSearchMinProjects] = useState(DEFAULT_MIN_PROJECTS);
|
||||
const [searchSources, setSearchSources] = useState<string[]>(["rfc", "owasp", "community"]);
|
||||
|
||||
// Client-side filter state
|
||||
const [filterCategory, setFilterCategory] = useState<string>("all");
|
||||
const [hideNoise, setHideNoise] = useState<boolean>(false);
|
||||
|
||||
const fetchData = useCallback(async () => {
|
||||
setState({ status: "loading" });
|
||||
try {
|
||||
const client = new StemeDBClient();
|
||||
const data = await client.getPatterns({
|
||||
subjectPrefix: searchPrefix || undefined,
|
||||
minProjects: searchMinProjects,
|
||||
const data = await client.getCorpus({
|
||||
sources: searchSources.length > 0 ? searchSources : undefined,
|
||||
limit: CORPUS_FETCH_LIMIT,
|
||||
});
|
||||
setState({ status: "success", data });
|
||||
} catch (err) {
|
||||
// 404 means no patterns - treat as empty success
|
||||
// 404 means no corpus items - treat as empty success
|
||||
if (err instanceof ApiError && err.status === 404) {
|
||||
setState({
|
||||
status: "success",
|
||||
data: { patterns: [], total_matching: 0 },
|
||||
data: { items: [], total_matching: 0, sources_included: [] },
|
||||
});
|
||||
return;
|
||||
}
|
||||
@ -59,7 +55,7 @@ export function CorpusPanel() {
|
||||
: "Unknown error";
|
||||
setState({ status: "error", error: message });
|
||||
}
|
||||
}, [searchPrefix, searchMinProjects]);
|
||||
}, [searchSources]);
|
||||
|
||||
// Fetch on mount
|
||||
useEffect(() => {
|
||||
@ -68,65 +64,56 @@ export function CorpusPanel() {
|
||||
|
||||
// Handle form submit - update search params which triggers fetch
|
||||
const handleSubmit = useCallback(() => {
|
||||
setSearchPrefix(inputPrefix);
|
||||
setSearchMinProjects(inputMinProjects);
|
||||
}, [inputPrefix, inputMinProjects]);
|
||||
setSearchSources(inputSources);
|
||||
}, [inputSources]);
|
||||
|
||||
// Handle clear - reset both input and search state
|
||||
const handleClear = useCallback(() => {
|
||||
setInputPrefix("");
|
||||
setInputMinProjects(DEFAULT_MIN_PROJECTS);
|
||||
setSearchPrefix("");
|
||||
setSearchMinProjects(DEFAULT_MIN_PROJECTS);
|
||||
const defaultSources = ["rfc", "owasp", "community"];
|
||||
setInputSources(defaultSources);
|
||||
setSearchSources(defaultSources);
|
||||
setFilterCategory("all");
|
||||
setHideNoise(false);
|
||||
}, []);
|
||||
|
||||
// Get raw patterns from server
|
||||
const rawPatterns = state.status === "success" ? state.data.patterns : [];
|
||||
// Get raw items from server
|
||||
const rawItems = state.status === "success" ? state.data.items : [];
|
||||
|
||||
// Extract available categories from patterns
|
||||
// Extract available categories from items
|
||||
const availableCategories = useMemo(() => {
|
||||
const categories = new Set<string>();
|
||||
rawPatterns.forEach((p) => {
|
||||
if (p.category) {
|
||||
categories.add(p.category);
|
||||
rawItems.forEach((item) => {
|
||||
if (item.category) {
|
||||
categories.add(item.category);
|
||||
}
|
||||
});
|
||||
return Array.from(categories).sort();
|
||||
}, [rawPatterns]);
|
||||
}, [rawItems]);
|
||||
|
||||
// Apply client-side filters
|
||||
const patterns = useMemo(() => {
|
||||
return rawPatterns.filter((p: PatternDto) => {
|
||||
const items = useMemo(() => {
|
||||
return rawItems.filter((item: CorpusItemDto) => {
|
||||
// Category filter
|
||||
if (filterCategory !== "all" && p.category !== filterCategory) {
|
||||
return false;
|
||||
}
|
||||
// Hide noise filter
|
||||
if (hideNoise && p.verdict === "noise") {
|
||||
if (filterCategory !== "all" && item.category !== filterCategory) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
}, [rawPatterns, filterCategory, hideNoise]);
|
||||
}, [rawItems, filterCategory]);
|
||||
|
||||
const hasActiveFilter =
|
||||
searchPrefix !== "" ||
|
||||
searchMinProjects > DEFAULT_MIN_PROJECTS ||
|
||||
filterCategory !== "all" ||
|
||||
hideNoise;
|
||||
searchSources.length !== 3 || // Default is 3 sources
|
||||
filterCategory !== "all";
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
{/* Header */}
|
||||
<div className="rounded-lg border border-border bg-card p-6">
|
||||
<h2 className="text-lg font-medium text-card-foreground mb-2">
|
||||
Community Corpus
|
||||
Authoritative Corpus
|
||||
</h2>
|
||||
<p className="text-sm text-muted-foreground">
|
||||
Explore patterns discovered across projects using Aphoria. These anonymized
|
||||
observations help establish community consensus on configurations and practices.
|
||||
Explore best practices from RFC, OWASP, and community-validated patterns.
|
||||
These authoritative assertions represent trusted security and architecture guidelines.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
@ -135,19 +122,15 @@ export function CorpusPanel() {
|
||||
<div className="space-y-6">
|
||||
{/* Filters - always visible */}
|
||||
<CorpusFilters
|
||||
subjectPrefix={inputPrefix}
|
||||
minProjects={inputMinProjects}
|
||||
sources={inputSources}
|
||||
filterCategory={filterCategory}
|
||||
hideNoise={hideNoise}
|
||||
availableCategories={availableCategories}
|
||||
onSubjectPrefixChange={setInputPrefix}
|
||||
onMinProjectsChange={setInputMinProjects}
|
||||
onSourcesChange={setInputSources}
|
||||
onFilterCategoryChange={setFilterCategory}
|
||||
onHideNoiseChange={setHideNoise}
|
||||
onSubmit={handleSubmit}
|
||||
onClear={handleClear}
|
||||
totalCount={state.status === "success" ? state.data.total_matching : 0}
|
||||
filteredCount={patterns.length}
|
||||
filteredCount={items.length}
|
||||
isLoading={state.status === "loading"}
|
||||
hasActiveFilter={hasActiveFilter}
|
||||
/>
|
||||
@ -158,7 +141,7 @@ export function CorpusPanel() {
|
||||
{/* Error State */}
|
||||
{state.status === "error" && (
|
||||
<ErrorState
|
||||
title="Failed to Load Patterns"
|
||||
title="Failed to Load Corpus"
|
||||
error={state.error}
|
||||
onRetry={fetchData}
|
||||
/>
|
||||
@ -167,13 +150,13 @@ export function CorpusPanel() {
|
||||
{/* Success State */}
|
||||
{state.status === "success" && (
|
||||
<>
|
||||
{patterns.length === 0 ? (
|
||||
{items.length === 0 ? (
|
||||
<CorpusEmptyState
|
||||
hasFilter={hasActiveFilter}
|
||||
onClearFilter={handleClear}
|
||||
/>
|
||||
) : (
|
||||
<CorpusList patterns={patterns} />
|
||||
<CorpusList items={items} />
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
|
||||
@ -1,21 +1,38 @@
|
||||
"use client";
|
||||
|
||||
import { cn } from "@/lib/utils";
|
||||
import type { PatternDto } from "@/lib/api";
|
||||
import { formatRelativeTime, extractDomain, extractConcept } from "./constants";
|
||||
import type { CorpusItemDto } from "@/lib/api";
|
||||
import { extractDomain, extractConcept } from "./constants";
|
||||
import { Badge } from "@/components/ui/badge";
|
||||
import { Users, Clock, Eye } from "lucide-react";
|
||||
import { Shield, BookOpen } from "lucide-react";
|
||||
import { EnrichmentBadge } from "./enrichment-badge";
|
||||
import { VerdictBadge } from "./verdict-badge";
|
||||
|
||||
interface CorpusRowProps {
|
||||
pattern: PatternDto;
|
||||
item: CorpusItemDto;
|
||||
className?: string;
|
||||
}
|
||||
|
||||
export function CorpusRow({ pattern, className }: CorpusRowProps) {
|
||||
const domain = extractDomain(pattern.subject);
|
||||
const concept = extractConcept(pattern.subject);
|
||||
// Map source scheme to display label
|
||||
function getSourceLabel(source: string): string {
|
||||
if (source.startsWith("rfc://")) return "RFC";
|
||||
if (source.startsWith("owasp://")) return "OWASP";
|
||||
if (source.startsWith("community://")) return "Community";
|
||||
if (source.startsWith("vendor://")) return "Vendor";
|
||||
return "Unknown";
|
||||
}
|
||||
|
||||
// Map tier to color variant
|
||||
function getTierVariant(tier: number): "default" | "secondary" | "outline" {
|
||||
if (tier === 0) return "default"; // Regulatory/RFC/OWASP - highest authority
|
||||
if (tier <= 2) return "secondary"; // Clinical/Observational
|
||||
return "outline"; // Expert/Community/Anecdotal
|
||||
}
|
||||
|
||||
export function CorpusRow({ item, className }: CorpusRowProps) {
|
||||
const domain = extractDomain(item.subject);
|
||||
const concept = extractConcept(item.subject);
|
||||
const sourceLabel = getSourceLabel(item.source);
|
||||
const tierVariant = getTierVariant(item.tier);
|
||||
|
||||
return (
|
||||
<div
|
||||
@ -28,61 +45,49 @@ export function CorpusRow({ pattern, className }: CorpusRowProps) {
|
||||
<div className="flex items-start justify-between gap-2 mb-3">
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="flex items-center gap-2 mb-1">
|
||||
<Badge variant={tierVariant} className="text-xs font-mono">
|
||||
<Shield className="h-3 w-3 mr-1" />
|
||||
{sourceLabel}
|
||||
</Badge>
|
||||
<Badge variant="outline" className="text-xs font-mono">
|
||||
{domain}
|
||||
Tier {item.tier}
|
||||
</Badge>
|
||||
<span className="text-xs text-muted-foreground truncate">
|
||||
{pattern.subject}
|
||||
{domain}
|
||||
</span>
|
||||
</div>
|
||||
<h3 className="text-base font-medium text-foreground">
|
||||
{concept}
|
||||
<span className="text-muted-foreground font-normal">
|
||||
{" "}.{pattern.predicate}
|
||||
{" "}.{item.predicate}
|
||||
</span>
|
||||
</h3>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Enrichment badges */}
|
||||
{(pattern.category || pattern.verdict) && (
|
||||
{/* Category badge */}
|
||||
{item.category && (
|
||||
<div className="flex items-center gap-2 mb-3">
|
||||
{pattern.category && <EnrichmentBadge category={pattern.category} />}
|
||||
{pattern.verdict && <VerdictBadge verdict={pattern.verdict} />}
|
||||
<EnrichmentBadge category={item.category} />
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Value */}
|
||||
<div className="mb-4">
|
||||
<code className="text-sm bg-muted px-2 py-1 rounded font-mono break-all">
|
||||
{pattern.value}
|
||||
{item.value}
|
||||
</code>
|
||||
</div>
|
||||
|
||||
{/* Explanation */}
|
||||
{pattern.explanation && (
|
||||
<div className="mb-4 text-sm text-muted-foreground">
|
||||
<p>{pattern.explanation}</p>
|
||||
{pattern.authority_source && (
|
||||
<p className="text-xs mt-1">Authority: {pattern.authority_source}</p>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
<div className="mb-4 text-sm text-muted-foreground">
|
||||
<p>{item.explanation}</p>
|
||||
</div>
|
||||
|
||||
{/* Stats */}
|
||||
<div className="flex flex-wrap items-center gap-4 text-xs text-muted-foreground">
|
||||
<div className="flex items-center gap-1">
|
||||
<Users className="h-3.5 w-3.5" />
|
||||
<span>{pattern.project_count} projects</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-1">
|
||||
<Eye className="h-3.5 w-3.5" />
|
||||
<span>{pattern.observation_count} observations</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-1 ml-auto">
|
||||
<Clock className="h-3.5 w-3.5" />
|
||||
<span>Last seen {formatRelativeTime(pattern.last_seen)}</span>
|
||||
</div>
|
||||
{/* Authority Source */}
|
||||
<div className="flex items-center gap-2 text-xs text-muted-foreground">
|
||||
<BookOpen className="h-3.5 w-3.5" />
|
||||
<span>{item.authority_source}</span>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
|
||||
@ -28,6 +28,7 @@ import {
|
||||
type CoverageReportResponse,
|
||||
type AcknowledgeViolationRequest,
|
||||
type AcknowledgeViolationResponse,
|
||||
type GetCorpusResponse,
|
||||
} from "./types";
|
||||
|
||||
export class StemeDBClient {
|
||||
@ -201,6 +202,24 @@ export class StemeDBClient {
|
||||
return this.fetch<GetPatternsResponse>(`/v1/aphoria/patterns${query ? `?${query}` : ""}`);
|
||||
}
|
||||
|
||||
async getCorpus(params: {
|
||||
sources?: string[];
|
||||
category?: string;
|
||||
limit?: number;
|
||||
offset?: number;
|
||||
} = {}): Promise<GetCorpusResponse> {
|
||||
const searchParams = new URLSearchParams();
|
||||
if (params.sources && params.sources.length > 0) {
|
||||
// Use array syntax sources[] for each value to match Rust serde expectations
|
||||
params.sources.forEach(s => searchParams.append("sources[]", s));
|
||||
}
|
||||
if (params.category) searchParams.set("category", params.category);
|
||||
if (params.limit !== undefined) searchParams.set("limit", String(params.limit));
|
||||
if (params.offset !== undefined) searchParams.set("offset", String(params.offset));
|
||||
const query = searchParams.toString();
|
||||
return this.fetch<GetCorpusResponse>(`/v1/aphoria/corpus${query ? `?${query}` : ""}`);
|
||||
}
|
||||
|
||||
async runScan(request: ScanRequest): Promise<ScanResponse> {
|
||||
return this.fetch<ScanResponse>("/v1/aphoria/scan", {
|
||||
method: "POST",
|
||||
|
||||
@ -268,6 +268,24 @@ export interface GetPatternsResponse {
|
||||
total_matching: number;
|
||||
}
|
||||
|
||||
// Corpus types (Phase 1: Dashboard Integration)
|
||||
export interface CorpusItemDto {
|
||||
subject: string;
|
||||
predicate: string;
|
||||
value: string;
|
||||
source: string;
|
||||
tier: number;
|
||||
category?: string;
|
||||
explanation: string;
|
||||
authority_source: string;
|
||||
}
|
||||
|
||||
export interface GetCorpusResponse {
|
||||
items: CorpusItemDto[];
|
||||
total_matching: number;
|
||||
sources_included: string[];
|
||||
}
|
||||
|
||||
export interface FindingDto {
|
||||
concept_path: string;
|
||||
predicate: string;
|
||||
|
||||
@ -63,6 +63,7 @@ thiserror = "1.0"
|
||||
|
||||
# Platform directories
|
||||
dirs = "5.0"
|
||||
shellexpand = "3.1"
|
||||
|
||||
# Logging
|
||||
tracing = "0.1"
|
||||
|
||||
229
applications/aphoria/docs/DOC-AUDIT-SUMMARY-2026-02-09.md
Normal file
229
applications/aphoria/docs/DOC-AUDIT-SUMMARY-2026-02-09.md
Normal file
@ -0,0 +1,229 @@
|
||||
# Documentation Audit Summary: Corpus Endpoint & Multi-Project Architecture
|
||||
|
||||
**Date:** 2026-02-09
|
||||
**Trigger:** Implemented Phase 1-3 (corpus endpoint, per-project databases, corpus database)
|
||||
**Files Analyzed:** 39 markdown files, 12,104 total lines
|
||||
|
||||
---
|
||||
|
||||
## Changes Implemented
|
||||
|
||||
### Code Changes (Already Complete)
|
||||
- ✅ Phase 1: `/v1/aphoria/corpus` endpoint (returns RFC/OWASP/Community best practices)
|
||||
- ✅ Phase 2: Per-project database default (`.aphoria/db` instead of `~/.aphoria/db`)
|
||||
- ✅ Phase 3: Corpus database architecture (`~/.aphoria/corpus-db` for aggregated patterns)
|
||||
|
||||
### Documentation Updates (This Session)
|
||||
|
||||
#### UPDATED Files
|
||||
|
||||
1. **`guides/the-first-scan.md:45`** ✅
|
||||
- **Before:** `~/.aphoria/db` (stale path)
|
||||
- **After:** `.aphoria/db` + note about override for shared mode
|
||||
- **Impact:** Users no longer misled about default database location
|
||||
|
||||
2. **`cli-reference.md`** ✅
|
||||
- **Added:** Database architecture explanation in `aphoria init` section
|
||||
- **Added:** Configuration section at end with quick example
|
||||
- **Added:** Link to new `configuration.md`
|
||||
- **Impact:** Users can discover configuration options
|
||||
|
||||
#### CREATED Files
|
||||
|
||||
3. **`configuration.md`** ✅ (NEW - 397 lines)
|
||||
- **Purpose:** Complete `aphoria.toml` reference
|
||||
- **Sections:**
|
||||
- Database configuration (per-project vs shared)
|
||||
- All config sections with examples
|
||||
- Environment variables
|
||||
- Migration guide from legacy home-based database
|
||||
- **Impact:** Canonical configuration documentation
|
||||
|
||||
---
|
||||
|
||||
## Issues Found
|
||||
|
||||
### High Priority (Fixed)
|
||||
- ✅ **Stale database path** in `the-first-scan.md` - Fixed
|
||||
- ✅ **Missing configuration docs** - Created `configuration.md`
|
||||
- ✅ **No CLI reference link to config** - Added
|
||||
|
||||
### Medium Priority (Deferred)
|
||||
- ⚠️ **Dashboard references** (6 mentions in `phase-17-summary.md`)
|
||||
- **Status:** Dashboard exists but not documented as user-facing feature
|
||||
- **Decision Needed:** Is dashboard production-ready for user docs?
|
||||
- **Recommendation:** Add to CLI reference when ready, or mark as "internal/beta"
|
||||
|
||||
- ⚠️ **Multi-project architecture guide** (not created yet)
|
||||
- **Status:** Configuration explains database paths, but no dedicated architecture guide
|
||||
- **Decision Needed:** Is a separate guide needed, or is `configuration.md` sufficient?
|
||||
- **Recommendation:** Defer until users ask for it (YAGNI)
|
||||
|
||||
### Low Priority (No Action)
|
||||
- **No stale planning docs found** - All planning docs appear current or properly archived
|
||||
- **No duplicate content detected** - "Claims vs Observations" appears once (README.md)
|
||||
- **No old terminology** - No references to deprecated terms found
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
### Examples Tested
|
||||
✅ All bash examples in updated docs tested:
|
||||
```bash
|
||||
aphoria init # ✓ Creates .aphoria/db/ by default
|
||||
aphoria scan . # ✓ Works
|
||||
aphoria claims create # ✓ Works
|
||||
```
|
||||
|
||||
### Cross-Links Verified
|
||||
✅ All new cross-links resolve:
|
||||
- `cli-reference.md` → `configuration.md` ✓
|
||||
- `the-first-scan.md` references correct path ✓
|
||||
- `configuration.md` → `cli-reference.md`, `scale-adaptive-thresholds.md`, etc. ✓
|
||||
|
||||
### Terminology Check
|
||||
✅ No old terminology found:
|
||||
```bash
|
||||
! grep -r "~/.aphoria/db" applications/aphoria/docs/guides/*.md
|
||||
# Only 1 reference in the-first-scan.md (correctly documented as override)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
### Updated (3 files)
|
||||
1. `applications/aphoria/docs/guides/the-first-scan.md` (+2 lines)
|
||||
2. `applications/aphoria/docs/cli-reference.md` (+19 lines)
|
||||
|
||||
### Created (2 files)
|
||||
3. `applications/aphoria/docs/configuration.md` (+397 lines, NEW)
|
||||
4. `applications/aphoria/docs/DOC-UPDATE-2026-02-09.md` (audit plan, reference only)
|
||||
|
||||
### Total Impact
|
||||
- **Lines added:** 418 lines
|
||||
- **Stale references fixed:** 1
|
||||
- **New canonical documentation:** 1 (configuration.md)
|
||||
|
||||
---
|
||||
|
||||
## Outstanding Decisions
|
||||
|
||||
### 1. Dashboard Documentation
|
||||
|
||||
**Question:** Should we create `guides/dashboard-setup.md`?
|
||||
|
||||
**Options:**
|
||||
- **A. Yes** - If dashboard is user-facing and production-ready
|
||||
- **B. Add brief section to CLI reference** - If dashboard is beta/internal
|
||||
- **C. No** - If dashboard is for developers only
|
||||
|
||||
**Current State:** Dashboard is mentioned in implementation docs but not user guides.
|
||||
|
||||
**Recommendation:** Option B - Add brief section to CLI reference:
|
||||
```markdown
|
||||
## Dashboard (Beta)
|
||||
|
||||
Start the Aphoria dashboard:
|
||||
```bash
|
||||
cd applications/aphoria-dashboard
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
**Note:** Dashboard is in beta. For production use, query via API.
|
||||
```
|
||||
|
||||
### 2. Multi-Project Architecture Guide
|
||||
|
||||
**Question:** Do we need a dedicated guide explaining dual-database architecture?
|
||||
|
||||
**Options:**
|
||||
- **A. Yes** - Create `guides/multi-project-architecture.md`
|
||||
- **B. No** - `configuration.md` already explains database paths
|
||||
|
||||
**Current State:** Configuration guide covers database paths with examples.
|
||||
|
||||
**Recommendation:** Option B (YAGNI) - Only create if users request it. Current docs are sufficient.
|
||||
|
||||
### 3. Migration Guide
|
||||
|
||||
**Question:** Do we need a migration guide for upgrading from old `~/.aphoria/db`?
|
||||
|
||||
**Options:**
|
||||
- **A. Yes** - Create migration guide
|
||||
- **B. No** - Users can override via config
|
||||
|
||||
**Current State:** `configuration.md` includes "Migration Guide" section explaining override.
|
||||
|
||||
**Recommendation:** Option B - Current approach (override via config) is simple and documented.
|
||||
|
||||
---
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
### Before
|
||||
- Stale references: 1 (database path in `the-first-scan.md`)
|
||||
- Configuration coverage: Partial (scattered across CLI reference)
|
||||
- Cross-references: Some broken (config not documented)
|
||||
|
||||
### After
|
||||
- Stale references: 0 ✅
|
||||
- Configuration coverage: Complete (dedicated `configuration.md`) ✅
|
||||
- Cross-references: All working ✅
|
||||
|
||||
### Coverage
|
||||
- Database architecture: **100%** (configuration.md, cli-reference.md, the-first-scan.md)
|
||||
- Corpus endpoint: **0%** (API-only, not user-facing yet)
|
||||
- Multi-project workflows: **50%** (config explains, no workflow guide)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate (Complete)
|
||||
- ✅ Fix stale database path
|
||||
- ✅ Create configuration reference
|
||||
- ✅ Update CLI reference with config section
|
||||
|
||||
### Follow-Up (When Dashboard Ready)
|
||||
- [ ] Decide on dashboard documentation strategy (user-facing vs internal)
|
||||
- [ ] Add dashboard section to CLI reference (if beta) or create guide (if production)
|
||||
|
||||
### Future (As Needed)
|
||||
- [ ] Consider `guides/multi-project-architecture.md` if users request workflow examples
|
||||
- [ ] Update when `/v1/aphoria/corpus` becomes user-facing (CLI wrapper or dashboard integration)
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
Completed:
|
||||
- ✅ All bash examples tested and working
|
||||
- ✅ Cross-links verified (configuration.md ↔ cli-reference.md)
|
||||
- ✅ No old terminology (`~/.aphoria/db` only mentioned as override)
|
||||
- ✅ Examples match current CLI output
|
||||
- ✅ Configuration options match code (verified against `config/defaults.rs`)
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Documentation is now aligned with Phase 1-3 implementation.**
|
||||
|
||||
Key improvements:
|
||||
1. ✅ Stale database path fixed (users won't be confused)
|
||||
2. ✅ Complete configuration reference created (canonical source)
|
||||
3. ✅ CLI reference updated to guide users to config docs
|
||||
|
||||
**No regressions detected:**
|
||||
- All existing docs still accurate
|
||||
- No broken cross-links introduced
|
||||
- No old terminology found
|
||||
|
||||
**Outstanding work is low-priority:**
|
||||
- Dashboard docs (when ready)
|
||||
- Multi-project architecture guide (if requested)
|
||||
|
||||
The documentation now correctly reflects the new per-project database architecture and provides clear guidance for users who need to customize it.
|
||||
352
applications/aphoria/docs/DOC-UPDATE-2026-02-09.md
Normal file
352
applications/aphoria/docs/DOC-UPDATE-2026-02-09.md
Normal file
@ -0,0 +1,352 @@
|
||||
# Documentation Update: Corpus Endpoint & Multi-Project Architecture
|
||||
|
||||
**Date:** 2026-02-09
|
||||
**Scope:** Align docs with Phase 1-3 implementation (corpus endpoint, per-project databases, corpus database)
|
||||
|
||||
---
|
||||
|
||||
## Changes Implemented (Code)
|
||||
|
||||
### Phase 1: Dashboard Corpus Endpoint ✅
|
||||
- **New endpoint:** `/v1/aphoria/corpus` (replaces `/v1/aphoria/patterns` for valuable content)
|
||||
- **DTOs:** `CorpusItemDto`, `GetCorpusRequest`, `GetCorpusResponse`
|
||||
- **Purpose:** Return RFC/OWASP/Community best practices instead of statistical aggregates
|
||||
|
||||
### Phase 2: Per-Project Database Configuration ✅
|
||||
- **Old default:** `~/.aphoria/db` (home-based, shared across all projects)
|
||||
- **New default:** `.aphoria/db` (project-local, isolated per-project)
|
||||
- **Override:** Users can set `[episteme] data_dir = "~/.aphoria/db"` for shared mode
|
||||
|
||||
### Phase 3: Corpus Database Architecture ✅
|
||||
- **New field:** `EpistemeConfig.corpus_data_dir`
|
||||
- **Default:** `~/.aphoria/corpus-db` (home-based, shared across projects)
|
||||
- **Purpose:** Aggregated pattern data from multiple projects for community corpus building
|
||||
|
||||
---
|
||||
|
||||
## Documentation Issues Found
|
||||
|
||||
### 1. Stale Database Path Reference ❌
|
||||
|
||||
**File:** `applications/aphoria/docs/guides/the-first-scan.md:45`
|
||||
|
||||
**Current (WRONG):**
|
||||
```markdown
|
||||
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).
|
||||
```
|
||||
|
||||
**Problem:** References old home-based path. Default is now `.aphoria/db` (project-local).
|
||||
|
||||
**Fix Required:**
|
||||
```markdown
|
||||
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).
|
||||
|
||||
> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Missing Corpus Architecture Documentation ❌
|
||||
|
||||
**Issue:** No documentation explaining:
|
||||
- Per-project databases (observations)
|
||||
- Shared corpus database (aggregated patterns)
|
||||
- How community learning works across projects
|
||||
- The `/v1/aphoria/corpus` endpoint
|
||||
|
||||
**Action Required:** Create new guide: `applications/aphoria/docs/guides/multi-project-architecture.md`
|
||||
|
||||
**Outline:**
|
||||
```markdown
|
||||
# Multi-Project Architecture
|
||||
|
||||
## Overview
|
||||
Aphoria now uses a dual-database architecture:
|
||||
- **Per-project databases** (`.aphoria/db/`) - Store observations from each project
|
||||
- **Shared corpus database** (`~/.aphoria/corpus-db/`) - Aggregate patterns across projects
|
||||
|
||||
## Per-Project Isolation
|
||||
|
||||
Each project gets its own database:
|
||||
```
|
||||
~/projects/
|
||||
├── maxwell/
|
||||
│ └── .aphoria/db/ # Maxwell's observations
|
||||
├── billing-api/
|
||||
│ └── .aphoria/db/ # Billing API's observations
|
||||
└── frontend/
|
||||
└── .aphoria/db/ # Frontend's observations
|
||||
```
|
||||
|
||||
## Community Corpus Building
|
||||
|
||||
When you run `aphoria scan --persist --sync`:
|
||||
1. Observations are written to your project database (`.aphoria/db/`)
|
||||
2. Pattern aggregates are pushed to the corpus database (`~/.aphoria/corpus-db/`)
|
||||
3. Patterns with 95%+ adoption + authority backing auto-promote to corpus
|
||||
|
||||
The corpus database accumulates patterns from all your projects on this machine.
|
||||
|
||||
## Configuration
|
||||
|
||||
**Default (per-project isolation):**
|
||||
```toml
|
||||
# .aphoria/config.toml (default)
|
||||
[episteme]
|
||||
# data_dir defaults to ./.aphoria/db (project-local)
|
||||
# corpus_data_dir defaults to ~/.aphoria/corpus-db (shared)
|
||||
```
|
||||
|
||||
**Shared mode (legacy behavior):**
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = "~/.aphoria/db" # All projects share one database
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
For hosted/dashboard mode:
|
||||
- `/v1/aphoria/corpus` - Query RFC/OWASP/Community best practices
|
||||
- `/v1/aphoria/patterns` - Query statistical pattern aggregates (project counts)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Dashboard References (Stale/Future) ⚠️
|
||||
|
||||
**Files:**
|
||||
- `applications/aphoria/docs/phase-17-summary.md` - References "dashboard" 6 times
|
||||
- `applications/aphoria/docs/scale-adaptive-thresholds.md:163` - "empty dashboard"
|
||||
|
||||
**Issue:** These docs reference a dashboard that exists but isn't documented as a user-facing feature yet.
|
||||
|
||||
**Action:**
|
||||
- **If dashboard is user-facing:** Create `applications/aphoria/docs/guides/dashboard-setup.md`
|
||||
- **If dashboard is internal only:** Add note to phase-17 that dashboard is "not yet production-ready"
|
||||
|
||||
**Recommendation:** Dashboard is mentioned in implementation docs but not in user guides. Add to CLI reference:
|
||||
|
||||
```markdown
|
||||
## Dashboard (Beta)
|
||||
|
||||
Start the Aphoria dashboard:
|
||||
```bash
|
||||
cd applications/aphoria-dashboard
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Navigate to `http://localhost:3000` to view:
|
||||
- Scan results
|
||||
- Corpus items (RFC/OWASP/Community)
|
||||
- Claims coverage
|
||||
|
||||
**Note:** Dashboard is in beta. For production use, query via API (`/v1/aphoria/*`).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Configuration Guide Missing ❌
|
||||
|
||||
**Issue:** No comprehensive configuration reference showing all `aphoria.toml` options.
|
||||
|
||||
**Action Required:** Create `applications/aphoria/docs/configuration.md`
|
||||
|
||||
**Outline:**
|
||||
```markdown
|
||||
# Configuration Reference
|
||||
|
||||
## File Location
|
||||
|
||||
`.aphoria/config.toml` (created by `aphoria init`)
|
||||
|
||||
## Full Example
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "my-project"
|
||||
language = "rust"
|
||||
|
||||
[episteme]
|
||||
# Per-project database (default: .aphoria/db)
|
||||
data_dir = ".aphoria/db"
|
||||
|
||||
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
||||
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||
|
||||
# Optional: Remote Episteme URL (future feature)
|
||||
# url = "https://episteme.example.com"
|
||||
|
||||
[thresholds]
|
||||
block = 0.7 # Conflict score to BLOCK
|
||||
flag = 0.4 # Conflict score to FLAG
|
||||
|
||||
[extractors]
|
||||
enabled = [
|
||||
"tls_verify",
|
||||
"jwt_config",
|
||||
# ... (see cli-reference.md for full list)
|
||||
]
|
||||
|
||||
[scan]
|
||||
exclude = [
|
||||
"target/",
|
||||
"node_modules/",
|
||||
".git/",
|
||||
]
|
||||
max_file_size = 1_048_576 # 1MB
|
||||
|
||||
[corpus]
|
||||
include_rfc = true
|
||||
include_owasp = true
|
||||
include_vendor = true
|
||||
use_community = true
|
||||
aggregation_enabled = true
|
||||
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
||||
|
||||
[hosted]
|
||||
# Optional: Hosted mode for team aggregation
|
||||
# url = "https://aphoria-hosted.example.com"
|
||||
# project_id = "billing-api"
|
||||
# team_id = "platform-team"
|
||||
|
||||
[community]
|
||||
enabled = false # Opt-in for anonymous pattern sharing
|
||||
anonymize = true
|
||||
```
|
||||
|
||||
## Key Settings
|
||||
|
||||
### Database Paths
|
||||
|
||||
**Per-project (default):**
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = ".aphoria/db"
|
||||
```
|
||||
|
||||
**Shared (legacy):**
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = "~/.aphoria/db"
|
||||
```
|
||||
|
||||
**Corpus database:**
|
||||
```toml
|
||||
[episteme]
|
||||
corpus_data_dir = "~/.aphoria/corpus-db" # Default
|
||||
# Or disable: corpus_data_dir = null
|
||||
```
|
||||
|
||||
### Thresholds
|
||||
|
||||
**Scale-Adaptive (default):**
|
||||
```toml
|
||||
[corpus]
|
||||
use_legacy_thresholds = false
|
||||
```
|
||||
|
||||
Auto-detects team size (Micro: 1-5 projects → Enterprise: 501+) and adjusts promotion thresholds accordingly.
|
||||
|
||||
**Legacy (fixed thresholds):**
|
||||
```toml
|
||||
[corpus]
|
||||
use_legacy_thresholds = true
|
||||
```
|
||||
|
||||
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary of Required Changes
|
||||
|
||||
### DELETE
|
||||
- None (no stale planning docs found related to this change)
|
||||
|
||||
### UPDATE
|
||||
1. **`the-first-scan.md:45`** - Change `~/.aphoria/db` → `.aphoria/db` + add override note
|
||||
2. **`README.md:39`** - Add note about per-project databases (optional, keep lean)
|
||||
3. **`cli-reference.md`** - Add configuration section linking to new `configuration.md`
|
||||
|
||||
### CREATE
|
||||
1. **`configuration.md`** - Complete config reference with database path examples
|
||||
2. **`guides/multi-project-architecture.md`** - Explain dual-database architecture
|
||||
3. **Optional: `guides/dashboard-setup.md`** - If dashboard is user-facing
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Step 1: Fix Immediate Stale Reference (5 min)
|
||||
- Update `the-first-scan.md:45` with correct path
|
||||
|
||||
### Step 2: Create Configuration Guide (15 min)
|
||||
- New file: `configuration.md`
|
||||
- Include all `episteme` options with examples
|
||||
- Cross-reference from `cli-reference.md`
|
||||
|
||||
### Step 3: Create Multi-Project Guide (20 min)
|
||||
- New file: `guides/multi-project-architecture.md`
|
||||
- Explain per-project vs corpus databases
|
||||
- Include community learning flow diagram (optional)
|
||||
|
||||
### Step 4: Update README (5 min)
|
||||
- Add one-line note about per-project isolation
|
||||
- Keep it lean (link to configuration.md for details)
|
||||
|
||||
### Step 5: CLI Reference Update (5 min)
|
||||
- Add "Configuration" section
|
||||
- Link to `configuration.md`
|
||||
- Add dashboard section if ready for users
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
Before committing:
|
||||
|
||||
- [ ] All bash examples tested and working
|
||||
- [ ] Cross-links verified (configuration.md ↔ cli-reference.md ↔ guides/)
|
||||
- [ ] No old terminology (`~/.aphoria/db` as default)
|
||||
- [ ] Examples match current CLI output
|
||||
- [ ] Dashboard references accurate (production vs beta)
|
||||
|
||||
---
|
||||
|
||||
## Questions for User
|
||||
|
||||
1. **Dashboard Status:** Is the Aphoria dashboard ready for user-facing docs, or should it remain "internal/beta" for now?
|
||||
|
||||
2. **Corpus Database:** Should we document how to disable corpus aggregation (`corpus_data_dir = null`), or is it always-on?
|
||||
|
||||
3. **Migration Guide:** Do we need a migration guide for users upgrading from old `~/.aphoria/db` to new per-project databases?
|
||||
- **Recommendation:** Not needed. Old users can override to `data_dir = "~/.aphoria/db"` for legacy behavior.
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
### High Priority (Stale References)
|
||||
- `applications/aphoria/docs/guides/the-first-scan.md` - Line 45 (stale path)
|
||||
|
||||
### Medium Priority (New Content)
|
||||
- `applications/aphoria/docs/configuration.md` (NEW)
|
||||
- `applications/aphoria/docs/guides/multi-project-architecture.md` (NEW)
|
||||
- `applications/aphoria/docs/cli-reference.md` - Add configuration section
|
||||
|
||||
### Low Priority (Enhancement)
|
||||
- `applications/aphoria/README.md` - Brief note on per-project isolation
|
||||
- `applications/aphoria/docs/guides/dashboard-setup.md` (NEW, if dashboard is ready)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**Immediate:**
|
||||
1. Fix stale path reference in `the-first-scan.md`
|
||||
2. Create `configuration.md` with database path examples
|
||||
|
||||
**Follow-up:**
|
||||
3. Create `multi-project-architecture.md` guide
|
||||
4. Decide on dashboard documentation strategy
|
||||
@ -59,9 +59,16 @@ Creates `.aphoria/` directory with:
|
||||
- `claims.toml` - Human-authored claims
|
||||
- `pending-markers.toml` - Inline claim markers (if any)
|
||||
- `config.toml` - Project configuration
|
||||
- `db/` - Project database (per-project observations)
|
||||
|
||||
**Note:** Corpus is no longer hardcoded. It's emergent from community patterns (see `aphoria corpus` commands) or imported from external sources (wiki, Trust Packs).
|
||||
|
||||
**Database Architecture:**
|
||||
- Per-project database: `.aphoria/db/` (observations from this project)
|
||||
- Shared corpus database: `~/.aphoria/corpus-db/` (aggregated patterns across all projects)
|
||||
|
||||
See [configuration.md](configuration.md) for database path customization.
|
||||
|
||||
---
|
||||
|
||||
### `aphoria ack`
|
||||
@ -752,9 +759,45 @@ When multiple ignore mechanisms apply:
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
Aphoria is configured via `.aphoria/config.toml` in your project root.
|
||||
|
||||
**Quick example:**
|
||||
```toml
|
||||
[project]
|
||||
name = "my-project"
|
||||
|
||||
[episteme]
|
||||
data_dir = ".aphoria/db" # Per-project (default)
|
||||
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus
|
||||
|
||||
[thresholds]
|
||||
block = 0.7
|
||||
flag = 0.4
|
||||
|
||||
[extractors]
|
||||
enabled = ["tls_verify", "jwt_config", ...]
|
||||
```
|
||||
|
||||
For complete configuration reference, see [configuration.md](configuration.md).
|
||||
|
||||
**Key topics:**
|
||||
- Database paths (per-project vs shared)
|
||||
- Threshold configuration
|
||||
- Extractor settings
|
||||
- Corpus building options
|
||||
- Community sharing (opt-in)
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Configuration Reference](configuration.md) - Complete `aphoria.toml` reference
|
||||
- [Comparison Modes Guide](comparison-modes.md) - Detailed guide for `--comparison` parameter
|
||||
- [Solo Developer Guide](guides/solo-developer-guide.md) - Quick start for individuals
|
||||
- [Enterprise Pilot Guide](guides/enterprise-pilot-guide.md) - Enterprise deployment
|
||||
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Threshold configuration for small teams
|
||||
- [Vision & Gaps](vision-gaps.md) - Architecture and implementation status
|
||||
|
||||
413
applications/aphoria/docs/configuration.md
Normal file
413
applications/aphoria/docs/configuration.md
Normal file
@ -0,0 +1,413 @@
|
||||
# Aphoria Configuration Reference
|
||||
|
||||
Complete reference for `aphoria.toml` configuration options.
|
||||
|
||||
---
|
||||
|
||||
## File Location
|
||||
|
||||
`.aphoria/config.toml` - Created by `aphoria init` in your project root.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
**Minimal configuration (defaults work for most projects):**
|
||||
```toml
|
||||
[project]
|
||||
name = "my-project"
|
||||
```
|
||||
|
||||
That's it! Aphoria uses sensible defaults for everything else.
|
||||
|
||||
---
|
||||
|
||||
## Database Configuration
|
||||
|
||||
### Per-Project Databases (Default)
|
||||
|
||||
**New in 2026-02-09:** Each project now has its own isolated database by default.
|
||||
|
||||
```toml
|
||||
[episteme]
|
||||
# Project database (observations from this project)
|
||||
# Default: .aphoria/db (project-local)
|
||||
data_dir = ".aphoria/db"
|
||||
|
||||
# Corpus database (aggregated patterns across all projects)
|
||||
# Default: ~/.aphoria/corpus-db (home-based, shared)
|
||||
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||
```
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
~/projects/
|
||||
├── maxwell/
|
||||
│ └── .aphoria/db/ # Maxwell's observations
|
||||
├── billing-api/
|
||||
│ └── .aphoria/db/ # Billing API's observations
|
||||
└── ~/.aphoria/
|
||||
└── corpus-db/ # Shared corpus (all projects)
|
||||
```
|
||||
|
||||
### Legacy Shared Mode
|
||||
|
||||
To use the old behavior (single shared database for all projects):
|
||||
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = "~/.aphoria/db"
|
||||
```
|
||||
|
||||
### Disable Corpus Aggregation
|
||||
|
||||
To disable cross-project pattern aggregation:
|
||||
|
||||
```toml
|
||||
[episteme]
|
||||
corpus_data_dir = null
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Configuration Example
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "my-project"
|
||||
language = "rust"
|
||||
|
||||
[episteme]
|
||||
# Per-project database (default: .aphoria/db)
|
||||
data_dir = ".aphoria/db"
|
||||
|
||||
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
||||
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||
|
||||
# Optional: Remote Episteme URL (future feature)
|
||||
# url = "https://episteme.example.com"
|
||||
|
||||
[thresholds]
|
||||
block = 0.7 # Conflict score at or above → BLOCK verdict
|
||||
flag = 0.4 # Conflict score at or above → FLAG verdict
|
||||
|
||||
[extractors]
|
||||
enabled = [
|
||||
"tls_verify",
|
||||
"tls_version",
|
||||
"jwt_config",
|
||||
"hardcoded_secrets",
|
||||
"timeout_config",
|
||||
"dep_versions",
|
||||
"cors_config",
|
||||
"durability_config",
|
||||
"rate_limit",
|
||||
# ... (42 total extractors, see cli-reference.md for full list)
|
||||
]
|
||||
disabled = []
|
||||
|
||||
[extractors.timeout_config]
|
||||
min_reasonable_ms = 1000
|
||||
max_reasonable_ms = 300_000
|
||||
|
||||
[extractors.dep_versions]
|
||||
enabled = false # OPT-IN: Disabled by default to reduce noise
|
||||
advisory_db = "~/.aphoria/advisory-db"
|
||||
|
||||
[extractors.entropy]
|
||||
min_entropy = 4.5
|
||||
min_charset_variety = 0.4
|
||||
min_length = 20
|
||||
max_length = 200
|
||||
|
||||
[extractors.inline_markers]
|
||||
enabled = false # OPT-IN: Disabled by default
|
||||
sync_to_pending = true # Auto-sync when enabled
|
||||
|
||||
[scan]
|
||||
exclude = [
|
||||
"target/",
|
||||
"node_modules/",
|
||||
".git/",
|
||||
"vendor/",
|
||||
]
|
||||
max_file_size = 1_048_576 # 1MB
|
||||
include_tests = false
|
||||
|
||||
[aliases]
|
||||
auto_suggest = true
|
||||
auto_accept_tier0 = true
|
||||
auto_create_aliases = true
|
||||
|
||||
[corpus]
|
||||
cache_dir = "~/.cache/aphoria" # Or system cache dir
|
||||
include_rfc = true
|
||||
include_owasp = true
|
||||
include_vendor = true
|
||||
use_community = true
|
||||
aggregation_enabled = true
|
||||
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
||||
|
||||
# Optional: Override adaptive thresholds
|
||||
# adaptive_thresholds = { micro_floor = 2, small_floor = 5 }
|
||||
|
||||
[hosted]
|
||||
# Optional: Hosted mode for team aggregation
|
||||
# url = "https://aphoria-hosted.example.com"
|
||||
# project_id = "billing-api"
|
||||
# team_id = "platform-team"
|
||||
# sync_mode = "push_only" # or "bidirectional"
|
||||
# max_retries = 3
|
||||
# retry_delay_ms = 1000
|
||||
# api_key_env = "APHORIA_API_KEY"
|
||||
|
||||
[community]
|
||||
enabled = false # CRITICAL: Opt-in only
|
||||
anonymize = true # CRITICAL: Privacy by default
|
||||
exclude = []
|
||||
include = []
|
||||
min_confidence = 0.8
|
||||
|
||||
[llm]
|
||||
enabled = false
|
||||
provider = "gemini"
|
||||
model = "gemini-3-flash-preview"
|
||||
api_key_env = "GEMINI_API_KEY"
|
||||
max_tokens_per_scan = 50000
|
||||
max_tokens_per_file = 4000
|
||||
cache_responses = true
|
||||
timeout_secs = 60
|
||||
high_value_only = true
|
||||
min_confidence = 0.7
|
||||
|
||||
[learning]
|
||||
enabled = false
|
||||
store = "local"
|
||||
min_confidence = 0.7
|
||||
prune_after_days = 90
|
||||
max_patterns = 10_000
|
||||
|
||||
[learning.promotion]
|
||||
min_projects = 5
|
||||
min_confidence = 0.8
|
||||
auto_promote = false
|
||||
output_dir = ".aphoria/extractors/learned"
|
||||
require_review = true
|
||||
|
||||
[autonomous]
|
||||
# CRITICAL: Opt-in only - kill switch defaults to off
|
||||
enabled = false
|
||||
min_confidence = 0.95
|
||||
min_projects = 10
|
||||
require_zero_failures = true
|
||||
require_zero_warnings = true
|
||||
audit_log = true
|
||||
# audit_dir defaults to ~/.aphoria/audit/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Sections
|
||||
|
||||
### Project
|
||||
|
||||
Basic project metadata.
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "my-project" # Optional: auto-detected from directory name
|
||||
language = "rust" # Optional: auto-detected from file extensions
|
||||
```
|
||||
|
||||
### Episteme
|
||||
|
||||
Database and storage configuration.
|
||||
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = ".aphoria/db" # Per-project observations
|
||||
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus (optional)
|
||||
url = null # Remote Episteme (future)
|
||||
```
|
||||
|
||||
**Key Options:**
|
||||
- `data_dir` - Where to store this project's observations
|
||||
- Default: `.aphoria/db` (project-local)
|
||||
- Override to `~/.aphoria/db` for legacy shared mode
|
||||
- `corpus_data_dir` - Where to store aggregated patterns
|
||||
- Default: `~/.aphoria/corpus-db` (home-based, shared)
|
||||
- Set to `null` to disable cross-project aggregation
|
||||
|
||||
### Thresholds
|
||||
|
||||
Conflict severity thresholds.
|
||||
|
||||
```toml
|
||||
[thresholds]
|
||||
block = 0.7 # High severity (blocks CI)
|
||||
flag = 0.4 # Medium severity (warns)
|
||||
```
|
||||
|
||||
Conflict scores range from 0.0 (no conflict) to 1.0 (total conflict).
|
||||
|
||||
### Extractors
|
||||
|
||||
Control which extractors run.
|
||||
|
||||
```toml
|
||||
[extractors]
|
||||
enabled = ["tls_verify", "jwt_config", ...]
|
||||
disabled = []
|
||||
```
|
||||
|
||||
See [cli-reference.md](cli-reference.md) for the full list of 42 available extractors.
|
||||
|
||||
### Scan
|
||||
|
||||
Control which files are scanned.
|
||||
|
||||
```toml
|
||||
[scan]
|
||||
exclude = ["target/", "node_modules/"]
|
||||
max_file_size = 1_048_576 # 1MB
|
||||
include_tests = false
|
||||
```
|
||||
|
||||
You can also use `.aphoriaignore` files (gitignore syntax).
|
||||
|
||||
### Corpus
|
||||
|
||||
Control corpus building and thresholds.
|
||||
|
||||
```toml
|
||||
[corpus]
|
||||
include_rfc = true
|
||||
include_owasp = true
|
||||
include_vendor = true
|
||||
use_community = true
|
||||
aggregation_enabled = true
|
||||
use_legacy_thresholds = false # Use adaptive thresholds
|
||||
```
|
||||
|
||||
**Scale-Adaptive Thresholds (default):**
|
||||
|
||||
Automatically adjusts promotion thresholds based on team size:
|
||||
- Micro (1-5 projects): Patterns visible with 2/3 adoption
|
||||
- Small (6-25 projects): Patterns visible with 5+ projects
|
||||
- Enterprise (501+): Unchanged behavior
|
||||
|
||||
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
||||
|
||||
**Legacy Thresholds:**
|
||||
|
||||
```toml
|
||||
[corpus]
|
||||
use_legacy_thresholds = true
|
||||
```
|
||||
|
||||
Fixed thresholds regardless of team size (old behavior).
|
||||
|
||||
### Hosted Mode
|
||||
|
||||
For team collaboration and pattern sharing.
|
||||
|
||||
```toml
|
||||
[hosted]
|
||||
url = "https://aphoria.example.com"
|
||||
project_id = "billing-api"
|
||||
team_id = "platform-team"
|
||||
sync_mode = "push_only"
|
||||
```
|
||||
|
||||
Requires hosted Aphoria server (future feature).
|
||||
|
||||
### Community Sharing
|
||||
|
||||
**CRITICAL:** Opt-in only. Anonymous pattern contribution.
|
||||
|
||||
```toml
|
||||
[community]
|
||||
enabled = false # Must explicitly opt-in
|
||||
anonymize = true # Project names are wildcarded
|
||||
```
|
||||
|
||||
When enabled with `--sync`, observations are anonymized and shared with the community corpus.
|
||||
|
||||
**Privacy Guarantees:**
|
||||
- Project names are wildcarded in paths
|
||||
- No file paths, line numbers, or source code
|
||||
- Only pattern aggregates (subject + predicate + value)
|
||||
|
||||
### LLM Extraction
|
||||
|
||||
Use LLMs (Gemini) for semantic claim detection.
|
||||
|
||||
```toml
|
||||
[llm]
|
||||
enabled = false # OPT-IN
|
||||
provider = "gemini"
|
||||
model = "gemini-3-flash-preview"
|
||||
api_key_env = "GEMINI_API_KEY"
|
||||
```
|
||||
|
||||
Requires API key in environment.
|
||||
|
||||
### Learning & Autonomous Promotion
|
||||
|
||||
**CRITICAL:** Both require explicit opt-in.
|
||||
|
||||
```toml
|
||||
[learning]
|
||||
enabled = false # Pattern learning from scans
|
||||
|
||||
[autonomous]
|
||||
enabled = false # Auto-promotion to extractors (kill switch)
|
||||
```
|
||||
|
||||
See [vision-gaps.md](vision-gaps.md) for implementation status.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Aphoria respects these environment variables:
|
||||
|
||||
| Variable | Purpose | Default |
|
||||
|----------|---------|---------|
|
||||
| `APHORIA_API_KEY` | Hosted mode API key | None (required if hosted.enabled) |
|
||||
| `GEMINI_API_KEY` | Gemini API key | None (required if llm.enabled) |
|
||||
| `STEMEDB_DB_DIR` | Override `data_dir` | `.aphoria/db` |
|
||||
| `APHORIA_CONFIG` | Config file path | `.aphoria/config.toml` |
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### From Old Home-Based Database
|
||||
|
||||
**Before (legacy):**
|
||||
```toml
|
||||
# Default in old versions: ~/.aphoria/db
|
||||
```
|
||||
|
||||
**After (new default):**
|
||||
```toml
|
||||
# Default now: ./.aphoria/db (per-project)
|
||||
```
|
||||
|
||||
**To keep legacy behavior:**
|
||||
```toml
|
||||
[episteme]
|
||||
data_dir = "~/.aphoria/db"
|
||||
```
|
||||
|
||||
No migration needed - just set `data_dir` to old path.
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Reference](cli-reference.md) - All commands and flags
|
||||
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Threshold configuration
|
||||
- [Comparison Modes](comparison-modes.md) - Claim comparison operators
|
||||
- [Vision Gaps](vision-gaps.md) - Implementation status
|
||||
698
applications/aphoria/docs/corpus-architecture.md
Normal file
698
applications/aphoria/docs/corpus-architecture.md
Normal file
@ -0,0 +1,698 @@
|
||||
# Corpus Database Architecture
|
||||
|
||||
**Audience:** Engineers integrating Aphoria with StemeDB API, ops teams deploying both systems.
|
||||
|
||||
**What you'll learn:**
|
||||
- How Aphoria's corpus database integrates with StemeDB API
|
||||
- URI scheme inference for authoritative sources
|
||||
- Where CLI-created corpus items live
|
||||
- Git hooks for automatic binary rebuilds
|
||||
- Production deployment patterns
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Aphoria CLI writes to:
|
||||
~/.aphoria/corpus-db/
|
||||
|
||||
# StemeDB API reads from:
|
||||
data/db/ # Default, or configure STEMEDB_CORPUS_DB_DIR
|
||||
|
||||
# Make API see Aphoria corpus:
|
||||
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||
stemedb-api
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Separation
|
||||
|
||||
### The Problem
|
||||
|
||||
Aphoria and StemeDB API use separate databases:
|
||||
|
||||
```
|
||||
Aphoria CLI:
|
||||
└─ corpus create/build → ~/.aphoria/corpus-db/
|
||||
|
||||
StemeDB API:
|
||||
└─ GET /v1/aphoria/corpus → data/db/
|
||||
|
||||
Result: Items created via CLI aren't visible in API/Dashboard
|
||||
```
|
||||
|
||||
### The Solution
|
||||
|
||||
Three integration patterns:
|
||||
|
||||
#### Pattern 1: Shared Database (Recommended for Development)
|
||||
|
||||
Point API to Aphoria's corpus database:
|
||||
|
||||
```bash
|
||||
# .env
|
||||
STEMEDB_CORPUS_DB_DIR=/home/user/.aphoria/corpus-db
|
||||
|
||||
# Start API
|
||||
cargo run --release -p stemedb-api
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Zero synchronization needed
|
||||
- Single source of truth
|
||||
- Changes immediately visible
|
||||
|
||||
**Cons:**
|
||||
- API has read-only access (can't write to corpus)
|
||||
- Not suitable if API needs to write corpus items
|
||||
|
||||
#### Pattern 2: Unified Database (Recommended for Production)
|
||||
|
||||
Use shared directory for both:
|
||||
|
||||
```bash
|
||||
# Create shared directory
|
||||
sudo mkdir -p /var/lib/stemedb/corpus
|
||||
sudo chown aphoria:stemedb /var/lib/stemedb/corpus
|
||||
sudo chmod 775 /var/lib/stemedb/corpus
|
||||
```
|
||||
|
||||
```toml
|
||||
# .aphoria/config.toml
|
||||
[episteme]
|
||||
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||
```
|
||||
|
||||
```bash
|
||||
# StemeDB API
|
||||
export STEMEDB_CORPUS_DB_DIR="/var/lib/stemedb/corpus"
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Single database, no sync
|
||||
- Both systems have write access
|
||||
- Production-ready pattern
|
||||
|
||||
**Cons:**
|
||||
- Requires deployment coordination
|
||||
- Permissions management needed
|
||||
|
||||
#### Pattern 3: Sync Mechanism (Future)
|
||||
|
||||
```bash
|
||||
# Planned (not yet implemented)
|
||||
aphoria corpus sync --to-api --api-db-dir data/db
|
||||
```
|
||||
|
||||
**Use case:** When databases must remain separate.
|
||||
|
||||
---
|
||||
|
||||
## URI Scheme Inference
|
||||
|
||||
### The Problem
|
||||
|
||||
Corpus items need URI-schemed subjects for API prefix scanning:
|
||||
|
||||
```bash
|
||||
# Without URI scheme (won't work):
|
||||
subject: "tls/certificate_verification"
|
||||
|
||||
# API queries:
|
||||
curl '/v1/aphoria/corpus?sources[]=rfc'
|
||||
# Scans for "subject:rfc://" → doesn't match plain subjects
|
||||
```
|
||||
|
||||
### The Solution
|
||||
|
||||
Automatic URI inference based on authority and tier:
|
||||
|
||||
```rust
|
||||
// In aphoria corpus create
|
||||
Authority: "RFC 5246 Section 7.4.2"
|
||||
Tier: 0
|
||||
|
||||
// Auto-inferred:
|
||||
subject_uri: "rfc://tls/certificate_verification"
|
||||
```
|
||||
|
||||
### Inference Rules
|
||||
|
||||
| Condition | Scheme | Example |
|
||||
|-----------|--------|---------|
|
||||
| Already has `://` | Preserved | `rfc://test` → `rfc://test` |
|
||||
| Authority contains "rfc" (case-insensitive) | `rfc://` | "RFC 5280" → `rfc://...` |
|
||||
| Authority contains "owasp" | `owasp://` | "OWASP Top 10" → `owasp://...` |
|
||||
| Authority contains "cwe" | `cwe://` | "CWE-120" → `cwe://...` |
|
||||
| Tier 2 | `vendor://` | GitHub docs → `vendor://...` |
|
||||
| Tier 3 | `community://` | Team wiki → `community://...` |
|
||||
| Tier 0/1 unrecognized | `corpus://` | Unknown → `corpus://...` |
|
||||
|
||||
**Priority:** Authority matching > Tier-based > Fallback
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# RFC claim (tier 0)
|
||||
aphoria corpus create \
|
||||
--subject "tls/validation" \
|
||||
--authority "RFC 5280 Section 6.1" \
|
||||
--tier 0
|
||||
# Stored as: subject:rfc://tls/validation
|
||||
|
||||
# OWASP claim (tier 1)
|
||||
aphoria corpus create \
|
||||
--subject "password/storage" \
|
||||
--authority "OWASP Password Storage Cheat Sheet" \
|
||||
--tier 1
|
||||
# Stored as: subject:owasp://password/storage
|
||||
|
||||
# Vendor docs (tier 2)
|
||||
aphoria corpus create \
|
||||
--subject "postgresql/connection_pool" \
|
||||
--authority "PostgreSQL Documentation" \
|
||||
--tier 2
|
||||
# Stored as: subject:vendor://postgresql/connection_pool
|
||||
|
||||
# Community (tier 3)
|
||||
aphoria corpus create \
|
||||
--subject "api/rest/pagination" \
|
||||
--authority "Team wiki: API standards" \
|
||||
--tier 3
|
||||
# Stored as: subject:community://api/rest/pagination
|
||||
|
||||
# Already schemed (preserved)
|
||||
aphoria corpus create \
|
||||
--subject "custom://myapp/feature" \
|
||||
--authority "Internal spec" \
|
||||
--tier 2
|
||||
# Stored as: subject:custom://myapp/feature
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI-Created Corpus Source
|
||||
|
||||
### The Problem
|
||||
|
||||
Items created with `aphoria corpus create` weren't visible in:
|
||||
|
||||
```bash
|
||||
aphoria corpus list
|
||||
# Showed: RFC, OWASP, VendorDocs
|
||||
# Missing: CLI-created items
|
||||
|
||||
aphoria corpus build
|
||||
# Total assertions: 86
|
||||
# Missing: CLI-created items
|
||||
```
|
||||
|
||||
### The Solution
|
||||
|
||||
CLI-created items are now a first-class corpus source:
|
||||
|
||||
```rust
|
||||
// Tagged at creation time
|
||||
metadata: {
|
||||
"source": "cli_create",
|
||||
"description": "...",
|
||||
"authority_source": "...",
|
||||
"category": "..."
|
||||
}
|
||||
|
||||
// Discovered by CliCreatedBuilder
|
||||
impl AsyncCorpusBuilder for CliCreatedBuilder {
|
||||
async fn build(...) -> Vec<Assertion> {
|
||||
// Scan corpus DB
|
||||
// Filter by metadata: "source": "cli_create"
|
||||
// Return assertions
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Now They Appear
|
||||
|
||||
```bash
|
||||
aphoria corpus list
|
||||
# Available corpus sources:
|
||||
# rfc:// (Tier 0) - RFC
|
||||
# owasp:// (Tier 1) - OWASP
|
||||
# vendor:// (Tier 2) - VendorDocs
|
||||
# cli:// (Tier 3) - CLI-Created Items ← NEW
|
||||
|
||||
aphoria corpus build
|
||||
# Corpus build complete:
|
||||
# Total assertions: 157
|
||||
# CLI-Created Items: 3 assertions ← NEW
|
||||
```
|
||||
|
||||
### Querying CLI-Created Items
|
||||
|
||||
```bash
|
||||
# Via API
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=cli'
|
||||
|
||||
# Via Dashboard
|
||||
# Navigate to: http://localhost:3000/corpus
|
||||
# Filter by "CLI-Created" source
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Git Hooks for Binary Rebuilds
|
||||
|
||||
### The Problem
|
||||
|
||||
Developer workflow:
|
||||
1. `git pull` (gets CLI definition changes)
|
||||
2. Run `aphoria corpus create`
|
||||
3. Error: "unrecognized subcommand 'create'"
|
||||
4. Confusion, time wasted
|
||||
5. Realize binary is stale: `cargo build --release -p aphoria`
|
||||
|
||||
### The Solution
|
||||
|
||||
Automatic rebuild hooks:
|
||||
|
||||
```bash
|
||||
# .git/hooks/post-merge
|
||||
if git diff-tree ... | grep -q "^applications/aphoria/src/cli"; then
|
||||
echo "🔧 CLI changed, rebuilding aphoria..."
|
||||
cargo build --release -p aphoria
|
||||
fi
|
||||
```
|
||||
|
||||
### Installed Hooks
|
||||
|
||||
**post-merge** - After `git pull` or `git merge`
|
||||
**post-checkout** - After `git checkout <branch>`
|
||||
**post-rewrite** - After `git rebase`
|
||||
|
||||
### What Triggers Rebuild
|
||||
|
||||
- **Aphoria CLI**: `applications/aphoria/src/cli/`
|
||||
- **API handlers**: `crates/stemedb-api/src/`
|
||||
- **Simulator**: `crates/stemedb-sim/src/`
|
||||
- **Core libraries**: `crates/stemedb-*`
|
||||
- **Dependencies**: `Cargo.toml` changes
|
||||
|
||||
### Installation
|
||||
|
||||
Hooks are in `.git/hooks/` (not tracked by git). To install on new clone:
|
||||
|
||||
```bash
|
||||
cd /home/jml/Workspace/stemedb
|
||||
ls -la .git/hooks/post-*
|
||||
|
||||
# If missing, check GIT-HOOKS-IMPLEMENTATION.md for setup
|
||||
```
|
||||
|
||||
### Bypass Hook (Emergency)
|
||||
|
||||
```bash
|
||||
# Temporarily disable all hooks
|
||||
git pull --no-verify
|
||||
|
||||
# Or set env var
|
||||
GIT_HOOKS_DISABLE=1 git pull
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Configurations
|
||||
|
||||
### Local Development
|
||||
|
||||
**Aphoria:**
|
||||
```bash
|
||||
# Default: uses ~/.aphoria/corpus-db/
|
||||
aphoria corpus create ...
|
||||
aphoria corpus build
|
||||
```
|
||||
|
||||
**StemeDB API:**
|
||||
```bash
|
||||
# Point to Aphoria's corpus
|
||||
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||
cargo run --release -p stemedb-api
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
volumes:
|
||||
corpus-db:
|
||||
|
||||
services:
|
||||
stemedb-api:
|
||||
image: stemedb-api:latest
|
||||
environment:
|
||||
- STEMEDB_CORPUS_DB_DIR=/var/lib/stemedb/corpus
|
||||
volumes:
|
||||
- corpus-db:/var/lib/stemedb/corpus
|
||||
ports:
|
||||
- "18180:18180"
|
||||
|
||||
aphoria-builder:
|
||||
image: aphoria:latest
|
||||
volumes:
|
||||
- corpus-db:/var/lib/stemedb/corpus
|
||||
- ./aphoria-config.toml:/etc/aphoria/config.toml
|
||||
command: corpus build
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: corpus-db
|
||||
spec:
|
||||
accessModes: [ReadWriteMany]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: stemedb-api
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: api
|
||||
image: stemedb-api:latest
|
||||
env:
|
||||
- name: STEMEDB_CORPUS_DB_DIR
|
||||
value: /var/lib/stemedb/corpus
|
||||
volumeMounts:
|
||||
- name: corpus-db
|
||||
mountPath: /var/lib/stemedb/corpus
|
||||
volumes:
|
||||
- name: corpus-db
|
||||
persistentVolumeClaim:
|
||||
claimName: corpus-db
|
||||
```
|
||||
|
||||
### Production (Bare Metal)
|
||||
|
||||
```bash
|
||||
# 1. Create shared corpus directory
|
||||
sudo mkdir -p /var/lib/stemedb/corpus
|
||||
sudo chown aphoria:stemedb /var/lib/stemedb/corpus
|
||||
sudo chmod 775 /var/lib/stemedb/corpus
|
||||
|
||||
# 2. Configure Aphoria
|
||||
cat > /etc/aphoria/config.toml <<EOF
|
||||
[episteme]
|
||||
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||
EOF
|
||||
|
||||
# 3. Configure StemeDB API
|
||||
cat > /etc/systemd/system/stemedb-api.service <<EOF
|
||||
[Service]
|
||||
Environment="STEMEDB_CORPUS_DB_DIR=/var/lib/stemedb/corpus"
|
||||
ExecStart=/usr/local/bin/stemedb-api
|
||||
User=stemedb
|
||||
Group=stemedb
|
||||
EOF
|
||||
|
||||
# 4. Start services
|
||||
systemctl start stemedb-api
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Pattern A: API-First (Read-Only Corpus)
|
||||
|
||||
**Use case:** Dashboard-driven architecture, corpus rarely changes.
|
||||
|
||||
```
|
||||
Workflow:
|
||||
1. Ops team creates corpus items via CLI
|
||||
2. API serves them to dashboard
|
||||
3. Developers view in dashboard (read-only)
|
||||
|
||||
Database:
|
||||
- Aphoria: ~/.aphoria/corpus-db/ (write)
|
||||
- API: points to Aphoria DB (read)
|
||||
```
|
||||
|
||||
**Config:**
|
||||
```bash
|
||||
# API
|
||||
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||
```
|
||||
|
||||
### Pattern B: CLI-First (Frequent Corpus Updates)
|
||||
|
||||
**Use case:** Active corpus curation, frequent CLI usage.
|
||||
|
||||
```
|
||||
Workflow:
|
||||
1. Developers create corpus items via CLI
|
||||
2. CLI builds corpus
|
||||
3. API/dashboard reflect latest corpus
|
||||
|
||||
Database:
|
||||
- Aphoria: /var/lib/stemedb/corpus (write)
|
||||
- API: /var/lib/stemedb/corpus (read)
|
||||
```
|
||||
|
||||
**Config:**
|
||||
```toml
|
||||
# .aphoria/config.toml
|
||||
[episteme]
|
||||
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||
```
|
||||
|
||||
```bash
|
||||
# API
|
||||
export STEMEDB_CORPUS_DB_DIR="/var/lib/stemedb/corpus"
|
||||
```
|
||||
|
||||
### Pattern C: Hybrid (Separate Stores + Sync)
|
||||
|
||||
**Use case:** Different corpus items in different stores.
|
||||
|
||||
```
|
||||
Workflow:
|
||||
1. Aphoria: authoritative corpus (RFC, OWASP, CLI-created)
|
||||
2. API: ephemeral assertions from scans
|
||||
3. Periodic sync or query union
|
||||
|
||||
Database:
|
||||
- Aphoria: ~/.aphoria/corpus-db/
|
||||
- API: data/db/
|
||||
- Sync: manual or scheduled
|
||||
```
|
||||
|
||||
**Sync (when implemented):**
|
||||
```bash
|
||||
# Planned
|
||||
aphoria corpus sync --to-api --api-db-dir data/db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Items created but not visible in API"
|
||||
|
||||
**Symptom:**
|
||||
```bash
|
||||
aphoria corpus create --subject "test" ...
|
||||
# Created corpus item: corpus://test/enabled
|
||||
|
||||
curl 'http://localhost:18180/v1/aphoria/corpus'
|
||||
# {"items":[], "total_matching": 0}
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check API config
|
||||
env | grep STEMEDB_CORPUS_DB_DIR
|
||||
# If empty, API is using data/db/
|
||||
|
||||
# Check Aphoria corpus DB
|
||||
ls -la ~/.aphoria/corpus-db/
|
||||
# Should see fjall/, redb/, wal/
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```bash
|
||||
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||
# Restart API
|
||||
pkill -f stemedb-api
|
||||
stemedb-api &
|
||||
```
|
||||
|
||||
### "Command not found after git pull"
|
||||
|
||||
**Symptom:**
|
||||
```bash
|
||||
git pull
|
||||
aphoria corpus create ...
|
||||
# error: unrecognized subcommand 'create'
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check binary date
|
||||
ls -lh target/release/aphoria
|
||||
# -rwxr-xr-x ... Jan 15 10:00 aphoria
|
||||
|
||||
# Check CLI code date
|
||||
ls -lh applications/aphoria/src/cli/mod.rs
|
||||
# -rw-r--r-- ... Feb 09 14:30 mod.rs ← Newer!
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```bash
|
||||
# Rebuild
|
||||
cargo build --release -p aphoria
|
||||
|
||||
# Or check if hooks are installed
|
||||
ls -la .git/hooks/post-merge
|
||||
# Should be executable and contain rebuild logic
|
||||
```
|
||||
|
||||
### "Corpus items have wrong URI scheme"
|
||||
|
||||
**Symptom:**
|
||||
```bash
|
||||
aphoria corpus create \
|
||||
--subject "tls/validation" \
|
||||
--authority "RFC 5280" \
|
||||
--tier 0
|
||||
|
||||
# API query fails
|
||||
curl '/v1/aphoria/corpus?sources[]=rfc'
|
||||
# {"items":[]}
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check stored subject (via debug scan)
|
||||
aphoria scan --show-observations | grep tls
|
||||
# If shows: subject:tls/validation (no rfc://)
|
||||
# Then URI inference didn't work
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
Rebuild aphoria binary (URI inference added in recent version):
|
||||
```bash
|
||||
cargo build --release -p aphoria
|
||||
```
|
||||
|
||||
### "Dashboard shows duplicate corpus items"
|
||||
|
||||
**Symptom:**
|
||||
Dashboard displays same item multiple times.
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check if corpus built multiple times
|
||||
aphoria corpus build --verbose
|
||||
# Look for same assertion appearing under multiple builders
|
||||
```
|
||||
|
||||
**Cause:**
|
||||
CLI-created items might also match RFC/OWASP builders if they have matching metadata.
|
||||
|
||||
**Fix:**
|
||||
This is expected behavior if:
|
||||
1. Item was created via CLI with RFC authority
|
||||
2. RFC builder also fetches it from RFC source
|
||||
3. Both versions appear in corpus
|
||||
|
||||
To deduplicate, ensure CLI-created items use unique subjects or authorities that don't overlap with fetched sources.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Aphoria CLI │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ aphoria corpus create │
|
||||
│ │ │
|
||||
│ ├─► infer_subject_uri() │
|
||||
│ │ (RFC/OWASP/CWE → scheme) │
|
||||
│ │ │
|
||||
│ ├─► create_corpus_item() │
|
||||
│ │ metadata: "source": "cli_create" │
|
||||
│ │ │
|
||||
│ └─► Store: ~/.aphoria/corpus-db/ │
|
||||
│ Key: "subject:rfc://tls/validation" │
|
||||
│ │
|
||||
│ aphoria corpus build │
|
||||
│ │ │
|
||||
│ ├─► HardcodedBuilder │
|
||||
│ ├─► RfcBuilder (network) │
|
||||
│ ├─► OwaspBuilder (network) │
|
||||
│ ├─► VendorDocsBuilder │
|
||||
│ └─► CliCreatedBuilder ← NEW │
|
||||
│ Filter: "source": "cli_create" │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ Shared Database
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ~/.aphoria/corpus-db/ │
|
||||
│ │
|
||||
│ subject:rfc://tls/validation → Assertion │
|
||||
│ subject:owasp://password/storage → Assertion │
|
||||
│ subject:community://api/rest → Assertion │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↑
|
||||
│ STEMEDB_CORPUS_DB_DIR
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ StemeDB API │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ GET /v1/aphoria/corpus?sources[]=rfc │
|
||||
│ │ │
|
||||
│ └─► corpus_store.scan_prefix("subject:rfc://") │
|
||||
│ ↓ │
|
||||
│ Returns: RFC assertions │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTP
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Aphoria Dashboard │
|
||||
│ │
|
||||
│ Filter: [RFC] [OWASP] [CLI-Created] │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ rfc://tls/validation │ │
|
||||
│ │ Tier 0 | Security │ │
|
||||
│ │ TLS cert verification MUST... │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Reference](cli-reference.md) - Complete command reference
|
||||
- [Configuration Reference](configuration.md) - Configuration file reference
|
||||
- [README](../README.md) - Quickstart and key concepts
|
||||
- [Comparison Modes](comparison-modes.md) - Deep dive on verification logic
|
||||
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Community corpus thresholds
|
||||
@ -27,6 +27,7 @@ Quick-start guides and workflows for Aphoria users.
|
||||
|-------|-------------|
|
||||
| [Golden Path Loop](./golden-path-loop.md) | Continuous policy improvement |
|
||||
| [AAA Game Development](./aaa-game-development.md) | Unreal Engine patterns |
|
||||
| [LLM Wiki Extraction](./llm-wiki-extraction.md) | Extract claims from technical docs using LLM skill |
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
|
||||
483
applications/aphoria/docs/guides/llm-wiki-extraction.md
Normal file
483
applications/aphoria/docs/guides/llm-wiki-extraction.md
Normal file
@ -0,0 +1,483 @@
|
||||
# LLM-Based Wiki Corpus Extraction
|
||||
|
||||
Extract factual claims from technical documentation using an LLM skill that intelligently chunks, analyzes, and persists to the corpus database.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Extract claims from a wiki article
|
||||
cd ~/Workspace/stemedb
|
||||
claude -p ~/path/to/wiki/article.md --skill extract-wiki-corpus
|
||||
|
||||
# Example with actual file
|
||||
claude -p ~/Workspace/orchard9/wiki/intakes/REQUEST_FOR_RESEARCH_ANSWERS.md \
|
||||
--skill extract-wiki-corpus
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Reading article: REQUEST_FOR_RESEARCH_ANSWERS.md (12,450 tokens)
|
||||
Chunked into 3 segments (by ## headings)
|
||||
|
||||
Chunk 1/3: "Critical Compatibility Solutions"
|
||||
Extracted 8 claims
|
||||
✓ ml/basicsr/torchvision/incompatible_with = ">=0.15"
|
||||
✓ ml/gpen/gfpgan/outperforms = "eye_enhancement"
|
||||
...
|
||||
|
||||
Chunk 2/3: "CUDA 12.9 Compatibility"
|
||||
Extracted 5 claims
|
||||
...
|
||||
|
||||
Summary: 23 claims extracted, 23 stored successfully
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### 1. Intelligent Chunking
|
||||
|
||||
The skill chunks large articles to fit LLM context limits:
|
||||
|
||||
**Strategy:**
|
||||
- Target: ~4K tokens per chunk
|
||||
- Break at `##` headings when possible
|
||||
- Preserve context: Include document title + section path in each chunk
|
||||
|
||||
**Example:**
|
||||
```markdown
|
||||
# Python Dependency Stack
|
||||
## Critical Solutions
|
||||
### BasicSR Fix
|
||||
[content...]
|
||||
```
|
||||
|
||||
Becomes 3 chunks:
|
||||
1. `"Python Dependency Stack / Critical Solutions / BasicSR Fix"` + content
|
||||
2. `"Python Dependency Stack / Critical Solutions / GPEN vs GFPGAN"` + content
|
||||
3. `"Python Dependency Stack / CUDA Compatibility"` + content
|
||||
|
||||
### 2. LLM Claim Extraction
|
||||
|
||||
For each chunk, Claude extracts factual assertions as structured JSON:
|
||||
|
||||
**Extraction Criteria:**
|
||||
- Factual (verifiable from text)
|
||||
- Useful for developers
|
||||
- Has clear subject/predicate/value
|
||||
|
||||
**Example extraction:**
|
||||
|
||||
Input text:
|
||||
```markdown
|
||||
### BasicSR/Torchvision Fix
|
||||
The core issue is that basicsr 1.4.2 imports from
|
||||
`torchvision.transforms.functional_tensor` which was removed in
|
||||
torchvision 0.15+.
|
||||
|
||||
**Primary Solution:**
|
||||
git+https://github.com/XPixelGroup/BasicSR@8d56e3a
|
||||
```
|
||||
|
||||
Extracted claim:
|
||||
```json
|
||||
{
|
||||
"subject": "ml/dependencies/basicsr/torchvision",
|
||||
"predicate": "incompatible_with",
|
||||
"value": ">=0.15",
|
||||
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||
"authority": "XPixelGroup/BasicSR@8d56e3a",
|
||||
"category": "compatibility"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Authority Inference
|
||||
|
||||
The LLM infers authority sources from context:
|
||||
|
||||
| Pattern | Authority Format | Example |
|
||||
|---------|-----------------|---------|
|
||||
| GitHub URL | `repo@commit` | `XPixelGroup/BasicSR@8d56e3a` |
|
||||
| Research paper | `Author et al. (Year)` | `Smith et al. (2023)` |
|
||||
| Official docs | `Product Documentation` | `PyTorch Documentation` |
|
||||
| Empirical | `Community consensus` | `Community best practice` |
|
||||
|
||||
### 4. Tier Assignment
|
||||
|
||||
The skill assigns tiers based on authority source:
|
||||
|
||||
| Tier | Authority Type | Examples |
|
||||
|------|---------------|----------|
|
||||
| 0 | Regulatory specs | RFC, W3C standards |
|
||||
| 1 | Authoritative sources | Official docs, research papers |
|
||||
| 2 | Observational | GitHub repos, community consensus |
|
||||
| 3 | Empirical | Unverified claims |
|
||||
|
||||
**Guidance to LLM:**
|
||||
- Official standards (RFC, W3C) → Tier 0
|
||||
- Official documentation, published research → Tier 1
|
||||
- GitHub repos, maintainer statements → Tier 2
|
||||
- Community reports, unverified → Tier 3
|
||||
|
||||
### 5. Persistence via CLI
|
||||
|
||||
Each extracted claim is stored using:
|
||||
|
||||
```bash
|
||||
aphoria corpus create \
|
||||
--subject "ml/dependencies/basicsr/torchvision" \
|
||||
--predicate "incompatible_with" \
|
||||
--value ">=0.15" \
|
||||
--explanation "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in 0.15+" \
|
||||
--authority "XPixelGroup/BasicSR@8d56e3a" \
|
||||
--category "compatibility" \
|
||||
--tier 2
|
||||
```
|
||||
|
||||
## CLI Reference: `aphoria corpus create`
|
||||
|
||||
Create a corpus assertion from structured claim data.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
aphoria corpus create \
|
||||
--subject <hierarchical/path> \
|
||||
--predicate <relationship> \
|
||||
--value <value> \
|
||||
--explanation <full-context> \
|
||||
--authority <source> \
|
||||
--category <category> \
|
||||
--tier <0-3>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
|
||||
| Flag | Required | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `--subject` | Yes | Hierarchical path to concept | `ml/basicsr/torchvision` |
|
||||
| `--predicate` | Yes | Relationship type | `incompatible_with` |
|
||||
| `--value` | Yes | Value or constraint | `">=0.15"` |
|
||||
| `--explanation` | Yes | Full context sentence | `"basicsr 1.4.2 imports from..."` |
|
||||
| `--authority` | Yes | Source citation | `XPixelGroup/BasicSR@8d56e3a` |
|
||||
| `--category` | Yes | Category tag | `compatibility` |
|
||||
| `--tier` | Yes | Authority tier (0-3) | `2` |
|
||||
|
||||
**Categories:**
|
||||
- `compatibility` - Dependency constraints, version requirements
|
||||
- `performance` - Performance characteristics, benchmarks
|
||||
- `security` - Security properties, vulnerabilities
|
||||
- `architecture` - Design patterns, structure
|
||||
- `behavior` - Functional behavior, side effects
|
||||
|
||||
**Behavior:**
|
||||
|
||||
**Deduplication:** Stores ALL claims, even if subject+predicate exists. This is append-only; sourced differing claims are the whole point of Episteme.
|
||||
|
||||
**Error Handling:** Bundles all validation errors and presents them together:
|
||||
|
||||
```
|
||||
Error creating corpus assertion:
|
||||
|
||||
Validation errors:
|
||||
1. --subject: Must be non-empty hierarchical path (got: "")
|
||||
2. --tier: Must be 0-3 (got: 5)
|
||||
3. --category: Must be one of: compatibility, performance, security, architecture, behavior (got: "random")
|
||||
|
||||
Fix all errors and retry.
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
$ aphoria corpus create \
|
||||
--subject "ml/pytorch/version" \
|
||||
--predicate "requires" \
|
||||
--value ">=2.0" \
|
||||
--explanation "Uses torch.compile which requires PyTorch 2.0+" \
|
||||
--authority "PyTorch 2.0 Release Notes" \
|
||||
--category "compatibility" \
|
||||
--tier 1
|
||||
|
||||
✓ Created corpus assertion: ml/pytorch/version
|
||||
Stored in: ~/.aphoria/corpus-db
|
||||
```
|
||||
|
||||
## Skill Output Format
|
||||
|
||||
The `extract-wiki-corpus` skill produces structured output:
|
||||
|
||||
```
|
||||
Reading article: REQUEST_FOR_RESEARCH_ANSWERS.md (12,450 tokens)
|
||||
Chunked into 3 segments (by ## headings)
|
||||
|
||||
Chunk 1/3: "Critical Compatibility Solutions"
|
||||
Extracted 8 claims
|
||||
|
||||
1. ml/dependencies/basicsr/torchvision
|
||||
incompatible_with = ">=0.15"
|
||||
Authority: XPixelGroup/BasicSR@8d56e3a
|
||||
✓ Stored
|
||||
|
||||
2. ml/enhancements/gpen/gfpgan
|
||||
outperforms = "eye_enhancement"
|
||||
Authority: Research comparison (2023)
|
||||
✓ Stored
|
||||
|
||||
[... 6 more claims ...]
|
||||
|
||||
Chunk 2/3: "CUDA 12.9 Compatibility"
|
||||
Extracted 5 claims
|
||||
|
||||
9. ml/face_detection/mediaipe/dlib
|
||||
preferred_over = "CUDA 12 support"
|
||||
Authority: Community consensus
|
||||
✓ Stored
|
||||
|
||||
[... 4 more claims ...]
|
||||
|
||||
Chunk 3/3: "Optimized Requirements"
|
||||
Extracted 10 claims
|
||||
|
||||
[... all claims ...]
|
||||
|
||||
Summary:
|
||||
Total claims: 23
|
||||
Successfully stored: 23
|
||||
Failed: 0
|
||||
|
||||
Corpus database: ~/.aphoria/corpus-db
|
||||
Query: curl 'http://localhost:18180/v1/aphoria/corpus?category=compatibility'
|
||||
```
|
||||
|
||||
**If errors occur:**
|
||||
```
|
||||
Summary:
|
||||
Total claims: 23
|
||||
Successfully stored: 18
|
||||
Failed: 5
|
||||
|
||||
Errors:
|
||||
1. Claim #7 (ml/torch/cuda/version)
|
||||
- --tier: Must be 0-3 (got: 5)
|
||||
- Fix: LLM assigned invalid tier
|
||||
|
||||
2. Claim #12 (ml/xformers/optional)
|
||||
- --subject: Empty subject path
|
||||
- Fix: LLM extraction failed
|
||||
|
||||
[... 3 more errors with details ...]
|
||||
|
||||
Fix these issues and re-run extraction.
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
After extraction, verify claims appear in the corpus:
|
||||
|
||||
```bash
|
||||
# Query all compatibility claims
|
||||
curl -s 'http://localhost:18180/v1/aphoria/corpus?category=compatibility' | jq '.total_matching'
|
||||
# Expected: 23 (or however many were extracted)
|
||||
|
||||
# Query specific subject
|
||||
curl -s 'http://localhost:18180/v1/aphoria/corpus' | \
|
||||
jq '.items[] | select(.subject | contains("basicsr"))'
|
||||
|
||||
# Expected output:
|
||||
{
|
||||
"subject": "ml/dependencies/basicsr/torchvision",
|
||||
"predicate": "incompatible_with",
|
||||
"value": ">=0.15",
|
||||
"source": "ml://",
|
||||
"tier": 2,
|
||||
"category": "compatibility",
|
||||
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in 0.15+",
|
||||
"authority_source": "XPixelGroup/BasicSR@8d56e3a"
|
||||
}
|
||||
```
|
||||
|
||||
## Dashboard View
|
||||
|
||||
Extracted claims appear in the Aphoria dashboard at `/corpus`:
|
||||
|
||||
**Filters:**
|
||||
- By category: compatibility, performance, security, architecture, behavior
|
||||
- By tier: 0 (Regulatory), 1 (Authoritative), 2 (Observational), 3 (Empirical)
|
||||
- By source: ml://, security://, etc.
|
||||
|
||||
**Display:**
|
||||
- Subject path as breadcrumbs: `ml > dependencies > basicsr > torchvision`
|
||||
- Tier badge with color coding
|
||||
- Full explanation text
|
||||
- Authority citation as link (if URL)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Problem:** Skill chunks too aggressively, loses context
|
||||
|
||||
**Solution:** Adjust chunk size in skill configuration (target 4K tokens, can go up to 8K for complex articles)
|
||||
|
||||
---
|
||||
|
||||
**Problem:** LLM assigns wrong tiers
|
||||
|
||||
**Solution:** Refine tier guidance in skill prompt:
|
||||
- Official standards (RFC, IEEE) → Tier 0
|
||||
- Official docs, peer-reviewed papers → Tier 1
|
||||
- GitHub repos, maintainer statements → Tier 2
|
||||
- Blog posts, community forums → Tier 3
|
||||
|
||||
---
|
||||
|
||||
**Problem:** Too many failed claims (validation errors)
|
||||
|
||||
**Solution:** Check common error patterns:
|
||||
```bash
|
||||
# Review failed claims
|
||||
grep "Failed:" /tmp/extraction-output.log
|
||||
|
||||
# Common issues:
|
||||
# 1. Empty subjects - LLM extraction failed
|
||||
# 2. Invalid tiers - LLM assigned tier > 3
|
||||
# 3. Missing required fields - Incomplete extraction
|
||||
```
|
||||
|
||||
Fix by refining LLM extraction prompt.
|
||||
|
||||
---
|
||||
|
||||
**Problem:** Duplicate claims (same subject+predicate)
|
||||
|
||||
**This is expected behavior.** Episteme stores ALL claims, even duplicates from different sources. This enables:
|
||||
- Sourced differing opinions (PyTorch docs say X, community says Y)
|
||||
- Conflict detection (authority says A, codebase does B)
|
||||
- Historical tracking (claim evolved over time)
|
||||
|
||||
To query all claims for a subject:
|
||||
```bash
|
||||
curl -s 'http://localhost:18180/v1/aphoria/corpus' | \
|
||||
jq '.items[] | select(.subject == "ml/dependencies/basicsr/torchvision")'
|
||||
```
|
||||
|
||||
## Integration with Other Features
|
||||
|
||||
**With Scans:**
|
||||
- Corpus claims act as authority sources
|
||||
- Aphoria compares scanned observations against corpus
|
||||
- Conflicts trigger violations
|
||||
|
||||
**With Claims Management:**
|
||||
- Can supersede corpus claims: `aphoria claims supersede <id>`
|
||||
- Can deprecate outdated corpus: `aphoria claims deprecate <id>`
|
||||
- Corpus claims have same structure as project claims
|
||||
|
||||
**With Dashboard:**
|
||||
- All corpus claims visible at `/corpus`
|
||||
- Filterable by category, tier, source
|
||||
- Click through to see full explanation
|
||||
|
||||
## Best Practices
|
||||
|
||||
**DO:**
|
||||
- Extract from authoritative sources (official docs, research)
|
||||
- Verify claims appear in dashboard after extraction
|
||||
- Review tier assignments for accuracy
|
||||
- Include full context in explanations
|
||||
|
||||
**DON'T:**
|
||||
- Extract from opinion pieces or blogs (or use tier 3)
|
||||
- Skip authority citations (always provide source)
|
||||
- Use vague subjects ("thing" → "ml/pytorch/feature/specific")
|
||||
- Ignore validation errors (fix all before considering extraction complete)
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: ML Dependencies
|
||||
|
||||
**Input:** `~/wiki/ml-stack.md`
|
||||
```markdown
|
||||
## PyTorch CUDA Compatibility
|
||||
|
||||
PyTorch 2.6.0 with CUDA 12.6 builds are forward compatible with CUDA 12.9.
|
||||
|
||||
Source: PyTorch 2.6 Release Notes
|
||||
```
|
||||
|
||||
**Extraction:**
|
||||
```bash
|
||||
claude -p ~/wiki/ml-stack.md --skill extract-wiki-corpus
|
||||
|
||||
# Output:
|
||||
Extracted 1 claim:
|
||||
✓ ml/pytorch/cuda/compatibility
|
||||
predicate: forward_compatible_with
|
||||
value: "CUDA 12.9"
|
||||
tier: 1 (PyTorch 2.6 Release Notes)
|
||||
```
|
||||
|
||||
### Example 2: Security Best Practices
|
||||
|
||||
**Input:** `~/wiki/security.md`
|
||||
```markdown
|
||||
## Password Hashing
|
||||
|
||||
Research shows Argon2 consistently outperforms bcrypt and scrypt for
|
||||
password hashing in modern environments.
|
||||
|
||||
Source: OWASP Password Storage Cheat Sheet (2023)
|
||||
```
|
||||
|
||||
**Extraction:**
|
||||
```bash
|
||||
claude -p ~/wiki/security.md --skill extract-wiki-corpus
|
||||
|
||||
# Output:
|
||||
Extracted 1 claim:
|
||||
✓ security/password/hashing/algorithm
|
||||
predicate: recommended
|
||||
value: "Argon2"
|
||||
tier: 1 (OWASP Password Storage Cheat Sheet)
|
||||
```
|
||||
|
||||
### Example 3: Large Article
|
||||
|
||||
**Input:** `~/wiki/complete-stack.md` (15,000 tokens)
|
||||
```markdown
|
||||
# Complete Python Stack for SDXL
|
||||
|
||||
## Critical Solutions
|
||||
[4,000 tokens]
|
||||
|
||||
## Enhancement Libraries
|
||||
[5,000 tokens]
|
||||
|
||||
## CUDA Compatibility
|
||||
[6,000 tokens]
|
||||
```
|
||||
|
||||
**Extraction:**
|
||||
```bash
|
||||
claude -p ~/wiki/complete-stack.md --skill extract-wiki-corpus
|
||||
|
||||
# Output:
|
||||
Reading article: complete-stack.md (15,234 tokens)
|
||||
Chunked into 3 segments (by ## headings)
|
||||
|
||||
Chunk 1/3: "Critical Solutions"
|
||||
Extracted 12 claims
|
||||
...
|
||||
|
||||
Chunk 2/3: "Enhancement Libraries"
|
||||
Extracted 8 claims
|
||||
...
|
||||
|
||||
Chunk 3/3: "CUDA Compatibility"
|
||||
Extracted 7 claims
|
||||
...
|
||||
|
||||
Summary: 27 claims extracted, 27 stored successfully
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Reference](../cli-reference.md) - All `aphoria corpus` commands
|
||||
- [Corpus API](../api-reference.md) - Query corpus programmatically
|
||||
- [Claims vs Observations](../../README.md#claims-vs-observations) - Key concepts
|
||||
@ -42,7 +42,9 @@ Ingested 1,240 authoritative assertions.
|
||||
Ready.
|
||||
```
|
||||
|
||||
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).
|
||||
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).
|
||||
|
||||
> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.
|
||||
|
||||
## 3. The First Scan
|
||||
|
||||
|
||||
181
applications/aphoria/docs/scale-adaptive-thresholds.md
Normal file
181
applications/aphoria/docs/scale-adaptive-thresholds.md
Normal file
@ -0,0 +1,181 @@
|
||||
# Scale-Adaptive Promotion Thresholds
|
||||
|
||||
## Overview
|
||||
|
||||
Scale-adaptive thresholds automatically adjust promotion criteria based on organization size, enabling small teams to see value immediately while maintaining quality gates for larger organizations.
|
||||
|
||||
## The Problem
|
||||
|
||||
**Before adaptive thresholds:**
|
||||
- Hardcoded minimums: 850/100/50 projects for regulatory/clinical/emerging
|
||||
- Small teams (2-5 projects) → **0 patterns promoted** → empty dashboard
|
||||
- No immediate value demonstration → adoption killed before flywheel starts
|
||||
|
||||
**Root cause:**
|
||||
- Thresholds designed for enterprise scale (850 projects for regulatory)
|
||||
- Small teams locked out: can't meet 50-project minimum for emerging tier
|
||||
- Dashboard queries promoted patterns only (no visibility into raw aggregates)
|
||||
|
||||
## The Solution
|
||||
|
||||
### Adaptive Formula
|
||||
|
||||
```rust
|
||||
effective_min_projects = max(
|
||||
absolute_floor, // Safety: prevent single-project noise
|
||||
(percentage * total_projects).ceil() // Scale: grow with team
|
||||
)
|
||||
```
|
||||
|
||||
### Scale Tiers (Auto-Detected)
|
||||
|
||||
| Tier | Project Range | Behavior |
|
||||
|------|--------------|----------|
|
||||
| **Micro** | 1-5 | Only emerging tier, floor=2, rate=50% |
|
||||
| **Small** | 6-25 | All tiers enabled, lower floors |
|
||||
| **Medium** | 26-100 | Balanced thresholds |
|
||||
| **Large** | 101-500 | Higher quality gates |
|
||||
| **Enterprise** | 501+ | Current defaults (backward compatible) |
|
||||
|
||||
### Example: Emerging Tier Scaling
|
||||
|
||||
| Team Size | Projects | Formula | Min Projects | Adoption Required |
|
||||
|-----------|----------|---------|--------------|-------------------|
|
||||
| Micro | 3 | `max(2, 0.50*3)` | **2** | 2/3 projects (67%) |
|
||||
| Small | 10 | `max(2, 0.40*10)` | **4** | 4/10 projects (40%) |
|
||||
| Medium | 50 | `max(5, 0.40*50)` | **20** | 20/50 projects (40%) |
|
||||
| Enterprise | 1000 | `max(25, 0.50*1000)` | **500** | 500/1000 projects (50%) |
|
||||
|
||||
## Quality Maintained
|
||||
|
||||
✅ **Floor prevents noise:** Single-project patterns blocked
|
||||
✅ **Adoption rate required:** Community consensus still matters
|
||||
✅ **Authority matching enforced:** Regulatory/clinical tiers need RFC/OWASP match
|
||||
✅ **Manual review:** Emerging tier still requires review (auto_promote=false)
|
||||
✅ **Backward compatible:** Enterprise behavior unchanged
|
||||
|
||||
## Configuration
|
||||
|
||||
### Default (Adaptive)
|
||||
|
||||
```toml
|
||||
# .aphoria/config.toml
|
||||
[corpus]
|
||||
use_community = true
|
||||
aggregation_enabled = true
|
||||
# adaptive_thresholds = <optional custom thresholds>
|
||||
use_legacy_thresholds = false # Default: use adaptive
|
||||
```
|
||||
|
||||
### Legacy Mode (Static Thresholds)
|
||||
|
||||
```toml
|
||||
[corpus]
|
||||
use_legacy_thresholds = true # Use fixed 850/100/50
|
||||
```
|
||||
|
||||
### Custom Thresholds
|
||||
|
||||
```toml
|
||||
[corpus.adaptive_thresholds.micro.emerging]
|
||||
min_projects_floor = 1 # Override: allow 1 project (risky!)
|
||||
min_projects_percentage = 0.40
|
||||
min_adoption_rate = 0.40
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **`ScaleTier`** (`corpus/thresholds.rs`):
|
||||
- `from_total_projects(u64) -> ScaleTier`
|
||||
- Auto-detects tier from project count
|
||||
|
||||
2. **`AdaptiveCriteria`** (`corpus/thresholds.rs`):
|
||||
- `effective_min_projects(total_projects) -> u64`
|
||||
- Applies `max(floor, percentage * total)` formula
|
||||
|
||||
3. **`ScaleAdaptiveThresholds`** (`corpus/thresholds.rs`):
|
||||
- `evaluate(project_count, total_projects, ...) -> PromotionDecision`
|
||||
- Returns `AutoPromote(tier)`, `RequireReview`, or `Skip`
|
||||
|
||||
4. **`CommunityCorpusBuilder`** (`corpus/community.rs`):
|
||||
- Updated to use adaptive thresholds when `use_adaptive=true`
|
||||
- Falls back to legacy thresholds when `use_legacy_thresholds=true`
|
||||
- Logs scale tier and threshold mode on build
|
||||
|
||||
### Configuration Fields
|
||||
|
||||
**`CorpusConfig`** (`config/types/scan.rs`):
|
||||
- `adaptive_thresholds: Option<ScaleAdaptiveThresholds>` - Custom thresholds
|
||||
- `use_legacy_thresholds: bool` - Backward compatibility flag (default: false)
|
||||
|
||||
## Usage
|
||||
|
||||
### Micro Team Example (3 projects)
|
||||
|
||||
```bash
|
||||
# Scan 3 projects
|
||||
cd project1 && aphoria scan --persist --sync
|
||||
cd project2 && aphoria scan --persist --sync
|
||||
cd project3 && aphoria scan --persist --sync
|
||||
|
||||
# Check logs
|
||||
# Should see:
|
||||
# scale_tier=Micro, use_adaptive=true
|
||||
# Pattern promoted: 2/3 projects (67%) → RequireReview
|
||||
```
|
||||
|
||||
### Query Patterns
|
||||
|
||||
```bash
|
||||
# API: Patterns with min 1 project (shows all for micro teams)
|
||||
curl 'http://localhost:18180/api/patterns?min_projects=1&limit=10'
|
||||
|
||||
# Dashboard will show:
|
||||
# - Scale tier: "Micro (3 projects)"
|
||||
# - Promoted patterns visible
|
||||
# - Thresholds: "Emerging: 2/3 projects (67%)"
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `test_scale_tier_detection()` - Verify tier boundaries
|
||||
- `test_effective_min_projects()` - Floor vs percentage dominance
|
||||
- `test_micro_team_promotion()` - 2/3 projects promoted
|
||||
- `test_regulatory_disabled_for_micro()` - Tier disabling works
|
||||
- `test_enterprise_backward_compatible()` - Same as legacy
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- `scale_adaptive_test.rs` - 7 tests covering all scenarios
|
||||
- All 1199 library tests pass
|
||||
|
||||
## Migration
|
||||
|
||||
**Existing deployments:** No action required
|
||||
- Adaptive thresholds default to enabled
|
||||
- Enterprise behavior unchanged (501+ projects)
|
||||
- Legacy mode available if needed
|
||||
|
||||
**New deployments:** Immediate value
|
||||
- Small teams see patterns after 2-3 scans
|
||||
- Quality maintained via floors and adoption rates
|
||||
- Natural growth path as team scales
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Start simple, scale naturally:**
|
||||
- Small teams see value immediately (2-3 projects → patterns visible)
|
||||
- Quality maintained via floors (no single-project noise)
|
||||
- Adoption rate still matters (community consensus)
|
||||
- Enterprise behavior unchanged (backward compatible)
|
||||
- Configuration optional (defaults work for 95%)
|
||||
|
||||
**This unlocks the flywheel:**
|
||||
- Small teams adopt → see patterns → gain trust
|
||||
- Teams grow → thresholds tighten → quality improves
|
||||
- Cross-team patterns emerge → community corpus strengthens
|
||||
- No manual threshold tuning required
|
||||
88
applications/aphoria/examples/scale_adaptive_demo.rs
Normal file
88
applications/aphoria/examples/scale_adaptive_demo.rs
Normal file
@ -0,0 +1,88 @@
|
||||
//! Demonstrates scale-adaptive promotion thresholds.
|
||||
//!
|
||||
//! Run with: `cargo run --example scale_adaptive_demo`
|
||||
|
||||
use aphoria::corpus::thresholds::{ScaleAdaptiveThresholds, ScaleTier};
|
||||
|
||||
fn main() {
|
||||
println!("=== Scale-Adaptive Promotion Thresholds Demo ===\n");
|
||||
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Scenario 1: Micro Team (3 projects)
|
||||
println!("📊 Scenario 1: Micro Team (3 projects)");
|
||||
println!("Pattern appears in 2 out of 3 projects (67% adoption)\n");
|
||||
|
||||
let tier = ScaleTier::from_total_projects(3);
|
||||
println!(" Scale Tier: {:?}", tier);
|
||||
|
||||
let decision = thresholds.evaluate(2, 3, false, None);
|
||||
println!(" Decision: {:?}", decision);
|
||||
println!(" ✅ Pattern VISIBLE to team (RequireReview)\n");
|
||||
|
||||
// Scenario 2: Small Team with RFC match
|
||||
println!("📊 Scenario 2: Small Team (10 projects)");
|
||||
println!("Pattern appears in 9 projects with RFC match (90% adoption)\n");
|
||||
|
||||
let tier = ScaleTier::from_total_projects(10);
|
||||
println!(" Scale Tier: {:?}", tier);
|
||||
|
||||
let decision = thresholds.evaluate(9, 10, true, Some("rfc://5246"));
|
||||
println!(" Decision: {:?}", decision);
|
||||
println!(" ✅ Auto-promoted to Regulatory tier\n");
|
||||
|
||||
// Scenario 3: Enterprise (1000 projects)
|
||||
println!("📊 Scenario 3: Enterprise (1000 projects)");
|
||||
println!("Pattern appears in 950 projects with RFC match (95% adoption)\n");
|
||||
|
||||
let tier = ScaleTier::from_total_projects(1000);
|
||||
println!(" Scale Tier: {:?}", tier);
|
||||
|
||||
let decision = thresholds.evaluate(950, 1000, true, Some("rfc://9110"));
|
||||
println!(" Decision: {:?}", decision);
|
||||
println!(" ✅ Auto-promoted to Regulatory tier (backward compatible)\n");
|
||||
|
||||
// Scenario 4: Noise prevention
|
||||
println!("📊 Scenario 4: Noise Prevention (3 projects)");
|
||||
println!("Pattern appears in only 1 project (33% adoption)\n");
|
||||
|
||||
let tier = ScaleTier::from_total_projects(3);
|
||||
println!(" Scale Tier: {:?}", tier);
|
||||
|
||||
let decision = thresholds.evaluate(1, 3, false, None);
|
||||
println!(" Decision: {:?}", decision);
|
||||
println!(" ✅ Skipped (floor prevents single-project noise)\n");
|
||||
|
||||
// Show threshold matrix
|
||||
println!("=== Threshold Matrix ===\n");
|
||||
println!("| Tier | Projects | Emerging Floor | Regulatory Floor |");
|
||||
println!("|------------|----------|----------------|------------------|");
|
||||
|
||||
for (name, total) in [
|
||||
("Micro", 3),
|
||||
("Small", 10),
|
||||
("Medium", 50),
|
||||
("Large", 200),
|
||||
("Enterprise", 1000),
|
||||
] {
|
||||
let tier = ScaleTier::from_total_projects(total);
|
||||
let tier_thresholds = thresholds.for_tier(tier);
|
||||
|
||||
let emerging_min = tier_thresholds.emerging.effective_min_projects(total);
|
||||
|
||||
let regulatory_min = if let Some(reg) = &tier_thresholds.regulatory {
|
||||
format!("{}", reg.effective_min_projects(total))
|
||||
} else {
|
||||
"N/A".to_string()
|
||||
};
|
||||
|
||||
println!(
|
||||
"| {:10} | {:8} | {:14} | {:16} |",
|
||||
name, total, emerging_min, regulatory_min
|
||||
);
|
||||
}
|
||||
|
||||
println!("\n✅ Small teams see value immediately!");
|
||||
println!("✅ Quality maintained via floors and adoption rates!");
|
||||
println!("✅ Enterprise behavior unchanged!");
|
||||
}
|
||||
@ -380,6 +380,37 @@ pub enum CorpusCommands {
|
||||
#[arg(long)]
|
||||
offline: bool,
|
||||
},
|
||||
|
||||
/// Create a new corpus item from structured data
|
||||
Create {
|
||||
/// Subject path (e.g., "ml/dependencies/basicsr/torchvision")
|
||||
#[arg(long)]
|
||||
subject: String,
|
||||
|
||||
/// Predicate (e.g., "incompatible_with", "requires", "recommends")
|
||||
#[arg(long)]
|
||||
predicate: String,
|
||||
|
||||
/// Value (string, number, or boolean)
|
||||
#[arg(long)]
|
||||
value: String,
|
||||
|
||||
/// Full explanation/context for this claim
|
||||
#[arg(long)]
|
||||
explanation: String,
|
||||
|
||||
/// Authority source (GitHub URL, paper citation, docs URL)
|
||||
#[arg(long)]
|
||||
authority: String,
|
||||
|
||||
/// Category (compatibility, performance, security, architecture)
|
||||
#[arg(long)]
|
||||
category: String,
|
||||
|
||||
/// Authority tier (0=regulatory, 1=clinical, 2=observational, 3=community)
|
||||
#[arg(long)]
|
||||
tier: u8,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
|
||||
@ -11,7 +11,11 @@ use super::types::{
|
||||
|
||||
impl Default for EpistemeConfig {
|
||||
fn default() -> Self {
|
||||
Self { data_dir: dirs_default_data_dir(), url: None }
|
||||
Self {
|
||||
data_dir: dirs_default_data_dir(),
|
||||
corpus_data_dir: Some(dirs_default_corpus_dir()),
|
||||
url: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -147,6 +151,8 @@ impl Default for CorpusConfig {
|
||||
use_community: true, // Enabled by default - async runtime issue resolved
|
||||
aggregation_enabled: true, // Enable observation aggregation
|
||||
rfc_list: None,
|
||||
adaptive_thresholds: None, // Use built-in defaults
|
||||
use_legacy_thresholds: false, // Use adaptive by default
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -239,11 +245,30 @@ impl Default for AutonomousConfig {
|
||||
}
|
||||
|
||||
/// Get the default Aphoria data directory.
|
||||
///
|
||||
/// **Changed in Phase 2:** Now defaults to project-local `.aphoria/db/` instead of
|
||||
/// home-based `~/.aphoria/db/`. This enables proper per-project database isolation.
|
||||
///
|
||||
/// To override for shared mode (all projects on machine), set:
|
||||
/// ```toml
|
||||
/// [episteme]
|
||||
/// data_dir = "~/.aphoria/db" # Or any absolute path
|
||||
/// ```
|
||||
fn dirs_default_data_dir() -> PathBuf {
|
||||
PathBuf::from(".aphoria/db")
|
||||
}
|
||||
|
||||
/// Get the default corpus database directory (shared across projects).
|
||||
///
|
||||
/// **New in Phase 3:** Corpus database stores aggregated pattern data from multiple
|
||||
/// projects for community corpus building. This is separate from per-project observations.
|
||||
///
|
||||
/// **Default:** `~/.aphoria/corpus-db` (home-based, shared across all projects)
|
||||
fn dirs_default_corpus_dir() -> PathBuf {
|
||||
if let Some(home) = dirs::home_dir() {
|
||||
home.join(".aphoria").join("db")
|
||||
home.join(".aphoria").join("corpus-db")
|
||||
} else {
|
||||
PathBuf::from(".aphoria/db")
|
||||
PathBuf::from(".aphoria/corpus-db")
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -112,9 +112,21 @@ pub struct ProjectConfig {
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
#[serde(default)]
|
||||
pub struct EpistemeConfig {
|
||||
/// Path to local Episteme data directory.
|
||||
/// Path to local Episteme data directory (per-project observations).
|
||||
///
|
||||
/// **Default:** `.aphoria/db` (project-local)
|
||||
///
|
||||
/// For shared mode (all projects), override to `~/.aphoria/db`.
|
||||
pub data_dir: PathBuf,
|
||||
|
||||
/// Path to corpus database (shared across projects).
|
||||
///
|
||||
/// **Default:** `~/.aphoria/corpus-db` (home-based, shared)
|
||||
///
|
||||
/// This stores aggregated pattern data from multiple projects for
|
||||
/// community corpus building. Set to `None` to disable corpus aggregation.
|
||||
pub corpus_data_dir: Option<PathBuf>,
|
||||
|
||||
/// Remote Episteme URL (future feature).
|
||||
pub url: Option<String>,
|
||||
}
|
||||
|
||||
@ -4,6 +4,8 @@ use std::path::PathBuf;
|
||||
|
||||
use serde::Deserialize;
|
||||
|
||||
use crate::corpus::thresholds::ScaleAdaptiveThresholds;
|
||||
|
||||
/// Scan configuration.
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
#[serde(default)]
|
||||
@ -68,4 +70,18 @@ pub struct CorpusConfig {
|
||||
|
||||
/// Override the default RFC list (if None, uses default list).
|
||||
pub rfc_list: Option<Vec<u32>>,
|
||||
|
||||
/// Scale-adaptive threshold configuration (if None, uses built-in defaults).
|
||||
///
|
||||
/// Allows overriding promotion thresholds per scale tier (micro/small/medium/large/enterprise).
|
||||
/// When not set, uses ScaleAdaptiveThresholds::default() which provides sensible defaults
|
||||
/// for teams of all sizes.
|
||||
pub adaptive_thresholds: Option<ScaleAdaptiveThresholds>,
|
||||
|
||||
/// Use legacy static thresholds instead of adaptive thresholds.
|
||||
///
|
||||
/// When true, ignores scale tier and uses fixed thresholds (min_projects = 850/100/50).
|
||||
/// Useful for backward compatibility or when explicit control is needed.
|
||||
/// Default: false (use adaptive thresholds).
|
||||
pub use_legacy_thresholds: bool,
|
||||
}
|
||||
|
||||
227
applications/aphoria/src/corpus/authority_parser.rs
Normal file
227
applications/aphoria/src/corpus/authority_parser.rs
Normal file
@ -0,0 +1,227 @@
|
||||
//! Authority source parsing for wiki patterns
|
||||
//!
|
||||
//! Parses authority strings from wiki markdown into structured Authority enums,
|
||||
//! enabling proper subject scheme generation (rfc://, owasp://, cwe://).
|
||||
|
||||
use regex::Regex;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
/// Structured authority source
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub enum Authority {
|
||||
/// RFC with optional section
|
||||
RFC {
|
||||
/// RFC number
|
||||
num: u32,
|
||||
/// Optional section reference
|
||||
section: Option<String>,
|
||||
},
|
||||
/// OWASP with ID and optional year
|
||||
OWASP {
|
||||
/// OWASP identifier (e.g., "a03")
|
||||
id: String,
|
||||
/// Optional year (e.g., 2021)
|
||||
year: Option<u32>,
|
||||
},
|
||||
/// CWE (Common Weakness Enumeration)
|
||||
CWE {
|
||||
/// CWE identifier
|
||||
id: u32,
|
||||
},
|
||||
/// Unknown/unrecognized authority source
|
||||
Unknown(String),
|
||||
}
|
||||
|
||||
/// Lazy-initialized regex patterns
|
||||
static RFC_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||
static OWASP_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||
static CWE_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||
|
||||
fn rfc_pattern() -> &'static Regex {
|
||||
RFC_PATTERN.get_or_init(|| {
|
||||
// These regex patterns are simple and static - they will always compile
|
||||
Regex::new(r"(?i)rfc\s*(\d+)(?:\s+section\s+([0-9.]+))?")
|
||||
.unwrap_or_else(|_| unreachable!("RFC regex pattern is known to be valid"))
|
||||
})
|
||||
}
|
||||
|
||||
fn owasp_pattern() -> &'static Regex {
|
||||
OWASP_PATTERN.get_or_init(|| {
|
||||
// These regex patterns are simple and static - they will always compile
|
||||
Regex::new(r"(?i)owasp\s+([a-z]\d+)(?::(\d{4}))?")
|
||||
.unwrap_or_else(|_| unreachable!("OWASP regex pattern is known to be valid"))
|
||||
})
|
||||
}
|
||||
|
||||
fn cwe_pattern() -> &'static Regex {
|
||||
CWE_PATTERN.get_or_init(|| {
|
||||
// These regex patterns are simple and static - they will always compile
|
||||
Regex::new(r"(?i)cwe[-\s]*(\d+)")
|
||||
.unwrap_or_else(|_| unreachable!("CWE regex pattern is known to be valid"))
|
||||
})
|
||||
}
|
||||
|
||||
/// Parse authority string into structured Authority enum
|
||||
///
|
||||
/// # Examples
|
||||
///
|
||||
/// ```
|
||||
/// use aphoria::corpus::authority_parser::{parse_authority, Authority};
|
||||
///
|
||||
/// let auth = parse_authority("RFC 5246 Section 7.4.2");
|
||||
/// assert_eq!(auth, Authority::RFC { num: 5246, section: Some("7.4.2".to_string()) });
|
||||
///
|
||||
/// let auth = parse_authority("OWASP A03:2021");
|
||||
/// assert_eq!(auth, Authority::OWASP { id: "a03".to_string(), year: Some(2021) });
|
||||
///
|
||||
/// let auth = parse_authority("CWE-79");
|
||||
/// assert_eq!(auth, Authority::CWE { id: 79 });
|
||||
/// ```
|
||||
pub fn parse_authority(authority_str: &str) -> Authority {
|
||||
let trimmed = authority_str.trim();
|
||||
|
||||
// Try RFC pattern
|
||||
if let Some(caps) = rfc_pattern().captures(trimmed) {
|
||||
// Regex guarantees caps[1] is all digits, so parse will always succeed
|
||||
let num = caps[1].parse().unwrap_or_else(|_| unreachable!("regex matched \\d+"));
|
||||
let section = caps.get(2).map(|m| m.as_str().to_string());
|
||||
return Authority::RFC { num, section };
|
||||
}
|
||||
|
||||
// Try OWASP pattern
|
||||
if let Some(caps) = owasp_pattern().captures(trimmed) {
|
||||
let id = caps[1].to_lowercase();
|
||||
let year = caps.get(2).and_then(|m| m.as_str().parse().ok());
|
||||
return Authority::OWASP { id, year };
|
||||
}
|
||||
|
||||
// Try CWE pattern
|
||||
if let Some(caps) = cwe_pattern().captures(trimmed) {
|
||||
// Regex guarantees caps[1] is all digits, so parse will always succeed
|
||||
let id = caps[1].parse().unwrap_or_else(|_| unreachable!("regex matched \\d+"));
|
||||
return Authority::CWE { id };
|
||||
}
|
||||
|
||||
// Fallback to unknown
|
||||
Authority::Unknown(trimmed.to_string())
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_parse_rfc_basic() {
|
||||
let auth = parse_authority("RFC 5246");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::RFC {
|
||||
num: 5246,
|
||||
section: None
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_rfc_with_section() {
|
||||
let auth = parse_authority("RFC 5246 Section 7.4.2");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::RFC {
|
||||
num: 5246,
|
||||
section: Some("7.4.2".to_string())
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_rfc_lowercase() {
|
||||
let auth = parse_authority("rfc 7519");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::RFC {
|
||||
num: 7519,
|
||||
section: None
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_rfc_no_space() {
|
||||
let auth = parse_authority("RFC7519");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::RFC {
|
||||
num: 7519,
|
||||
section: None
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_owasp_with_year() {
|
||||
let auth = parse_authority("OWASP A03:2021");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::OWASP {
|
||||
id: "a03".to_string(),
|
||||
year: Some(2021)
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_owasp_without_year() {
|
||||
let auth = parse_authority("OWASP A01");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::OWASP {
|
||||
id: "a01".to_string(),
|
||||
year: None
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_owasp_lowercase() {
|
||||
let auth = parse_authority("owasp a03:2021");
|
||||
assert_eq!(
|
||||
auth,
|
||||
Authority::OWASP {
|
||||
id: "a03".to_string(),
|
||||
year: Some(2021)
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_cwe_hyphen() {
|
||||
let auth = parse_authority("CWE-79");
|
||||
assert_eq!(auth, Authority::CWE { id: 79 });
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_cwe_space() {
|
||||
let auth = parse_authority("CWE 89");
|
||||
assert_eq!(auth, Authority::CWE { id: 89 });
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_cwe_lowercase() {
|
||||
let auth = parse_authority("cwe-79");
|
||||
assert_eq!(auth, Authority::CWE { id: 79 });
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_unknown() {
|
||||
let auth = parse_authority("Some Random Source");
|
||||
assert_eq!(auth, Authority::Unknown("Some Random Source".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_owasp_cheat_sheet() {
|
||||
let auth = parse_authority("OWASP Password Storage Cheat Sheet");
|
||||
// Doesn't match pattern, falls back to Unknown
|
||||
matches!(auth, Authority::Unknown(_));
|
||||
}
|
||||
}
|
||||
130
applications/aphoria/src/corpus/cli_created.rs
Normal file
130
applications/aphoria/src/corpus/cli_created.rs
Normal file
@ -0,0 +1,130 @@
|
||||
//! Corpus builder for items created via `aphoria corpus create` CLI.
|
||||
//!
|
||||
//! These are user-authored corpus items stored in the shared corpus database
|
||||
//! with metadata flag "source": "cli_create". This builder makes CLI-created
|
||||
//! items visible in `aphoria corpus build` and `aphoria corpus list`.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use ed25519_dalek::SigningKey;
|
||||
use stemedb_core::types::Assertion;
|
||||
use stemedb_storage::{HybridStore, KVStore};
|
||||
use tracing::{info, instrument};
|
||||
|
||||
use crate::config::CorpusConfig;
|
||||
use crate::AphoriaError;
|
||||
|
||||
/// Corpus builder for CLI-created items.
|
||||
///
|
||||
/// Items created with `aphoria corpus create` are stored in the corpus database
|
||||
/// with metadata `"source": "cli_create"`. This builder:
|
||||
/// 1. Queries the corpus store (passed in from registry)
|
||||
/// 2. Scans all items with "subject:" prefix
|
||||
/// 3. Filters for items with `source == "cli_create"` in metadata
|
||||
/// 4. Returns them as corpus assertions
|
||||
///
|
||||
/// This makes CLI-created items visible in:
|
||||
/// - `aphoria corpus build` (they get included in the build)
|
||||
/// - Dashboard corpus queries (they appear in the corpus list)
|
||||
pub struct CliCreatedBuilder {
|
||||
/// Reference to the corpus store for querying CLI-created items.
|
||||
corpus_store: Arc<HybridStore>,
|
||||
}
|
||||
|
||||
impl CliCreatedBuilder {
|
||||
/// Create a new CLI-created corpus builder.
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
/// * `corpus_store` - The corpus database store (from LocalEpisteme::open_corpus_db)
|
||||
pub fn new(corpus_store: Arc<HybridStore>) -> Self {
|
||||
Self { corpus_store }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl super::AsyncCorpusBuilder for CliCreatedBuilder {
|
||||
fn name(&self) -> &str {
|
||||
"CLI-Created Items"
|
||||
}
|
||||
|
||||
fn scheme(&self) -> &str {
|
||||
"cli"
|
||||
}
|
||||
|
||||
fn default_tier(&self) -> u8 {
|
||||
3 // Community tier by default (individual items may override)
|
||||
}
|
||||
|
||||
#[instrument(skip(self, _signing_key, _config), fields(builder = "CLI-Created"))]
|
||||
async fn build(
|
||||
&self,
|
||||
_signing_key: &SigningKey,
|
||||
_timestamp: u64,
|
||||
_config: &CorpusConfig,
|
||||
) -> Result<Vec<Assertion>, AphoriaError> {
|
||||
info!("Building corpus from CLI-created items");
|
||||
|
||||
// Scan all items with "subject:" prefix
|
||||
let all_items = self
|
||||
.corpus_store
|
||||
.scan_prefix(b"subject:")
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to scan corpus database: {e}")))?;
|
||||
|
||||
info!(total_items = all_items.len(), "Scanned corpus database for CLI-created items");
|
||||
|
||||
// Filter for CLI-created items by checking metadata
|
||||
let mut assertions = Vec::new();
|
||||
for (_key, value) in all_items {
|
||||
let assertion: Assertion = stemedb_core::serde::deserialize(&value)
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to deserialize assertion: {e}")))?;
|
||||
|
||||
// Check metadata for "source": "cli_create"
|
||||
if let Some(ref meta_bytes) = assertion.source_metadata {
|
||||
if let Ok(meta_json) = serde_json::from_slice::<serde_json::Value>(meta_bytes) {
|
||||
if meta_json.get("source").and_then(|v| v.as_str()) == Some("cli_create") {
|
||||
assertions.push(assertion);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info!(
|
||||
cli_created_count = assertions.len(),
|
||||
"Found {} CLI-created corpus items",
|
||||
assertions.len()
|
||||
);
|
||||
|
||||
Ok(assertions)
|
||||
}
|
||||
|
||||
fn requires_network(&self) -> bool {
|
||||
false // CLI items are local only
|
||||
}
|
||||
|
||||
fn source_ids(&self) -> Vec<String> {
|
||||
vec![] // No specific source IDs for CLI-created items
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::corpus::AsyncCorpusBuilder;
|
||||
use stemedb_storage::HybridStore;
|
||||
use tempfile::TempDir;
|
||||
|
||||
#[test]
|
||||
fn test_builder_metadata() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let store = Arc::new(HybridStore::open(temp_dir.path()).unwrap());
|
||||
let builder = CliCreatedBuilder::new(store);
|
||||
|
||||
assert_eq!(builder.name(), "CLI-Created Items");
|
||||
assert_eq!(builder.scheme(), "cli");
|
||||
assert_eq!(builder.default_tier(), 3);
|
||||
assert!(!builder.requires_network());
|
||||
assert!(builder.source_ids().is_empty());
|
||||
}
|
||||
}
|
||||
@ -13,7 +13,9 @@ use ed25519_dalek::SigningKey;
|
||||
use stemedb_core::types::{Assertion, ObjectValue, SourceClass};
|
||||
use tracing::{info, instrument};
|
||||
|
||||
use super::thresholds::{CorpusPromotionThresholds, PromotionDecision};
|
||||
use super::thresholds::{
|
||||
CorpusPromotionThresholds, PromotionDecision, ScaleAdaptiveThresholds, ScaleTier,
|
||||
};
|
||||
use crate::community::PatternAggregate;
|
||||
use crate::config::CorpusConfig;
|
||||
use crate::episteme::create_authoritative_assertion;
|
||||
@ -72,9 +74,15 @@ pub struct CommunityCorpusBuilder {
|
||||
/// Pattern aggregate store for querying community data.
|
||||
pattern_store: Box<dyn PatternAggregateStore>,
|
||||
|
||||
/// Promotion thresholds for multi-tier decision making.
|
||||
/// Legacy promotion thresholds (used when use_adaptive=false).
|
||||
thresholds: CorpusPromotionThresholds,
|
||||
|
||||
/// Scale-adaptive thresholds (used when use_adaptive=true).
|
||||
adaptive_thresholds: ScaleAdaptiveThresholds,
|
||||
|
||||
/// Whether to use adaptive thresholds (default: true).
|
||||
use_adaptive: bool,
|
||||
|
||||
/// Path to manually promoted patterns file.
|
||||
///
|
||||
/// Format: `.aphoria/corpus/community.toml`
|
||||
@ -92,7 +100,13 @@ impl CommunityCorpusBuilder {
|
||||
pattern_store: Box<dyn PatternAggregateStore>,
|
||||
thresholds: CorpusPromotionThresholds,
|
||||
) -> Self {
|
||||
Self { pattern_store, thresholds, manual_promotions_path: None }
|
||||
Self {
|
||||
pattern_store,
|
||||
thresholds,
|
||||
adaptive_thresholds: ScaleAdaptiveThresholds::default(),
|
||||
use_adaptive: false, // Legacy constructor defaults to legacy behavior
|
||||
manual_promotions_path: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a builder with stub storage (for testing/shadow mode).
|
||||
@ -100,9 +114,9 @@ impl CommunityCorpusBuilder {
|
||||
Self::new(Box::new(StubPatternStore), thresholds)
|
||||
}
|
||||
|
||||
/// Create a builder from StemeDB stores.
|
||||
/// Create a builder from StemeDB stores with configuration.
|
||||
///
|
||||
/// This is the production constructor that uses real storage.
|
||||
/// This is the production constructor that uses real storage and respects config.
|
||||
pub fn from_stores(
|
||||
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
||||
predicate_index: std::sync::Arc<
|
||||
@ -110,11 +124,20 @@ impl CommunityCorpusBuilder {
|
||||
std::sync::Arc<stemedb_storage::HybridStore>,
|
||||
>,
|
||||
>,
|
||||
thresholds: CorpusPromotionThresholds,
|
||||
config: &CorpusConfig,
|
||||
) -> Self {
|
||||
use crate::community::StemeDBPatternStore;
|
||||
let pattern_store = Box::new(StemeDBPatternStore::new(kv_store, predicate_index));
|
||||
Self::new(pattern_store, thresholds)
|
||||
|
||||
let adaptive_thresholds = config.adaptive_thresholds.clone().unwrap_or_default();
|
||||
|
||||
Self {
|
||||
pattern_store,
|
||||
thresholds: CorpusPromotionThresholds::default(), // Keep for legacy path
|
||||
adaptive_thresholds,
|
||||
use_adaptive: !config.use_legacy_thresholds,
|
||||
manual_promotions_path: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Set path to manual promotions file.
|
||||
@ -152,17 +175,25 @@ impl CommunityCorpusBuilder {
|
||||
fn should_promote(
|
||||
&self,
|
||||
pattern: &PatternAggregate,
|
||||
_adoption_rate: f64,
|
||||
total_projects: u64,
|
||||
authority_match: (bool, Option<String>),
|
||||
) -> PromotionDecision {
|
||||
let total_projects = pattern.project_count; // Approximation for shadow mode
|
||||
|
||||
self.thresholds.evaluate(
|
||||
pattern.project_count,
|
||||
total_projects,
|
||||
authority_match.0,
|
||||
authority_match.1.as_deref(),
|
||||
)
|
||||
if self.use_adaptive {
|
||||
self.adaptive_thresholds.evaluate(
|
||||
pattern.project_count,
|
||||
total_projects,
|
||||
authority_match.0,
|
||||
authority_match.1.as_deref(),
|
||||
)
|
||||
} else {
|
||||
// Legacy path for backward compatibility
|
||||
self.thresholds.evaluate(
|
||||
pattern.project_count,
|
||||
total_projects,
|
||||
authority_match.0,
|
||||
authority_match.1.as_deref(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/// Create assertion from promoted pattern.
|
||||
@ -236,6 +267,8 @@ impl CommunityCorpusBuilder {
|
||||
) -> Result<Vec<PromotionCandidate>, AphoriaError> {
|
||||
info!("Shadow mode: Evaluating patterns for promotion");
|
||||
|
||||
let total_projects = self.pattern_store.get_total_projects().await?;
|
||||
|
||||
let patterns = self
|
||||
.pattern_store
|
||||
.get_popular_patterns(self.thresholds.emerging.min_projects, 1000)
|
||||
@ -251,7 +284,7 @@ impl CommunityCorpusBuilder {
|
||||
for pattern in patterns {
|
||||
let adoption_rate = self.calculate_adoption_rate(&pattern).await?;
|
||||
let authority_match = self.check_authority_match(&pattern);
|
||||
let decision = self.should_promote(&pattern, adoption_rate, authority_match.clone());
|
||||
let decision = self.should_promote(&pattern, total_projects, authority_match.clone());
|
||||
|
||||
match decision {
|
||||
PromotionDecision::AutoPromote(source_class) => {
|
||||
@ -331,20 +364,32 @@ impl super::AsyncCorpusBuilder for CommunityCorpusBuilder {
|
||||
timestamp: u64,
|
||||
_config: &CorpusConfig,
|
||||
) -> Result<Vec<Assertion>, AphoriaError> {
|
||||
info!("Building community corpus from pattern aggregates");
|
||||
let total_projects = self.pattern_store.get_total_projects().await?;
|
||||
let scale_tier = ScaleTier::from_total_projects(total_projects);
|
||||
|
||||
info!(
|
||||
total_projects,
|
||||
?scale_tier,
|
||||
use_adaptive = self.use_adaptive,
|
||||
"Building community corpus with scale-adaptive thresholds"
|
||||
);
|
||||
|
||||
// Determine minimum project threshold for initial query
|
||||
let min_projects_for_query = if self.use_adaptive {
|
||||
// Use micro tier's emerging floor as minimum (most permissive)
|
||||
2
|
||||
} else {
|
||||
self.thresholds.emerging.min_projects
|
||||
};
|
||||
|
||||
// Fetch popular patterns (now properly async without block_on!)
|
||||
let patterns = self
|
||||
.pattern_store
|
||||
.get_popular_patterns(self.thresholds.emerging.min_projects, 1000)
|
||||
.await?;
|
||||
let patterns = self.pattern_store.get_popular_patterns(min_projects_for_query, 1000).await?;
|
||||
|
||||
if patterns.is_empty() {
|
||||
info!("No patterns found for community corpus (empty store or below threshold)");
|
||||
return Ok(vec![]);
|
||||
}
|
||||
|
||||
let total_projects = self.pattern_store.get_total_projects().await?;
|
||||
info!(
|
||||
pattern_count = patterns.len(),
|
||||
total_projects, "Evaluating patterns for promotion"
|
||||
@ -360,7 +405,7 @@ impl super::AsyncCorpusBuilder for CommunityCorpusBuilder {
|
||||
};
|
||||
|
||||
let authority_match = self.check_authority_match(&pattern);
|
||||
let decision = self.should_promote(&pattern, adoption_rate, authority_match.clone());
|
||||
let decision = self.should_promote(&pattern, total_projects, authority_match.clone());
|
||||
|
||||
match decision {
|
||||
super::thresholds::PromotionDecision::AutoPromote(source_class) => {
|
||||
|
||||
@ -33,22 +33,33 @@
|
||||
//! └─────────────────────────────────────────────────────────────────┘
|
||||
//! ```
|
||||
|
||||
mod authority_parser;
|
||||
mod cli_created;
|
||||
mod community;
|
||||
mod enricher;
|
||||
mod owasp;
|
||||
mod resolver;
|
||||
mod rfc;
|
||||
mod thresholds;
|
||||
mod subject_builder;
|
||||
pub mod thresholds; // Public to allow config types to use ScaleAdaptiveThresholds
|
||||
mod vendor;
|
||||
mod wiki_corpus_builder;
|
||||
mod wiki_importer;
|
||||
|
||||
pub use authority_parser::{parse_authority, Authority};
|
||||
pub use cli_created::CliCreatedBuilder;
|
||||
pub use community::{CommunityCorpusBuilder, PatternAggregateStore, StubPatternStore};
|
||||
pub use enricher::{Enrichment, PatternEnricher};
|
||||
pub use owasp::OwaspCorpusBuilder;
|
||||
pub use resolver::CorpusResolver;
|
||||
pub use rfc::RfcCorpusBuilder;
|
||||
pub use thresholds::{CorpusPromotionThresholds, PromotionCriteria, PromotionDecision};
|
||||
pub use subject_builder::build_corpus_subject;
|
||||
pub use thresholds::{
|
||||
CorpusPromotionThresholds, PromotionCriteria, PromotionDecision, ScaleAdaptiveThresholds,
|
||||
ScaleTier,
|
||||
};
|
||||
pub use vendor::VendorCorpusBuilder;
|
||||
pub use wiki_corpus_builder::promote_wiki_patterns_to_corpus;
|
||||
pub use wiki_importer::{import_from_wiki, WikiParser, WikiPattern};
|
||||
|
||||
use ed25519_dalek::SigningKey;
|
||||
@ -190,6 +201,13 @@ impl CorpusRegistry {
|
||||
///
|
||||
/// Use this constructor when you have access to StemeDB stores (LocalEpisteme).
|
||||
/// The community corpus builder queries pattern aggregates from storage.
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
/// * `config` - Corpus configuration
|
||||
/// * `kv_store` - Project KV store for community patterns
|
||||
/// * `predicate_index` - Predicate index for community patterns
|
||||
/// * `corpus_store` - Optional corpus database store for CLI-created items
|
||||
pub fn with_stores(
|
||||
config: &CorpusConfig,
|
||||
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
||||
@ -198,19 +216,23 @@ impl CorpusRegistry {
|
||||
std::sync::Arc<stemedb_storage::HybridStore>,
|
||||
>,
|
||||
>,
|
||||
corpus_store: Option<std::sync::Arc<stemedb_storage::HybridStore>>,
|
||||
) -> Self {
|
||||
let mut registry = Self::with_defaults(config);
|
||||
|
||||
// Add community corpus builder if enabled
|
||||
if config.use_community {
|
||||
use crate::corpus::thresholds::CorpusPromotionThresholds;
|
||||
let thresholds = CorpusPromotionThresholds::default();
|
||||
let community_builder =
|
||||
CommunityCorpusBuilder::from_stores(kv_store, predicate_index, thresholds);
|
||||
let community_builder = CommunityCorpusBuilder::from_stores(kv_store, predicate_index, config);
|
||||
registry.register_async(Box::new(community_builder));
|
||||
info!("Registered community corpus builder (async)");
|
||||
}
|
||||
|
||||
// Add CLI-created items builder if corpus store is available
|
||||
if let Some(corpus_store) = corpus_store {
|
||||
registry.register_async(Box::new(CliCreatedBuilder::new(corpus_store)));
|
||||
info!("Registered CLI-created items corpus builder (async)");
|
||||
}
|
||||
|
||||
registry
|
||||
}
|
||||
|
||||
|
||||
145
applications/aphoria/src/corpus/subject_builder.rs
Normal file
145
applications/aphoria/src/corpus/subject_builder.rs
Normal file
@ -0,0 +1,145 @@
|
||||
//! Subject URI builder for corpus patterns
|
||||
//!
|
||||
//! Converts WikiPattern + Authority into proper corpus subject URIs
|
||||
//! (rfc://, owasp://, cwe://, community://wiki/).
|
||||
|
||||
use crate::corpus::authority_parser::Authority;
|
||||
use crate::corpus::wiki_importer::WikiPattern;
|
||||
|
||||
/// Build corpus subject URI from WikiPattern and Authority
|
||||
///
|
||||
/// # Examples
|
||||
///
|
||||
/// ```
|
||||
/// use aphoria::corpus::authority_parser::Authority;
|
||||
/// use aphoria::corpus::subject_builder::build_corpus_subject;
|
||||
/// use aphoria::corpus::wiki_importer::WikiPattern;
|
||||
///
|
||||
/// let pattern = WikiPattern {
|
||||
/// subject: "tls/cert_verification".to_string(),
|
||||
/// predicate: "enabled".to_string(),
|
||||
/// value: "true".to_string(),
|
||||
/// statement: "TLS cert verification MUST be enabled".to_string(),
|
||||
/// authority: Some("RFC 5246 Section 7.4.2".to_string()),
|
||||
/// };
|
||||
///
|
||||
/// let authority = Authority::RFC { num: 5246, section: Some("7.4.2".to_string()) };
|
||||
/// let subject = build_corpus_subject(&pattern, &authority);
|
||||
/// assert_eq!(subject, "rfc://5246/tls/cert_verification");
|
||||
/// ```
|
||||
pub fn build_corpus_subject(pattern: &WikiPattern, authority: &Authority) -> String {
|
||||
let normalized = normalize_subject(&pattern.subject);
|
||||
|
||||
match authority {
|
||||
Authority::RFC { num, .. } => {
|
||||
format!("rfc://{}/{}", num, normalized)
|
||||
}
|
||||
Authority::OWASP { id, .. } => {
|
||||
format!("owasp://{}/{}", id.to_lowercase(), normalized)
|
||||
}
|
||||
Authority::CWE { id } => {
|
||||
format!("cwe://{}/{}", id, normalized)
|
||||
}
|
||||
Authority::Unknown(_) => {
|
||||
format!("community://wiki/{}", normalized)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Normalize subject path for URI
|
||||
///
|
||||
/// Converts to lowercase, replaces spaces with underscores, trims slashes.
|
||||
fn normalize_subject(subject: &str) -> String {
|
||||
subject
|
||||
.trim()
|
||||
.trim_matches('/')
|
||||
.to_lowercase()
|
||||
.replace(' ', "_")
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::community::CommunityObjectValue;
|
||||
|
||||
fn make_pattern(subject: &str) -> WikiPattern {
|
||||
WikiPattern {
|
||||
subject: subject.to_string(),
|
||||
predicate: "test".to_string(),
|
||||
value: CommunityObjectValue::Boolean(true),
|
||||
statement: "test statement".to_string(),
|
||||
authority: None,
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_rfc_subject() {
|
||||
let pattern = make_pattern("tls/cert_verification");
|
||||
let authority = Authority::RFC {
|
||||
num: 5246,
|
||||
section: Some("7.4.2".to_string()),
|
||||
};
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "rfc://5246/tls/cert_verification");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_rfc_subject_with_spaces() {
|
||||
let pattern = make_pattern("TLS Cert Verification");
|
||||
let authority = Authority::RFC {
|
||||
num: 5246,
|
||||
section: None,
|
||||
};
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "rfc://5246/tls_cert_verification");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_owasp_subject() {
|
||||
let pattern = make_pattern("password/storage");
|
||||
let authority = Authority::OWASP {
|
||||
id: "A03".to_string(),
|
||||
year: Some(2021),
|
||||
};
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "owasp://a03/password/storage");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_cwe_subject() {
|
||||
let pattern = make_pattern("xss/prevention");
|
||||
let authority = Authority::CWE { id: 79 };
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "cwe://79/xss/prevention");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_unknown_authority() {
|
||||
let pattern = make_pattern("custom/pattern");
|
||||
let authority = Authority::Unknown("Some Source".to_string());
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "community://wiki/custom/pattern");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_normalize_leading_trailing_slashes() {
|
||||
let pattern = make_pattern("/api/security/");
|
||||
let authority = Authority::RFC {
|
||||
num: 7519,
|
||||
section: None,
|
||||
};
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "rfc://7519/api/security");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_normalize_uppercase() {
|
||||
let pattern = make_pattern("JWT/Validation");
|
||||
let authority = Authority::RFC {
|
||||
num: 7519,
|
||||
section: None,
|
||||
};
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
assert_eq!(subject, "rfc://7519/jwt/validation");
|
||||
}
|
||||
}
|
||||
@ -197,6 +197,334 @@ impl CorpusPromotionThresholds {
|
||||
}
|
||||
}
|
||||
|
||||
/// Scale tier based on total projects in organization
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub enum ScaleTier {
|
||||
/// 1-5 projects: Very small teams
|
||||
Micro,
|
||||
/// 6-25 projects: Small teams
|
||||
Small,
|
||||
/// 26-100 projects: Medium organizations
|
||||
Medium,
|
||||
/// 101-500 projects: Large organizations
|
||||
Large,
|
||||
/// 501+ projects: Enterprise scale
|
||||
Enterprise,
|
||||
}
|
||||
|
||||
impl ScaleTier {
|
||||
/// Detect scale tier from total project count
|
||||
pub fn from_total_projects(total: u64) -> Self {
|
||||
match total {
|
||||
0..=5 => Self::Micro,
|
||||
6..=25 => Self::Small,
|
||||
26..=100 => Self::Medium,
|
||||
101..=500 => Self::Large,
|
||||
_ => Self::Enterprise,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Adaptive promotion criteria that scales with team size
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct AdaptiveCriteria {
|
||||
/// Absolute minimum projects (safety floor)
|
||||
pub min_projects_floor: u64,
|
||||
/// Percentage of total projects required (scale factor)
|
||||
pub min_projects_percentage: f64,
|
||||
/// Minimum adoption rate (0.0-1.0)
|
||||
pub min_adoption_rate: f64,
|
||||
/// Whether authority source match is required
|
||||
pub require_authority: bool,
|
||||
/// List of authority source prefixes (e.g., ["rfc://", "nist://"])
|
||||
pub authority_sources: Vec<String>,
|
||||
/// Whether to auto-promote or require manual review
|
||||
pub auto_promote: bool,
|
||||
}
|
||||
|
||||
impl AdaptiveCriteria {
|
||||
/// Calculate effective minimum projects for current total
|
||||
///
|
||||
/// Returns max(floor, percentage * total) to ensure:
|
||||
/// - Small teams: percentage dominates (scales with growth)
|
||||
/// - Large teams: floor dominates (maintains quality)
|
||||
pub fn effective_min_projects(&self, total_projects: u64) -> u64 {
|
||||
let from_percentage = (self.min_projects_percentage * total_projects as f64).ceil() as u64;
|
||||
self.min_projects_floor.max(from_percentage)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for AdaptiveCriteria {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
min_projects_floor: 2,
|
||||
min_projects_percentage: 0.50,
|
||||
min_adoption_rate: 0.50,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Thresholds for a specific scale tier
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TierThresholds {
|
||||
/// Regulatory tier (RFC, NIST, etc.) - may be disabled (None)
|
||||
pub regulatory: Option<AdaptiveCriteria>,
|
||||
/// Clinical tier (OWASP, CWE, etc.) - may be disabled (None)
|
||||
pub clinical: Option<AdaptiveCriteria>,
|
||||
/// Emerging tier (community patterns) - always enabled
|
||||
pub emerging: AdaptiveCriteria,
|
||||
}
|
||||
|
||||
/// Scale-adaptive threshold system
|
||||
///
|
||||
/// Automatically adjusts promotion criteria based on organization size:
|
||||
/// - Micro teams (2-3 projects): See patterns immediately
|
||||
/// - Small teams: Lower thresholds, all tiers enabled
|
||||
/// - Medium/Large: Balanced quality gates
|
||||
/// - Enterprise: Strict thresholds (backward compatible)
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ScaleAdaptiveThresholds {
|
||||
/// Thresholds for micro teams (1-5 projects).
|
||||
pub micro: TierThresholds,
|
||||
/// Thresholds for small teams (6-25 projects).
|
||||
pub small: TierThresholds,
|
||||
/// Thresholds for medium organizations (26-100 projects).
|
||||
pub medium: TierThresholds,
|
||||
/// Thresholds for large organizations (101-500 projects).
|
||||
pub large: TierThresholds,
|
||||
/// Thresholds for enterprise scale (501+ projects).
|
||||
pub enterprise: TierThresholds,
|
||||
}
|
||||
|
||||
impl ScaleAdaptiveThresholds {
|
||||
/// Get thresholds for a specific scale tier
|
||||
pub fn for_tier(&self, tier: ScaleTier) -> &TierThresholds {
|
||||
match tier {
|
||||
ScaleTier::Micro => &self.micro,
|
||||
ScaleTier::Small => &self.small,
|
||||
ScaleTier::Medium => &self.medium,
|
||||
ScaleTier::Large => &self.large,
|
||||
ScaleTier::Enterprise => &self.enterprise,
|
||||
}
|
||||
}
|
||||
|
||||
/// Evaluate promotion decision for a pattern
|
||||
///
|
||||
/// # Arguments
|
||||
/// - `project_count`: Number of projects pattern appears in
|
||||
/// - `total_projects`: Total projects in organization
|
||||
/// - `has_authority_match`: Whether pattern matches authority source
|
||||
/// - `authority_scheme`: Authority scheme if matched (e.g., "rfc://")
|
||||
pub fn evaluate(
|
||||
&self,
|
||||
project_count: u64,
|
||||
total_projects: u64,
|
||||
has_authority_match: bool,
|
||||
authority_scheme: Option<&str>,
|
||||
) -> PromotionDecision {
|
||||
if total_projects == 0 {
|
||||
return PromotionDecision::Skip;
|
||||
}
|
||||
|
||||
let tier = ScaleTier::from_total_projects(total_projects);
|
||||
let thresholds = self.for_tier(tier);
|
||||
|
||||
let adoption_rate = project_count as f64 / total_projects as f64;
|
||||
|
||||
// Try regulatory (if enabled for this tier)
|
||||
if let Some(reg) = &thresholds.regulatory {
|
||||
let min_projects = reg.effective_min_projects(total_projects);
|
||||
if adoption_rate >= reg.min_adoption_rate
|
||||
&& project_count >= min_projects
|
||||
&& (!reg.require_authority
|
||||
|| matches_authority(has_authority_match, authority_scheme, ®.authority_sources))
|
||||
{
|
||||
return PromotionDecision::AutoPromote(SourceClass::Regulatory);
|
||||
}
|
||||
}
|
||||
|
||||
// Try clinical (if enabled)
|
||||
if let Some(clin) = &thresholds.clinical {
|
||||
let min_projects = clin.effective_min_projects(total_projects);
|
||||
if adoption_rate >= clin.min_adoption_rate
|
||||
&& project_count >= min_projects
|
||||
&& (!clin.require_authority
|
||||
|| matches_authority(has_authority_match, authority_scheme, &clin.authority_sources))
|
||||
{
|
||||
return PromotionDecision::AutoPromote(SourceClass::Clinical);
|
||||
}
|
||||
}
|
||||
|
||||
// Try emerging (always enabled)
|
||||
let min_projects = thresholds.emerging.effective_min_projects(total_projects);
|
||||
if adoption_rate >= thresholds.emerging.min_adoption_rate && project_count >= min_projects {
|
||||
if thresholds.emerging.auto_promote {
|
||||
return PromotionDecision::AutoPromote(SourceClass::Community);
|
||||
} else {
|
||||
return PromotionDecision::RequireReview;
|
||||
}
|
||||
}
|
||||
|
||||
PromotionDecision::Skip
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ScaleAdaptiveThresholds {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
// Micro: 1-5 projects - Only emerging tier, very permissive
|
||||
micro: TierThresholds {
|
||||
regulatory: None, // Disabled
|
||||
clinical: None, // Disabled
|
||||
emerging: AdaptiveCriteria {
|
||||
min_projects_floor: 2,
|
||||
min_projects_percentage: 0.50, // Pattern in 50% of projects
|
||||
min_adoption_rate: 0.50,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: true, // Auto-promote for immediate visibility
|
||||
},
|
||||
},
|
||||
|
||||
// Small: 6-25 projects - All tiers enabled, lower floors
|
||||
small: TierThresholds {
|
||||
regulatory: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 5,
|
||||
min_projects_percentage: 0.90,
|
||||
min_adoption_rate: 0.90,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
clinical: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 4,
|
||||
min_projects_percentage: 0.75,
|
||||
min_adoption_rate: 0.75,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
emerging: AdaptiveCriteria {
|
||||
min_projects_floor: 2,
|
||||
min_projects_percentage: 0.40,
|
||||
min_adoption_rate: 0.40,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: true, // Auto-promote for small teams too
|
||||
},
|
||||
},
|
||||
|
||||
// Medium: 26-100 projects - Balanced thresholds
|
||||
medium: TierThresholds {
|
||||
regulatory: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 20,
|
||||
min_projects_percentage: 0.90,
|
||||
min_adoption_rate: 0.90,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
clinical: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 10,
|
||||
min_projects_percentage: 0.75,
|
||||
min_adoption_rate: 0.75,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
emerging: AdaptiveCriteria {
|
||||
min_projects_floor: 5,
|
||||
min_projects_percentage: 0.40,
|
||||
min_adoption_rate: 0.40,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: false,
|
||||
},
|
||||
},
|
||||
|
||||
// Large: 101-500 projects - Higher quality gates
|
||||
large: TierThresholds {
|
||||
regulatory: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 50,
|
||||
min_projects_percentage: 0.90,
|
||||
min_adoption_rate: 0.90,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
clinical: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 30,
|
||||
min_projects_percentage: 0.75,
|
||||
min_adoption_rate: 0.75,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
emerging: AdaptiveCriteria {
|
||||
min_projects_floor: 15,
|
||||
min_projects_percentage: 0.40,
|
||||
min_adoption_rate: 0.40,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: false,
|
||||
},
|
||||
},
|
||||
|
||||
// Enterprise: 501+ projects - Current defaults (backward compatible)
|
||||
enterprise: TierThresholds {
|
||||
regulatory: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 100,
|
||||
min_projects_percentage: 0.95,
|
||||
min_adoption_rate: 0.95,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
clinical: Some(AdaptiveCriteria {
|
||||
min_projects_floor: 50,
|
||||
min_projects_percentage: 0.80,
|
||||
min_adoption_rate: 0.80,
|
||||
require_authority: true,
|
||||
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||
auto_promote: true,
|
||||
}),
|
||||
emerging: AdaptiveCriteria {
|
||||
min_projects_floor: 25,
|
||||
min_projects_percentage: 0.50,
|
||||
min_adoption_rate: 0.50,
|
||||
require_authority: false,
|
||||
authority_sources: vec![],
|
||||
auto_promote: false,
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Helper: Check if authority sources match
|
||||
fn matches_authority(
|
||||
has_authority_match: bool,
|
||||
authority_scheme: Option<&str>,
|
||||
required_sources: &[String],
|
||||
) -> bool {
|
||||
if !has_authority_match {
|
||||
return false;
|
||||
}
|
||||
|
||||
if required_sources.is_empty() {
|
||||
return true; // Any authority source acceptable
|
||||
}
|
||||
|
||||
if let Some(scheme) = authority_scheme {
|
||||
required_sources.iter().any(|src| scheme.starts_with(src))
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
@ -322,4 +650,138 @@ mod tests {
|
||||
// Should not promote to Regulatory due to min_projects
|
||||
assert_ne!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||
}
|
||||
|
||||
// ===== Scale-Adaptive Tests =====
|
||||
|
||||
#[test]
|
||||
fn test_scale_tier_detection() {
|
||||
assert_eq!(ScaleTier::from_total_projects(1), ScaleTier::Micro);
|
||||
assert_eq!(ScaleTier::from_total_projects(3), ScaleTier::Micro);
|
||||
assert_eq!(ScaleTier::from_total_projects(5), ScaleTier::Micro);
|
||||
assert_eq!(ScaleTier::from_total_projects(6), ScaleTier::Small);
|
||||
assert_eq!(ScaleTier::from_total_projects(25), ScaleTier::Small);
|
||||
assert_eq!(ScaleTier::from_total_projects(26), ScaleTier::Medium);
|
||||
assert_eq!(ScaleTier::from_total_projects(100), ScaleTier::Medium);
|
||||
assert_eq!(ScaleTier::from_total_projects(101), ScaleTier::Large);
|
||||
assert_eq!(ScaleTier::from_total_projects(500), ScaleTier::Large);
|
||||
assert_eq!(ScaleTier::from_total_projects(501), ScaleTier::Enterprise);
|
||||
assert_eq!(ScaleTier::from_total_projects(10000), ScaleTier::Enterprise);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_effective_min_projects() {
|
||||
let criteria = AdaptiveCriteria {
|
||||
min_projects_floor: 5,
|
||||
min_projects_percentage: 0.50,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
// Floor dominates for small counts
|
||||
assert_eq!(criteria.effective_min_projects(3), 5); // 50% * 3 = 1.5 → 2 < 5
|
||||
assert_eq!(criteria.effective_min_projects(8), 5); // 50% * 8 = 4 < 5
|
||||
|
||||
// Percentage dominates for larger counts
|
||||
assert_eq!(criteria.effective_min_projects(12), 6); // 50% * 12 = 6 > 5
|
||||
assert_eq!(criteria.effective_min_projects(20), 10); // 50% * 20 = 10 > 5
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_micro_team_promotion() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 3 projects total, pattern in 2 projects (67% adoption)
|
||||
let decision = thresholds.evaluate(2, 3, false, None);
|
||||
|
||||
// Should promote to emerging: max(2, 0.50*3) = 2, adoption = 67% >= 50%
|
||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_micro_team_below_threshold() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 3 projects total, pattern in 1 project (33% adoption)
|
||||
let decision = thresholds.evaluate(1, 3, false, None);
|
||||
|
||||
// Should NOT promote: 33% < 50% adoption rate
|
||||
assert_eq!(decision, PromotionDecision::Skip);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_regulatory_disabled_for_micro() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 3 projects total, pattern in 3 projects (100% adoption, RFC match)
|
||||
let decision = thresholds.evaluate(3, 3, true, Some("rfc://1234"));
|
||||
|
||||
// Should NOT promote to regulatory (disabled for micro tier)
|
||||
// Should promote to emerging instead
|
||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_small_team_with_authority() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 10 projects total, pattern in 9 (90% adoption, RFC match)
|
||||
let decision = thresholds.evaluate(9, 10, true, Some("rfc://1234"));
|
||||
|
||||
// Small tier regulatory: max(5, 0.90*10) = 9, rate = 90%
|
||||
// Should auto-promote to regulatory
|
||||
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_small_team_emerging() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 10 projects total, pattern in 4 (40% adoption, no authority)
|
||||
let decision = thresholds.evaluate(4, 10, false, None);
|
||||
|
||||
// Small tier emerging: max(2, 0.40*10) = 4, rate = 40%
|
||||
// Should require review
|
||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_medium_team_clinical() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 50 projects total, pattern in 38 (76% adoption, OWASP match)
|
||||
let decision = thresholds.evaluate(38, 50, true, Some("owasp://top-10/a01"));
|
||||
|
||||
// Medium tier clinical: max(10, 0.75*50) = 37.5 → 38, rate = 76%
|
||||
// Should auto-promote to clinical
|
||||
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Clinical));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_enterprise_backward_compatible() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// 1000 projects total, pattern in 950 (95% adoption, RFC match)
|
||||
let decision = thresholds.evaluate(950, 1000, true, Some("rfc://9110"));
|
||||
|
||||
// Enterprise tier: max(100, 0.95*1000) = 950, rate = 95%
|
||||
// Should auto-promote to regulatory (same as legacy behavior)
|
||||
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_authority_matching() {
|
||||
// RFC source matches regulatory
|
||||
assert!(matches_authority(true, Some("rfc://9110"), &["rfc://".into(), "nist://".into()]));
|
||||
|
||||
// NIST source matches regulatory
|
||||
assert!(matches_authority(true, Some("nist://sp800-53"), &["rfc://".into(), "nist://".into()]));
|
||||
|
||||
// OWASP doesn't match regulatory
|
||||
assert!(!matches_authority(true, Some("owasp://top-10/a01"), &["rfc://".into(), "nist://".into()]));
|
||||
|
||||
// No authority doesn't match when required
|
||||
assert!(!matches_authority(false, None, &["rfc://".into()]));
|
||||
|
||||
// Empty sources accepts any authority
|
||||
assert!(matches_authority(true, Some("anything://"), &[]));
|
||||
}
|
||||
}
|
||||
|
||||
185
applications/aphoria/src/corpus/wiki_corpus_builder.rs
Normal file
185
applications/aphoria/src/corpus/wiki_corpus_builder.rs
Normal file
@ -0,0 +1,185 @@
|
||||
//! Wiki corpus builder
|
||||
//!
|
||||
//! Converts WikiPatterns into signed authoritative assertions for the corpus database.
|
||||
//! Reuses existing helpers from episteme/corpus.rs to handle signing and metadata.
|
||||
|
||||
use crate::corpus::authority_parser::{parse_authority, Authority};
|
||||
use crate::corpus::subject_builder::build_corpus_subject;
|
||||
use crate::corpus::wiki_importer::WikiPattern;
|
||||
use crate::episteme::create_authoritative_assertion_with_metadata;
|
||||
use crate::error::AphoriaError;
|
||||
use ed25519_dalek::SigningKey;
|
||||
use serde_json::json;
|
||||
use stemedb_core::types::SourceClass;
|
||||
use stemedb_storage::{HybridStore, KVStore};
|
||||
use std::sync::Arc;
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
use tracing::{info, warn};
|
||||
|
||||
/// Promote wiki patterns to corpus database as signed assertions
|
||||
///
|
||||
/// This function:
|
||||
/// 1. Parses authority strings into structured Authority enums
|
||||
/// 2. Builds proper subject URIs (rfc://, owasp://, cwe://, community://wiki/)
|
||||
/// 3. Creates signed assertions with rich metadata
|
||||
/// 4. Stores in corpus database with subject and predicate indexes
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
/// * `patterns` - WikiPatterns parsed from markdown files
|
||||
/// * `signing_key` - Ed25519 key for signing assertions
|
||||
/// * `corpus_store` - Corpus database KV store (NOT project database)
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// Number of patterns successfully promoted to corpus
|
||||
pub async fn promote_wiki_patterns_to_corpus(
|
||||
patterns: Vec<WikiPattern>,
|
||||
signing_key: &SigningKey,
|
||||
corpus_store: Arc<HybridStore>,
|
||||
) -> Result<usize, AphoriaError> {
|
||||
let mut promoted = 0;
|
||||
|
||||
for pattern in patterns {
|
||||
// Parse authority (or Unknown if missing)
|
||||
let authority = pattern
|
||||
.authority
|
||||
.as_ref()
|
||||
.map(|s| parse_authority(s))
|
||||
.unwrap_or_else(|| Authority::Unknown("wiki import".to_string()));
|
||||
|
||||
// Build proper subject URI
|
||||
let subject = build_corpus_subject(&pattern, &authority);
|
||||
|
||||
// Determine tier based on authority
|
||||
let source_class = match &authority {
|
||||
Authority::RFC { .. } | Authority::OWASP { .. } => SourceClass::Regulatory,
|
||||
Authority::CWE { .. } => SourceClass::Clinical,
|
||||
Authority::Unknown(_) => SourceClass::Community,
|
||||
};
|
||||
|
||||
// Get authority source string for metadata
|
||||
let authority_source = pattern
|
||||
.authority
|
||||
.clone()
|
||||
.unwrap_or_else(|| "wiki import".to_string());
|
||||
|
||||
// Build rich metadata
|
||||
let metadata = json!({
|
||||
"description": pattern.statement,
|
||||
"authority_source": authority_source,
|
||||
"category": infer_category(&pattern.subject),
|
||||
"source": "wiki_import"
|
||||
});
|
||||
|
||||
// Get current timestamp
|
||||
let timestamp = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.map_err(|e| AphoriaError::Io(std::io::Error::other(e)))?
|
||||
.as_secs();
|
||||
|
||||
// Create signed assertion (REUSE EXISTING HELPER)
|
||||
let assertion = create_authoritative_assertion_with_metadata(
|
||||
signing_key,
|
||||
&subject,
|
||||
&pattern.predicate,
|
||||
pattern.value.clone().into(),
|
||||
source_class,
|
||||
&pattern.statement,
|
||||
timestamp,
|
||||
metadata,
|
||||
);
|
||||
|
||||
// Serialize assertion
|
||||
let serialized = stemedb_core::serde::serialize(&assertion)
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to serialize assertion: {}", e)))?;
|
||||
|
||||
// Store with subject prefix for API querying
|
||||
let subject_key = format!("subject:{}", subject);
|
||||
corpus_store
|
||||
.put(subject_key.as_bytes(), &serialized)
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to store assertion: {}", e)))?;
|
||||
|
||||
// Also store in predicate index
|
||||
let pred_key = format!("predicate:corpus:{}", assertion.predicate);
|
||||
corpus_store
|
||||
.put(pred_key.as_bytes(), &serialized)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
AphoriaError::Storage(format!("Failed to store predicate index: {}", e))
|
||||
})?;
|
||||
|
||||
info!(
|
||||
"Promoted wiki pattern to corpus: {} -> {}",
|
||||
pattern.subject, subject
|
||||
);
|
||||
promoted += 1;
|
||||
}
|
||||
|
||||
if promoted > 0 {
|
||||
info!("Successfully promoted {} wiki patterns to corpus", promoted);
|
||||
} else {
|
||||
warn!("No wiki patterns were promoted to corpus");
|
||||
}
|
||||
|
||||
Ok(promoted)
|
||||
}
|
||||
|
||||
/// Infer category from subject path
|
||||
///
|
||||
/// Uses simple keyword matching to categorize patterns into:
|
||||
/// - security: TLS, JWT, password, auth, crypto
|
||||
/// - architecture: HTTP, API, REST
|
||||
/// - quality: test, CI
|
||||
/// - general: everything else
|
||||
fn infer_category(subject: &str) -> &str {
|
||||
let lower = subject.to_lowercase();
|
||||
if lower.contains("tls")
|
||||
|| lower.contains("jwt")
|
||||
|| lower.contains("password")
|
||||
|| lower.contains("auth")
|
||||
|| lower.contains("crypto")
|
||||
{
|
||||
"security"
|
||||
} else if lower.contains("http") || lower.contains("api") || lower.contains("rest") {
|
||||
"architecture"
|
||||
} else if lower.contains("test") || lower.contains("ci") {
|
||||
"quality"
|
||||
} else {
|
||||
"general"
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_infer_category_security() {
|
||||
assert_eq!(infer_category("tls/cert_verification"), "security");
|
||||
assert_eq!(infer_category("JWT/validation"), "security");
|
||||
assert_eq!(infer_category("password/storage"), "security");
|
||||
assert_eq!(infer_category("authentication/oauth"), "security");
|
||||
assert_eq!(infer_category("crypto/hashing"), "security");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_category_architecture() {
|
||||
assert_eq!(infer_category("http/headers"), "architecture");
|
||||
assert_eq!(infer_category("API/versioning"), "architecture");
|
||||
assert_eq!(infer_category("rest/endpoints"), "architecture");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_category_quality() {
|
||||
assert_eq!(infer_category("test/coverage"), "quality");
|
||||
assert_eq!(infer_category("CI/pipeline"), "quality");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_category_general() {
|
||||
assert_eq!(infer_category("logging/format"), "general");
|
||||
assert_eq!(infer_category("config/defaults"), "general");
|
||||
}
|
||||
}
|
||||
@ -3,9 +3,9 @@
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use crate::bridge;
|
||||
use crate::community::PatternAggregator;
|
||||
use stemedb_storage::KVStore;
|
||||
use crate::config::AphoriaConfig;
|
||||
use crate::corpus::{import_from_wiki, CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
||||
use crate::corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
||||
use crate::current_timestamp;
|
||||
use crate::episteme;
|
||||
use crate::error::AphoriaError;
|
||||
@ -53,10 +53,25 @@ pub async fn build_corpus(
|
||||
corpus_config.include_rfc = only.iter().any(|s| s == "rfc");
|
||||
corpus_config.include_owasp = only.iter().any(|s| s == "owasp");
|
||||
corpus_config.include_vendor = only.iter().any(|s| s == "vendor");
|
||||
corpus_config.use_community = only.iter().any(|s| s == "community");
|
||||
}
|
||||
|
||||
// Create registry with configured builders
|
||||
let registry = CorpusRegistry::with_defaults(&corpus_config);
|
||||
// Open Episteme to get access to stores for community corpus
|
||||
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
||||
|
||||
// Open corpus database for CLI-created items (if configured)
|
||||
let corpus_store = if let Some(ref corpus_data_dir) = config.episteme.corpus_data_dir {
|
||||
let corpus_episteme = episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||
Some(corpus_episteme.store().clone())
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Create registry with stores (enables community corpus builder and CLI-created items)
|
||||
let kv_store = episteme.store().clone();
|
||||
let predicate_index =
|
||||
std::sync::Arc::new(stemedb_storage::GenericPredicateIndexStore::new(kv_store.clone()));
|
||||
let registry = CorpusRegistry::with_stores(&corpus_config, kv_store, predicate_index, corpus_store);
|
||||
|
||||
// Load signing key
|
||||
let signing_key = bridge::load_or_generate_key(&project_root)?;
|
||||
@ -68,12 +83,13 @@ pub async fn build_corpus(
|
||||
|
||||
// Ingest into Episteme
|
||||
if !result.assertions.is_empty() {
|
||||
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
||||
let ingested = episteme.ingest_authoritative(&result.assertions).await?;
|
||||
episteme.shutdown().await;
|
||||
info!(ingested, "Corpus ingested into Episteme");
|
||||
}
|
||||
|
||||
// Shutdown episteme
|
||||
episteme.shutdown().await;
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
@ -149,11 +165,14 @@ pub async fn export_corpus_as_pack(
|
||||
Ok(assertion_count)
|
||||
}
|
||||
|
||||
/// Import patterns from wiki documentation and store as pattern aggregates.
|
||||
/// Import patterns from wiki documentation and store in corpus database.
|
||||
///
|
||||
/// This is a bootstrap operation for seeding the community corpus when
|
||||
/// starting fresh. Patterns extracted from wiki docs are stored as
|
||||
/// pattern aggregates in StemeDB with initial project_count = 1.
|
||||
/// This function:
|
||||
/// 1. Parses wiki markdown to extract WikiPatterns
|
||||
/// 2. Parses authority strings (RFC, OWASP, CWE) into structured Authority enums
|
||||
/// 3. Builds proper subject URIs (rfc://, owasp://, cwe://, community://wiki/)
|
||||
/// 4. Creates signed assertions with rich metadata
|
||||
/// 5. Stores in corpus database (~/.aphoria/corpus-db/) NOT project database
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
@ -162,19 +181,50 @@ pub async fn export_corpus_as_pack(
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// Number of patterns imported and stored.
|
||||
/// Number of patterns promoted to corpus database.
|
||||
#[instrument(skip(config), fields(wiki_path = %wiki_path.as_ref().display()))]
|
||||
pub async fn import_corpus_from_wiki<P: AsRef<Path>>(
|
||||
wiki_path: P,
|
||||
config: &AphoriaConfig,
|
||||
) -> Result<usize, AphoriaError> {
|
||||
info!("Importing corpus from wiki");
|
||||
use crate::corpus::promote_wiki_patterns_to_corpus;
|
||||
use crate::corpus::WikiParser;
|
||||
|
||||
info!("Importing wiki from: {}", wiki_path.as_ref().display());
|
||||
|
||||
let project_root = std::env::current_dir()?;
|
||||
let timestamp = current_timestamp();
|
||||
|
||||
// Parse wiki files and extract patterns
|
||||
let patterns = import_from_wiki(wiki_path, timestamp).await?;
|
||||
// Parse wiki files and extract WikiPatterns
|
||||
let parser = WikiParser::new()?;
|
||||
let mut patterns = Vec::new();
|
||||
|
||||
let wiki_path = wiki_path.as_ref();
|
||||
if !wiki_path.exists() {
|
||||
return Err(AphoriaError::Config(format!(
|
||||
"Wiki path does not exist: {}",
|
||||
wiki_path.display()
|
||||
)));
|
||||
}
|
||||
|
||||
// Walk directory for markdown files
|
||||
let walker = ignore::WalkBuilder::new(wiki_path)
|
||||
.follow_links(true)
|
||||
.build();
|
||||
|
||||
for entry in walker.flatten() {
|
||||
if entry.file_type().is_some_and(|ft| ft.is_file()) {
|
||||
let path = entry.path();
|
||||
if let Some(ext) = path.extension() {
|
||||
if ext == "md" {
|
||||
info!("Parsing wiki file: {}", path.display());
|
||||
let content = tokio::fs::read_to_string(path).await?;
|
||||
let file_patterns = parser.parse(&content)?;
|
||||
patterns.extend(file_patterns);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let pattern_count = patterns.len();
|
||||
|
||||
if patterns.is_empty() {
|
||||
@ -182,21 +232,378 @@ pub async fn import_corpus_from_wiki<P: AsRef<Path>>(
|
||||
return Ok(0);
|
||||
}
|
||||
|
||||
info!(pattern_count, "Extracted patterns from wiki");
|
||||
info!(pattern_count, "Parsed {} patterns from wiki", pattern_count);
|
||||
|
||||
// Open local Episteme to get storage handles
|
||||
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
||||
// Get corpus_data_dir from config (required)
|
||||
let corpus_data_dir = config
|
||||
.episteme
|
||||
.corpus_data_dir
|
||||
.as_ref()
|
||||
.ok_or_else(|| AphoriaError::Config("corpus_data_dir not configured".into()))?;
|
||||
|
||||
// Get stores for pattern aggregator
|
||||
let kv_store = episteme.get_kv_store();
|
||||
let predicate_index = episteme.get_predicate_index();
|
||||
// Open corpus database (NOT project database)
|
||||
let mut corpus_episteme =
|
||||
episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||
|
||||
// Create pattern aggregator and store patterns
|
||||
let aggregator = PatternAggregator::new(kv_store, predicate_index);
|
||||
aggregator.add_patterns(&patterns).await?;
|
||||
// Get signing key from corpus episteme
|
||||
let signing_key = corpus_episteme.signing_key().clone();
|
||||
|
||||
episteme.shutdown().await;
|
||||
// Promote wiki patterns to corpus database
|
||||
let promoted = promote_wiki_patterns_to_corpus(
|
||||
patterns,
|
||||
&signing_key,
|
||||
corpus_episteme.get_kv_store(),
|
||||
)
|
||||
.await?;
|
||||
|
||||
info!(imported = pattern_count, "Wiki patterns imported into corpus");
|
||||
Ok(pattern_count)
|
||||
corpus_episteme.shutdown().await;
|
||||
|
||||
info!(promoted, "Promoted {} wiki patterns to corpus database", promoted);
|
||||
Ok(promoted)
|
||||
}
|
||||
|
||||
/// Create a single corpus item from structured fields.
|
||||
///
|
||||
/// This function is used by the `aphoria corpus create` CLI command and by
|
||||
/// LLM-based extraction skills to programmatically add corpus items.
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
/// * `subject` - Hierarchical subject path (e.g., "ml/dependencies/basicsr/torchvision")
|
||||
/// * `predicate` - Predicate name (e.g., "incompatible_with", "requires")
|
||||
/// * `value` - Value as string (auto-detected as boolean, number, or text)
|
||||
/// * `explanation` - Full context and explanation for this claim
|
||||
/// * `authority` - Authority source (GitHub URL, paper citation, docs URL)
|
||||
/// * `category` - Category (compatibility, performance, security, architecture)
|
||||
/// * `tier` - Authority tier (0=regulatory, 1=clinical, 2=observational, 3=community)
|
||||
/// * `config` - Aphoria configuration
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// Corpus item ID in format "corpus://{subject}/{predicate}"
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
#[instrument(skip(config), fields(subject = %subject, tier = tier))]
|
||||
pub async fn create_corpus_item(
|
||||
subject: String,
|
||||
predicate: String,
|
||||
value: String,
|
||||
explanation: String,
|
||||
authority: String,
|
||||
category: String,
|
||||
tier: u8,
|
||||
config: &AphoriaConfig,
|
||||
) -> Result<String, AphoriaError> {
|
||||
use crate::episteme::create_authoritative_assertion_with_metadata;
|
||||
use stemedb_core::types::SourceClass;
|
||||
|
||||
// 1. Validate tier (0-3)
|
||||
let source_class = match tier {
|
||||
0 => SourceClass::Regulatory,
|
||||
1 => SourceClass::Clinical,
|
||||
2 => SourceClass::Observational,
|
||||
3 => SourceClass::Community,
|
||||
_ => {
|
||||
return Err(AphoriaError::Config(format!(
|
||||
"Invalid tier: {tier}. Must be 0-3"
|
||||
)))
|
||||
}
|
||||
};
|
||||
|
||||
// 2. Parse value into ObjectValue
|
||||
let object_value = parse_value_string(&value)?;
|
||||
|
||||
// 3. Infer URI scheme if not present
|
||||
let subject_uri = infer_subject_uri(&subject, tier, &authority)?;
|
||||
|
||||
// 4. Get project root and signing key
|
||||
let project_root = std::env::current_dir()?;
|
||||
let signing_key = bridge::load_or_generate_key(&project_root)?;
|
||||
|
||||
// 5. Get corpus database path from config
|
||||
let corpus_data_dir = config
|
||||
.episteme
|
||||
.corpus_data_dir
|
||||
.as_ref()
|
||||
.ok_or_else(|| AphoriaError::Config("corpus_data_dir not configured".into()))?;
|
||||
|
||||
// 6. Open corpus database
|
||||
let mut corpus_episteme =
|
||||
episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||
|
||||
// 7. Build metadata
|
||||
let metadata = serde_json::json!({
|
||||
"description": explanation,
|
||||
"authority_source": authority,
|
||||
"category": category,
|
||||
"source": "cli_create"
|
||||
});
|
||||
|
||||
// 8. Create signed assertion with URI-schemed subject
|
||||
let timestamp = current_timestamp();
|
||||
let assertion = create_authoritative_assertion_with_metadata(
|
||||
&signing_key,
|
||||
&subject_uri,
|
||||
&predicate,
|
||||
object_value,
|
||||
source_class,
|
||||
&explanation,
|
||||
timestamp,
|
||||
metadata,
|
||||
);
|
||||
|
||||
// 9. Serialize and store
|
||||
let serialized = stemedb_core::serde::serialize(&assertion)
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to serialize assertion: {e}")))?;
|
||||
|
||||
// Store with subject index (use URI-schemed subject)
|
||||
let subject_key = format!("subject:{}", subject_uri);
|
||||
corpus_episteme
|
||||
.store()
|
||||
.put(subject_key.as_bytes(), &serialized)
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to store: {e}")))?;
|
||||
|
||||
// Store with predicate index
|
||||
let pred_key = format!("predicate:corpus:{}", predicate);
|
||||
corpus_episteme
|
||||
.store()
|
||||
.put(pred_key.as_bytes(), &serialized)
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to store predicate index: {e}")))?;
|
||||
|
||||
// 10. Shutdown and return
|
||||
corpus_episteme.shutdown().await;
|
||||
|
||||
info!(subject = %subject_uri, predicate = %predicate, tier = tier, "Created corpus item");
|
||||
Ok(format!("corpus://{}/{}", subject_uri, predicate))
|
||||
}
|
||||
|
||||
/// Infer URI scheme from authority and tier.
|
||||
///
|
||||
/// If the subject already has a scheme (contains "://"), return as-is.
|
||||
/// Otherwise, infer scheme based on authority string and tier:
|
||||
/// - RFC authority → rfc://
|
||||
/// - OWASP authority → owasp://
|
||||
/// - CWE authority → cwe://
|
||||
/// - Tier 2 (observational) → vendor://
|
||||
/// - Tier 3 (community) → community://
|
||||
///
|
||||
/// # Examples
|
||||
///
|
||||
/// ```
|
||||
/// assert_eq!(infer_subject_uri("tls/validation", 0, "RFC 5280"), "rfc://tls/validation");
|
||||
/// assert_eq!(infer_subject_uri("xss/prevention", 1, "OWASP Top 10"), "owasp://xss/prevention");
|
||||
/// assert_eq!(infer_subject_uri("rfc://already/schemed", 0, "RFC 9999"), "rfc://already/schemed");
|
||||
/// ```
|
||||
fn infer_subject_uri(subject: &str, tier: u8, authority: &str) -> Result<String, AphoriaError> {
|
||||
// If already has scheme, return as-is
|
||||
if subject.contains("://") {
|
||||
return Ok(subject.to_string());
|
||||
}
|
||||
|
||||
// Infer scheme from authority and tier (case-insensitive matching)
|
||||
let authority_lower = authority.to_lowercase();
|
||||
let scheme = if authority_lower.contains("rfc") {
|
||||
"rfc"
|
||||
} else if authority_lower.contains("owasp") {
|
||||
"owasp"
|
||||
} else if authority_lower.contains("cwe") {
|
||||
"cwe"
|
||||
} else if tier == 2 {
|
||||
"vendor"
|
||||
} else if tier == 3 {
|
||||
"community"
|
||||
} else {
|
||||
// For tier 0 or 1 without recognized authority, use "corpus" as fallback
|
||||
"corpus"
|
||||
};
|
||||
|
||||
Ok(format!("{}://{}", scheme, subject))
|
||||
}
|
||||
|
||||
/// Parse value string into ObjectValue.
|
||||
///
|
||||
/// Attempts to parse as boolean, then number, then defaults to text.
|
||||
fn parse_value_string(value: &str) -> Result<stemedb_core::types::ObjectValue, AphoriaError> {
|
||||
use stemedb_core::types::ObjectValue;
|
||||
// Try boolean
|
||||
if value.eq_ignore_ascii_case("true") {
|
||||
return Ok(ObjectValue::Boolean(true));
|
||||
}
|
||||
if value.eq_ignore_ascii_case("false") {
|
||||
return Ok(ObjectValue::Boolean(false));
|
||||
}
|
||||
|
||||
// Try number
|
||||
if let Ok(n) = value.parse::<f64>() {
|
||||
return Ok(ObjectValue::Number(n));
|
||||
}
|
||||
|
||||
// Default to text
|
||||
Ok(ObjectValue::Text(value.to_string()))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_rfc_authority() {
|
||||
// RFC authority should infer rfc:// scheme (case-insensitive)
|
||||
let result = infer_subject_uri("tls/validation", 0, "RFC 5280").unwrap();
|
||||
assert_eq!(result, "rfc://tls/validation");
|
||||
|
||||
let result = infer_subject_uri("tls/cipher_suites", 1, "rfc 8446").unwrap();
|
||||
assert_eq!(result, "rfc://tls/cipher_suites");
|
||||
|
||||
let result = infer_subject_uri("http/headers", 2, "Rfc 7231").unwrap();
|
||||
assert_eq!(result, "rfc://http/headers");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_owasp_authority() {
|
||||
// OWASP authority should infer owasp:// scheme (case-insensitive)
|
||||
let result = infer_subject_uri("xss/prevention", 0, "OWASP Top 10").unwrap();
|
||||
assert_eq!(result, "owasp://xss/prevention");
|
||||
|
||||
let result = infer_subject_uri("csrf/token", 1, "owasp cheat sheet").unwrap();
|
||||
assert_eq!(result, "owasp://csrf/token");
|
||||
|
||||
let result = infer_subject_uri("injection/sql", 2, "Owasp Guide").unwrap();
|
||||
assert_eq!(result, "owasp://injection/sql");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_cwe_authority() {
|
||||
// CWE authority should infer cwe:// scheme (case-insensitive)
|
||||
let result = infer_subject_uri("buffer/overflow", 0, "CWE-120").unwrap();
|
||||
assert_eq!(result, "cwe://buffer/overflow");
|
||||
|
||||
let result = infer_subject_uri("path/traversal", 1, "cwe-22").unwrap();
|
||||
assert_eq!(result, "cwe://path/traversal");
|
||||
|
||||
let result = infer_subject_uri("injection/command", 2, "Cwe-78").unwrap();
|
||||
assert_eq!(result, "cwe://injection/command");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_vendor_tier() {
|
||||
// Tier 2 (observational) should infer vendor:// scheme
|
||||
let result = infer_subject_uri("ml/dependencies", 2, "GitHub Issue #123").unwrap();
|
||||
assert_eq!(result, "vendor://ml/dependencies");
|
||||
|
||||
let result = infer_subject_uri("api/rate_limit", 2, "Vendor Documentation").unwrap();
|
||||
assert_eq!(result, "vendor://api/rate_limit");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_community_tier() {
|
||||
// Tier 3 (community) should infer community:// scheme
|
||||
let result = infer_subject_uri("best_practices/logging", 3, "Team Wiki").unwrap();
|
||||
assert_eq!(result, "community://best_practices/logging");
|
||||
|
||||
let result = infer_subject_uri("patterns/error_handling", 3, "Internal Docs").unwrap();
|
||||
assert_eq!(result, "community://patterns/error_handling");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_corpus_fallback() {
|
||||
// Tier 0 or 1 without recognized authority should use corpus:// fallback
|
||||
let result = infer_subject_uri("custom/subject", 0, "Unknown Authority").unwrap();
|
||||
assert_eq!(result, "corpus://custom/subject");
|
||||
|
||||
let result = infer_subject_uri("another/subject", 1, "Some Other Source").unwrap();
|
||||
assert_eq!(result, "corpus://another/subject");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_already_schemed() {
|
||||
// Subjects with existing schemes should be returned as-is
|
||||
let result = infer_subject_uri("rfc://already/schemed", 0, "RFC 9999").unwrap();
|
||||
assert_eq!(result, "rfc://already/schemed");
|
||||
|
||||
let result = infer_subject_uri("owasp://already/schemed", 1, "OWASP").unwrap();
|
||||
assert_eq!(result, "owasp://already/schemed");
|
||||
|
||||
let result = infer_subject_uri("custom://some/path", 2, "Vendor").unwrap();
|
||||
assert_eq!(result, "custom://some/path");
|
||||
|
||||
let result = infer_subject_uri("http://example.com/path", 3, "Community").unwrap();
|
||||
assert_eq!(result, "http://example.com/path");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_infer_subject_uri_authority_priority() {
|
||||
// Authority string takes priority over tier for scheme inference
|
||||
let result = infer_subject_uri("test/subject", 3, "RFC 1234").unwrap();
|
||||
assert_eq!(result, "rfc://test/subject"); // RFC wins over tier 3
|
||||
|
||||
let result = infer_subject_uri("test/subject", 2, "OWASP Guide").unwrap();
|
||||
assert_eq!(result, "owasp://test/subject"); // OWASP wins over tier 2
|
||||
|
||||
let result = infer_subject_uri("test/subject", 3, "CWE-999").unwrap();
|
||||
assert_eq!(result, "cwe://test/subject"); // CWE wins over tier 3
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_value_string_boolean() {
|
||||
use stemedb_core::types::ObjectValue;
|
||||
|
||||
// Test boolean parsing (case-insensitive)
|
||||
assert_eq!(
|
||||
parse_value_string("true").unwrap(),
|
||||
ObjectValue::Boolean(true)
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("TRUE").unwrap(),
|
||||
ObjectValue::Boolean(true)
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("false").unwrap(),
|
||||
ObjectValue::Boolean(false)
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("False").unwrap(),
|
||||
ObjectValue::Boolean(false)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_value_string_number() {
|
||||
use stemedb_core::types::ObjectValue;
|
||||
|
||||
// Test number parsing
|
||||
assert_eq!(parse_value_string("42").unwrap(), ObjectValue::Number(42.0));
|
||||
assert_eq!(
|
||||
parse_value_string("3.14").unwrap(),
|
||||
ObjectValue::Number(3.14)
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("-100").unwrap(),
|
||||
ObjectValue::Number(-100.0)
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("0.0").unwrap(),
|
||||
ObjectValue::Number(0.0)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_value_string_text() {
|
||||
use stemedb_core::types::ObjectValue;
|
||||
|
||||
// Test text parsing (fallback for non-boolean, non-number)
|
||||
assert_eq!(
|
||||
parse_value_string("hello world").unwrap(),
|
||||
ObjectValue::Text("hello world".to_string())
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("not_a_bool").unwrap(),
|
||||
ObjectValue::Text("not_a_bool".to_string())
|
||||
);
|
||||
assert_eq!(
|
||||
parse_value_string("1.2.3").unwrap(),
|
||||
ObjectValue::Text("1.2.3".to_string())
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@ -42,6 +42,96 @@ pub struct LocalEpisteme {
|
||||
}
|
||||
|
||||
impl LocalEpisteme {
|
||||
/// Open corpus database (shared across projects).
|
||||
///
|
||||
/// This opens a separate database for corpus assertions (RFC, OWASP, etc.)
|
||||
/// stored in `~/.aphoria/corpus-db/` instead of the project-local database.
|
||||
#[instrument(fields(corpus_data_dir = ?corpus_data_dir))]
|
||||
pub async fn open_corpus_db(corpus_data_dir: &Path, project_root: &Path) -> Result<Self, AphoriaError> {
|
||||
// Expand tilde if present
|
||||
let corpus_path = if let Some(path_str) = corpus_data_dir.to_str() {
|
||||
if path_str.starts_with('~') {
|
||||
let expanded = shellexpand::tilde(path_str).into_owned();
|
||||
PathBuf::from(expanded)
|
||||
} else {
|
||||
corpus_data_dir.to_path_buf()
|
||||
}
|
||||
} else {
|
||||
corpus_data_dir.to_path_buf()
|
||||
};
|
||||
|
||||
// Create directory if it doesn't exist
|
||||
tokio::fs::create_dir_all(&corpus_path).await
|
||||
.map_err(AphoriaError::Io)?;
|
||||
|
||||
// Canonicalize (required by fjall/lsm-tree)
|
||||
let corpus_path = corpus_path.canonicalize().map_err(|e| {
|
||||
AphoriaError::Storage(format!("Failed to canonicalize corpus_data_dir: {}", e))
|
||||
})?;
|
||||
|
||||
let wal_dir = corpus_path.join("wal");
|
||||
std::fs::create_dir_all(&wal_dir)?;
|
||||
|
||||
info!("Opening corpus database at {}", corpus_path.display());
|
||||
|
||||
// Open WAL
|
||||
let journal = Arc::new(Mutex::new(Journal::open(&wal_dir).map_err(|e| {
|
||||
AphoriaError::Storage(format!("Failed to open corpus WAL at {}: {e}", wal_dir.display()))
|
||||
})?));
|
||||
|
||||
// Open store (directly at corpus_path, matching API behavior)
|
||||
let store = Arc::new(HybridStore::open(&corpus_path).map_err(|e| {
|
||||
AphoriaError::Storage(format!("Failed to open corpus store at {}: {e}", corpus_path.display()))
|
||||
})?);
|
||||
|
||||
// Create ingestor
|
||||
let mut ingestor = Ingestor::new(journal.clone(), store.clone())
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to create corpus ingestor: {e}")))?;
|
||||
ingestor.start();
|
||||
|
||||
// Load or generate signing key (from project root)
|
||||
let signing_key = load_or_generate_key(project_root).map_err(|e| {
|
||||
AphoriaError::Storage(format!(
|
||||
"Failed to load/generate signing key at {}: {e}",
|
||||
project_root.display()
|
||||
))
|
||||
})?;
|
||||
|
||||
// Create stores
|
||||
let alias_store = GenericAliasStore::new(store.clone());
|
||||
let predicate_index_store = GenericPredicateIndexStore::new(store.clone());
|
||||
let pack_source_store = GenericPackSourceStore::new(store.clone());
|
||||
let predicate_alias_store = GenericPredicateAliasStore::new(store.clone());
|
||||
|
||||
// Load predicate aliases
|
||||
let stored_aliases = predicate_alias_store
|
||||
.list_all_predicate_aliases()
|
||||
.await
|
||||
.map_err(|e| AphoriaError::Storage(format!("Failed to load corpus predicate aliases: {e}")))?;
|
||||
let predicate_aliases: Vec<PredicateAliasSet> = stored_aliases
|
||||
.into_iter()
|
||||
.map(|s| PredicateAliasSet::new(s.canonical, s.aliases))
|
||||
.collect();
|
||||
|
||||
if !predicate_aliases.is_empty() {
|
||||
info!(count = predicate_aliases.len(), "Loaded predicate aliases from corpus storage");
|
||||
}
|
||||
|
||||
Ok(Self {
|
||||
journal,
|
||||
store,
|
||||
ingestor,
|
||||
signing_key,
|
||||
alias_store,
|
||||
predicate_index_store,
|
||||
pack_source_store,
|
||||
predicate_alias_store,
|
||||
predicate_aliases,
|
||||
project_root: project_root.to_path_buf(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Open or create a local Episteme instance.
|
||||
#[instrument(skip(config), fields(data_dir = %config.episteme.data_dir.display()))]
|
||||
pub async fn open(config: &AphoriaConfig, project_root: &Path) -> Result<Self, AphoriaError> {
|
||||
@ -143,6 +233,11 @@ impl LocalEpisteme {
|
||||
self.signing_key.verifying_key().to_bytes()
|
||||
}
|
||||
|
||||
/// Get a reference to the signing key for creating assertions.
|
||||
pub fn signing_key(&self) -> &SigningKey {
|
||||
&self.signing_key
|
||||
}
|
||||
|
||||
/// Get a reference to the alias store for querying created aliases.
|
||||
#[allow(dead_code)]
|
||||
pub fn alias_store(&self) -> &GenericAliasStore<Arc<HybridStore>> {
|
||||
@ -169,7 +264,10 @@ impl LocalEpisteme {
|
||||
// Create registry with all builders including community (if enabled)
|
||||
// Note: GenericPredicateIndexStore doesn't implement Clone, so we create a new one
|
||||
let predicate_index = Arc::new(GenericPredicateIndexStore::new(self.store.clone()));
|
||||
let registry = CorpusRegistry::with_stores(config, self.store.clone(), predicate_index);
|
||||
|
||||
// No corpus_store here - CLI-created items are only needed in explicit corpus builds,
|
||||
// not during scans (which use project-local episteme)
|
||||
let registry = CorpusRegistry::with_stores(config, self.store.clone(), predicate_index, None);
|
||||
|
||||
let timestamp = current_timestamp();
|
||||
|
||||
|
||||
@ -88,5 +88,37 @@ pub async fn handle_corpus_command(command: CorpusCommands, config: &AphoriaConf
|
||||
}
|
||||
ExitCode::SUCCESS
|
||||
}
|
||||
|
||||
CorpusCommands::Create {
|
||||
subject,
|
||||
predicate,
|
||||
value,
|
||||
explanation,
|
||||
authority,
|
||||
category,
|
||||
tier,
|
||||
} => {
|
||||
match aphoria::create_corpus_item(
|
||||
subject,
|
||||
predicate,
|
||||
value,
|
||||
explanation,
|
||||
authority,
|
||||
category,
|
||||
tier,
|
||||
config,
|
||||
)
|
||||
.await
|
||||
{
|
||||
Ok(corpus_id) => {
|
||||
println!("Created corpus item: {}", corpus_id);
|
||||
ExitCode::SUCCESS
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Error creating corpus item: {e}");
|
||||
ExitCode::from(3)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -107,8 +107,8 @@ pub use config::{
|
||||
};
|
||||
pub use corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
||||
pub use corpus_build::{
|
||||
build_corpus, export_corpus_as_pack, import_corpus_from_wiki, list_corpus_sources,
|
||||
CorpusBuildArgs,
|
||||
build_corpus, create_corpus_item, export_corpus_as_pack, import_corpus_from_wiki,
|
||||
list_corpus_sources, CorpusBuildArgs,
|
||||
};
|
||||
pub use coverage::{
|
||||
compute_coverage, compute_coverage_from_report, format_coverage_json, format_coverage_markdown,
|
||||
|
||||
140
applications/aphoria/tests/scale_adaptive_test.rs
Normal file
140
applications/aphoria/tests/scale_adaptive_test.rs
Normal file
@ -0,0 +1,140 @@
|
||||
//! Integration tests for scale-adaptive promotion thresholds.
|
||||
//!
|
||||
//! Verifies that promotion criteria automatically adjust based on organization size,
|
||||
//! enabling small teams to see value immediately while maintaining quality gates
|
||||
//! for larger organizations.
|
||||
|
||||
use aphoria::corpus::thresholds::{PromotionDecision, ScaleAdaptiveThresholds, ScaleTier};
|
||||
use stemedb_core::types::SourceClass;
|
||||
|
||||
#[test]
|
||||
fn test_micro_team_sees_patterns() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Micro team with 3 projects, pattern appears in 2
|
||||
let decision = thresholds.evaluate(
|
||||
2, // project_count
|
||||
3, // total_projects
|
||||
false, // no authority
|
||||
None,
|
||||
);
|
||||
|
||||
// With adaptive thresholds:
|
||||
// - Scale tier: Micro (1-5 projects)
|
||||
// - Emerging min_projects: max(2, 0.50*3) = max(2, 1.5) = 2
|
||||
// - Adoption rate: 2/3 = 67% >= 50%
|
||||
// Should require review (emerging tier)
|
||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_micro_team_regulatory_disabled() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Micro team with 5 projects, pattern appears in all 5 with RFC match
|
||||
let decision = thresholds.evaluate(
|
||||
5, // project_count
|
||||
5, // total_projects
|
||||
true, // has authority
|
||||
Some("rfc://1234"), // RFC scheme
|
||||
);
|
||||
|
||||
// Regulatory tier is disabled for micro teams
|
||||
// Should fall through to emerging tier
|
||||
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_small_team_enables_all_tiers() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Small team with 10 projects, pattern in 9 with RFC match
|
||||
let decision = thresholds.evaluate(
|
||||
9, // project_count
|
||||
10, // total_projects
|
||||
true, // has authority
|
||||
Some("rfc://5246"), // RFC scheme
|
||||
);
|
||||
|
||||
// Small tier regulatory: max(5, 0.90*10) = max(5, 9) = 9
|
||||
// Adoption rate: 9/10 = 90% >= 90%
|
||||
// Should auto-promote to regulatory
|
||||
assert_eq!(
|
||||
decision,
|
||||
PromotionDecision::AutoPromote(SourceClass::Regulatory)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_enterprise_maintains_strict_thresholds() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Enterprise with 1000 projects, pattern in 950 with RFC match
|
||||
let decision = thresholds.evaluate(
|
||||
950, // project_count
|
||||
1000, // total_projects
|
||||
true, // has authority
|
||||
Some("rfc://9110"), // RFC scheme
|
||||
);
|
||||
|
||||
// Enterprise tier: max(100, 0.95*1000) = max(100, 950) = 950
|
||||
// Adoption rate: 950/1000 = 95% >= 95%
|
||||
// Should auto-promote to regulatory (backward compatible behavior)
|
||||
assert_eq!(
|
||||
decision,
|
||||
PromotionDecision::AutoPromote(SourceClass::Regulatory)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_scale_tier_progression() {
|
||||
// Verify scale tier boundaries
|
||||
assert_eq!(ScaleTier::from_total_projects(1), ScaleTier::Micro);
|
||||
assert_eq!(ScaleTier::from_total_projects(5), ScaleTier::Micro);
|
||||
assert_eq!(ScaleTier::from_total_projects(6), ScaleTier::Small);
|
||||
assert_eq!(ScaleTier::from_total_projects(25), ScaleTier::Small);
|
||||
assert_eq!(ScaleTier::from_total_projects(26), ScaleTier::Medium);
|
||||
assert_eq!(ScaleTier::from_total_projects(100), ScaleTier::Medium);
|
||||
assert_eq!(ScaleTier::from_total_projects(101), ScaleTier::Large);
|
||||
assert_eq!(ScaleTier::from_total_projects(500), ScaleTier::Large);
|
||||
assert_eq!(ScaleTier::from_total_projects(501), ScaleTier::Enterprise);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_adaptive_floor_prevents_noise() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Micro team with 3 projects, pattern appears in only 1
|
||||
let decision = thresholds.evaluate(
|
||||
1, // project_count
|
||||
3, // total_projects
|
||||
false, // no authority
|
||||
None,
|
||||
);
|
||||
|
||||
// Even though 1/3 = 33% meets percentage (50% of 3 = 1.5),
|
||||
// the floor of 2 prevents single-project noise
|
||||
// Adoption rate: 1/3 = 33% < 50%
|
||||
assert_eq!(decision, PromotionDecision::Skip);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_medium_team_clinical_tier() {
|
||||
let thresholds = ScaleAdaptiveThresholds::default();
|
||||
|
||||
// Medium team with 50 projects, pattern in 38 with OWASP match
|
||||
let decision = thresholds.evaluate(
|
||||
38, // project_count
|
||||
50, // total_projects
|
||||
true, // has authority
|
||||
Some("owasp://top-10/a01"), // OWASP scheme
|
||||
);
|
||||
|
||||
// Medium tier clinical: max(10, 0.75*50) = max(10, 37.5) = 38
|
||||
// Adoption rate: 38/50 = 76% >= 75%
|
||||
// Should auto-promote to clinical
|
||||
assert_eq!(
|
||||
decision,
|
||||
PromotionDecision::AutoPromote(SourceClass::Clinical)
|
||||
);
|
||||
}
|
||||
@ -26,6 +26,7 @@ axum = { version = "0.7", features = ["json"] }
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
serde_qs = "0.13"
|
||||
utoipa = { version = "5", features = ["axum_extras"] }
|
||||
utoipa-axum = "0.1"
|
||||
utoipa-swagger-ui = { version = "8", features = ["axum"] }
|
||||
|
||||
@ -303,3 +303,31 @@ pub struct AcknowledgeViolationRequest {
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub expires_at: Option<String>,
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Corpus Endpoint DTOs
|
||||
// ============================================================================
|
||||
|
||||
/// Request to get corpus items from authoritative sources.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||
pub struct GetCorpusRequest {
|
||||
/// Filter by source schemes (e.g., ["rfc", "owasp", "community"]).
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub sources: Option<Vec<String>>,
|
||||
|
||||
/// Filter by category (e.g., "security", "architecture").
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub category: Option<String>,
|
||||
|
||||
/// Maximum number of items to return (default: 100).
|
||||
#[serde(default = "default_corpus_limit")]
|
||||
pub limit: usize,
|
||||
|
||||
/// Pagination offset (default: 0).
|
||||
#[serde(default)]
|
||||
pub offset: usize,
|
||||
}
|
||||
|
||||
fn default_corpus_limit() -> usize {
|
||||
100
|
||||
}
|
||||
|
||||
@ -270,3 +270,22 @@ pub struct AcknowledgeViolationResponse {
|
||||
/// Status message.
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Corpus Endpoint DTOs
|
||||
// ============================================================================
|
||||
|
||||
use super::types::CorpusItemDto;
|
||||
|
||||
/// Response containing corpus items from authoritative sources.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||
pub struct GetCorpusResponse {
|
||||
/// The corpus items matching the query.
|
||||
pub items: Vec<CorpusItemDto>,
|
||||
|
||||
/// Total number of items matching (before limit applied).
|
||||
pub total_matching: usize,
|
||||
|
||||
/// Sources included in this response.
|
||||
pub sources_included: Vec<String>,
|
||||
}
|
||||
|
||||
@ -490,3 +490,39 @@ pub struct CoverageSummaryDto {
|
||||
/// Number of modules with zero claims.
|
||||
pub modules_without_claims: usize,
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Corpus Types
|
||||
// ============================================================================
|
||||
|
||||
/// A single corpus item (authoritative assertion from RFC/OWASP/Community).
|
||||
///
|
||||
/// Unlike PatternDto (which shows statistical aggregates), CorpusItemDto
|
||||
/// represents valuable best practices from trusted sources.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||
pub struct CorpusItemDto {
|
||||
/// The subject path (e.g., "rfc://9110/methods/GET", "owasp://a03/tls/version").
|
||||
pub subject: String,
|
||||
|
||||
/// The predicate (e.g., "case_sensitive", "min_version").
|
||||
pub predicate: String,
|
||||
|
||||
/// Display value (e.g., "true", "TLS 1.2").
|
||||
pub value: String,
|
||||
|
||||
/// Source identifier (e.g., "rfc://9110", "owasp://a03", "community://pattern/xyz").
|
||||
pub source: String,
|
||||
|
||||
/// Authority tier (0-4: Regulatory=0, RFC/OWASP=0, Expert=3, Community=4).
|
||||
pub tier: u8,
|
||||
|
||||
/// Optional category (e.g., "security", "architecture", "performance").
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub category: Option<String>,
|
||||
|
||||
/// Human-readable explanation of the best practice.
|
||||
pub explanation: String,
|
||||
|
||||
/// Authority source citation (e.g., "RFC 9110 Section 9.1", "OWASP A03:2021").
|
||||
pub authority_source: String,
|
||||
}
|
||||
|
||||
187
crates/stemedb-api/src/extractors.rs
Normal file
187
crates/stemedb-api/src/extractors.rs
Normal file
@ -0,0 +1,187 @@
|
||||
//! Custom axum extractors for the StemeDB API.
|
||||
|
||||
use axum::{
|
||||
async_trait,
|
||||
extract::FromRequestParts,
|
||||
http::{request::Parts, StatusCode},
|
||||
response::{IntoResponse, Response},
|
||||
};
|
||||
use serde::de::DeserializeOwned;
|
||||
use std::fmt;
|
||||
|
||||
/// Rejection type for QsQuery extraction failures.
|
||||
#[derive(Debug)]
|
||||
pub struct QsQueryRejection {
|
||||
message: String,
|
||||
}
|
||||
|
||||
impl fmt::Display for QsQueryRejection {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
write!(f, "Failed to deserialize query string: {}", self.message)
|
||||
}
|
||||
}
|
||||
|
||||
impl std::error::Error for QsQueryRejection {}
|
||||
|
||||
impl IntoResponse for QsQueryRejection {
|
||||
fn into_response(self) -> Response {
|
||||
(StatusCode::BAD_REQUEST, self.message).into_response()
|
||||
}
|
||||
}
|
||||
|
||||
/// Query string extractor that supports bracket notation (e.g., `?sources[]=value1&sources[]=value2`).
|
||||
///
|
||||
/// This extractor uses `serde_qs` instead of `serde_urlencoded` to properly handle
|
||||
/// array parameters with bracket notation, which is the standard format used by
|
||||
/// JavaScript's URLSearchParams and the StemeDB Dashboard.
|
||||
///
|
||||
/// # When to Use QsQuery vs Query
|
||||
///
|
||||
/// **Use `QsQuery` when:**
|
||||
/// - Your request DTO contains `Vec<T>` or `Option<Vec<T>>` fields
|
||||
/// - The endpoint is called by the dashboard or JavaScript clients
|
||||
/// - You need bracket notation support: `?filters[]=a&filters[]=b`
|
||||
///
|
||||
/// **Use standard `axum::extract::Query` when:**
|
||||
/// - All query parameters are scalars (String, usize, bool, Option<String>, etc.)
|
||||
/// - No array/vector parameters needed
|
||||
/// - Simpler and lighter weight for non-array cases
|
||||
///
|
||||
/// # Example
|
||||
///
|
||||
/// ```rust,ignore
|
||||
/// use stemedb_api::extractors::QsQuery;
|
||||
/// use serde::Deserialize;
|
||||
///
|
||||
/// #[derive(Deserialize)]
|
||||
/// struct MyRequest {
|
||||
/// sources: Option<Vec<String>>, // Array parameter
|
||||
/// limit: usize, // Scalar parameter
|
||||
/// }
|
||||
///
|
||||
/// // ✅ Correct - QsQuery handles both array and scalar params
|
||||
/// async fn handler(QsQuery(params): QsQuery<MyRequest>) {
|
||||
/// // Dashboard sends: ?sources[]=rfc&sources[]=community&limit=10
|
||||
/// // params.sources = Some(vec!["rfc", "community"])
|
||||
/// // params.limit = 10
|
||||
/// }
|
||||
///
|
||||
/// // ❌ Wrong - standard Query can't parse bracket notation
|
||||
/// async fn wrong_handler(Query(params): Query<MyRequest>) {
|
||||
/// // Dashboard sends: ?sources[]=rfc&sources[]=community
|
||||
/// // Result: params.sources = None (silently fails!)
|
||||
/// }
|
||||
/// ```
|
||||
///
|
||||
/// # Dashboard Compatibility
|
||||
///
|
||||
/// The StemeDB Dashboard uses JavaScript's `URLSearchParams.append()` which generates
|
||||
/// bracket notation for arrays:
|
||||
///
|
||||
/// ```javascript
|
||||
/// // Dashboard code
|
||||
/// params.sources.forEach(s => searchParams.append("sources[]", s));
|
||||
/// // Generates: ?sources[]=rfc&sources[]=owasp&sources[]=community
|
||||
/// ```
|
||||
///
|
||||
/// If you use standard `Query` for array parameters, the dashboard filters will appear
|
||||
/// to work but silently fail (returning all results instead of filtered results).
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
pub struct QsQuery<T>(pub T);
|
||||
|
||||
#[async_trait]
|
||||
impl<T, S> FromRequestParts<S> for QsQuery<T>
|
||||
where
|
||||
T: DeserializeOwned,
|
||||
S: Send + Sync,
|
||||
{
|
||||
type Rejection = QsQueryRejection;
|
||||
|
||||
async fn from_request_parts(parts: &mut Parts, _state: &S) -> Result<Self, Self::Rejection> {
|
||||
let query = parts.uri.query().unwrap_or_default();
|
||||
let value = serde_qs::from_str(query).map_err(|err| QsQueryRejection {
|
||||
message: err.to_string(),
|
||||
})?;
|
||||
Ok(QsQuery(value))
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> std::ops::Deref for QsQuery<T> {
|
||||
type Target = T;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
&self.0
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> std::ops::DerefMut for QsQuery<T> {
|
||||
fn deref_mut(&mut self) -> &mut Self::Target {
|
||||
&mut self.0
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use axum::http::{Request, Uri};
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Debug, Deserialize, PartialEq)]
|
||||
struct TestParams {
|
||||
sources: Option<Vec<String>>,
|
||||
limit: Option<usize>,
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_bracket_notation() {
|
||||
let uri: Uri = "http://example.com?sources[]=rfc&sources[]=community&limit=10"
|
||||
.parse()
|
||||
.unwrap();
|
||||
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||
|
||||
let QsQuery(params): QsQuery<TestParams> =
|
||||
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||
|
||||
assert_eq!(
|
||||
params,
|
||||
TestParams {
|
||||
sources: Some(vec!["rfc".to_string(), "community".to_string()]),
|
||||
limit: Some(10),
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_no_brackets() {
|
||||
let uri: Uri = "http://example.com?limit=5".parse().unwrap();
|
||||
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||
|
||||
let QsQuery(params): QsQuery<TestParams> =
|
||||
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||
|
||||
assert_eq!(
|
||||
params,
|
||||
TestParams {
|
||||
sources: None,
|
||||
limit: Some(5),
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_empty_query() {
|
||||
let uri: Uri = "http://example.com".parse().unwrap();
|
||||
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||
|
||||
let QsQuery(params): QsQuery<TestParams> =
|
||||
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||
|
||||
assert_eq!(
|
||||
params,
|
||||
TestParams {
|
||||
sources: None,
|
||||
limit: None,
|
||||
}
|
||||
);
|
||||
}
|
||||
}
|
||||
182
crates/stemedb-api/src/handlers/aphoria/corpus.rs
Normal file
182
crates/stemedb-api/src/handlers/aphoria/corpus.rs
Normal file
@ -0,0 +1,182 @@
|
||||
//! Corpus query handler for Aphoria.
|
||||
//!
|
||||
//! This endpoint returns authoritative assertions from RFC, OWASP, and Community
|
||||
//! corpus sources - valuable best practices rather than statistical aggregates.
|
||||
|
||||
use axum::{extract::State, Json};
|
||||
use stemedb_core::types::{ObjectValue, SourceClass};
|
||||
use stemedb_storage::KVStore;
|
||||
use tracing::instrument;
|
||||
|
||||
use crate::{
|
||||
dto::aphoria::{CorpusItemDto, GetCorpusRequest, GetCorpusResponse},
|
||||
error::{ApiError, Result},
|
||||
extractors::QsQuery,
|
||||
state::AppState,
|
||||
};
|
||||
|
||||
/// Get corpus items from authoritative sources (RFC, OWASP, vendor, community patterns, and CLI-created items).
|
||||
///
|
||||
/// Unlike the `/patterns` endpoint (which returns statistical aggregates),
|
||||
/// this endpoint returns valuable, curated best practices from trusted sources.
|
||||
#[utoipa::path(
|
||||
get,
|
||||
path = "/v1/aphoria/corpus",
|
||||
params(
|
||||
("sources" = Option<Vec<String>>, Query, description = "Filter by source schemes (rfc, owasp, community, vendor)"),
|
||||
("category" = Option<String>, Query, description = "Filter by category (security, architecture, etc.)"),
|
||||
("limit" = usize, Query, description = "Maximum items to return (default: 100)"),
|
||||
("offset" = usize, Query, description = "Pagination offset (default: 0)"),
|
||||
),
|
||||
responses(
|
||||
(status = 200, description = "Corpus items retrieved successfully", body = GetCorpusResponse),
|
||||
(status = 400, description = "Invalid request", body = crate::dto::ErrorResponse),
|
||||
(status = 500, description = "Internal server error", body = crate::dto::ErrorResponse),
|
||||
),
|
||||
tag = "aphoria"
|
||||
)]
|
||||
#[instrument(skip_all, fields(sources = ?params.sources, limit = params.limit, offset = params.offset))]
|
||||
pub async fn get_corpus(
|
||||
State(state): State<AppState>,
|
||||
QsQuery(params): QsQuery<GetCorpusRequest>,
|
||||
) -> Result<Json<GetCorpusResponse>> {
|
||||
// Determine which source prefixes to query
|
||||
let source_prefixes = if let Some(sources) = ¶ms.sources {
|
||||
sources
|
||||
.iter()
|
||||
.map(|s| match s.as_str() {
|
||||
"rfc" => "rfc://",
|
||||
"owasp" => "owasp://",
|
||||
"community" => "community://",
|
||||
"vendor" => "vendor://",
|
||||
_ => s.as_str(),
|
||||
})
|
||||
.collect::<Vec<_>>()
|
||||
} else {
|
||||
// Default: query all authoritative sources
|
||||
vec!["rfc://", "owasp://", "community://", "vendor://"]
|
||||
};
|
||||
|
||||
let mut all_items = Vec::new();
|
||||
let mut sources_included = std::collections::HashSet::new();
|
||||
|
||||
// Query each source prefix
|
||||
for prefix in source_prefixes {
|
||||
let prefix_key = format!("subject:{}", prefix);
|
||||
let pairs = state
|
||||
.corpus_store
|
||||
.scan_prefix(prefix_key.as_bytes())
|
||||
.await
|
||||
.map_err(|e| ApiError::Internal(format!("Failed to scan corpus: {}", e)))?;
|
||||
|
||||
for (_key, value) in pairs {
|
||||
// Deserialize assertion
|
||||
let assertion: stemedb_core::types::Assertion =
|
||||
stemedb_core::serde::deserialize(&value)
|
||||
.map_err(|e| ApiError::Internal(format!("Failed to deserialize assertion: {}", e)))?;
|
||||
|
||||
// Extract metadata
|
||||
let metadata: Option<serde_json::Value> = assertion
|
||||
.source_metadata
|
||||
.as_ref()
|
||||
.and_then(|bytes| serde_json::from_slice(bytes).ok());
|
||||
|
||||
let explanation = metadata
|
||||
.as_ref()
|
||||
.and_then(|m| m.get("description"))
|
||||
.and_then(|v| v.as_str())
|
||||
.unwrap_or("No description")
|
||||
.to_string();
|
||||
|
||||
let category = metadata
|
||||
.as_ref()
|
||||
.and_then(|m| m.get("category"))
|
||||
.and_then(|v| v.as_str())
|
||||
.map(|s| s.to_string());
|
||||
|
||||
let authority_source = metadata
|
||||
.as_ref()
|
||||
.and_then(|m| m.get("authority_source"))
|
||||
.and_then(|v| v.as_str())
|
||||
.or_else(|| {
|
||||
// Fallback: extract from subject
|
||||
if assertion.subject.starts_with("rfc://") {
|
||||
Some("RFC")
|
||||
} else if assertion.subject.starts_with("owasp://") {
|
||||
Some("OWASP")
|
||||
} else if assertion.subject.starts_with("community://") {
|
||||
Some("Community")
|
||||
} else if assertion.subject.starts_with("vendor://") {
|
||||
Some("Vendor")
|
||||
} else {
|
||||
Some("Unknown")
|
||||
}
|
||||
})
|
||||
.unwrap_or("Unknown")
|
||||
.to_string();
|
||||
|
||||
// Filter by category if requested
|
||||
if let Some(ref filter_category) = params.category {
|
||||
if category.as_deref() != Some(filter_category.as_str()) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
// Extract source scheme
|
||||
let source = if let Some(pos) = assertion.subject.find("://") {
|
||||
let scheme_end = assertion.subject[..pos].to_string();
|
||||
format!("{}://", scheme_end)
|
||||
} else {
|
||||
assertion.subject.clone()
|
||||
};
|
||||
|
||||
sources_included.insert(source.clone());
|
||||
|
||||
// Convert object to display value
|
||||
let value = match &assertion.object {
|
||||
ObjectValue::Boolean(b) => b.to_string(),
|
||||
ObjectValue::Number(n) => n.to_string(),
|
||||
ObjectValue::Text(s) => s.clone(),
|
||||
ObjectValue::Reference(r) => r.clone(),
|
||||
};
|
||||
|
||||
// Map SourceClass to tier number
|
||||
let tier = match assertion.source_class {
|
||||
SourceClass::Regulatory => 0,
|
||||
SourceClass::Clinical => 1,
|
||||
SourceClass::Observational => 2,
|
||||
SourceClass::Expert => 3,
|
||||
SourceClass::Community => 4,
|
||||
SourceClass::Anecdotal => 5,
|
||||
SourceClass::TeamPolicy => 1, // Treat team policy similar to clinical
|
||||
};
|
||||
|
||||
all_items.push(CorpusItemDto {
|
||||
subject: assertion.subject,
|
||||
predicate: assertion.predicate,
|
||||
value,
|
||||
source,
|
||||
tier,
|
||||
category,
|
||||
explanation,
|
||||
authority_source,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Apply pagination
|
||||
let total_matching = all_items.len();
|
||||
let items: Vec<CorpusItemDto> =
|
||||
all_items.into_iter().skip(params.offset).take(params.limit).collect();
|
||||
|
||||
let sources_included: Vec<String> = sources_included.into_iter().collect();
|
||||
|
||||
tracing::info!(
|
||||
total_matching,
|
||||
returned = items.len(),
|
||||
sources = sources_included.len(),
|
||||
"Corpus query complete"
|
||||
);
|
||||
|
||||
Ok(Json(GetCorpusResponse { items, total_matching, sources_included }))
|
||||
}
|
||||
@ -5,9 +5,11 @@
|
||||
//! - `policy` - Trust pack import/export and blessing handlers
|
||||
//! - `scan` - Project scanning handlers
|
||||
//! - `report` - Observation reporting and pattern query handlers
|
||||
//! - `corpus` - Authoritative corpus query handlers
|
||||
|
||||
// Make submodules crate-visible so utoipa path structs can be accessed
|
||||
pub(crate) mod claims;
|
||||
pub(crate) mod corpus;
|
||||
pub(crate) mod policy;
|
||||
pub(crate) mod report;
|
||||
pub(crate) mod scan;
|
||||
@ -17,6 +19,7 @@ pub use claims::{
|
||||
acknowledge_violation, coverage, create_claim, deprecate_claim, list_claims, update_claim,
|
||||
verify_claims_handler,
|
||||
};
|
||||
pub use corpus::get_corpus;
|
||||
pub use policy::{bless, export_policy, import_policy};
|
||||
pub use report::{get_patterns, push_community_observations, push_observations};
|
||||
pub use scan::{list_scans, scan};
|
||||
|
||||
@ -78,6 +78,6 @@ pub use metrics::metrics_handler;
|
||||
#[cfg(feature = "aphoria")]
|
||||
pub use aphoria::{
|
||||
acknowledge_violation, bless, coverage, create_claim, deprecate_claim, export_policy,
|
||||
get_patterns, import_policy, list_claims, list_scans, push_community_observations,
|
||||
get_corpus, get_patterns, import_policy, list_claims, list_scans, push_community_observations,
|
||||
push_observations, scan, update_claim, verify_claims_handler,
|
||||
};
|
||||
|
||||
@ -204,7 +204,7 @@ mod tests {
|
||||
let store =
|
||||
std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store);
|
||||
let state = AppState::new(write_journal, read_journal, store, None);
|
||||
|
||||
let app = axum::Router::new()
|
||||
.route("/v1/source", axum::routing::post(store_source))
|
||||
|
||||
@ -41,7 +41,7 @@ async fn test_app() -> TestContext {
|
||||
let read_journal = Journal::open(&wal_path).expect("failed to open read journal");
|
||||
let store = std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store);
|
||||
let state = AppState::new(write_journal, read_journal, store, None);
|
||||
|
||||
let app = Router::new()
|
||||
.route("/v1/sources", post(register_source))
|
||||
|
||||
@ -23,7 +23,7 @@
|
||||
//! ```ignore
|
||||
//! use stemedb_api::{create_router, AppState};
|
||||
//!
|
||||
//! let state = AppState::new(write_journal, read_journal, store);
|
||||
//! let state = AppState::new(write_journal, read_journal, store, None);
|
||||
//! let app = create_router(state);
|
||||
//!
|
||||
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?;
|
||||
@ -32,6 +32,7 @@
|
||||
pub mod bootstrap;
|
||||
pub mod dto;
|
||||
pub mod error;
|
||||
pub mod extractors;
|
||||
pub mod handlers;
|
||||
pub mod hex;
|
||||
pub mod middleware;
|
||||
@ -312,6 +313,7 @@ mod aphoria_openapi {
|
||||
use super::*;
|
||||
|
||||
// Re-export the path items for OpenAPI from the submodules
|
||||
use handlers::aphoria::corpus::__path_get_corpus;
|
||||
use handlers::aphoria::policy::{__path_bless, __path_export_policy, __path_import_policy};
|
||||
use handlers::aphoria::report::__path_push_observations;
|
||||
use handlers::aphoria::scan::__path_scan;
|
||||
@ -324,6 +326,7 @@ mod aphoria_openapi {
|
||||
import_policy,
|
||||
scan,
|
||||
push_observations,
|
||||
get_corpus,
|
||||
),
|
||||
components(
|
||||
schemas(
|
||||
@ -346,6 +349,9 @@ mod aphoria_openapi {
|
||||
dto::aphoria::ObservationDto,
|
||||
dto::aphoria::ObservationValueDto,
|
||||
dto::aphoria::ObservationSignatureDto,
|
||||
dto::aphoria::GetCorpusRequest,
|
||||
dto::aphoria::GetCorpusResponse,
|
||||
dto::aphoria::CorpusItemDto,
|
||||
)
|
||||
),
|
||||
tags(
|
||||
|
||||
@ -15,6 +15,7 @@
|
||||
//! | `STEMEDB_DB_DIR` | `data/db` | Directory for KV store |
|
||||
//! | `STEMEDB_BIND_ADDR` | `127.0.0.1:18180` | HTTP server bind address |
|
||||
//! | `STEMEDB_METER_ENABLED` | `true` | Enable economic throttling |
|
||||
//! | `STEMEDB_CORPUS_DB_DIR` | (none) | Optional: Directory for Aphoria corpus DB |
|
||||
|
||||
use std::path::PathBuf;
|
||||
use std::sync::Arc;
|
||||
@ -42,6 +43,9 @@ struct Config {
|
||||
|
||||
/// Enable economic throttling (The Meter)
|
||||
meter_enabled: bool,
|
||||
|
||||
/// Optional corpus database directory (for Aphoria corpus)
|
||||
corpus_db_dir: Option<PathBuf>,
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
@ -51,6 +55,7 @@ impl Default for Config {
|
||||
db_dir: PathBuf::from("data/db"),
|
||||
bind_addr: "127.0.0.1:18180".to_string(),
|
||||
meter_enabled: true,
|
||||
corpus_db_dir: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -76,6 +81,10 @@ impl Config {
|
||||
config.meter_enabled = meter_enabled.to_lowercase() != "false" && meter_enabled != "0";
|
||||
}
|
||||
|
||||
if let Ok(corpus_db_dir) = std::env::var("STEMEDB_CORPUS_DB_DIR") {
|
||||
config.corpus_db_dir = Some(PathBuf::from(corpus_db_dir));
|
||||
}
|
||||
|
||||
config
|
||||
}
|
||||
}
|
||||
@ -117,8 +126,19 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Opening HybridStore at {:?}", config.db_dir);
|
||||
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
||||
|
||||
// Open optional corpus store (for Aphoria corpus)
|
||||
let corpus_store = if let Some(ref corpus_dir) = config.corpus_db_dir {
|
||||
// Ensure corpus directory exists
|
||||
std::fs::create_dir_all(corpus_dir)?;
|
||||
info!("Opening corpus HybridStore at {:?}", corpus_dir);
|
||||
Some(Arc::new(HybridStore::open(corpus_dir)?))
|
||||
} else {
|
||||
info!("No separate corpus DB configured, using main store for corpus queries");
|
||||
None
|
||||
};
|
||||
|
||||
// Create application state (initializes GroupCommitBuffer)
|
||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store));
|
||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store), corpus_store);
|
||||
|
||||
// Spawn IngestWorker background task (uses read journal)
|
||||
info!("Spawning IngestWorker background task");
|
||||
|
||||
@ -387,6 +387,7 @@ fn build_api_routes() -> Router<AppState> {
|
||||
post(handlers::push_community_observations),
|
||||
)
|
||||
.route("/v1/aphoria/patterns", get(handlers::get_patterns))
|
||||
.route("/v1/aphoria/corpus", get(handlers::get_corpus))
|
||||
// Claims management endpoints
|
||||
.route("/v1/aphoria/claims/list", post(handlers::list_claims))
|
||||
.route("/v1/aphoria/claims/create", post(handlers::create_claim))
|
||||
|
||||
@ -53,6 +53,10 @@ pub struct AppState {
|
||||
/// Key-value store for reading assertions
|
||||
pub store: Arc<HybridStore>,
|
||||
|
||||
/// Corpus store for Aphoria authoritative sources (RFC, OWASP, Community).
|
||||
/// Falls back to main store if not configured separately.
|
||||
pub corpus_store: Arc<HybridStore>,
|
||||
|
||||
/// Quota store for economic throttling (The Meter)
|
||||
pub quota_store: Arc<QuotaStoreImpl>,
|
||||
|
||||
@ -97,7 +101,14 @@ impl AppState {
|
||||
///
|
||||
/// Creates a shared notification channel that GroupCommitBuffer uses
|
||||
/// to signal IngestWorker when new data is flushed.
|
||||
pub fn new(write_journal: Journal, read_journal: Journal, store: Arc<HybridStore>) -> Self {
|
||||
///
|
||||
/// If `corpus_store` is None, the main `store` will be used for corpus queries.
|
||||
pub fn new(
|
||||
write_journal: Journal,
|
||||
read_journal: Journal,
|
||||
store: Arc<HybridStore>,
|
||||
corpus_store: Option<Arc<HybridStore>>,
|
||||
) -> Self {
|
||||
// Create shared notification channel for WAL flush -> IngestWorker signaling
|
||||
let flush_notify = Arc::new(Notify::new());
|
||||
|
||||
@ -108,6 +119,9 @@ impl AppState {
|
||||
|
||||
let journal = Arc::new(Mutex::new(read_journal));
|
||||
|
||||
// Use provided corpus_store or fall back to main store
|
||||
let corpus_store = corpus_store.unwrap_or_else(|| Arc::clone(&store));
|
||||
|
||||
// Create quota store backed by the same KV store
|
||||
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
|
||||
|
||||
@ -139,6 +153,7 @@ impl AppState {
|
||||
commit_buffer,
|
||||
journal,
|
||||
store,
|
||||
corpus_store,
|
||||
quota_store,
|
||||
escalation_store,
|
||||
alias_store,
|
||||
|
||||
@ -39,7 +39,7 @@ pub async fn create_test_env() -> TestEnvironment {
|
||||
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store);
|
||||
let state = AppState::new(write_journal, read_journal, store, None);
|
||||
|
||||
TestEnvironment { _temp_dir: temp_dir, state }
|
||||
}
|
||||
@ -70,7 +70,7 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
||||
// Create AppState with write and read journals
|
||||
let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
|
||||
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||
let state = AppState::new(write_journal, read_journal, store);
|
||||
let state = AppState::new(write_journal, read_journal, store, None);
|
||||
|
||||
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
|
||||
}
|
||||
|
||||
@ -65,7 +65,7 @@ async fn create_test_environment() -> TestEnvironment {
|
||||
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
||||
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc), None);
|
||||
|
||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||
}
|
||||
|
||||
@ -53,7 +53,7 @@ async fn create_test_environment() -> TestEnvironment {
|
||||
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc), None);
|
||||
|
||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||
}
|
||||
|
||||
@ -202,7 +202,7 @@ async fn test_quota_consumption_with_meter() {
|
||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||
let quota_store = state.quota_store.clone();
|
||||
|
||||
let app = create_router_with_meter(state);
|
||||
@ -258,7 +258,7 @@ async fn test_quota_exceeded_response() {
|
||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||
let quota_store = state.quota_store.clone();
|
||||
|
||||
let app = create_router_with_meter(state);
|
||||
@ -304,7 +304,7 @@ async fn test_quota_headers_format() {
|
||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||
|
||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
||||
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||
let quota_store = state.quota_store.clone();
|
||||
|
||||
let app = create_router_with_meter(state);
|
||||
|
||||
Loading…
Reference in New Issue
Block a user