fix(api): enable querying of CLI-created community corpus items
## Problem CLI-created community corpus items (tier 3) were stored correctly but invisible via API queries. Two issues blocked discoverability: 1. **Prefix mismatch**: API hardcoded 'community://pattern/' for aggregated patterns, but CLI creates 'community://rust/http/...' URIs 2. **Query parameter parsing**: Axum's default parser doesn't support bracket notation (?sources[]=value) used by the dashboard Result: 0/22 CLI-created items were queryable. ## Solution ### Fix 1: Broaden Community Prefix - Changed: 'community://pattern/' → 'community://' in corpus handler - Impact: Now matches both aggregated patterns AND CLI-created items - Backward compatible: Broader prefix includes narrower results ### Fix 2: Add QsQuery Extractor - Added: serde_qs dependency + custom QsQuery extractor - Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b) - Compatible: Works with JavaScript URLSearchParams standard - Tested: 3 new unit tests for extractor behavior ## Verification - ✅ All 22 CLI-created community items now queryable (was 0) - ✅ Source filtering works: community (22), RFC (2), vendor (5) - ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24 - ✅ All 89 API tests pass + 3 new extractor tests - ✅ Clippy clean (0 warnings) - ✅ No regressions in existing functionality ## Files Changed - crates/stemedb-api/Cargo.toml: Add serde_qs dependency - crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines) - crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix - crates/stemedb-api/src/lib.rs: Export extractors module Also includes: Scale-adaptive thresholds, wiki corpus extraction, documentation updates, and dashboard UI improvements from prior work. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
65065f3d8f
commit
bb0c33f8d3
@ -62,6 +62,23 @@ stemedb/
|
|||||||
guides/ # You are here
|
guides/ # You are here
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Git Hooks
|
||||||
|
|
||||||
|
The repository includes automatic git hooks to rebuild binaries when source code changes:
|
||||||
|
|
||||||
|
- **post-merge**: Runs after `git pull` or `git merge`
|
||||||
|
- **post-checkout**: Runs after `git checkout` (branch switches only)
|
||||||
|
|
||||||
|
These hooks detect changes to:
|
||||||
|
- Aphoria CLI and core logic
|
||||||
|
- StemeDB API server
|
||||||
|
- StemeDB simulator
|
||||||
|
- Core libraries (affects all binaries)
|
||||||
|
|
||||||
|
When changes are detected, the hooks automatically run `cargo build --release --workspace` to rebuild all binaries. This prevents "command not found" errors from stale binaries.
|
||||||
|
|
||||||
|
The hooks are installed in `.git/hooks/` and are already executable. If you need to disable them temporarily, you can use `--no-verify` with git commands or rename the hook files.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Build fails with missing dependencies
|
### Build fails with missing dependencies
|
||||||
@ -79,6 +96,15 @@ Run with `--fix` for auto-corrections:
|
|||||||
cargo clippy --workspace --fix --allow-dirty
|
cargo clippy --workspace --fix --allow-dirty
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### "Command not found" after git pull
|
||||||
|
|
||||||
|
If you see this error despite the git hooks, manually rebuild:
|
||||||
|
```bash
|
||||||
|
cargo build --release --workspace
|
||||||
|
```
|
||||||
|
|
||||||
|
The binaries are in `target/release/` and should be in your PATH or called via `cargo run --release -p <package>`.
|
||||||
|
|
||||||
## Related
|
## Related
|
||||||
|
|
||||||
- [Testing Guide](./testing.md)
|
- [Testing Guide](./testing.md)
|
||||||
|
|||||||
602
.claude/skills/extract-wiki-corpus/SKILL.md
Normal file
602
.claude/skills/extract-wiki-corpus/SKILL.md
Normal file
@ -0,0 +1,602 @@
|
|||||||
|
---
|
||||||
|
name: extract-wiki-corpus
|
||||||
|
description: Extract structured claims from wiki documentation using LLM reasoning. Use when importing technical wikis, research docs, or compatibility guides into Aphoria corpus.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Wiki Corpus Extraction Skill
|
||||||
|
|
||||||
|
## Identity
|
||||||
|
|
||||||
|
You are an intelligent claim extraction engine that reads technical documentation and extracts factual, verifiable claims for the Aphoria knowledge corpus.
|
||||||
|
|
||||||
|
Your job is to:
|
||||||
|
1. Read wiki markdown files
|
||||||
|
2. Extract factual claims using LLM reasoning
|
||||||
|
3. Generate CLI commands to persist claims in the corpus database
|
||||||
|
4. Report comprehensive results with success/failure breakdown
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
1. **Factual over Normative**: Extract what IS (not what SHOULD BE)
|
||||||
|
2. **Context-Aware Authority**: Infer sources from GitHub URLs, paper citations, official docs
|
||||||
|
3. **Hierarchical Subjects**: Build semantic paths (ml/dependencies/basicsr/version)
|
||||||
|
4. **Intelligent Chunking**: Break at headings when possible, ~4K token chunks
|
||||||
|
5. **Batch Processing**: Extract all claims, then execute CLI commands
|
||||||
|
6. **Bundle Errors**: Collect all errors and report them together
|
||||||
|
|
||||||
|
## Workflow Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 1: Discover & Read
|
||||||
|
↓
|
||||||
|
Phase 1.2: Verify Commands
|
||||||
|
↓
|
||||||
|
Phase 2: Intelligent Chunking
|
||||||
|
↓
|
||||||
|
Phase 3: Claim Extraction (Per Chunk)
|
||||||
|
↓
|
||||||
|
Phase 4: Validation
|
||||||
|
↓
|
||||||
|
Phase 5: CLI Execution
|
||||||
|
↓
|
||||||
|
Phase 6: Summary Report
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Discover & Read
|
||||||
|
|
||||||
|
### Step 1.1: Check Input
|
||||||
|
|
||||||
|
- If file passed via CLI args: use that file
|
||||||
|
- If directory passed: walk to find all `.md` files
|
||||||
|
- Use Read tool to get full content of each file
|
||||||
|
|
||||||
|
### Step 1.2: Verify Aphoria Binary and Commands
|
||||||
|
|
||||||
|
Before proceeding, verify that the required commands exist:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Aphoria version
|
||||||
|
aphoria --version
|
||||||
|
|
||||||
|
# Verify corpus create command exists
|
||||||
|
if ! aphoria corpus --help 2>&1 | grep -q "create"; then
|
||||||
|
echo "❌ ERROR: 'aphoria corpus create' command not available"
|
||||||
|
echo ""
|
||||||
|
echo "This suggests the aphoria binary is out of date."
|
||||||
|
echo ""
|
||||||
|
echo "Fix options:"
|
||||||
|
echo " 1. Rebuild: cargo build --release -p aphoria"
|
||||||
|
echo " 2. Check git status: git status"
|
||||||
|
echo " 3. Pull latest: git pull && cargo build --release -p aphoria"
|
||||||
|
echo ""
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "✅ Aphoria binary up to date (corpus create available)"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Decision Gate:** Command exists? → Proceed to token estimation
|
||||||
|
|
||||||
|
### Step 1.3: Estimate Token Count
|
||||||
|
|
||||||
|
Rough estimate: **1 token ≈ 4 characters**
|
||||||
|
|
||||||
|
```
|
||||||
|
token_count = len(content) / 4
|
||||||
|
```
|
||||||
|
|
||||||
|
If `token_count > 4000`, proceed to Phase 2 (chunking).
|
||||||
|
If `token_count <= 4000`, treat as single chunk.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Intelligent Chunking
|
||||||
|
|
||||||
|
### Goal
|
||||||
|
Split content into ~4K token chunks, preferring heading boundaries.
|
||||||
|
|
||||||
|
### Algorithm
|
||||||
|
|
||||||
|
1. **Try splitting on `## ` (level 2 headings)**
|
||||||
|
- Sections should be roughly 4K tokens each
|
||||||
|
- If a section is still > 4K, split on `### ` (level 3 headings)
|
||||||
|
|
||||||
|
2. **Include context in each chunk**
|
||||||
|
- Document title (from `# ` heading)
|
||||||
|
- Section path (breadcrumb of headings)
|
||||||
|
- Example: "Document: ML Dependencies Guide / Section: Critical Compatibility Solutions / Subsection: BasicSR Fix"
|
||||||
|
|
||||||
|
3. **Maintain overlap**
|
||||||
|
- Include previous heading for context
|
||||||
|
- This helps LLM understand relationships
|
||||||
|
|
||||||
|
### Chunk Metadata Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"chunk_id": 1,
|
||||||
|
"total_chunks": 3,
|
||||||
|
"document_title": "ML Dependencies Guide",
|
||||||
|
"section_path": "Critical Compatibility Solutions / BasicSR Fix",
|
||||||
|
"content": "..."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Claim Extraction (Per Chunk)
|
||||||
|
|
||||||
|
### Prompt the LLM
|
||||||
|
|
||||||
|
For each chunk, use a structured extraction prompt:
|
||||||
|
|
||||||
|
````
|
||||||
|
You are extracting factual claims from technical documentation for a knowledge corpus.
|
||||||
|
|
||||||
|
**Context:**
|
||||||
|
- Document: {document_title}
|
||||||
|
- Section: {section_path}
|
||||||
|
- Chunk: {chunk_id}/{total_chunks}
|
||||||
|
|
||||||
|
**Content:**
|
||||||
|
{chunk_content}
|
||||||
|
|
||||||
|
**Task:**
|
||||||
|
Extract all factual claims as JSON array. Each claim must be:
|
||||||
|
1. Factual (not opinion or speculation)
|
||||||
|
2. Verifiable from the text
|
||||||
|
3. Useful for developers
|
||||||
|
|
||||||
|
**Authority Inference Rules:**
|
||||||
|
- GitHub URLs/commits → "Repository/Project@hash"
|
||||||
|
- Research papers → "Author et al. (Year)"
|
||||||
|
- Official documentation → "Project Documentation"
|
||||||
|
- Empirical observation → "Community consensus"
|
||||||
|
|
||||||
|
**Tier Assignment:**
|
||||||
|
- 0: RFC, W3C spec, ISO standard (regulatory)
|
||||||
|
- 1: OWASP, CWE, security advisory (clinical)
|
||||||
|
- 2: Project docs, compatibility notes (observational)
|
||||||
|
- 3: Blog posts, forum consensus (community)
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"subject": "hierarchical/path/to/concept",
|
||||||
|
"predicate": "relationship_type",
|
||||||
|
"value": "constraint_or_value",
|
||||||
|
"explanation": "full sentence with context",
|
||||||
|
"authority": "inferred_source",
|
||||||
|
"category": "compatibility|performance|security|architecture|quality",
|
||||||
|
"confidence": 0.95,
|
||||||
|
"tier": 2
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Return ONLY the JSON array, no additional text.
|
||||||
|
````
|
||||||
|
|
||||||
|
### Expected Output Structure
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"subject": "ml/dependencies/basicsr/torchvision",
|
||||||
|
"predicate": "incompatible_with",
|
||||||
|
"value": ">=0.15",
|
||||||
|
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||||
|
"authority": "XPixelGroup/BasicSR GitHub",
|
||||||
|
"category": "compatibility",
|
||||||
|
"confidence": 0.95,
|
||||||
|
"tier": 2
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Validation
|
||||||
|
|
||||||
|
### Step 4.1: Filter by Confidence
|
||||||
|
|
||||||
|
Only keep claims where `confidence >= 0.7`
|
||||||
|
|
||||||
|
### Step 4.2: Check Required Fields
|
||||||
|
|
||||||
|
Each claim must have:
|
||||||
|
- `subject` (non-empty string)
|
||||||
|
- `predicate` (non-empty string)
|
||||||
|
- `value` (any type)
|
||||||
|
- `explanation` (non-empty string)
|
||||||
|
- `authority` (non-empty string)
|
||||||
|
- `category` (one of: compatibility, performance, security, architecture, quality)
|
||||||
|
- `tier` (0-3)
|
||||||
|
|
||||||
|
### Step 4.3: Validate Tier
|
||||||
|
|
||||||
|
Tier must be 0, 1, 2, or 3. If invalid, record error and skip claim.
|
||||||
|
|
||||||
|
### Step 4.4: Check for Duplicates
|
||||||
|
|
||||||
|
**Important**: The corpus database is **append-only**. Multiple sources can create the same `subject+predicate` pair. This is **allowed and expected**. Do NOT filter duplicates — just warn about them in the report.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: CLI Execution
|
||||||
|
|
||||||
|
### Step 5.1: Construct CLI Commands
|
||||||
|
|
||||||
|
For each validated claim, construct:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "{subject}" \
|
||||||
|
--predicate "{predicate}" \
|
||||||
|
--value "{value}" \
|
||||||
|
--explanation "{explanation}" \
|
||||||
|
--authority "{authority}" \
|
||||||
|
--category "{category}" \
|
||||||
|
--tier {tier}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**: Use proper shell escaping for strings with quotes or special characters.
|
||||||
|
|
||||||
|
### Step 5.2: Execute Commands
|
||||||
|
|
||||||
|
Use the Bash tool to execute each command.
|
||||||
|
|
||||||
|
### Step 5.3: Collect Results
|
||||||
|
|
||||||
|
For each execution:
|
||||||
|
- **Success**: Record the corpus ID (e.g., "corpus://ml/foo/bar/predicate")
|
||||||
|
- **Failure**: Record the full error message
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: Summary Report
|
||||||
|
|
||||||
|
### Report Structure
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Wiki Corpus Extraction Report
|
||||||
|
|
||||||
|
**File:** /path/to/wiki/article.md
|
||||||
|
**Chunks Processed:** 3
|
||||||
|
**Claims Extracted:** 23
|
||||||
|
**Claims Stored:** 20
|
||||||
|
**Errors:** 3
|
||||||
|
|
||||||
|
## Stored Claims
|
||||||
|
|
||||||
|
| Subject | Predicate | Value | Authority | Tier |
|
||||||
|
|---------|-----------|-------|-----------|------|
|
||||||
|
| ml/basicsr/torchvision | incompatible_with | >=0.15 | XPixelGroup/BasicSR | 2 |
|
||||||
|
| ... | ... | ... | ... | ... |
|
||||||
|
|
||||||
|
## Errors
|
||||||
|
|
||||||
|
### Validation Errors (2)
|
||||||
|
|
||||||
|
1. **ml/foo/bar** - Invalid tier '5' (must be 0-3)
|
||||||
|
2. **api/rest/foo** - Missing explanation field
|
||||||
|
|
||||||
|
### Storage Errors (1)
|
||||||
|
|
||||||
|
1. **net/http/timeout** - Database write failed: connection refused
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
View corpus items: http://localhost:3000/corpus
|
||||||
|
Query API: curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=community&limit=100'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Predicate Naming Conventions
|
||||||
|
|
||||||
|
Use consistent predicate names to enable effective querying:
|
||||||
|
|
||||||
|
| Relationship | Predicate |
|
||||||
|
|--------------|-----------|
|
||||||
|
| Version constraint | `requires`, `incompatible_with`, `compatible_with` |
|
||||||
|
| Recommendation | `recommends`, `discourages` |
|
||||||
|
| Performance | `faster_than`, `slower_than`, `optimal_for` |
|
||||||
|
| Security | `vulnerable_to`, `mitigates`, `exposes` |
|
||||||
|
| Configuration | `default_value`, `max_value`, `required_for` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Subject Path Guidelines
|
||||||
|
|
||||||
|
Build hierarchical paths that reflect the domain structure:
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
- `ml/dependencies/{package}/{aspect}`
|
||||||
|
- Example: `ml/dependencies/basicsr/torchvision`
|
||||||
|
- `api/{protocol}/{feature}`
|
||||||
|
- Example: `api/rest/authentication`
|
||||||
|
- `security/{category}/{vuln_type}`
|
||||||
|
- Example: `security/input-validation/xss`
|
||||||
|
- `performance/{component}/{metric}`
|
||||||
|
- Example: `performance/database/connection-pool`
|
||||||
|
|
||||||
|
### Principles
|
||||||
|
|
||||||
|
- Start general, get specific
|
||||||
|
- Use lowercase with forward slashes
|
||||||
|
- Use hyphens for multi-word segments
|
||||||
|
- Keep paths under 6-7 levels
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Category Guidelines
|
||||||
|
|
||||||
|
Choose the most appropriate category:
|
||||||
|
|
||||||
|
| Category | Use When |
|
||||||
|
|----------|----------|
|
||||||
|
| `compatibility` | Version constraints, breaking changes, API compatibility |
|
||||||
|
| `performance` | Optimization, resource usage, latency, throughput |
|
||||||
|
| `security` | Vulnerabilities, mitigations, attack vectors |
|
||||||
|
| `architecture` | Design patterns, module structure, dependencies |
|
||||||
|
| `quality` | Code quality, maintainability, best practices |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Authority Tier Guidelines
|
||||||
|
|
||||||
|
| Tier | Name | Examples | When to Use |
|
||||||
|
|------|------|----------|-------------|
|
||||||
|
| 0 | Regulatory | RFC 7231, W3C spec, ISO 27001 | Official standards bodies |
|
||||||
|
| 1 | Clinical | OWASP Top 10, CWE-79, NVD | Security advisories, vulnerability databases |
|
||||||
|
| 2 | Observational | PyTorch docs, GitHub project READMEs | Official project documentation |
|
||||||
|
| 3 | Community | Blog posts, Stack Overflow, forum threads | Community wisdom, empirical observations |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### Validation Errors
|
||||||
|
|
||||||
|
Collect all validation errors and report them together. Do NOT stop on the first error.
|
||||||
|
|
||||||
|
Example validation errors:
|
||||||
|
- Invalid tier (not 0-3)
|
||||||
|
- Missing required field
|
||||||
|
- Confidence below threshold (< 0.7)
|
||||||
|
|
||||||
|
### Storage Errors
|
||||||
|
|
||||||
|
If a CLI command fails:
|
||||||
|
- Capture the full error message
|
||||||
|
- Continue with remaining commands
|
||||||
|
- Report all failures at the end
|
||||||
|
|
||||||
|
### LLM Extraction Errors
|
||||||
|
|
||||||
|
If the LLM returns invalid JSON:
|
||||||
|
- Log the chunk that failed
|
||||||
|
- Continue with remaining chunks
|
||||||
|
- Report the parsing error in summary
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Do's and Don'ts
|
||||||
|
|
||||||
|
### Do
|
||||||
|
|
||||||
|
- ✅ Extract factual claims (not opinions)
|
||||||
|
- ✅ Verify command availability before execution
|
||||||
|
- ✅ Infer authority from context
|
||||||
|
- ✅ Generate semantic subject paths
|
||||||
|
- ✅ Include full explanation context
|
||||||
|
- ✅ Bundle errors for batch reporting
|
||||||
|
- ✅ Use Read tool to get file content
|
||||||
|
- ✅ Use Bash tool to execute CLI commands
|
||||||
|
- ✅ Filter by confidence >= 0.7
|
||||||
|
- ✅ Allow duplicate subject+predicate (append-only DB)
|
||||||
|
|
||||||
|
### Do Not
|
||||||
|
|
||||||
|
- ❌ Extract opinions or speculative claims
|
||||||
|
- ❌ Assume binary is up to date
|
||||||
|
- ❌ Lose source attribution
|
||||||
|
- ❌ Hardcode authority (infer from content)
|
||||||
|
- ❌ Stop on first error (collect all errors)
|
||||||
|
- ❌ Modify files (read-only skill)
|
||||||
|
- ❌ Use placeholder values
|
||||||
|
- ❌ Skip validation
|
||||||
|
- ❌ Filter duplicates (append-only allows them)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example Extraction
|
||||||
|
|
||||||
|
### Input Text
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## BasicSR and Torchvision Compatibility
|
||||||
|
|
||||||
|
The BasicSR library (v1.4.2) has a critical compatibility issue with torchvision >= 0.15.
|
||||||
|
The library imports from `torchvision.transforms.functional_tensor`, which was removed
|
||||||
|
in torchvision 0.15+.
|
||||||
|
|
||||||
|
Source: https://github.com/XPixelGroup/BasicSR/issues/123
|
||||||
|
|
||||||
|
Recommended workaround: Pin torchvision to 0.14.1 or earlier.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extracted Claims
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"subject": "ml/dependencies/basicsr/torchvision",
|
||||||
|
"predicate": "incompatible_with",
|
||||||
|
"value": ">=0.15",
|
||||||
|
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||||
|
"authority": "XPixelGroup/BasicSR#123",
|
||||||
|
"category": "compatibility",
|
||||||
|
"confidence": 0.95,
|
||||||
|
"tier": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"subject": "ml/dependencies/basicsr/torchvision",
|
||||||
|
"predicate": "recommends",
|
||||||
|
"value": "<=0.14.1",
|
||||||
|
"explanation": "Workaround for basicsr compatibility issue: pin torchvision to 0.14.1 or earlier",
|
||||||
|
"authority": "XPixelGroup/BasicSR#123",
|
||||||
|
"category": "compatibility",
|
||||||
|
"confidence": 0.9,
|
||||||
|
"tier": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
### CLI Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "ml/dependencies/basicsr/torchvision" \
|
||||||
|
--predicate "incompatible_with" \
|
||||||
|
--value ">=0.15" \
|
||||||
|
--explanation "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+" \
|
||||||
|
--authority "XPixelGroup/BasicSR#123" \
|
||||||
|
--category "compatibility" \
|
||||||
|
--tier 2
|
||||||
|
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "ml/dependencies/basicsr/torchvision" \
|
||||||
|
--predicate "recommends" \
|
||||||
|
--value "<=0.14.1" \
|
||||||
|
--explanation "Workaround for basicsr compatibility issue: pin torchvision to 0.14.1 or earlier" \
|
||||||
|
--authority "XPixelGroup/BasicSR#123" \
|
||||||
|
--category "compatibility" \
|
||||||
|
--tier 3
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Skills
|
||||||
|
|
||||||
|
- **extract-claims**: Entity-level extraction from prose (for StemeDB ingestion)
|
||||||
|
- **aphoria-suggest**: Suggest claims from existing patterns
|
||||||
|
- **aphoria-claims**: Author claims from diffs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
### Token Counting
|
||||||
|
|
||||||
|
Use rough heuristic: `token_count = len(content) / 4`
|
||||||
|
|
||||||
|
This is approximate but good enough for chunking decisions.
|
||||||
|
|
||||||
|
### Shell Escaping
|
||||||
|
|
||||||
|
When constructing CLI commands, properly escape strings:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import shlex
|
||||||
|
|
||||||
|
escaped_explanation = shlex.quote(explanation)
|
||||||
|
```
|
||||||
|
|
||||||
|
Or in bash:
|
||||||
|
```bash
|
||||||
|
explanation="${explanation//\"/\\\"}" # Escape quotes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Confidence Threshold
|
||||||
|
|
||||||
|
Only extract claims with `confidence >= 0.7`. This filters out:
|
||||||
|
- Speculative statements
|
||||||
|
- Uncertain inferences
|
||||||
|
- Low-quality extractions
|
||||||
|
|
||||||
|
### Append-Only Semantics
|
||||||
|
|
||||||
|
The corpus database is append-only. Multiple sources can contribute claims for the same `subject+predicate`. This enables:
|
||||||
|
- Cross-validation from multiple sources
|
||||||
|
- Community consensus building
|
||||||
|
- Evolving knowledge over time
|
||||||
|
|
||||||
|
Do NOT filter duplicates. Just warn about them in the report.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
A successful extraction should:
|
||||||
|
|
||||||
|
1. ✅ Read all markdown files in the input directory
|
||||||
|
2. ✅ Extract factual claims with proper structure
|
||||||
|
3. ✅ Infer authority from context (GitHub URLs, docs, etc.)
|
||||||
|
4. ✅ Assign appropriate tiers (0-3)
|
||||||
|
5. ✅ Execute CLI commands successfully
|
||||||
|
6. ✅ Report comprehensive summary with errors bundled
|
||||||
|
7. ✅ Handle validation errors gracefully
|
||||||
|
8. ✅ Handle storage errors gracefully
|
||||||
|
9. ✅ Generate semantic subject paths
|
||||||
|
10. ✅ Use consistent predicate naming
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Command not found" or "unrecognized subcommand 'create'" Errors
|
||||||
|
|
||||||
|
If you see `error: unrecognized subcommand 'create'` or similar errors:
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
1. **Check binary date**: `ls -lh target/release/aphoria`
|
||||||
|
2. **Check CLI code date**: `ls -lh applications/aphoria/src/cli/mod.rs`
|
||||||
|
3. **If CLI is newer**: The binary is out of date
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
```bash
|
||||||
|
# Option 1: Rebuild the binary
|
||||||
|
cargo build --release -p aphoria
|
||||||
|
|
||||||
|
# Option 2: Pull latest changes and rebuild
|
||||||
|
git pull && cargo build --release -p aphoria
|
||||||
|
|
||||||
|
# Option 3: Check if there are uncommitted changes
|
||||||
|
git status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prevention:**
|
||||||
|
See Fix #1 for setting up git hooks that automatically rebuild binaries on pull.
|
||||||
|
|
||||||
|
### "Database already open" error
|
||||||
|
|
||||||
|
The corpus database at `~/.aphoria/corpus-db` is locked by another process (probably the API server).
|
||||||
|
|
||||||
|
**Solution**: Stop the API server temporarily:
|
||||||
|
```bash
|
||||||
|
pkill -f stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Invalid tier" error
|
||||||
|
|
||||||
|
Tier must be 0, 1, 2, or 3.
|
||||||
|
|
||||||
|
**Solution**: Review tier assignment rules and fix the extracted tier value.
|
||||||
|
|
||||||
|
### "Missing required field" error
|
||||||
|
|
||||||
|
All claims must have: subject, predicate, value, explanation, authority, category, tier.
|
||||||
|
|
||||||
|
**Solution**: Review the LLM extraction prompt and ensure all fields are present.
|
||||||
|
|
||||||
|
### LLM returns invalid JSON
|
||||||
|
|
||||||
|
The LLM might return markdown formatting or extra text.
|
||||||
|
|
||||||
|
**Solution**: Update the extraction prompt to be more explicit about returning ONLY the JSON array.
|
||||||
1573
.claude/skills/verify-wiki-corpus/SKILL.md
Normal file
1573
.claude/skills/verify-wiki-corpus/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
109
CORPUS-QUICK-START.md
Normal file
109
CORPUS-QUICK-START.md
Normal file
@ -0,0 +1,109 @@
|
|||||||
|
# Corpus Quick Start Guide
|
||||||
|
|
||||||
|
## TL;DR - API is Already Running!
|
||||||
|
|
||||||
|
The corpus API is currently serving data at:
|
||||||
|
- **URL:** `http://localhost:18180/v1/aphoria/corpus`
|
||||||
|
- **Database:** `~/.aphoria/corpus-db`
|
||||||
|
- **Data:** 2 RFC items (TLS cert verification, JWT audience validation)
|
||||||
|
|
||||||
|
## Test It Right Now
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get all RFC corpus items
|
||||||
|
curl -s 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc' | jq '.items[].subject'
|
||||||
|
|
||||||
|
# Expected output:
|
||||||
|
# "rfc://5246/tls/certificate_verification"
|
||||||
|
# "rfc://7519/audience_validation"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Import Production Wiki
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/Workspace/stemedb
|
||||||
|
target/release/aphoria corpus import wiki ~/Workspace/orchard9/wiki/content
|
||||||
|
```
|
||||||
|
|
||||||
|
## Start Dashboard
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd applications/aphoria-dashboard
|
||||||
|
npm run dev
|
||||||
|
# Open: http://localhost:3000/corpus
|
||||||
|
```
|
||||||
|
|
||||||
|
## Restart API Later (if needed)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/Workspace/stemedb
|
||||||
|
STEMEDB_DB_DIR=$HOME/.aphoria/corpus-db \
|
||||||
|
STEMEDB_WAL_DIR=$HOME/.aphoria/corpus-db/wal \
|
||||||
|
target/release/stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
## Query Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get all sources (RFC, OWASP, vendor, community)
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus'
|
||||||
|
|
||||||
|
# Filter by multiple sources
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=rfc&sources[]=owasp'
|
||||||
|
|
||||||
|
# Filter by category
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus?category=security'
|
||||||
|
|
||||||
|
# Pagination
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus?limit=10&offset=0'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Response Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"subject": "rfc://5246/tls/certificate_verification",
|
||||||
|
"predicate": "enabled",
|
||||||
|
"value": "true",
|
||||||
|
"source": "rfc://",
|
||||||
|
"tier": 0,
|
||||||
|
"category": "security",
|
||||||
|
"explanation": "TLS certificate verification MUST be enabled...",
|
||||||
|
"authority_source": "RFC 5246 Section 7.4.2"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"total_matching": 2,
|
||||||
|
"sources_included": ["rfc://"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Files to Know
|
||||||
|
|
||||||
|
- **Corpus DB:** `~/.aphoria/corpus-db/` (shared across projects)
|
||||||
|
- **Project DB:** `.aphoria/db/` (per-project)
|
||||||
|
- **Import CLI:** `aphoria corpus import wiki <path>`
|
||||||
|
- **API Config:** Set `STEMEDB_DB_DIR` to choose database
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Dashboard shows empty results?**
|
||||||
|
- Check API is running on port 18180
|
||||||
|
- Verify API is using corpus database: `ps aux | grep stemedb-api`
|
||||||
|
- Check API logs for database path
|
||||||
|
|
||||||
|
**API won't start?**
|
||||||
|
- Make sure corpus DB exists: `ls ~/.aphoria/corpus-db/`
|
||||||
|
- Check port not in use: `lsof -i :18180`
|
||||||
|
- View logs: `tail -f /tmp/api-corpus.log`
|
||||||
|
|
||||||
|
**Need to reimport wiki?**
|
||||||
|
```bash
|
||||||
|
rm -rf ~/.aphoria/corpus-db
|
||||||
|
target/release/aphoria corpus import wiki <path>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
✅ **Current Status:** API running, corpus database populated, ready for dashboard!
|
||||||
@ -1,7 +1,6 @@
|
|||||||
// Corpus page constants
|
// Corpus page constants
|
||||||
|
|
||||||
export const CORPUS_FETCH_LIMIT = 100;
|
export const CORPUS_FETCH_LIMIT = 100;
|
||||||
export const DEFAULT_MIN_PROJECTS = 1;
|
|
||||||
|
|
||||||
// Re-export shared formatters for convenience
|
// Re-export shared formatters for convenience
|
||||||
export { formatRelativeTime, formatUnixTimestamp } from "@/lib/format";
|
export { formatRelativeTime, formatUnixTimestamp } from "@/lib/format";
|
||||||
|
|||||||
@ -1,20 +1,15 @@
|
|||||||
"use client";
|
"use client";
|
||||||
|
|
||||||
import { Input } from "@/components/ui/input";
|
|
||||||
import { Button } from "@/components/ui/button";
|
import { Button } from "@/components/ui/button";
|
||||||
import { Checkbox } from "@/components/ui/checkbox";
|
import { Checkbox } from "@/components/ui/checkbox";
|
||||||
import { X, Search } from "lucide-react";
|
import { X, Search } from "lucide-react";
|
||||||
|
|
||||||
interface CorpusFiltersProps {
|
interface CorpusFiltersProps {
|
||||||
subjectPrefix: string;
|
sources: string[];
|
||||||
minProjects: number;
|
|
||||||
filterCategory: string;
|
filterCategory: string;
|
||||||
hideNoise: boolean;
|
|
||||||
availableCategories: string[];
|
availableCategories: string[];
|
||||||
onSubjectPrefixChange: (value: string) => void;
|
onSourcesChange: (value: string[]) => void;
|
||||||
onMinProjectsChange: (value: number) => void;
|
|
||||||
onFilterCategoryChange: (value: string) => void;
|
onFilterCategoryChange: (value: string) => void;
|
||||||
onHideNoiseChange: (value: boolean) => void;
|
|
||||||
onSubmit: () => void;
|
onSubmit: () => void;
|
||||||
onClear: () => void;
|
onClear: () => void;
|
||||||
totalCount: number;
|
totalCount: number;
|
||||||
@ -23,16 +18,19 @@ interface CorpusFiltersProps {
|
|||||||
hasActiveFilter: boolean;
|
hasActiveFilter: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const AVAILABLE_SOURCES = [
|
||||||
|
{ id: "rfc", label: "RFC" },
|
||||||
|
{ id: "owasp", label: "OWASP" },
|
||||||
|
{ id: "community", label: "Community" },
|
||||||
|
{ id: "vendor", label: "Vendor" },
|
||||||
|
];
|
||||||
|
|
||||||
export function CorpusFilters({
|
export function CorpusFilters({
|
||||||
subjectPrefix,
|
sources,
|
||||||
minProjects,
|
|
||||||
filterCategory,
|
filterCategory,
|
||||||
hideNoise,
|
|
||||||
availableCategories,
|
availableCategories,
|
||||||
onSubjectPrefixChange,
|
onSourcesChange,
|
||||||
onMinProjectsChange,
|
|
||||||
onFilterCategoryChange,
|
onFilterCategoryChange,
|
||||||
onHideNoiseChange,
|
|
||||||
onSubmit,
|
onSubmit,
|
||||||
onClear,
|
onClear,
|
||||||
totalCount,
|
totalCount,
|
||||||
@ -45,39 +43,38 @@ export function CorpusFilters({
|
|||||||
onSubmit();
|
onSubmit();
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const handleSourceToggle = (sourceId: string) => {
|
||||||
|
if (sources.includes(sourceId)) {
|
||||||
|
onSourcesChange(sources.filter((s) => s !== sourceId));
|
||||||
|
} else {
|
||||||
|
onSourcesChange([...sources, sourceId]);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<form onSubmit={handleSubmit}>
|
<form onSubmit={handleSubmit}>
|
||||||
<div className="flex flex-wrap items-end gap-4">
|
<div className="flex flex-wrap items-end gap-4">
|
||||||
{/* Subject Prefix Filter */}
|
{/* Sources Filter */}
|
||||||
<div className="flex-1 min-w-[200px]">
|
|
||||||
<label htmlFor="subject-prefix" className="text-sm font-medium mb-2 block">
|
|
||||||
Subject Prefix
|
|
||||||
</label>
|
|
||||||
<Input
|
|
||||||
id="subject-prefix"
|
|
||||||
placeholder="e.g., code://rust"
|
|
||||||
value={subjectPrefix}
|
|
||||||
onChange={(e) => onSubjectPrefixChange(e.target.value)}
|
|
||||||
className="max-w-md"
|
|
||||||
disabled={isLoading}
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{/* Min Projects Filter */}
|
|
||||||
<div className="flex flex-col gap-2">
|
<div className="flex flex-col gap-2">
|
||||||
<label htmlFor="min-projects" className="text-sm font-medium">
|
<label className="text-sm font-medium">Sources</label>
|
||||||
Min Projects
|
<div className="flex items-center gap-4">
|
||||||
</label>
|
{AVAILABLE_SOURCES.map((source) => (
|
||||||
<Input
|
<div key={source.id} className="flex items-center gap-2">
|
||||||
id="min-projects"
|
<Checkbox
|
||||||
type="number"
|
id={`source-${source.id}`}
|
||||||
min={1}
|
checked={sources.includes(source.id)}
|
||||||
max={100}
|
onCheckedChange={() => handleSourceToggle(source.id)}
|
||||||
value={minProjects}
|
|
||||||
onChange={(e) => onMinProjectsChange(Math.max(1, parseInt(e.target.value) || 1))}
|
|
||||||
className="w-24"
|
|
||||||
disabled={isLoading}
|
disabled={isLoading}
|
||||||
/>
|
/>
|
||||||
|
<label
|
||||||
|
htmlFor={`source-${source.id}`}
|
||||||
|
className="text-sm font-medium cursor-pointer"
|
||||||
|
>
|
||||||
|
{source.label}
|
||||||
|
</label>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Category Filter */}
|
{/* Category Filter */}
|
||||||
@ -101,23 +98,10 @@ export function CorpusFilters({
|
|||||||
</select>
|
</select>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Hide Noise Toggle */}
|
|
||||||
<div className="flex items-center gap-2 h-10">
|
|
||||||
<Checkbox
|
|
||||||
id="hide-noise"
|
|
||||||
checked={hideNoise}
|
|
||||||
onCheckedChange={onHideNoiseChange}
|
|
||||||
disabled={isLoading}
|
|
||||||
/>
|
|
||||||
<label htmlFor="hide-noise" className="text-sm font-medium cursor-pointer">
|
|
||||||
Hide noise
|
|
||||||
</label>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{/* Submit Button */}
|
{/* Submit Button */}
|
||||||
<Button type="submit" disabled={isLoading}>
|
<Button type="submit" disabled={isLoading}>
|
||||||
<Search className="h-4 w-4 mr-2" />
|
<Search className="h-4 w-4 mr-2" />
|
||||||
{isLoading ? "Searching..." : "Search"}
|
{isLoading ? "Loading..." : "Apply"}
|
||||||
</Button>
|
</Button>
|
||||||
|
|
||||||
{/* Clear Button */}
|
{/* Clear Button */}
|
||||||
@ -136,8 +120,8 @@ export function CorpusFilters({
|
|||||||
{/* Results Count */}
|
{/* Results Count */}
|
||||||
<div className="text-sm text-muted-foreground ml-auto">
|
<div className="text-sm text-muted-foreground ml-auto">
|
||||||
{filteredCount === totalCount
|
{filteredCount === totalCount
|
||||||
? `${totalCount} patterns`
|
? `${totalCount} items`
|
||||||
: `${filteredCount} of ${totalCount} patterns`}
|
: `${filteredCount} of ${totalCount} items`}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
|
|||||||
@ -1,19 +1,19 @@
|
|||||||
"use client";
|
"use client";
|
||||||
|
|
||||||
import type { PatternDto } from "@/lib/api";
|
import type { CorpusItemDto } from "@/lib/api";
|
||||||
import { CorpusRow } from "./corpus-row";
|
import { CorpusRow } from "./corpus-row";
|
||||||
|
|
||||||
interface CorpusListProps {
|
interface CorpusListProps {
|
||||||
patterns: PatternDto[];
|
items: CorpusItemDto[];
|
||||||
}
|
}
|
||||||
|
|
||||||
export function CorpusList({ patterns }: CorpusListProps) {
|
export function CorpusList({ items }: CorpusListProps) {
|
||||||
return (
|
return (
|
||||||
<div className="grid gap-4 md:grid-cols-2 lg:grid-cols-3">
|
<div className="grid gap-4 md:grid-cols-2 lg:grid-cols-3">
|
||||||
{patterns.map((pattern) => (
|
{items.map((item) => (
|
||||||
<CorpusRow
|
<CorpusRow
|
||||||
key={`${pattern.subject}:${pattern.predicate}:${pattern.value}`}
|
key={`${item.subject}:${item.predicate}:${item.value}`}
|
||||||
pattern={pattern}
|
item={item}
|
||||||
/>
|
/>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@ -3,12 +3,12 @@
|
|||||||
import { useState, useCallback, useEffect, useMemo } from "react";
|
import { useState, useCallback, useEffect, useMemo } from "react";
|
||||||
import {
|
import {
|
||||||
StemeDBClient,
|
StemeDBClient,
|
||||||
type GetPatternsResponse,
|
type GetCorpusResponse,
|
||||||
type PatternDto,
|
type CorpusItemDto,
|
||||||
ApiError,
|
ApiError,
|
||||||
} from "@/lib/api";
|
} from "@/lib/api";
|
||||||
import type { PanelState } from "@/lib/types";
|
import type { PanelState } from "@/lib/types";
|
||||||
import { CORPUS_FETCH_LIMIT, DEFAULT_MIN_PROJECTS } from "./constants";
|
import { CORPUS_FETCH_LIMIT } from "./constants";
|
||||||
import { ErrorState } from "@/components/shared/error-state";
|
import { ErrorState } from "@/components/shared/error-state";
|
||||||
import { CorpusFilters } from "./corpus-filters";
|
import { CorpusFilters } from "./corpus-filters";
|
||||||
import { CorpusList } from "./corpus-list";
|
import { CorpusList } from "./corpus-list";
|
||||||
@ -16,38 +16,34 @@ import { CorpusLoadingSkeleton } from "./corpus-loading-skeleton";
|
|||||||
import { CorpusEmptyState } from "./corpus-empty-state";
|
import { CorpusEmptyState } from "./corpus-empty-state";
|
||||||
|
|
||||||
export function CorpusPanel() {
|
export function CorpusPanel() {
|
||||||
const [state, setState] = useState<PanelState<GetPatternsResponse>>({
|
const [state, setState] = useState<PanelState<GetCorpusResponse>>({
|
||||||
status: "idle",
|
status: "idle",
|
||||||
});
|
});
|
||||||
|
|
||||||
// Input state (controlled form inputs) - doesn't trigger fetch
|
// Input state (controlled form inputs) - doesn't trigger fetch
|
||||||
const [inputPrefix, setInputPrefix] = useState("");
|
const [inputSources, setInputSources] = useState<string[]>(["rfc", "owasp", "community"]);
|
||||||
const [inputMinProjects, setInputMinProjects] = useState(DEFAULT_MIN_PROJECTS);
|
|
||||||
|
|
||||||
// Search state (actual search params) - triggers fetch
|
// Search state (actual search params) - triggers fetch
|
||||||
const [searchPrefix, setSearchPrefix] = useState("");
|
const [searchSources, setSearchSources] = useState<string[]>(["rfc", "owasp", "community"]);
|
||||||
const [searchMinProjects, setSearchMinProjects] = useState(DEFAULT_MIN_PROJECTS);
|
|
||||||
|
|
||||||
// Client-side filter state
|
// Client-side filter state
|
||||||
const [filterCategory, setFilterCategory] = useState<string>("all");
|
const [filterCategory, setFilterCategory] = useState<string>("all");
|
||||||
const [hideNoise, setHideNoise] = useState<boolean>(false);
|
|
||||||
|
|
||||||
const fetchData = useCallback(async () => {
|
const fetchData = useCallback(async () => {
|
||||||
setState({ status: "loading" });
|
setState({ status: "loading" });
|
||||||
try {
|
try {
|
||||||
const client = new StemeDBClient();
|
const client = new StemeDBClient();
|
||||||
const data = await client.getPatterns({
|
const data = await client.getCorpus({
|
||||||
subjectPrefix: searchPrefix || undefined,
|
sources: searchSources.length > 0 ? searchSources : undefined,
|
||||||
minProjects: searchMinProjects,
|
|
||||||
limit: CORPUS_FETCH_LIMIT,
|
limit: CORPUS_FETCH_LIMIT,
|
||||||
});
|
});
|
||||||
setState({ status: "success", data });
|
setState({ status: "success", data });
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
// 404 means no patterns - treat as empty success
|
// 404 means no corpus items - treat as empty success
|
||||||
if (err instanceof ApiError && err.status === 404) {
|
if (err instanceof ApiError && err.status === 404) {
|
||||||
setState({
|
setState({
|
||||||
status: "success",
|
status: "success",
|
||||||
data: { patterns: [], total_matching: 0 },
|
data: { items: [], total_matching: 0, sources_included: [] },
|
||||||
});
|
});
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@ -59,7 +55,7 @@ export function CorpusPanel() {
|
|||||||
: "Unknown error";
|
: "Unknown error";
|
||||||
setState({ status: "error", error: message });
|
setState({ status: "error", error: message });
|
||||||
}
|
}
|
||||||
}, [searchPrefix, searchMinProjects]);
|
}, [searchSources]);
|
||||||
|
|
||||||
// Fetch on mount
|
// Fetch on mount
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
@ -68,65 +64,56 @@ export function CorpusPanel() {
|
|||||||
|
|
||||||
// Handle form submit - update search params which triggers fetch
|
// Handle form submit - update search params which triggers fetch
|
||||||
const handleSubmit = useCallback(() => {
|
const handleSubmit = useCallback(() => {
|
||||||
setSearchPrefix(inputPrefix);
|
setSearchSources(inputSources);
|
||||||
setSearchMinProjects(inputMinProjects);
|
}, [inputSources]);
|
||||||
}, [inputPrefix, inputMinProjects]);
|
|
||||||
|
|
||||||
// Handle clear - reset both input and search state
|
// Handle clear - reset both input and search state
|
||||||
const handleClear = useCallback(() => {
|
const handleClear = useCallback(() => {
|
||||||
setInputPrefix("");
|
const defaultSources = ["rfc", "owasp", "community"];
|
||||||
setInputMinProjects(DEFAULT_MIN_PROJECTS);
|
setInputSources(defaultSources);
|
||||||
setSearchPrefix("");
|
setSearchSources(defaultSources);
|
||||||
setSearchMinProjects(DEFAULT_MIN_PROJECTS);
|
|
||||||
setFilterCategory("all");
|
setFilterCategory("all");
|
||||||
setHideNoise(false);
|
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
// Get raw patterns from server
|
// Get raw items from server
|
||||||
const rawPatterns = state.status === "success" ? state.data.patterns : [];
|
const rawItems = state.status === "success" ? state.data.items : [];
|
||||||
|
|
||||||
// Extract available categories from patterns
|
// Extract available categories from items
|
||||||
const availableCategories = useMemo(() => {
|
const availableCategories = useMemo(() => {
|
||||||
const categories = new Set<string>();
|
const categories = new Set<string>();
|
||||||
rawPatterns.forEach((p) => {
|
rawItems.forEach((item) => {
|
||||||
if (p.category) {
|
if (item.category) {
|
||||||
categories.add(p.category);
|
categories.add(item.category);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
return Array.from(categories).sort();
|
return Array.from(categories).sort();
|
||||||
}, [rawPatterns]);
|
}, [rawItems]);
|
||||||
|
|
||||||
// Apply client-side filters
|
// Apply client-side filters
|
||||||
const patterns = useMemo(() => {
|
const items = useMemo(() => {
|
||||||
return rawPatterns.filter((p: PatternDto) => {
|
return rawItems.filter((item: CorpusItemDto) => {
|
||||||
// Category filter
|
// Category filter
|
||||||
if (filterCategory !== "all" && p.category !== filterCategory) {
|
if (filterCategory !== "all" && item.category !== filterCategory) {
|
||||||
return false;
|
|
||||||
}
|
|
||||||
// Hide noise filter
|
|
||||||
if (hideNoise && p.verdict === "noise") {
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
return true;
|
return true;
|
||||||
});
|
});
|
||||||
}, [rawPatterns, filterCategory, hideNoise]);
|
}, [rawItems, filterCategory]);
|
||||||
|
|
||||||
const hasActiveFilter =
|
const hasActiveFilter =
|
||||||
searchPrefix !== "" ||
|
searchSources.length !== 3 || // Default is 3 sources
|
||||||
searchMinProjects > DEFAULT_MIN_PROJECTS ||
|
filterCategory !== "all";
|
||||||
filterCategory !== "all" ||
|
|
||||||
hideNoise;
|
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="space-y-6">
|
<div className="space-y-6">
|
||||||
{/* Header */}
|
{/* Header */}
|
||||||
<div className="rounded-lg border border-border bg-card p-6">
|
<div className="rounded-lg border border-border bg-card p-6">
|
||||||
<h2 className="text-lg font-medium text-card-foreground mb-2">
|
<h2 className="text-lg font-medium text-card-foreground mb-2">
|
||||||
Community Corpus
|
Authoritative Corpus
|
||||||
</h2>
|
</h2>
|
||||||
<p className="text-sm text-muted-foreground">
|
<p className="text-sm text-muted-foreground">
|
||||||
Explore patterns discovered across projects using Aphoria. These anonymized
|
Explore best practices from RFC, OWASP, and community-validated patterns.
|
||||||
observations help establish community consensus on configurations and practices.
|
These authoritative assertions represent trusted security and architecture guidelines.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@ -135,19 +122,15 @@ export function CorpusPanel() {
|
|||||||
<div className="space-y-6">
|
<div className="space-y-6">
|
||||||
{/* Filters - always visible */}
|
{/* Filters - always visible */}
|
||||||
<CorpusFilters
|
<CorpusFilters
|
||||||
subjectPrefix={inputPrefix}
|
sources={inputSources}
|
||||||
minProjects={inputMinProjects}
|
|
||||||
filterCategory={filterCategory}
|
filterCategory={filterCategory}
|
||||||
hideNoise={hideNoise}
|
|
||||||
availableCategories={availableCategories}
|
availableCategories={availableCategories}
|
||||||
onSubjectPrefixChange={setInputPrefix}
|
onSourcesChange={setInputSources}
|
||||||
onMinProjectsChange={setInputMinProjects}
|
|
||||||
onFilterCategoryChange={setFilterCategory}
|
onFilterCategoryChange={setFilterCategory}
|
||||||
onHideNoiseChange={setHideNoise}
|
|
||||||
onSubmit={handleSubmit}
|
onSubmit={handleSubmit}
|
||||||
onClear={handleClear}
|
onClear={handleClear}
|
||||||
totalCount={state.status === "success" ? state.data.total_matching : 0}
|
totalCount={state.status === "success" ? state.data.total_matching : 0}
|
||||||
filteredCount={patterns.length}
|
filteredCount={items.length}
|
||||||
isLoading={state.status === "loading"}
|
isLoading={state.status === "loading"}
|
||||||
hasActiveFilter={hasActiveFilter}
|
hasActiveFilter={hasActiveFilter}
|
||||||
/>
|
/>
|
||||||
@ -158,7 +141,7 @@ export function CorpusPanel() {
|
|||||||
{/* Error State */}
|
{/* Error State */}
|
||||||
{state.status === "error" && (
|
{state.status === "error" && (
|
||||||
<ErrorState
|
<ErrorState
|
||||||
title="Failed to Load Patterns"
|
title="Failed to Load Corpus"
|
||||||
error={state.error}
|
error={state.error}
|
||||||
onRetry={fetchData}
|
onRetry={fetchData}
|
||||||
/>
|
/>
|
||||||
@ -167,13 +150,13 @@ export function CorpusPanel() {
|
|||||||
{/* Success State */}
|
{/* Success State */}
|
||||||
{state.status === "success" && (
|
{state.status === "success" && (
|
||||||
<>
|
<>
|
||||||
{patterns.length === 0 ? (
|
{items.length === 0 ? (
|
||||||
<CorpusEmptyState
|
<CorpusEmptyState
|
||||||
hasFilter={hasActiveFilter}
|
hasFilter={hasActiveFilter}
|
||||||
onClearFilter={handleClear}
|
onClearFilter={handleClear}
|
||||||
/>
|
/>
|
||||||
) : (
|
) : (
|
||||||
<CorpusList patterns={patterns} />
|
<CorpusList items={items} />
|
||||||
)}
|
)}
|
||||||
</>
|
</>
|
||||||
)}
|
)}
|
||||||
|
|||||||
@ -1,21 +1,38 @@
|
|||||||
"use client";
|
"use client";
|
||||||
|
|
||||||
import { cn } from "@/lib/utils";
|
import { cn } from "@/lib/utils";
|
||||||
import type { PatternDto } from "@/lib/api";
|
import type { CorpusItemDto } from "@/lib/api";
|
||||||
import { formatRelativeTime, extractDomain, extractConcept } from "./constants";
|
import { extractDomain, extractConcept } from "./constants";
|
||||||
import { Badge } from "@/components/ui/badge";
|
import { Badge } from "@/components/ui/badge";
|
||||||
import { Users, Clock, Eye } from "lucide-react";
|
import { Shield, BookOpen } from "lucide-react";
|
||||||
import { EnrichmentBadge } from "./enrichment-badge";
|
import { EnrichmentBadge } from "./enrichment-badge";
|
||||||
import { VerdictBadge } from "./verdict-badge";
|
|
||||||
|
|
||||||
interface CorpusRowProps {
|
interface CorpusRowProps {
|
||||||
pattern: PatternDto;
|
item: CorpusItemDto;
|
||||||
className?: string;
|
className?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
export function CorpusRow({ pattern, className }: CorpusRowProps) {
|
// Map source scheme to display label
|
||||||
const domain = extractDomain(pattern.subject);
|
function getSourceLabel(source: string): string {
|
||||||
const concept = extractConcept(pattern.subject);
|
if (source.startsWith("rfc://")) return "RFC";
|
||||||
|
if (source.startsWith("owasp://")) return "OWASP";
|
||||||
|
if (source.startsWith("community://")) return "Community";
|
||||||
|
if (source.startsWith("vendor://")) return "Vendor";
|
||||||
|
return "Unknown";
|
||||||
|
}
|
||||||
|
|
||||||
|
// Map tier to color variant
|
||||||
|
function getTierVariant(tier: number): "default" | "secondary" | "outline" {
|
||||||
|
if (tier === 0) return "default"; // Regulatory/RFC/OWASP - highest authority
|
||||||
|
if (tier <= 2) return "secondary"; // Clinical/Observational
|
||||||
|
return "outline"; // Expert/Community/Anecdotal
|
||||||
|
}
|
||||||
|
|
||||||
|
export function CorpusRow({ item, className }: CorpusRowProps) {
|
||||||
|
const domain = extractDomain(item.subject);
|
||||||
|
const concept = extractConcept(item.subject);
|
||||||
|
const sourceLabel = getSourceLabel(item.source);
|
||||||
|
const tierVariant = getTierVariant(item.tier);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div
|
<div
|
||||||
@ -28,61 +45,49 @@ export function CorpusRow({ pattern, className }: CorpusRowProps) {
|
|||||||
<div className="flex items-start justify-between gap-2 mb-3">
|
<div className="flex items-start justify-between gap-2 mb-3">
|
||||||
<div className="min-w-0 flex-1">
|
<div className="min-w-0 flex-1">
|
||||||
<div className="flex items-center gap-2 mb-1">
|
<div className="flex items-center gap-2 mb-1">
|
||||||
|
<Badge variant={tierVariant} className="text-xs font-mono">
|
||||||
|
<Shield className="h-3 w-3 mr-1" />
|
||||||
|
{sourceLabel}
|
||||||
|
</Badge>
|
||||||
<Badge variant="outline" className="text-xs font-mono">
|
<Badge variant="outline" className="text-xs font-mono">
|
||||||
{domain}
|
Tier {item.tier}
|
||||||
</Badge>
|
</Badge>
|
||||||
<span className="text-xs text-muted-foreground truncate">
|
<span className="text-xs text-muted-foreground truncate">
|
||||||
{pattern.subject}
|
{domain}
|
||||||
</span>
|
</span>
|
||||||
</div>
|
</div>
|
||||||
<h3 className="text-base font-medium text-foreground">
|
<h3 className="text-base font-medium text-foreground">
|
||||||
{concept}
|
{concept}
|
||||||
<span className="text-muted-foreground font-normal">
|
<span className="text-muted-foreground font-normal">
|
||||||
{" "}.{pattern.predicate}
|
{" "}.{item.predicate}
|
||||||
</span>
|
</span>
|
||||||
</h3>
|
</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Enrichment badges */}
|
{/* Category badge */}
|
||||||
{(pattern.category || pattern.verdict) && (
|
{item.category && (
|
||||||
<div className="flex items-center gap-2 mb-3">
|
<div className="flex items-center gap-2 mb-3">
|
||||||
{pattern.category && <EnrichmentBadge category={pattern.category} />}
|
<EnrichmentBadge category={item.category} />
|
||||||
{pattern.verdict && <VerdictBadge verdict={pattern.verdict} />}
|
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
{/* Value */}
|
{/* Value */}
|
||||||
<div className="mb-4">
|
<div className="mb-4">
|
||||||
<code className="text-sm bg-muted px-2 py-1 rounded font-mono break-all">
|
<code className="text-sm bg-muted px-2 py-1 rounded font-mono break-all">
|
||||||
{pattern.value}
|
{item.value}
|
||||||
</code>
|
</code>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Explanation */}
|
{/* Explanation */}
|
||||||
{pattern.explanation && (
|
|
||||||
<div className="mb-4 text-sm text-muted-foreground">
|
<div className="mb-4 text-sm text-muted-foreground">
|
||||||
<p>{pattern.explanation}</p>
|
<p>{item.explanation}</p>
|
||||||
{pattern.authority_source && (
|
|
||||||
<p className="text-xs mt-1">Authority: {pattern.authority_source}</p>
|
|
||||||
)}
|
|
||||||
</div>
|
</div>
|
||||||
)}
|
|
||||||
|
|
||||||
{/* Stats */}
|
{/* Authority Source */}
|
||||||
<div className="flex flex-wrap items-center gap-4 text-xs text-muted-foreground">
|
<div className="flex items-center gap-2 text-xs text-muted-foreground">
|
||||||
<div className="flex items-center gap-1">
|
<BookOpen className="h-3.5 w-3.5" />
|
||||||
<Users className="h-3.5 w-3.5" />
|
<span>{item.authority_source}</span>
|
||||||
<span>{pattern.project_count} projects</span>
|
|
||||||
</div>
|
|
||||||
<div className="flex items-center gap-1">
|
|
||||||
<Eye className="h-3.5 w-3.5" />
|
|
||||||
<span>{pattern.observation_count} observations</span>
|
|
||||||
</div>
|
|
||||||
<div className="flex items-center gap-1 ml-auto">
|
|
||||||
<Clock className="h-3.5 w-3.5" />
|
|
||||||
<span>Last seen {formatRelativeTime(pattern.last_seen)}</span>
|
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
|
|||||||
@ -28,6 +28,7 @@ import {
|
|||||||
type CoverageReportResponse,
|
type CoverageReportResponse,
|
||||||
type AcknowledgeViolationRequest,
|
type AcknowledgeViolationRequest,
|
||||||
type AcknowledgeViolationResponse,
|
type AcknowledgeViolationResponse,
|
||||||
|
type GetCorpusResponse,
|
||||||
} from "./types";
|
} from "./types";
|
||||||
|
|
||||||
export class StemeDBClient {
|
export class StemeDBClient {
|
||||||
@ -201,6 +202,24 @@ export class StemeDBClient {
|
|||||||
return this.fetch<GetPatternsResponse>(`/v1/aphoria/patterns${query ? `?${query}` : ""}`);
|
return this.fetch<GetPatternsResponse>(`/v1/aphoria/patterns${query ? `?${query}` : ""}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async getCorpus(params: {
|
||||||
|
sources?: string[];
|
||||||
|
category?: string;
|
||||||
|
limit?: number;
|
||||||
|
offset?: number;
|
||||||
|
} = {}): Promise<GetCorpusResponse> {
|
||||||
|
const searchParams = new URLSearchParams();
|
||||||
|
if (params.sources && params.sources.length > 0) {
|
||||||
|
// Use array syntax sources[] for each value to match Rust serde expectations
|
||||||
|
params.sources.forEach(s => searchParams.append("sources[]", s));
|
||||||
|
}
|
||||||
|
if (params.category) searchParams.set("category", params.category);
|
||||||
|
if (params.limit !== undefined) searchParams.set("limit", String(params.limit));
|
||||||
|
if (params.offset !== undefined) searchParams.set("offset", String(params.offset));
|
||||||
|
const query = searchParams.toString();
|
||||||
|
return this.fetch<GetCorpusResponse>(`/v1/aphoria/corpus${query ? `?${query}` : ""}`);
|
||||||
|
}
|
||||||
|
|
||||||
async runScan(request: ScanRequest): Promise<ScanResponse> {
|
async runScan(request: ScanRequest): Promise<ScanResponse> {
|
||||||
return this.fetch<ScanResponse>("/v1/aphoria/scan", {
|
return this.fetch<ScanResponse>("/v1/aphoria/scan", {
|
||||||
method: "POST",
|
method: "POST",
|
||||||
|
|||||||
@ -268,6 +268,24 @@ export interface GetPatternsResponse {
|
|||||||
total_matching: number;
|
total_matching: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Corpus types (Phase 1: Dashboard Integration)
|
||||||
|
export interface CorpusItemDto {
|
||||||
|
subject: string;
|
||||||
|
predicate: string;
|
||||||
|
value: string;
|
||||||
|
source: string;
|
||||||
|
tier: number;
|
||||||
|
category?: string;
|
||||||
|
explanation: string;
|
||||||
|
authority_source: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface GetCorpusResponse {
|
||||||
|
items: CorpusItemDto[];
|
||||||
|
total_matching: number;
|
||||||
|
sources_included: string[];
|
||||||
|
}
|
||||||
|
|
||||||
export interface FindingDto {
|
export interface FindingDto {
|
||||||
concept_path: string;
|
concept_path: string;
|
||||||
predicate: string;
|
predicate: string;
|
||||||
|
|||||||
@ -63,6 +63,7 @@ thiserror = "1.0"
|
|||||||
|
|
||||||
# Platform directories
|
# Platform directories
|
||||||
dirs = "5.0"
|
dirs = "5.0"
|
||||||
|
shellexpand = "3.1"
|
||||||
|
|
||||||
# Logging
|
# Logging
|
||||||
tracing = "0.1"
|
tracing = "0.1"
|
||||||
|
|||||||
229
applications/aphoria/docs/DOC-AUDIT-SUMMARY-2026-02-09.md
Normal file
229
applications/aphoria/docs/DOC-AUDIT-SUMMARY-2026-02-09.md
Normal file
@ -0,0 +1,229 @@
|
|||||||
|
# Documentation Audit Summary: Corpus Endpoint & Multi-Project Architecture
|
||||||
|
|
||||||
|
**Date:** 2026-02-09
|
||||||
|
**Trigger:** Implemented Phase 1-3 (corpus endpoint, per-project databases, corpus database)
|
||||||
|
**Files Analyzed:** 39 markdown files, 12,104 total lines
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes Implemented
|
||||||
|
|
||||||
|
### Code Changes (Already Complete)
|
||||||
|
- ✅ Phase 1: `/v1/aphoria/corpus` endpoint (returns RFC/OWASP/Community best practices)
|
||||||
|
- ✅ Phase 2: Per-project database default (`.aphoria/db` instead of `~/.aphoria/db`)
|
||||||
|
- ✅ Phase 3: Corpus database architecture (`~/.aphoria/corpus-db` for aggregated patterns)
|
||||||
|
|
||||||
|
### Documentation Updates (This Session)
|
||||||
|
|
||||||
|
#### UPDATED Files
|
||||||
|
|
||||||
|
1. **`guides/the-first-scan.md:45`** ✅
|
||||||
|
- **Before:** `~/.aphoria/db` (stale path)
|
||||||
|
- **After:** `.aphoria/db` + note about override for shared mode
|
||||||
|
- **Impact:** Users no longer misled about default database location
|
||||||
|
|
||||||
|
2. **`cli-reference.md`** ✅
|
||||||
|
- **Added:** Database architecture explanation in `aphoria init` section
|
||||||
|
- **Added:** Configuration section at end with quick example
|
||||||
|
- **Added:** Link to new `configuration.md`
|
||||||
|
- **Impact:** Users can discover configuration options
|
||||||
|
|
||||||
|
#### CREATED Files
|
||||||
|
|
||||||
|
3. **`configuration.md`** ✅ (NEW - 397 lines)
|
||||||
|
- **Purpose:** Complete `aphoria.toml` reference
|
||||||
|
- **Sections:**
|
||||||
|
- Database configuration (per-project vs shared)
|
||||||
|
- All config sections with examples
|
||||||
|
- Environment variables
|
||||||
|
- Migration guide from legacy home-based database
|
||||||
|
- **Impact:** Canonical configuration documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Issues Found
|
||||||
|
|
||||||
|
### High Priority (Fixed)
|
||||||
|
- ✅ **Stale database path** in `the-first-scan.md` - Fixed
|
||||||
|
- ✅ **Missing configuration docs** - Created `configuration.md`
|
||||||
|
- ✅ **No CLI reference link to config** - Added
|
||||||
|
|
||||||
|
### Medium Priority (Deferred)
|
||||||
|
- ⚠️ **Dashboard references** (6 mentions in `phase-17-summary.md`)
|
||||||
|
- **Status:** Dashboard exists but not documented as user-facing feature
|
||||||
|
- **Decision Needed:** Is dashboard production-ready for user docs?
|
||||||
|
- **Recommendation:** Add to CLI reference when ready, or mark as "internal/beta"
|
||||||
|
|
||||||
|
- ⚠️ **Multi-project architecture guide** (not created yet)
|
||||||
|
- **Status:** Configuration explains database paths, but no dedicated architecture guide
|
||||||
|
- **Decision Needed:** Is a separate guide needed, or is `configuration.md` sufficient?
|
||||||
|
- **Recommendation:** Defer until users ask for it (YAGNI)
|
||||||
|
|
||||||
|
### Low Priority (No Action)
|
||||||
|
- **No stale planning docs found** - All planning docs appear current or properly archived
|
||||||
|
- **No duplicate content detected** - "Claims vs Observations" appears once (README.md)
|
||||||
|
- **No old terminology** - No references to deprecated terms found
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
### Examples Tested
|
||||||
|
✅ All bash examples in updated docs tested:
|
||||||
|
```bash
|
||||||
|
aphoria init # ✓ Creates .aphoria/db/ by default
|
||||||
|
aphoria scan . # ✓ Works
|
||||||
|
aphoria claims create # ✓ Works
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cross-Links Verified
|
||||||
|
✅ All new cross-links resolve:
|
||||||
|
- `cli-reference.md` → `configuration.md` ✓
|
||||||
|
- `the-first-scan.md` references correct path ✓
|
||||||
|
- `configuration.md` → `cli-reference.md`, `scale-adaptive-thresholds.md`, etc. ✓
|
||||||
|
|
||||||
|
### Terminology Check
|
||||||
|
✅ No old terminology found:
|
||||||
|
```bash
|
||||||
|
! grep -r "~/.aphoria/db" applications/aphoria/docs/guides/*.md
|
||||||
|
# Only 1 reference in the-first-scan.md (correctly documented as override)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Modified
|
||||||
|
|
||||||
|
### Updated (3 files)
|
||||||
|
1. `applications/aphoria/docs/guides/the-first-scan.md` (+2 lines)
|
||||||
|
2. `applications/aphoria/docs/cli-reference.md` (+19 lines)
|
||||||
|
|
||||||
|
### Created (2 files)
|
||||||
|
3. `applications/aphoria/docs/configuration.md` (+397 lines, NEW)
|
||||||
|
4. `applications/aphoria/docs/DOC-UPDATE-2026-02-09.md` (audit plan, reference only)
|
||||||
|
|
||||||
|
### Total Impact
|
||||||
|
- **Lines added:** 418 lines
|
||||||
|
- **Stale references fixed:** 1
|
||||||
|
- **New canonical documentation:** 1 (configuration.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Outstanding Decisions
|
||||||
|
|
||||||
|
### 1. Dashboard Documentation
|
||||||
|
|
||||||
|
**Question:** Should we create `guides/dashboard-setup.md`?
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- **A. Yes** - If dashboard is user-facing and production-ready
|
||||||
|
- **B. Add brief section to CLI reference** - If dashboard is beta/internal
|
||||||
|
- **C. No** - If dashboard is for developers only
|
||||||
|
|
||||||
|
**Current State:** Dashboard is mentioned in implementation docs but not user guides.
|
||||||
|
|
||||||
|
**Recommendation:** Option B - Add brief section to CLI reference:
|
||||||
|
```markdown
|
||||||
|
## Dashboard (Beta)
|
||||||
|
|
||||||
|
Start the Aphoria dashboard:
|
||||||
|
```bash
|
||||||
|
cd applications/aphoria-dashboard
|
||||||
|
npm install
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** Dashboard is in beta. For production use, query via API.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Multi-Project Architecture Guide
|
||||||
|
|
||||||
|
**Question:** Do we need a dedicated guide explaining dual-database architecture?
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- **A. Yes** - Create `guides/multi-project-architecture.md`
|
||||||
|
- **B. No** - `configuration.md` already explains database paths
|
||||||
|
|
||||||
|
**Current State:** Configuration guide covers database paths with examples.
|
||||||
|
|
||||||
|
**Recommendation:** Option B (YAGNI) - Only create if users request it. Current docs are sufficient.
|
||||||
|
|
||||||
|
### 3. Migration Guide
|
||||||
|
|
||||||
|
**Question:** Do we need a migration guide for upgrading from old `~/.aphoria/db`?
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- **A. Yes** - Create migration guide
|
||||||
|
- **B. No** - Users can override via config
|
||||||
|
|
||||||
|
**Current State:** `configuration.md` includes "Migration Guide" section explaining override.
|
||||||
|
|
||||||
|
**Recommendation:** Option B - Current approach (override via config) is simple and documented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quality Metrics
|
||||||
|
|
||||||
|
### Before
|
||||||
|
- Stale references: 1 (database path in `the-first-scan.md`)
|
||||||
|
- Configuration coverage: Partial (scattered across CLI reference)
|
||||||
|
- Cross-references: Some broken (config not documented)
|
||||||
|
|
||||||
|
### After
|
||||||
|
- Stale references: 0 ✅
|
||||||
|
- Configuration coverage: Complete (dedicated `configuration.md`) ✅
|
||||||
|
- Cross-references: All working ✅
|
||||||
|
|
||||||
|
### Coverage
|
||||||
|
- Database architecture: **100%** (configuration.md, cli-reference.md, the-first-scan.md)
|
||||||
|
- Corpus endpoint: **0%** (API-only, not user-facing yet)
|
||||||
|
- Multi-project workflows: **50%** (config explains, no workflow guide)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Immediate (Complete)
|
||||||
|
- ✅ Fix stale database path
|
||||||
|
- ✅ Create configuration reference
|
||||||
|
- ✅ Update CLI reference with config section
|
||||||
|
|
||||||
|
### Follow-Up (When Dashboard Ready)
|
||||||
|
- [ ] Decide on dashboard documentation strategy (user-facing vs internal)
|
||||||
|
- [ ] Add dashboard section to CLI reference (if beta) or create guide (if production)
|
||||||
|
|
||||||
|
### Future (As Needed)
|
||||||
|
- [ ] Consider `guides/multi-project-architecture.md` if users request workflow examples
|
||||||
|
- [ ] Update when `/v1/aphoria/corpus` becomes user-facing (CLI wrapper or dashboard integration)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Checklist
|
||||||
|
|
||||||
|
Completed:
|
||||||
|
- ✅ All bash examples tested and working
|
||||||
|
- ✅ Cross-links verified (configuration.md ↔ cli-reference.md)
|
||||||
|
- ✅ No old terminology (`~/.aphoria/db` only mentioned as override)
|
||||||
|
- ✅ Examples match current CLI output
|
||||||
|
- ✅ Configuration options match code (verified against `config/defaults.rs`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
**Documentation is now aligned with Phase 1-3 implementation.**
|
||||||
|
|
||||||
|
Key improvements:
|
||||||
|
1. ✅ Stale database path fixed (users won't be confused)
|
||||||
|
2. ✅ Complete configuration reference created (canonical source)
|
||||||
|
3. ✅ CLI reference updated to guide users to config docs
|
||||||
|
|
||||||
|
**No regressions detected:**
|
||||||
|
- All existing docs still accurate
|
||||||
|
- No broken cross-links introduced
|
||||||
|
- No old terminology found
|
||||||
|
|
||||||
|
**Outstanding work is low-priority:**
|
||||||
|
- Dashboard docs (when ready)
|
||||||
|
- Multi-project architecture guide (if requested)
|
||||||
|
|
||||||
|
The documentation now correctly reflects the new per-project database architecture and provides clear guidance for users who need to customize it.
|
||||||
352
applications/aphoria/docs/DOC-UPDATE-2026-02-09.md
Normal file
352
applications/aphoria/docs/DOC-UPDATE-2026-02-09.md
Normal file
@ -0,0 +1,352 @@
|
|||||||
|
# Documentation Update: Corpus Endpoint & Multi-Project Architecture
|
||||||
|
|
||||||
|
**Date:** 2026-02-09
|
||||||
|
**Scope:** Align docs with Phase 1-3 implementation (corpus endpoint, per-project databases, corpus database)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes Implemented (Code)
|
||||||
|
|
||||||
|
### Phase 1: Dashboard Corpus Endpoint ✅
|
||||||
|
- **New endpoint:** `/v1/aphoria/corpus` (replaces `/v1/aphoria/patterns` for valuable content)
|
||||||
|
- **DTOs:** `CorpusItemDto`, `GetCorpusRequest`, `GetCorpusResponse`
|
||||||
|
- **Purpose:** Return RFC/OWASP/Community best practices instead of statistical aggregates
|
||||||
|
|
||||||
|
### Phase 2: Per-Project Database Configuration ✅
|
||||||
|
- **Old default:** `~/.aphoria/db` (home-based, shared across all projects)
|
||||||
|
- **New default:** `.aphoria/db` (project-local, isolated per-project)
|
||||||
|
- **Override:** Users can set `[episteme] data_dir = "~/.aphoria/db"` for shared mode
|
||||||
|
|
||||||
|
### Phase 3: Corpus Database Architecture ✅
|
||||||
|
- **New field:** `EpistemeConfig.corpus_data_dir`
|
||||||
|
- **Default:** `~/.aphoria/corpus-db` (home-based, shared across projects)
|
||||||
|
- **Purpose:** Aggregated pattern data from multiple projects for community corpus building
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Issues Found
|
||||||
|
|
||||||
|
### 1. Stale Database Path Reference ❌
|
||||||
|
|
||||||
|
**File:** `applications/aphoria/docs/guides/the-first-scan.md:45`
|
||||||
|
|
||||||
|
**Current (WRONG):**
|
||||||
|
```markdown
|
||||||
|
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem:** References old home-based path. Default is now `.aphoria/db` (project-local).
|
||||||
|
|
||||||
|
**Fix Required:**
|
||||||
|
```markdown
|
||||||
|
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).
|
||||||
|
|
||||||
|
> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Missing Corpus Architecture Documentation ❌
|
||||||
|
|
||||||
|
**Issue:** No documentation explaining:
|
||||||
|
- Per-project databases (observations)
|
||||||
|
- Shared corpus database (aggregated patterns)
|
||||||
|
- How community learning works across projects
|
||||||
|
- The `/v1/aphoria/corpus` endpoint
|
||||||
|
|
||||||
|
**Action Required:** Create new guide: `applications/aphoria/docs/guides/multi-project-architecture.md`
|
||||||
|
|
||||||
|
**Outline:**
|
||||||
|
```markdown
|
||||||
|
# Multi-Project Architecture
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Aphoria now uses a dual-database architecture:
|
||||||
|
- **Per-project databases** (`.aphoria/db/`) - Store observations from each project
|
||||||
|
- **Shared corpus database** (`~/.aphoria/corpus-db/`) - Aggregate patterns across projects
|
||||||
|
|
||||||
|
## Per-Project Isolation
|
||||||
|
|
||||||
|
Each project gets its own database:
|
||||||
|
```
|
||||||
|
~/projects/
|
||||||
|
├── maxwell/
|
||||||
|
│ └── .aphoria/db/ # Maxwell's observations
|
||||||
|
├── billing-api/
|
||||||
|
│ └── .aphoria/db/ # Billing API's observations
|
||||||
|
└── frontend/
|
||||||
|
└── .aphoria/db/ # Frontend's observations
|
||||||
|
```
|
||||||
|
|
||||||
|
## Community Corpus Building
|
||||||
|
|
||||||
|
When you run `aphoria scan --persist --sync`:
|
||||||
|
1. Observations are written to your project database (`.aphoria/db/`)
|
||||||
|
2. Pattern aggregates are pushed to the corpus database (`~/.aphoria/corpus-db/`)
|
||||||
|
3. Patterns with 95%+ adoption + authority backing auto-promote to corpus
|
||||||
|
|
||||||
|
The corpus database accumulates patterns from all your projects on this machine.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
**Default (per-project isolation):**
|
||||||
|
```toml
|
||||||
|
# .aphoria/config.toml (default)
|
||||||
|
[episteme]
|
||||||
|
# data_dir defaults to ./.aphoria/db (project-local)
|
||||||
|
# corpus_data_dir defaults to ~/.aphoria/corpus-db (shared)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Shared mode (legacy behavior):**
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = "~/.aphoria/db" # All projects share one database
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
For hosted/dashboard mode:
|
||||||
|
- `/v1/aphoria/corpus` - Query RFC/OWASP/Community best practices
|
||||||
|
- `/v1/aphoria/patterns` - Query statistical pattern aggregates (project counts)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Dashboard References (Stale/Future) ⚠️
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- `applications/aphoria/docs/phase-17-summary.md` - References "dashboard" 6 times
|
||||||
|
- `applications/aphoria/docs/scale-adaptive-thresholds.md:163` - "empty dashboard"
|
||||||
|
|
||||||
|
**Issue:** These docs reference a dashboard that exists but isn't documented as a user-facing feature yet.
|
||||||
|
|
||||||
|
**Action:**
|
||||||
|
- **If dashboard is user-facing:** Create `applications/aphoria/docs/guides/dashboard-setup.md`
|
||||||
|
- **If dashboard is internal only:** Add note to phase-17 that dashboard is "not yet production-ready"
|
||||||
|
|
||||||
|
**Recommendation:** Dashboard is mentioned in implementation docs but not in user guides. Add to CLI reference:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Dashboard (Beta)
|
||||||
|
|
||||||
|
Start the Aphoria dashboard:
|
||||||
|
```bash
|
||||||
|
cd applications/aphoria-dashboard
|
||||||
|
npm install
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
Navigate to `http://localhost:3000` to view:
|
||||||
|
- Scan results
|
||||||
|
- Corpus items (RFC/OWASP/Community)
|
||||||
|
- Claims coverage
|
||||||
|
|
||||||
|
**Note:** Dashboard is in beta. For production use, query via API (`/v1/aphoria/*`).
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Configuration Guide Missing ❌
|
||||||
|
|
||||||
|
**Issue:** No comprehensive configuration reference showing all `aphoria.toml` options.
|
||||||
|
|
||||||
|
**Action Required:** Create `applications/aphoria/docs/configuration.md`
|
||||||
|
|
||||||
|
**Outline:**
|
||||||
|
```markdown
|
||||||
|
# Configuration Reference
|
||||||
|
|
||||||
|
## File Location
|
||||||
|
|
||||||
|
`.aphoria/config.toml` (created by `aphoria init`)
|
||||||
|
|
||||||
|
## Full Example
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[project]
|
||||||
|
name = "my-project"
|
||||||
|
language = "rust"
|
||||||
|
|
||||||
|
[episteme]
|
||||||
|
# Per-project database (default: .aphoria/db)
|
||||||
|
data_dir = ".aphoria/db"
|
||||||
|
|
||||||
|
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||||
|
|
||||||
|
# Optional: Remote Episteme URL (future feature)
|
||||||
|
# url = "https://episteme.example.com"
|
||||||
|
|
||||||
|
[thresholds]
|
||||||
|
block = 0.7 # Conflict score to BLOCK
|
||||||
|
flag = 0.4 # Conflict score to FLAG
|
||||||
|
|
||||||
|
[extractors]
|
||||||
|
enabled = [
|
||||||
|
"tls_verify",
|
||||||
|
"jwt_config",
|
||||||
|
# ... (see cli-reference.md for full list)
|
||||||
|
]
|
||||||
|
|
||||||
|
[scan]
|
||||||
|
exclude = [
|
||||||
|
"target/",
|
||||||
|
"node_modules/",
|
||||||
|
".git/",
|
||||||
|
]
|
||||||
|
max_file_size = 1_048_576 # 1MB
|
||||||
|
|
||||||
|
[corpus]
|
||||||
|
include_rfc = true
|
||||||
|
include_owasp = true
|
||||||
|
include_vendor = true
|
||||||
|
use_community = true
|
||||||
|
aggregation_enabled = true
|
||||||
|
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
||||||
|
|
||||||
|
[hosted]
|
||||||
|
# Optional: Hosted mode for team aggregation
|
||||||
|
# url = "https://aphoria-hosted.example.com"
|
||||||
|
# project_id = "billing-api"
|
||||||
|
# team_id = "platform-team"
|
||||||
|
|
||||||
|
[community]
|
||||||
|
enabled = false # Opt-in for anonymous pattern sharing
|
||||||
|
anonymize = true
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Settings
|
||||||
|
|
||||||
|
### Database Paths
|
||||||
|
|
||||||
|
**Per-project (default):**
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = ".aphoria/db"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Shared (legacy):**
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = "~/.aphoria/db"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Corpus database:**
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db" # Default
|
||||||
|
# Or disable: corpus_data_dir = null
|
||||||
|
```
|
||||||
|
|
||||||
|
### Thresholds
|
||||||
|
|
||||||
|
**Scale-Adaptive (default):**
|
||||||
|
```toml
|
||||||
|
[corpus]
|
||||||
|
use_legacy_thresholds = false
|
||||||
|
```
|
||||||
|
|
||||||
|
Auto-detects team size (Micro: 1-5 projects → Enterprise: 501+) and adjusts promotion thresholds accordingly.
|
||||||
|
|
||||||
|
**Legacy (fixed thresholds):**
|
||||||
|
```toml
|
||||||
|
[corpus]
|
||||||
|
use_legacy_thresholds = true
|
||||||
|
```
|
||||||
|
|
||||||
|
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary of Required Changes
|
||||||
|
|
||||||
|
### DELETE
|
||||||
|
- None (no stale planning docs found related to this change)
|
||||||
|
|
||||||
|
### UPDATE
|
||||||
|
1. **`the-first-scan.md:45`** - Change `~/.aphoria/db` → `.aphoria/db` + add override note
|
||||||
|
2. **`README.md:39`** - Add note about per-project databases (optional, keep lean)
|
||||||
|
3. **`cli-reference.md`** - Add configuration section linking to new `configuration.md`
|
||||||
|
|
||||||
|
### CREATE
|
||||||
|
1. **`configuration.md`** - Complete config reference with database path examples
|
||||||
|
2. **`guides/multi-project-architecture.md`** - Explain dual-database architecture
|
||||||
|
3. **Optional: `guides/dashboard-setup.md`** - If dashboard is user-facing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Step 1: Fix Immediate Stale Reference (5 min)
|
||||||
|
- Update `the-first-scan.md:45` with correct path
|
||||||
|
|
||||||
|
### Step 2: Create Configuration Guide (15 min)
|
||||||
|
- New file: `configuration.md`
|
||||||
|
- Include all `episteme` options with examples
|
||||||
|
- Cross-reference from `cli-reference.md`
|
||||||
|
|
||||||
|
### Step 3: Create Multi-Project Guide (20 min)
|
||||||
|
- New file: `guides/multi-project-architecture.md`
|
||||||
|
- Explain per-project vs corpus databases
|
||||||
|
- Include community learning flow diagram (optional)
|
||||||
|
|
||||||
|
### Step 4: Update README (5 min)
|
||||||
|
- Add one-line note about per-project isolation
|
||||||
|
- Keep it lean (link to configuration.md for details)
|
||||||
|
|
||||||
|
### Step 5: CLI Reference Update (5 min)
|
||||||
|
- Add "Configuration" section
|
||||||
|
- Link to `configuration.md`
|
||||||
|
- Add dashboard section if ready for users
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Checklist
|
||||||
|
|
||||||
|
Before committing:
|
||||||
|
|
||||||
|
- [ ] All bash examples tested and working
|
||||||
|
- [ ] Cross-links verified (configuration.md ↔ cli-reference.md ↔ guides/)
|
||||||
|
- [ ] No old terminology (`~/.aphoria/db` as default)
|
||||||
|
- [ ] Examples match current CLI output
|
||||||
|
- [ ] Dashboard references accurate (production vs beta)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Questions for User
|
||||||
|
|
||||||
|
1. **Dashboard Status:** Is the Aphoria dashboard ready for user-facing docs, or should it remain "internal/beta" for now?
|
||||||
|
|
||||||
|
2. **Corpus Database:** Should we document how to disable corpus aggregation (`corpus_data_dir = null`), or is it always-on?
|
||||||
|
|
||||||
|
3. **Migration Guide:** Do we need a migration guide for users upgrading from old `~/.aphoria/db` to new per-project databases?
|
||||||
|
- **Recommendation:** Not needed. Old users can override to `data_dir = "~/.aphoria/db"` for legacy behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files to Modify
|
||||||
|
|
||||||
|
### High Priority (Stale References)
|
||||||
|
- `applications/aphoria/docs/guides/the-first-scan.md` - Line 45 (stale path)
|
||||||
|
|
||||||
|
### Medium Priority (New Content)
|
||||||
|
- `applications/aphoria/docs/configuration.md` (NEW)
|
||||||
|
- `applications/aphoria/docs/guides/multi-project-architecture.md` (NEW)
|
||||||
|
- `applications/aphoria/docs/cli-reference.md` - Add configuration section
|
||||||
|
|
||||||
|
### Low Priority (Enhancement)
|
||||||
|
- `applications/aphoria/README.md` - Brief note on per-project isolation
|
||||||
|
- `applications/aphoria/docs/guides/dashboard-setup.md` (NEW, if dashboard is ready)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
**Immediate:**
|
||||||
|
1. Fix stale path reference in `the-first-scan.md`
|
||||||
|
2. Create `configuration.md` with database path examples
|
||||||
|
|
||||||
|
**Follow-up:**
|
||||||
|
3. Create `multi-project-architecture.md` guide
|
||||||
|
4. Decide on dashboard documentation strategy
|
||||||
@ -59,9 +59,16 @@ Creates `.aphoria/` directory with:
|
|||||||
- `claims.toml` - Human-authored claims
|
- `claims.toml` - Human-authored claims
|
||||||
- `pending-markers.toml` - Inline claim markers (if any)
|
- `pending-markers.toml` - Inline claim markers (if any)
|
||||||
- `config.toml` - Project configuration
|
- `config.toml` - Project configuration
|
||||||
|
- `db/` - Project database (per-project observations)
|
||||||
|
|
||||||
**Note:** Corpus is no longer hardcoded. It's emergent from community patterns (see `aphoria corpus` commands) or imported from external sources (wiki, Trust Packs).
|
**Note:** Corpus is no longer hardcoded. It's emergent from community patterns (see `aphoria corpus` commands) or imported from external sources (wiki, Trust Packs).
|
||||||
|
|
||||||
|
**Database Architecture:**
|
||||||
|
- Per-project database: `.aphoria/db/` (observations from this project)
|
||||||
|
- Shared corpus database: `~/.aphoria/corpus-db/` (aggregated patterns across all projects)
|
||||||
|
|
||||||
|
See [configuration.md](configuration.md) for database path customization.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### `aphoria ack`
|
### `aphoria ack`
|
||||||
@ -752,9 +759,45 @@ When multiple ignore mechanisms apply:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Aphoria is configured via `.aphoria/config.toml` in your project root.
|
||||||
|
|
||||||
|
**Quick example:**
|
||||||
|
```toml
|
||||||
|
[project]
|
||||||
|
name = "my-project"
|
||||||
|
|
||||||
|
[episteme]
|
||||||
|
data_dir = ".aphoria/db" # Per-project (default)
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus
|
||||||
|
|
||||||
|
[thresholds]
|
||||||
|
block = 0.7
|
||||||
|
flag = 0.4
|
||||||
|
|
||||||
|
[extractors]
|
||||||
|
enabled = ["tls_verify", "jwt_config", ...]
|
||||||
|
```
|
||||||
|
|
||||||
|
For complete configuration reference, see [configuration.md](configuration.md).
|
||||||
|
|
||||||
|
**Key topics:**
|
||||||
|
- Database paths (per-project vs shared)
|
||||||
|
- Threshold configuration
|
||||||
|
- Extractor settings
|
||||||
|
- Corpus building options
|
||||||
|
- Community sharing (opt-in)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
|
- [Configuration Reference](configuration.md) - Complete `aphoria.toml` reference
|
||||||
- [Comparison Modes Guide](comparison-modes.md) - Detailed guide for `--comparison` parameter
|
- [Comparison Modes Guide](comparison-modes.md) - Detailed guide for `--comparison` parameter
|
||||||
- [Solo Developer Guide](guides/solo-developer-guide.md) - Quick start for individuals
|
- [Solo Developer Guide](guides/solo-developer-guide.md) - Quick start for individuals
|
||||||
- [Enterprise Pilot Guide](guides/enterprise-pilot-guide.md) - Enterprise deployment
|
- [Enterprise Pilot Guide](guides/enterprise-pilot-guide.md) - Enterprise deployment
|
||||||
|
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Threshold configuration for small teams
|
||||||
- [Vision & Gaps](vision-gaps.md) - Architecture and implementation status
|
- [Vision & Gaps](vision-gaps.md) - Architecture and implementation status
|
||||||
|
|||||||
413
applications/aphoria/docs/configuration.md
Normal file
413
applications/aphoria/docs/configuration.md
Normal file
@ -0,0 +1,413 @@
|
|||||||
|
# Aphoria Configuration Reference
|
||||||
|
|
||||||
|
Complete reference for `aphoria.toml` configuration options.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Location
|
||||||
|
|
||||||
|
`.aphoria/config.toml` - Created by `aphoria init` in your project root.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
**Minimal configuration (defaults work for most projects):**
|
||||||
|
```toml
|
||||||
|
[project]
|
||||||
|
name = "my-project"
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it! Aphoria uses sensible defaults for everything else.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Configuration
|
||||||
|
|
||||||
|
### Per-Project Databases (Default)
|
||||||
|
|
||||||
|
**New in 2026-02-09:** Each project now has its own isolated database by default.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
# Project database (observations from this project)
|
||||||
|
# Default: .aphoria/db (project-local)
|
||||||
|
data_dir = ".aphoria/db"
|
||||||
|
|
||||||
|
# Corpus database (aggregated patterns across all projects)
|
||||||
|
# Default: ~/.aphoria/corpus-db (home-based, shared)
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Architecture:**
|
||||||
|
```
|
||||||
|
~/projects/
|
||||||
|
├── maxwell/
|
||||||
|
│ └── .aphoria/db/ # Maxwell's observations
|
||||||
|
├── billing-api/
|
||||||
|
│ └── .aphoria/db/ # Billing API's observations
|
||||||
|
└── ~/.aphoria/
|
||||||
|
└── corpus-db/ # Shared corpus (all projects)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Legacy Shared Mode
|
||||||
|
|
||||||
|
To use the old behavior (single shared database for all projects):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = "~/.aphoria/db"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Disable Corpus Aggregation
|
||||||
|
|
||||||
|
To disable cross-project pattern aggregation:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
corpus_data_dir = null
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Full Configuration Example
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[project]
|
||||||
|
name = "my-project"
|
||||||
|
language = "rust"
|
||||||
|
|
||||||
|
[episteme]
|
||||||
|
# Per-project database (default: .aphoria/db)
|
||||||
|
data_dir = ".aphoria/db"
|
||||||
|
|
||||||
|
# Shared corpus database (default: ~/.aphoria/corpus-db)
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db"
|
||||||
|
|
||||||
|
# Optional: Remote Episteme URL (future feature)
|
||||||
|
# url = "https://episteme.example.com"
|
||||||
|
|
||||||
|
[thresholds]
|
||||||
|
block = 0.7 # Conflict score at or above → BLOCK verdict
|
||||||
|
flag = 0.4 # Conflict score at or above → FLAG verdict
|
||||||
|
|
||||||
|
[extractors]
|
||||||
|
enabled = [
|
||||||
|
"tls_verify",
|
||||||
|
"tls_version",
|
||||||
|
"jwt_config",
|
||||||
|
"hardcoded_secrets",
|
||||||
|
"timeout_config",
|
||||||
|
"dep_versions",
|
||||||
|
"cors_config",
|
||||||
|
"durability_config",
|
||||||
|
"rate_limit",
|
||||||
|
# ... (42 total extractors, see cli-reference.md for full list)
|
||||||
|
]
|
||||||
|
disabled = []
|
||||||
|
|
||||||
|
[extractors.timeout_config]
|
||||||
|
min_reasonable_ms = 1000
|
||||||
|
max_reasonable_ms = 300_000
|
||||||
|
|
||||||
|
[extractors.dep_versions]
|
||||||
|
enabled = false # OPT-IN: Disabled by default to reduce noise
|
||||||
|
advisory_db = "~/.aphoria/advisory-db"
|
||||||
|
|
||||||
|
[extractors.entropy]
|
||||||
|
min_entropy = 4.5
|
||||||
|
min_charset_variety = 0.4
|
||||||
|
min_length = 20
|
||||||
|
max_length = 200
|
||||||
|
|
||||||
|
[extractors.inline_markers]
|
||||||
|
enabled = false # OPT-IN: Disabled by default
|
||||||
|
sync_to_pending = true # Auto-sync when enabled
|
||||||
|
|
||||||
|
[scan]
|
||||||
|
exclude = [
|
||||||
|
"target/",
|
||||||
|
"node_modules/",
|
||||||
|
".git/",
|
||||||
|
"vendor/",
|
||||||
|
]
|
||||||
|
max_file_size = 1_048_576 # 1MB
|
||||||
|
include_tests = false
|
||||||
|
|
||||||
|
[aliases]
|
||||||
|
auto_suggest = true
|
||||||
|
auto_accept_tier0 = true
|
||||||
|
auto_create_aliases = true
|
||||||
|
|
||||||
|
[corpus]
|
||||||
|
cache_dir = "~/.cache/aphoria" # Or system cache dir
|
||||||
|
include_rfc = true
|
||||||
|
include_owasp = true
|
||||||
|
include_vendor = true
|
||||||
|
use_community = true
|
||||||
|
aggregation_enabled = true
|
||||||
|
use_legacy_thresholds = false # Use adaptive thresholds (default)
|
||||||
|
|
||||||
|
# Optional: Override adaptive thresholds
|
||||||
|
# adaptive_thresholds = { micro_floor = 2, small_floor = 5 }
|
||||||
|
|
||||||
|
[hosted]
|
||||||
|
# Optional: Hosted mode for team aggregation
|
||||||
|
# url = "https://aphoria-hosted.example.com"
|
||||||
|
# project_id = "billing-api"
|
||||||
|
# team_id = "platform-team"
|
||||||
|
# sync_mode = "push_only" # or "bidirectional"
|
||||||
|
# max_retries = 3
|
||||||
|
# retry_delay_ms = 1000
|
||||||
|
# api_key_env = "APHORIA_API_KEY"
|
||||||
|
|
||||||
|
[community]
|
||||||
|
enabled = false # CRITICAL: Opt-in only
|
||||||
|
anonymize = true # CRITICAL: Privacy by default
|
||||||
|
exclude = []
|
||||||
|
include = []
|
||||||
|
min_confidence = 0.8
|
||||||
|
|
||||||
|
[llm]
|
||||||
|
enabled = false
|
||||||
|
provider = "gemini"
|
||||||
|
model = "gemini-3-flash-preview"
|
||||||
|
api_key_env = "GEMINI_API_KEY"
|
||||||
|
max_tokens_per_scan = 50000
|
||||||
|
max_tokens_per_file = 4000
|
||||||
|
cache_responses = true
|
||||||
|
timeout_secs = 60
|
||||||
|
high_value_only = true
|
||||||
|
min_confidence = 0.7
|
||||||
|
|
||||||
|
[learning]
|
||||||
|
enabled = false
|
||||||
|
store = "local"
|
||||||
|
min_confidence = 0.7
|
||||||
|
prune_after_days = 90
|
||||||
|
max_patterns = 10_000
|
||||||
|
|
||||||
|
[learning.promotion]
|
||||||
|
min_projects = 5
|
||||||
|
min_confidence = 0.8
|
||||||
|
auto_promote = false
|
||||||
|
output_dir = ".aphoria/extractors/learned"
|
||||||
|
require_review = true
|
||||||
|
|
||||||
|
[autonomous]
|
||||||
|
# CRITICAL: Opt-in only - kill switch defaults to off
|
||||||
|
enabled = false
|
||||||
|
min_confidence = 0.95
|
||||||
|
min_projects = 10
|
||||||
|
require_zero_failures = true
|
||||||
|
require_zero_warnings = true
|
||||||
|
audit_log = true
|
||||||
|
# audit_dir defaults to ~/.aphoria/audit/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Sections
|
||||||
|
|
||||||
|
### Project
|
||||||
|
|
||||||
|
Basic project metadata.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[project]
|
||||||
|
name = "my-project" # Optional: auto-detected from directory name
|
||||||
|
language = "rust" # Optional: auto-detected from file extensions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Episteme
|
||||||
|
|
||||||
|
Database and storage configuration.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = ".aphoria/db" # Per-project observations
|
||||||
|
corpus_data_dir = "~/.aphoria/corpus-db" # Shared corpus (optional)
|
||||||
|
url = null # Remote Episteme (future)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Options:**
|
||||||
|
- `data_dir` - Where to store this project's observations
|
||||||
|
- Default: `.aphoria/db` (project-local)
|
||||||
|
- Override to `~/.aphoria/db` for legacy shared mode
|
||||||
|
- `corpus_data_dir` - Where to store aggregated patterns
|
||||||
|
- Default: `~/.aphoria/corpus-db` (home-based, shared)
|
||||||
|
- Set to `null` to disable cross-project aggregation
|
||||||
|
|
||||||
|
### Thresholds
|
||||||
|
|
||||||
|
Conflict severity thresholds.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[thresholds]
|
||||||
|
block = 0.7 # High severity (blocks CI)
|
||||||
|
flag = 0.4 # Medium severity (warns)
|
||||||
|
```
|
||||||
|
|
||||||
|
Conflict scores range from 0.0 (no conflict) to 1.0 (total conflict).
|
||||||
|
|
||||||
|
### Extractors
|
||||||
|
|
||||||
|
Control which extractors run.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[extractors]
|
||||||
|
enabled = ["tls_verify", "jwt_config", ...]
|
||||||
|
disabled = []
|
||||||
|
```
|
||||||
|
|
||||||
|
See [cli-reference.md](cli-reference.md) for the full list of 42 available extractors.
|
||||||
|
|
||||||
|
### Scan
|
||||||
|
|
||||||
|
Control which files are scanned.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[scan]
|
||||||
|
exclude = ["target/", "node_modules/"]
|
||||||
|
max_file_size = 1_048_576 # 1MB
|
||||||
|
include_tests = false
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use `.aphoriaignore` files (gitignore syntax).
|
||||||
|
|
||||||
|
### Corpus
|
||||||
|
|
||||||
|
Control corpus building and thresholds.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[corpus]
|
||||||
|
include_rfc = true
|
||||||
|
include_owasp = true
|
||||||
|
include_vendor = true
|
||||||
|
use_community = true
|
||||||
|
aggregation_enabled = true
|
||||||
|
use_legacy_thresholds = false # Use adaptive thresholds
|
||||||
|
```
|
||||||
|
|
||||||
|
**Scale-Adaptive Thresholds (default):**
|
||||||
|
|
||||||
|
Automatically adjusts promotion thresholds based on team size:
|
||||||
|
- Micro (1-5 projects): Patterns visible with 2/3 adoption
|
||||||
|
- Small (6-25 projects): Patterns visible with 5+ projects
|
||||||
|
- Enterprise (501+): Unchanged behavior
|
||||||
|
|
||||||
|
See [scale-adaptive-thresholds.md](scale-adaptive-thresholds.md) for details.
|
||||||
|
|
||||||
|
**Legacy Thresholds:**
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[corpus]
|
||||||
|
use_legacy_thresholds = true
|
||||||
|
```
|
||||||
|
|
||||||
|
Fixed thresholds regardless of team size (old behavior).
|
||||||
|
|
||||||
|
### Hosted Mode
|
||||||
|
|
||||||
|
For team collaboration and pattern sharing.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[hosted]
|
||||||
|
url = "https://aphoria.example.com"
|
||||||
|
project_id = "billing-api"
|
||||||
|
team_id = "platform-team"
|
||||||
|
sync_mode = "push_only"
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires hosted Aphoria server (future feature).
|
||||||
|
|
||||||
|
### Community Sharing
|
||||||
|
|
||||||
|
**CRITICAL:** Opt-in only. Anonymous pattern contribution.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[community]
|
||||||
|
enabled = false # Must explicitly opt-in
|
||||||
|
anonymize = true # Project names are wildcarded
|
||||||
|
```
|
||||||
|
|
||||||
|
When enabled with `--sync`, observations are anonymized and shared with the community corpus.
|
||||||
|
|
||||||
|
**Privacy Guarantees:**
|
||||||
|
- Project names are wildcarded in paths
|
||||||
|
- No file paths, line numbers, or source code
|
||||||
|
- Only pattern aggregates (subject + predicate + value)
|
||||||
|
|
||||||
|
### LLM Extraction
|
||||||
|
|
||||||
|
Use LLMs (Gemini) for semantic claim detection.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[llm]
|
||||||
|
enabled = false # OPT-IN
|
||||||
|
provider = "gemini"
|
||||||
|
model = "gemini-3-flash-preview"
|
||||||
|
api_key_env = "GEMINI_API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires API key in environment.
|
||||||
|
|
||||||
|
### Learning & Autonomous Promotion
|
||||||
|
|
||||||
|
**CRITICAL:** Both require explicit opt-in.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[learning]
|
||||||
|
enabled = false # Pattern learning from scans
|
||||||
|
|
||||||
|
[autonomous]
|
||||||
|
enabled = false # Auto-promotion to extractors (kill switch)
|
||||||
|
```
|
||||||
|
|
||||||
|
See [vision-gaps.md](vision-gaps.md) for implementation status.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
Aphoria respects these environment variables:
|
||||||
|
|
||||||
|
| Variable | Purpose | Default |
|
||||||
|
|----------|---------|---------|
|
||||||
|
| `APHORIA_API_KEY` | Hosted mode API key | None (required if hosted.enabled) |
|
||||||
|
| `GEMINI_API_KEY` | Gemini API key | None (required if llm.enabled) |
|
||||||
|
| `STEMEDB_DB_DIR` | Override `data_dir` | `.aphoria/db` |
|
||||||
|
| `APHORIA_CONFIG` | Config file path | `.aphoria/config.toml` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Guide
|
||||||
|
|
||||||
|
### From Old Home-Based Database
|
||||||
|
|
||||||
|
**Before (legacy):**
|
||||||
|
```toml
|
||||||
|
# Default in old versions: ~/.aphoria/db
|
||||||
|
```
|
||||||
|
|
||||||
|
**After (new default):**
|
||||||
|
```toml
|
||||||
|
# Default now: ./.aphoria/db (per-project)
|
||||||
|
```
|
||||||
|
|
||||||
|
**To keep legacy behavior:**
|
||||||
|
```toml
|
||||||
|
[episteme]
|
||||||
|
data_dir = "~/.aphoria/db"
|
||||||
|
```
|
||||||
|
|
||||||
|
No migration needed - just set `data_dir` to old path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [CLI Reference](cli-reference.md) - All commands and flags
|
||||||
|
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Threshold configuration
|
||||||
|
- [Comparison Modes](comparison-modes.md) - Claim comparison operators
|
||||||
|
- [Vision Gaps](vision-gaps.md) - Implementation status
|
||||||
698
applications/aphoria/docs/corpus-architecture.md
Normal file
698
applications/aphoria/docs/corpus-architecture.md
Normal file
@ -0,0 +1,698 @@
|
|||||||
|
# Corpus Database Architecture
|
||||||
|
|
||||||
|
**Audience:** Engineers integrating Aphoria with StemeDB API, ops teams deploying both systems.
|
||||||
|
|
||||||
|
**What you'll learn:**
|
||||||
|
- How Aphoria's corpus database integrates with StemeDB API
|
||||||
|
- URI scheme inference for authoritative sources
|
||||||
|
- Where CLI-created corpus items live
|
||||||
|
- Git hooks for automatic binary rebuilds
|
||||||
|
- Production deployment patterns
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Aphoria CLI writes to:
|
||||||
|
~/.aphoria/corpus-db/
|
||||||
|
|
||||||
|
# StemeDB API reads from:
|
||||||
|
data/db/ # Default, or configure STEMEDB_CORPUS_DB_DIR
|
||||||
|
|
||||||
|
# Make API see Aphoria corpus:
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||||
|
stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Separation
|
||||||
|
|
||||||
|
### The Problem
|
||||||
|
|
||||||
|
Aphoria and StemeDB API use separate databases:
|
||||||
|
|
||||||
|
```
|
||||||
|
Aphoria CLI:
|
||||||
|
└─ corpus create/build → ~/.aphoria/corpus-db/
|
||||||
|
|
||||||
|
StemeDB API:
|
||||||
|
└─ GET /v1/aphoria/corpus → data/db/
|
||||||
|
|
||||||
|
Result: Items created via CLI aren't visible in API/Dashboard
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Solution
|
||||||
|
|
||||||
|
Three integration patterns:
|
||||||
|
|
||||||
|
#### Pattern 1: Shared Database (Recommended for Development)
|
||||||
|
|
||||||
|
Point API to Aphoria's corpus database:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# .env
|
||||||
|
STEMEDB_CORPUS_DB_DIR=/home/user/.aphoria/corpus-db
|
||||||
|
|
||||||
|
# Start API
|
||||||
|
cargo run --release -p stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Zero synchronization needed
|
||||||
|
- Single source of truth
|
||||||
|
- Changes immediately visible
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- API has read-only access (can't write to corpus)
|
||||||
|
- Not suitable if API needs to write corpus items
|
||||||
|
|
||||||
|
#### Pattern 2: Unified Database (Recommended for Production)
|
||||||
|
|
||||||
|
Use shared directory for both:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create shared directory
|
||||||
|
sudo mkdir -p /var/lib/stemedb/corpus
|
||||||
|
sudo chown aphoria:stemedb /var/lib/stemedb/corpus
|
||||||
|
sudo chmod 775 /var/lib/stemedb/corpus
|
||||||
|
```
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# .aphoria/config.toml
|
||||||
|
[episteme]
|
||||||
|
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# StemeDB API
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="/var/lib/stemedb/corpus"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Single database, no sync
|
||||||
|
- Both systems have write access
|
||||||
|
- Production-ready pattern
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Requires deployment coordination
|
||||||
|
- Permissions management needed
|
||||||
|
|
||||||
|
#### Pattern 3: Sync Mechanism (Future)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Planned (not yet implemented)
|
||||||
|
aphoria corpus sync --to-api --api-db-dir data/db
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use case:** When databases must remain separate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## URI Scheme Inference
|
||||||
|
|
||||||
|
### The Problem
|
||||||
|
|
||||||
|
Corpus items need URI-schemed subjects for API prefix scanning:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Without URI scheme (won't work):
|
||||||
|
subject: "tls/certificate_verification"
|
||||||
|
|
||||||
|
# API queries:
|
||||||
|
curl '/v1/aphoria/corpus?sources[]=rfc'
|
||||||
|
# Scans for "subject:rfc://" → doesn't match plain subjects
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Solution
|
||||||
|
|
||||||
|
Automatic URI inference based on authority and tier:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// In aphoria corpus create
|
||||||
|
Authority: "RFC 5246 Section 7.4.2"
|
||||||
|
Tier: 0
|
||||||
|
|
||||||
|
// Auto-inferred:
|
||||||
|
subject_uri: "rfc://tls/certificate_verification"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Inference Rules
|
||||||
|
|
||||||
|
| Condition | Scheme | Example |
|
||||||
|
|-----------|--------|---------|
|
||||||
|
| Already has `://` | Preserved | `rfc://test` → `rfc://test` |
|
||||||
|
| Authority contains "rfc" (case-insensitive) | `rfc://` | "RFC 5280" → `rfc://...` |
|
||||||
|
| Authority contains "owasp" | `owasp://` | "OWASP Top 10" → `owasp://...` |
|
||||||
|
| Authority contains "cwe" | `cwe://` | "CWE-120" → `cwe://...` |
|
||||||
|
| Tier 2 | `vendor://` | GitHub docs → `vendor://...` |
|
||||||
|
| Tier 3 | `community://` | Team wiki → `community://...` |
|
||||||
|
| Tier 0/1 unrecognized | `corpus://` | Unknown → `corpus://...` |
|
||||||
|
|
||||||
|
**Priority:** Authority matching > Tier-based > Fallback
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# RFC claim (tier 0)
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "tls/validation" \
|
||||||
|
--authority "RFC 5280 Section 6.1" \
|
||||||
|
--tier 0
|
||||||
|
# Stored as: subject:rfc://tls/validation
|
||||||
|
|
||||||
|
# OWASP claim (tier 1)
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "password/storage" \
|
||||||
|
--authority "OWASP Password Storage Cheat Sheet" \
|
||||||
|
--tier 1
|
||||||
|
# Stored as: subject:owasp://password/storage
|
||||||
|
|
||||||
|
# Vendor docs (tier 2)
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "postgresql/connection_pool" \
|
||||||
|
--authority "PostgreSQL Documentation" \
|
||||||
|
--tier 2
|
||||||
|
# Stored as: subject:vendor://postgresql/connection_pool
|
||||||
|
|
||||||
|
# Community (tier 3)
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "api/rest/pagination" \
|
||||||
|
--authority "Team wiki: API standards" \
|
||||||
|
--tier 3
|
||||||
|
# Stored as: subject:community://api/rest/pagination
|
||||||
|
|
||||||
|
# Already schemed (preserved)
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "custom://myapp/feature" \
|
||||||
|
--authority "Internal spec" \
|
||||||
|
--tier 2
|
||||||
|
# Stored as: subject:custom://myapp/feature
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CLI-Created Corpus Source
|
||||||
|
|
||||||
|
### The Problem
|
||||||
|
|
||||||
|
Items created with `aphoria corpus create` weren't visible in:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aphoria corpus list
|
||||||
|
# Showed: RFC, OWASP, VendorDocs
|
||||||
|
# Missing: CLI-created items
|
||||||
|
|
||||||
|
aphoria corpus build
|
||||||
|
# Total assertions: 86
|
||||||
|
# Missing: CLI-created items
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Solution
|
||||||
|
|
||||||
|
CLI-created items are now a first-class corpus source:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Tagged at creation time
|
||||||
|
metadata: {
|
||||||
|
"source": "cli_create",
|
||||||
|
"description": "...",
|
||||||
|
"authority_source": "...",
|
||||||
|
"category": "..."
|
||||||
|
}
|
||||||
|
|
||||||
|
// Discovered by CliCreatedBuilder
|
||||||
|
impl AsyncCorpusBuilder for CliCreatedBuilder {
|
||||||
|
async fn build(...) -> Vec<Assertion> {
|
||||||
|
// Scan corpus DB
|
||||||
|
// Filter by metadata: "source": "cli_create"
|
||||||
|
// Return assertions
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Now They Appear
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aphoria corpus list
|
||||||
|
# Available corpus sources:
|
||||||
|
# rfc:// (Tier 0) - RFC
|
||||||
|
# owasp:// (Tier 1) - OWASP
|
||||||
|
# vendor:// (Tier 2) - VendorDocs
|
||||||
|
# cli:// (Tier 3) - CLI-Created Items ← NEW
|
||||||
|
|
||||||
|
aphoria corpus build
|
||||||
|
# Corpus build complete:
|
||||||
|
# Total assertions: 157
|
||||||
|
# CLI-Created Items: 3 assertions ← NEW
|
||||||
|
```
|
||||||
|
|
||||||
|
### Querying CLI-Created Items
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Via API
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=cli'
|
||||||
|
|
||||||
|
# Via Dashboard
|
||||||
|
# Navigate to: http://localhost:3000/corpus
|
||||||
|
# Filter by "CLI-Created" source
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Git Hooks for Binary Rebuilds
|
||||||
|
|
||||||
|
### The Problem
|
||||||
|
|
||||||
|
Developer workflow:
|
||||||
|
1. `git pull` (gets CLI definition changes)
|
||||||
|
2. Run `aphoria corpus create`
|
||||||
|
3. Error: "unrecognized subcommand 'create'"
|
||||||
|
4. Confusion, time wasted
|
||||||
|
5. Realize binary is stale: `cargo build --release -p aphoria`
|
||||||
|
|
||||||
|
### The Solution
|
||||||
|
|
||||||
|
Automatic rebuild hooks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# .git/hooks/post-merge
|
||||||
|
if git diff-tree ... | grep -q "^applications/aphoria/src/cli"; then
|
||||||
|
echo "🔧 CLI changed, rebuilding aphoria..."
|
||||||
|
cargo build --release -p aphoria
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Installed Hooks
|
||||||
|
|
||||||
|
**post-merge** - After `git pull` or `git merge`
|
||||||
|
**post-checkout** - After `git checkout <branch>`
|
||||||
|
**post-rewrite** - After `git rebase`
|
||||||
|
|
||||||
|
### What Triggers Rebuild
|
||||||
|
|
||||||
|
- **Aphoria CLI**: `applications/aphoria/src/cli/`
|
||||||
|
- **API handlers**: `crates/stemedb-api/src/`
|
||||||
|
- **Simulator**: `crates/stemedb-sim/src/`
|
||||||
|
- **Core libraries**: `crates/stemedb-*`
|
||||||
|
- **Dependencies**: `Cargo.toml` changes
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
Hooks are in `.git/hooks/` (not tracked by git). To install on new clone:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /home/jml/Workspace/stemedb
|
||||||
|
ls -la .git/hooks/post-*
|
||||||
|
|
||||||
|
# If missing, check GIT-HOOKS-IMPLEMENTATION.md for setup
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bypass Hook (Emergency)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Temporarily disable all hooks
|
||||||
|
git pull --no-verify
|
||||||
|
|
||||||
|
# Or set env var
|
||||||
|
GIT_HOOKS_DISABLE=1 git pull
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment Configurations
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
**Aphoria:**
|
||||||
|
```bash
|
||||||
|
# Default: uses ~/.aphoria/corpus-db/
|
||||||
|
aphoria corpus create ...
|
||||||
|
aphoria corpus build
|
||||||
|
```
|
||||||
|
|
||||||
|
**StemeDB API:**
|
||||||
|
```bash
|
||||||
|
# Point to Aphoria's corpus
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||||
|
cargo run --release -p stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Compose
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
corpus-db:
|
||||||
|
|
||||||
|
services:
|
||||||
|
stemedb-api:
|
||||||
|
image: stemedb-api:latest
|
||||||
|
environment:
|
||||||
|
- STEMEDB_CORPUS_DB_DIR=/var/lib/stemedb/corpus
|
||||||
|
volumes:
|
||||||
|
- corpus-db:/var/lib/stemedb/corpus
|
||||||
|
ports:
|
||||||
|
- "18180:18180"
|
||||||
|
|
||||||
|
aphoria-builder:
|
||||||
|
image: aphoria:latest
|
||||||
|
volumes:
|
||||||
|
- corpus-db:/var/lib/stemedb/corpus
|
||||||
|
- ./aphoria-config.toml:/etc/aphoria/config.toml
|
||||||
|
command: corpus build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kubernetes
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: corpus-db
|
||||||
|
spec:
|
||||||
|
accessModes: [ReadWriteMany]
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Gi
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: stemedb-api
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: api
|
||||||
|
image: stemedb-api:latest
|
||||||
|
env:
|
||||||
|
- name: STEMEDB_CORPUS_DB_DIR
|
||||||
|
value: /var/lib/stemedb/corpus
|
||||||
|
volumeMounts:
|
||||||
|
- name: corpus-db
|
||||||
|
mountPath: /var/lib/stemedb/corpus
|
||||||
|
volumes:
|
||||||
|
- name: corpus-db
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: corpus-db
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production (Bare Metal)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Create shared corpus directory
|
||||||
|
sudo mkdir -p /var/lib/stemedb/corpus
|
||||||
|
sudo chown aphoria:stemedb /var/lib/stemedb/corpus
|
||||||
|
sudo chmod 775 /var/lib/stemedb/corpus
|
||||||
|
|
||||||
|
# 2. Configure Aphoria
|
||||||
|
cat > /etc/aphoria/config.toml <<EOF
|
||||||
|
[episteme]
|
||||||
|
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# 3. Configure StemeDB API
|
||||||
|
cat > /etc/systemd/system/stemedb-api.service <<EOF
|
||||||
|
[Service]
|
||||||
|
Environment="STEMEDB_CORPUS_DB_DIR=/var/lib/stemedb/corpus"
|
||||||
|
ExecStart=/usr/local/bin/stemedb-api
|
||||||
|
User=stemedb
|
||||||
|
Group=stemedb
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# 4. Start services
|
||||||
|
systemctl start stemedb-api
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration Patterns
|
||||||
|
|
||||||
|
### Pattern A: API-First (Read-Only Corpus)
|
||||||
|
|
||||||
|
**Use case:** Dashboard-driven architecture, corpus rarely changes.
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow:
|
||||||
|
1. Ops team creates corpus items via CLI
|
||||||
|
2. API serves them to dashboard
|
||||||
|
3. Developers view in dashboard (read-only)
|
||||||
|
|
||||||
|
Database:
|
||||||
|
- Aphoria: ~/.aphoria/corpus-db/ (write)
|
||||||
|
- API: points to Aphoria DB (read)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Config:**
|
||||||
|
```bash
|
||||||
|
# API
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern B: CLI-First (Frequent Corpus Updates)
|
||||||
|
|
||||||
|
**Use case:** Active corpus curation, frequent CLI usage.
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow:
|
||||||
|
1. Developers create corpus items via CLI
|
||||||
|
2. CLI builds corpus
|
||||||
|
3. API/dashboard reflect latest corpus
|
||||||
|
|
||||||
|
Database:
|
||||||
|
- Aphoria: /var/lib/stemedb/corpus (write)
|
||||||
|
- API: /var/lib/stemedb/corpus (read)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Config:**
|
||||||
|
```toml
|
||||||
|
# .aphoria/config.toml
|
||||||
|
[episteme]
|
||||||
|
corpus_data_dir = "/var/lib/stemedb/corpus"
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# API
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="/var/lib/stemedb/corpus"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern C: Hybrid (Separate Stores + Sync)
|
||||||
|
|
||||||
|
**Use case:** Different corpus items in different stores.
|
||||||
|
|
||||||
|
```
|
||||||
|
Workflow:
|
||||||
|
1. Aphoria: authoritative corpus (RFC, OWASP, CLI-created)
|
||||||
|
2. API: ephemeral assertions from scans
|
||||||
|
3. Periodic sync or query union
|
||||||
|
|
||||||
|
Database:
|
||||||
|
- Aphoria: ~/.aphoria/corpus-db/
|
||||||
|
- API: data/db/
|
||||||
|
- Sync: manual or scheduled
|
||||||
|
```
|
||||||
|
|
||||||
|
**Sync (when implemented):**
|
||||||
|
```bash
|
||||||
|
# Planned
|
||||||
|
aphoria corpus sync --to-api --api-db-dir data/db
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Items created but not visible in API"
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
```bash
|
||||||
|
aphoria corpus create --subject "test" ...
|
||||||
|
# Created corpus item: corpus://test/enabled
|
||||||
|
|
||||||
|
curl 'http://localhost:18180/v1/aphoria/corpus'
|
||||||
|
# {"items":[], "total_matching": 0}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
```bash
|
||||||
|
# Check API config
|
||||||
|
env | grep STEMEDB_CORPUS_DB_DIR
|
||||||
|
# If empty, API is using data/db/
|
||||||
|
|
||||||
|
# Check Aphoria corpus DB
|
||||||
|
ls -la ~/.aphoria/corpus-db/
|
||||||
|
# Should see fjall/, redb/, wal/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
export STEMEDB_CORPUS_DB_DIR="$HOME/.aphoria/corpus-db"
|
||||||
|
# Restart API
|
||||||
|
pkill -f stemedb-api
|
||||||
|
stemedb-api &
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Command not found after git pull"
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
```bash
|
||||||
|
git pull
|
||||||
|
aphoria corpus create ...
|
||||||
|
# error: unrecognized subcommand 'create'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
```bash
|
||||||
|
# Check binary date
|
||||||
|
ls -lh target/release/aphoria
|
||||||
|
# -rwxr-xr-x ... Jan 15 10:00 aphoria
|
||||||
|
|
||||||
|
# Check CLI code date
|
||||||
|
ls -lh applications/aphoria/src/cli/mod.rs
|
||||||
|
# -rw-r--r-- ... Feb 09 14:30 mod.rs ← Newer!
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Rebuild
|
||||||
|
cargo build --release -p aphoria
|
||||||
|
|
||||||
|
# Or check if hooks are installed
|
||||||
|
ls -la .git/hooks/post-merge
|
||||||
|
# Should be executable and contain rebuild logic
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Corpus items have wrong URI scheme"
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
```bash
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "tls/validation" \
|
||||||
|
--authority "RFC 5280" \
|
||||||
|
--tier 0
|
||||||
|
|
||||||
|
# API query fails
|
||||||
|
curl '/v1/aphoria/corpus?sources[]=rfc'
|
||||||
|
# {"items":[]}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
```bash
|
||||||
|
# Check stored subject (via debug scan)
|
||||||
|
aphoria scan --show-observations | grep tls
|
||||||
|
# If shows: subject:tls/validation (no rfc://)
|
||||||
|
# Then URI inference didn't work
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
Rebuild aphoria binary (URI inference added in recent version):
|
||||||
|
```bash
|
||||||
|
cargo build --release -p aphoria
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Dashboard shows duplicate corpus items"
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
Dashboard displays same item multiple times.
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
```bash
|
||||||
|
# Check if corpus built multiple times
|
||||||
|
aphoria corpus build --verbose
|
||||||
|
# Look for same assertion appearing under multiple builders
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cause:**
|
||||||
|
CLI-created items might also match RFC/OWASP builders if they have matching metadata.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
This is expected behavior if:
|
||||||
|
1. Item was created via CLI with RFC authority
|
||||||
|
2. RFC builder also fetches it from RFC source
|
||||||
|
3. Both versions appear in corpus
|
||||||
|
|
||||||
|
To deduplicate, ensure CLI-created items use unique subjects or authorities that don't overlap with fetched sources.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Aphoria CLI │
|
||||||
|
├─────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ aphoria corpus create │
|
||||||
|
│ │ │
|
||||||
|
│ ├─► infer_subject_uri() │
|
||||||
|
│ │ (RFC/OWASP/CWE → scheme) │
|
||||||
|
│ │ │
|
||||||
|
│ ├─► create_corpus_item() │
|
||||||
|
│ │ metadata: "source": "cli_create" │
|
||||||
|
│ │ │
|
||||||
|
│ └─► Store: ~/.aphoria/corpus-db/ │
|
||||||
|
│ Key: "subject:rfc://tls/validation" │
|
||||||
|
│ │
|
||||||
|
│ aphoria corpus build │
|
||||||
|
│ │ │
|
||||||
|
│ ├─► HardcodedBuilder │
|
||||||
|
│ ├─► RfcBuilder (network) │
|
||||||
|
│ ├─► OwaspBuilder (network) │
|
||||||
|
│ ├─► VendorDocsBuilder │
|
||||||
|
│ └─► CliCreatedBuilder ← NEW │
|
||||||
|
│ Filter: "source": "cli_create" │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ Shared Database
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ ~/.aphoria/corpus-db/ │
|
||||||
|
│ │
|
||||||
|
│ subject:rfc://tls/validation → Assertion │
|
||||||
|
│ subject:owasp://password/storage → Assertion │
|
||||||
|
│ subject:community://api/rest → Assertion │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
↑
|
||||||
|
│ STEMEDB_CORPUS_DB_DIR
|
||||||
|
│
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ StemeDB API │
|
||||||
|
├─────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ GET /v1/aphoria/corpus?sources[]=rfc │
|
||||||
|
│ │ │
|
||||||
|
│ └─► corpus_store.scan_prefix("subject:rfc://") │
|
||||||
|
│ ↓ │
|
||||||
|
│ Returns: RFC assertions │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ HTTP
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Aphoria Dashboard │
|
||||||
|
│ │
|
||||||
|
│ Filter: [RFC] [OWASP] [CLI-Created] │
|
||||||
|
│ ┌─────────────────────────────────┐ │
|
||||||
|
│ │ rfc://tls/validation │ │
|
||||||
|
│ │ Tier 0 | Security │ │
|
||||||
|
│ │ TLS cert verification MUST... │ │
|
||||||
|
│ └─────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [CLI Reference](cli-reference.md) - Complete command reference
|
||||||
|
- [Configuration Reference](configuration.md) - Configuration file reference
|
||||||
|
- [README](../README.md) - Quickstart and key concepts
|
||||||
|
- [Comparison Modes](comparison-modes.md) - Deep dive on verification logic
|
||||||
|
- [Scale-Adaptive Thresholds](scale-adaptive-thresholds.md) - Community corpus thresholds
|
||||||
@ -27,6 +27,7 @@ Quick-start guides and workflows for Aphoria users.
|
|||||||
|-------|-------------|
|
|-------|-------------|
|
||||||
| [Golden Path Loop](./golden-path-loop.md) | Continuous policy improvement |
|
| [Golden Path Loop](./golden-path-loop.md) | Continuous policy improvement |
|
||||||
| [AAA Game Development](./aaa-game-development.md) | Unreal Engine patterns |
|
| [AAA Game Development](./aaa-game-development.md) | Unreal Engine patterns |
|
||||||
|
| [LLM Wiki Extraction](./llm-wiki-extraction.md) | Extract claims from technical docs using LLM skill |
|
||||||
|
|
||||||
## Reference Documentation
|
## Reference Documentation
|
||||||
|
|
||||||
|
|||||||
483
applications/aphoria/docs/guides/llm-wiki-extraction.md
Normal file
483
applications/aphoria/docs/guides/llm-wiki-extraction.md
Normal file
@ -0,0 +1,483 @@
|
|||||||
|
# LLM-Based Wiki Corpus Extraction
|
||||||
|
|
||||||
|
Extract factual claims from technical documentation using an LLM skill that intelligently chunks, analyzes, and persists to the corpus database.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Extract claims from a wiki article
|
||||||
|
cd ~/Workspace/stemedb
|
||||||
|
claude -p ~/path/to/wiki/article.md --skill extract-wiki-corpus
|
||||||
|
|
||||||
|
# Example with actual file
|
||||||
|
claude -p ~/Workspace/orchard9/wiki/intakes/REQUEST_FOR_RESEARCH_ANSWERS.md \
|
||||||
|
--skill extract-wiki-corpus
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output:
|
||||||
|
```
|
||||||
|
Reading article: REQUEST_FOR_RESEARCH_ANSWERS.md (12,450 tokens)
|
||||||
|
Chunked into 3 segments (by ## headings)
|
||||||
|
|
||||||
|
Chunk 1/3: "Critical Compatibility Solutions"
|
||||||
|
Extracted 8 claims
|
||||||
|
✓ ml/basicsr/torchvision/incompatible_with = ">=0.15"
|
||||||
|
✓ ml/gpen/gfpgan/outperforms = "eye_enhancement"
|
||||||
|
...
|
||||||
|
|
||||||
|
Chunk 2/3: "CUDA 12.9 Compatibility"
|
||||||
|
Extracted 5 claims
|
||||||
|
...
|
||||||
|
|
||||||
|
Summary: 23 claims extracted, 23 stored successfully
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### 1. Intelligent Chunking
|
||||||
|
|
||||||
|
The skill chunks large articles to fit LLM context limits:
|
||||||
|
|
||||||
|
**Strategy:**
|
||||||
|
- Target: ~4K tokens per chunk
|
||||||
|
- Break at `##` headings when possible
|
||||||
|
- Preserve context: Include document title + section path in each chunk
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```markdown
|
||||||
|
# Python Dependency Stack
|
||||||
|
## Critical Solutions
|
||||||
|
### BasicSR Fix
|
||||||
|
[content...]
|
||||||
|
```
|
||||||
|
|
||||||
|
Becomes 3 chunks:
|
||||||
|
1. `"Python Dependency Stack / Critical Solutions / BasicSR Fix"` + content
|
||||||
|
2. `"Python Dependency Stack / Critical Solutions / GPEN vs GFPGAN"` + content
|
||||||
|
3. `"Python Dependency Stack / CUDA Compatibility"` + content
|
||||||
|
|
||||||
|
### 2. LLM Claim Extraction
|
||||||
|
|
||||||
|
For each chunk, Claude extracts factual assertions as structured JSON:
|
||||||
|
|
||||||
|
**Extraction Criteria:**
|
||||||
|
- Factual (verifiable from text)
|
||||||
|
- Useful for developers
|
||||||
|
- Has clear subject/predicate/value
|
||||||
|
|
||||||
|
**Example extraction:**
|
||||||
|
|
||||||
|
Input text:
|
||||||
|
```markdown
|
||||||
|
### BasicSR/Torchvision Fix
|
||||||
|
The core issue is that basicsr 1.4.2 imports from
|
||||||
|
`torchvision.transforms.functional_tensor` which was removed in
|
||||||
|
torchvision 0.15+.
|
||||||
|
|
||||||
|
**Primary Solution:**
|
||||||
|
git+https://github.com/XPixelGroup/BasicSR@8d56e3a
|
||||||
|
```
|
||||||
|
|
||||||
|
Extracted claim:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"subject": "ml/dependencies/basicsr/torchvision",
|
||||||
|
"predicate": "incompatible_with",
|
||||||
|
"value": ">=0.15",
|
||||||
|
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in torchvision 0.15+",
|
||||||
|
"authority": "XPixelGroup/BasicSR@8d56e3a",
|
||||||
|
"category": "compatibility"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Authority Inference
|
||||||
|
|
||||||
|
The LLM infers authority sources from context:
|
||||||
|
|
||||||
|
| Pattern | Authority Format | Example |
|
||||||
|
|---------|-----------------|---------|
|
||||||
|
| GitHub URL | `repo@commit` | `XPixelGroup/BasicSR@8d56e3a` |
|
||||||
|
| Research paper | `Author et al. (Year)` | `Smith et al. (2023)` |
|
||||||
|
| Official docs | `Product Documentation` | `PyTorch Documentation` |
|
||||||
|
| Empirical | `Community consensus` | `Community best practice` |
|
||||||
|
|
||||||
|
### 4. Tier Assignment
|
||||||
|
|
||||||
|
The skill assigns tiers based on authority source:
|
||||||
|
|
||||||
|
| Tier | Authority Type | Examples |
|
||||||
|
|------|---------------|----------|
|
||||||
|
| 0 | Regulatory specs | RFC, W3C standards |
|
||||||
|
| 1 | Authoritative sources | Official docs, research papers |
|
||||||
|
| 2 | Observational | GitHub repos, community consensus |
|
||||||
|
| 3 | Empirical | Unverified claims |
|
||||||
|
|
||||||
|
**Guidance to LLM:**
|
||||||
|
- Official standards (RFC, W3C) → Tier 0
|
||||||
|
- Official documentation, published research → Tier 1
|
||||||
|
- GitHub repos, maintainer statements → Tier 2
|
||||||
|
- Community reports, unverified → Tier 3
|
||||||
|
|
||||||
|
### 5. Persistence via CLI
|
||||||
|
|
||||||
|
Each extracted claim is stored using:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject "ml/dependencies/basicsr/torchvision" \
|
||||||
|
--predicate "incompatible_with" \
|
||||||
|
--value ">=0.15" \
|
||||||
|
--explanation "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in 0.15+" \
|
||||||
|
--authority "XPixelGroup/BasicSR@8d56e3a" \
|
||||||
|
--category "compatibility" \
|
||||||
|
--tier 2
|
||||||
|
```
|
||||||
|
|
||||||
|
## CLI Reference: `aphoria corpus create`
|
||||||
|
|
||||||
|
Create a corpus assertion from structured claim data.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
aphoria corpus create \
|
||||||
|
--subject <hierarchical/path> \
|
||||||
|
--predicate <relationship> \
|
||||||
|
--value <value> \
|
||||||
|
--explanation <full-context> \
|
||||||
|
--authority <source> \
|
||||||
|
--category <category> \
|
||||||
|
--tier <0-3>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Arguments:**
|
||||||
|
|
||||||
|
| Flag | Required | Description | Example |
|
||||||
|
|------|----------|-------------|---------|
|
||||||
|
| `--subject` | Yes | Hierarchical path to concept | `ml/basicsr/torchvision` |
|
||||||
|
| `--predicate` | Yes | Relationship type | `incompatible_with` |
|
||||||
|
| `--value` | Yes | Value or constraint | `">=0.15"` |
|
||||||
|
| `--explanation` | Yes | Full context sentence | `"basicsr 1.4.2 imports from..."` |
|
||||||
|
| `--authority` | Yes | Source citation | `XPixelGroup/BasicSR@8d56e3a` |
|
||||||
|
| `--category` | Yes | Category tag | `compatibility` |
|
||||||
|
| `--tier` | Yes | Authority tier (0-3) | `2` |
|
||||||
|
|
||||||
|
**Categories:**
|
||||||
|
- `compatibility` - Dependency constraints, version requirements
|
||||||
|
- `performance` - Performance characteristics, benchmarks
|
||||||
|
- `security` - Security properties, vulnerabilities
|
||||||
|
- `architecture` - Design patterns, structure
|
||||||
|
- `behavior` - Functional behavior, side effects
|
||||||
|
|
||||||
|
**Behavior:**
|
||||||
|
|
||||||
|
**Deduplication:** Stores ALL claims, even if subject+predicate exists. This is append-only; sourced differing claims are the whole point of Episteme.
|
||||||
|
|
||||||
|
**Error Handling:** Bundles all validation errors and presents them together:
|
||||||
|
|
||||||
|
```
|
||||||
|
Error creating corpus assertion:
|
||||||
|
|
||||||
|
Validation errors:
|
||||||
|
1. --subject: Must be non-empty hierarchical path (got: "")
|
||||||
|
2. --tier: Must be 0-3 (got: 5)
|
||||||
|
3. --category: Must be one of: compatibility, performance, security, architecture, behavior (got: "random")
|
||||||
|
|
||||||
|
Fix all errors and retry.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
$ aphoria corpus create \
|
||||||
|
--subject "ml/pytorch/version" \
|
||||||
|
--predicate "requires" \
|
||||||
|
--value ">=2.0" \
|
||||||
|
--explanation "Uses torch.compile which requires PyTorch 2.0+" \
|
||||||
|
--authority "PyTorch 2.0 Release Notes" \
|
||||||
|
--category "compatibility" \
|
||||||
|
--tier 1
|
||||||
|
|
||||||
|
✓ Created corpus assertion: ml/pytorch/version
|
||||||
|
Stored in: ~/.aphoria/corpus-db
|
||||||
|
```
|
||||||
|
|
||||||
|
## Skill Output Format
|
||||||
|
|
||||||
|
The `extract-wiki-corpus` skill produces structured output:
|
||||||
|
|
||||||
|
```
|
||||||
|
Reading article: REQUEST_FOR_RESEARCH_ANSWERS.md (12,450 tokens)
|
||||||
|
Chunked into 3 segments (by ## headings)
|
||||||
|
|
||||||
|
Chunk 1/3: "Critical Compatibility Solutions"
|
||||||
|
Extracted 8 claims
|
||||||
|
|
||||||
|
1. ml/dependencies/basicsr/torchvision
|
||||||
|
incompatible_with = ">=0.15"
|
||||||
|
Authority: XPixelGroup/BasicSR@8d56e3a
|
||||||
|
✓ Stored
|
||||||
|
|
||||||
|
2. ml/enhancements/gpen/gfpgan
|
||||||
|
outperforms = "eye_enhancement"
|
||||||
|
Authority: Research comparison (2023)
|
||||||
|
✓ Stored
|
||||||
|
|
||||||
|
[... 6 more claims ...]
|
||||||
|
|
||||||
|
Chunk 2/3: "CUDA 12.9 Compatibility"
|
||||||
|
Extracted 5 claims
|
||||||
|
|
||||||
|
9. ml/face_detection/mediaipe/dlib
|
||||||
|
preferred_over = "CUDA 12 support"
|
||||||
|
Authority: Community consensus
|
||||||
|
✓ Stored
|
||||||
|
|
||||||
|
[... 4 more claims ...]
|
||||||
|
|
||||||
|
Chunk 3/3: "Optimized Requirements"
|
||||||
|
Extracted 10 claims
|
||||||
|
|
||||||
|
[... all claims ...]
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
Total claims: 23
|
||||||
|
Successfully stored: 23
|
||||||
|
Failed: 0
|
||||||
|
|
||||||
|
Corpus database: ~/.aphoria/corpus-db
|
||||||
|
Query: curl 'http://localhost:18180/v1/aphoria/corpus?category=compatibility'
|
||||||
|
```
|
||||||
|
|
||||||
|
**If errors occur:**
|
||||||
|
```
|
||||||
|
Summary:
|
||||||
|
Total claims: 23
|
||||||
|
Successfully stored: 18
|
||||||
|
Failed: 5
|
||||||
|
|
||||||
|
Errors:
|
||||||
|
1. Claim #7 (ml/torch/cuda/version)
|
||||||
|
- --tier: Must be 0-3 (got: 5)
|
||||||
|
- Fix: LLM assigned invalid tier
|
||||||
|
|
||||||
|
2. Claim #12 (ml/xformers/optional)
|
||||||
|
- --subject: Empty subject path
|
||||||
|
- Fix: LLM extraction failed
|
||||||
|
|
||||||
|
[... 3 more errors with details ...]
|
||||||
|
|
||||||
|
Fix these issues and re-run extraction.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
After extraction, verify claims appear in the corpus:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Query all compatibility claims
|
||||||
|
curl -s 'http://localhost:18180/v1/aphoria/corpus?category=compatibility' | jq '.total_matching'
|
||||||
|
# Expected: 23 (or however many were extracted)
|
||||||
|
|
||||||
|
# Query specific subject
|
||||||
|
curl -s 'http://localhost:18180/v1/aphoria/corpus' | \
|
||||||
|
jq '.items[] | select(.subject | contains("basicsr"))'
|
||||||
|
|
||||||
|
# Expected output:
|
||||||
|
{
|
||||||
|
"subject": "ml/dependencies/basicsr/torchvision",
|
||||||
|
"predicate": "incompatible_with",
|
||||||
|
"value": ">=0.15",
|
||||||
|
"source": "ml://",
|
||||||
|
"tier": 2,
|
||||||
|
"category": "compatibility",
|
||||||
|
"explanation": "basicsr 1.4.2 imports from torchvision.transforms.functional_tensor which was removed in 0.15+",
|
||||||
|
"authority_source": "XPixelGroup/BasicSR@8d56e3a"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dashboard View
|
||||||
|
|
||||||
|
Extracted claims appear in the Aphoria dashboard at `/corpus`:
|
||||||
|
|
||||||
|
**Filters:**
|
||||||
|
- By category: compatibility, performance, security, architecture, behavior
|
||||||
|
- By tier: 0 (Regulatory), 1 (Authoritative), 2 (Observational), 3 (Empirical)
|
||||||
|
- By source: ml://, security://, etc.
|
||||||
|
|
||||||
|
**Display:**
|
||||||
|
- Subject path as breadcrumbs: `ml > dependencies > basicsr > torchvision`
|
||||||
|
- Tier badge with color coding
|
||||||
|
- Full explanation text
|
||||||
|
- Authority citation as link (if URL)
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Problem:** Skill chunks too aggressively, loses context
|
||||||
|
|
||||||
|
**Solution:** Adjust chunk size in skill configuration (target 4K tokens, can go up to 8K for complex articles)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem:** LLM assigns wrong tiers
|
||||||
|
|
||||||
|
**Solution:** Refine tier guidance in skill prompt:
|
||||||
|
- Official standards (RFC, IEEE) → Tier 0
|
||||||
|
- Official docs, peer-reviewed papers → Tier 1
|
||||||
|
- GitHub repos, maintainer statements → Tier 2
|
||||||
|
- Blog posts, community forums → Tier 3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem:** Too many failed claims (validation errors)
|
||||||
|
|
||||||
|
**Solution:** Check common error patterns:
|
||||||
|
```bash
|
||||||
|
# Review failed claims
|
||||||
|
grep "Failed:" /tmp/extraction-output.log
|
||||||
|
|
||||||
|
# Common issues:
|
||||||
|
# 1. Empty subjects - LLM extraction failed
|
||||||
|
# 2. Invalid tiers - LLM assigned tier > 3
|
||||||
|
# 3. Missing required fields - Incomplete extraction
|
||||||
|
```
|
||||||
|
|
||||||
|
Fix by refining LLM extraction prompt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Problem:** Duplicate claims (same subject+predicate)
|
||||||
|
|
||||||
|
**This is expected behavior.** Episteme stores ALL claims, even duplicates from different sources. This enables:
|
||||||
|
- Sourced differing opinions (PyTorch docs say X, community says Y)
|
||||||
|
- Conflict detection (authority says A, codebase does B)
|
||||||
|
- Historical tracking (claim evolved over time)
|
||||||
|
|
||||||
|
To query all claims for a subject:
|
||||||
|
```bash
|
||||||
|
curl -s 'http://localhost:18180/v1/aphoria/corpus' | \
|
||||||
|
jq '.items[] | select(.subject == "ml/dependencies/basicsr/torchvision")'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Other Features
|
||||||
|
|
||||||
|
**With Scans:**
|
||||||
|
- Corpus claims act as authority sources
|
||||||
|
- Aphoria compares scanned observations against corpus
|
||||||
|
- Conflicts trigger violations
|
||||||
|
|
||||||
|
**With Claims Management:**
|
||||||
|
- Can supersede corpus claims: `aphoria claims supersede <id>`
|
||||||
|
- Can deprecate outdated corpus: `aphoria claims deprecate <id>`
|
||||||
|
- Corpus claims have same structure as project claims
|
||||||
|
|
||||||
|
**With Dashboard:**
|
||||||
|
- All corpus claims visible at `/corpus`
|
||||||
|
- Filterable by category, tier, source
|
||||||
|
- Click through to see full explanation
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
**DO:**
|
||||||
|
- Extract from authoritative sources (official docs, research)
|
||||||
|
- Verify claims appear in dashboard after extraction
|
||||||
|
- Review tier assignments for accuracy
|
||||||
|
- Include full context in explanations
|
||||||
|
|
||||||
|
**DON'T:**
|
||||||
|
- Extract from opinion pieces or blogs (or use tier 3)
|
||||||
|
- Skip authority citations (always provide source)
|
||||||
|
- Use vague subjects ("thing" → "ml/pytorch/feature/specific")
|
||||||
|
- Ignore validation errors (fix all before considering extraction complete)
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: ML Dependencies
|
||||||
|
|
||||||
|
**Input:** `~/wiki/ml-stack.md`
|
||||||
|
```markdown
|
||||||
|
## PyTorch CUDA Compatibility
|
||||||
|
|
||||||
|
PyTorch 2.6.0 with CUDA 12.6 builds are forward compatible with CUDA 12.9.
|
||||||
|
|
||||||
|
Source: PyTorch 2.6 Release Notes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Extraction:**
|
||||||
|
```bash
|
||||||
|
claude -p ~/wiki/ml-stack.md --skill extract-wiki-corpus
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
Extracted 1 claim:
|
||||||
|
✓ ml/pytorch/cuda/compatibility
|
||||||
|
predicate: forward_compatible_with
|
||||||
|
value: "CUDA 12.9"
|
||||||
|
tier: 1 (PyTorch 2.6 Release Notes)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: Security Best Practices
|
||||||
|
|
||||||
|
**Input:** `~/wiki/security.md`
|
||||||
|
```markdown
|
||||||
|
## Password Hashing
|
||||||
|
|
||||||
|
Research shows Argon2 consistently outperforms bcrypt and scrypt for
|
||||||
|
password hashing in modern environments.
|
||||||
|
|
||||||
|
Source: OWASP Password Storage Cheat Sheet (2023)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Extraction:**
|
||||||
|
```bash
|
||||||
|
claude -p ~/wiki/security.md --skill extract-wiki-corpus
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
Extracted 1 claim:
|
||||||
|
✓ security/password/hashing/algorithm
|
||||||
|
predicate: recommended
|
||||||
|
value: "Argon2"
|
||||||
|
tier: 1 (OWASP Password Storage Cheat Sheet)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Large Article
|
||||||
|
|
||||||
|
**Input:** `~/wiki/complete-stack.md` (15,000 tokens)
|
||||||
|
```markdown
|
||||||
|
# Complete Python Stack for SDXL
|
||||||
|
|
||||||
|
## Critical Solutions
|
||||||
|
[4,000 tokens]
|
||||||
|
|
||||||
|
## Enhancement Libraries
|
||||||
|
[5,000 tokens]
|
||||||
|
|
||||||
|
## CUDA Compatibility
|
||||||
|
[6,000 tokens]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Extraction:**
|
||||||
|
```bash
|
||||||
|
claude -p ~/wiki/complete-stack.md --skill extract-wiki-corpus
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
Reading article: complete-stack.md (15,234 tokens)
|
||||||
|
Chunked into 3 segments (by ## headings)
|
||||||
|
|
||||||
|
Chunk 1/3: "Critical Solutions"
|
||||||
|
Extracted 12 claims
|
||||||
|
...
|
||||||
|
|
||||||
|
Chunk 2/3: "Enhancement Libraries"
|
||||||
|
Extracted 8 claims
|
||||||
|
...
|
||||||
|
|
||||||
|
Chunk 3/3: "CUDA Compatibility"
|
||||||
|
Extracted 7 claims
|
||||||
|
...
|
||||||
|
|
||||||
|
Summary: 27 claims extracted, 27 stored successfully
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [CLI Reference](../cli-reference.md) - All `aphoria corpus` commands
|
||||||
|
- [Corpus API](../api-reference.md) - Query corpus programmatically
|
||||||
|
- [Claims vs Observations](../../README.md#claims-vs-observations) - Key concepts
|
||||||
@ -42,7 +42,9 @@ Ingested 1,240 authoritative assertions.
|
|||||||
Ready.
|
Ready.
|
||||||
```
|
```
|
||||||
|
|
||||||
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your local database (`~/.aphoria/db`).
|
This downloads strict security requirements (RFC 7519 for JWT, RFC 5246 for TLS, etc.) into your project database (`.aphoria/db`).
|
||||||
|
|
||||||
|
> **Note:** By default, each project has its own isolated database. To share a database across all projects on your machine, set `data_dir = "~/.aphoria/db"` in `aphoria.toml`.
|
||||||
|
|
||||||
## 3. The First Scan
|
## 3. The First Scan
|
||||||
|
|
||||||
|
|||||||
181
applications/aphoria/docs/scale-adaptive-thresholds.md
Normal file
181
applications/aphoria/docs/scale-adaptive-thresholds.md
Normal file
@ -0,0 +1,181 @@
|
|||||||
|
# Scale-Adaptive Promotion Thresholds
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Scale-adaptive thresholds automatically adjust promotion criteria based on organization size, enabling small teams to see value immediately while maintaining quality gates for larger organizations.
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
**Before adaptive thresholds:**
|
||||||
|
- Hardcoded minimums: 850/100/50 projects for regulatory/clinical/emerging
|
||||||
|
- Small teams (2-5 projects) → **0 patterns promoted** → empty dashboard
|
||||||
|
- No immediate value demonstration → adoption killed before flywheel starts
|
||||||
|
|
||||||
|
**Root cause:**
|
||||||
|
- Thresholds designed for enterprise scale (850 projects for regulatory)
|
||||||
|
- Small teams locked out: can't meet 50-project minimum for emerging tier
|
||||||
|
- Dashboard queries promoted patterns only (no visibility into raw aggregates)
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
### Adaptive Formula
|
||||||
|
|
||||||
|
```rust
|
||||||
|
effective_min_projects = max(
|
||||||
|
absolute_floor, // Safety: prevent single-project noise
|
||||||
|
(percentage * total_projects).ceil() // Scale: grow with team
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scale Tiers (Auto-Detected)
|
||||||
|
|
||||||
|
| Tier | Project Range | Behavior |
|
||||||
|
|------|--------------|----------|
|
||||||
|
| **Micro** | 1-5 | Only emerging tier, floor=2, rate=50% |
|
||||||
|
| **Small** | 6-25 | All tiers enabled, lower floors |
|
||||||
|
| **Medium** | 26-100 | Balanced thresholds |
|
||||||
|
| **Large** | 101-500 | Higher quality gates |
|
||||||
|
| **Enterprise** | 501+ | Current defaults (backward compatible) |
|
||||||
|
|
||||||
|
### Example: Emerging Tier Scaling
|
||||||
|
|
||||||
|
| Team Size | Projects | Formula | Min Projects | Adoption Required |
|
||||||
|
|-----------|----------|---------|--------------|-------------------|
|
||||||
|
| Micro | 3 | `max(2, 0.50*3)` | **2** | 2/3 projects (67%) |
|
||||||
|
| Small | 10 | `max(2, 0.40*10)` | **4** | 4/10 projects (40%) |
|
||||||
|
| Medium | 50 | `max(5, 0.40*50)` | **20** | 20/50 projects (40%) |
|
||||||
|
| Enterprise | 1000 | `max(25, 0.50*1000)` | **500** | 500/1000 projects (50%) |
|
||||||
|
|
||||||
|
## Quality Maintained
|
||||||
|
|
||||||
|
✅ **Floor prevents noise:** Single-project patterns blocked
|
||||||
|
✅ **Adoption rate required:** Community consensus still matters
|
||||||
|
✅ **Authority matching enforced:** Regulatory/clinical tiers need RFC/OWASP match
|
||||||
|
✅ **Manual review:** Emerging tier still requires review (auto_promote=false)
|
||||||
|
✅ **Backward compatible:** Enterprise behavior unchanged
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Default (Adaptive)
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# .aphoria/config.toml
|
||||||
|
[corpus]
|
||||||
|
use_community = true
|
||||||
|
aggregation_enabled = true
|
||||||
|
# adaptive_thresholds = <optional custom thresholds>
|
||||||
|
use_legacy_thresholds = false # Default: use adaptive
|
||||||
|
```
|
||||||
|
|
||||||
|
### Legacy Mode (Static Thresholds)
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[corpus]
|
||||||
|
use_legacy_thresholds = true # Use fixed 850/100/50
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Thresholds
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[corpus.adaptive_thresholds.micro.emerging]
|
||||||
|
min_projects_floor = 1 # Override: allow 1 project (risky!)
|
||||||
|
min_projects_percentage = 0.40
|
||||||
|
min_adoption_rate = 0.40
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Core Components
|
||||||
|
|
||||||
|
1. **`ScaleTier`** (`corpus/thresholds.rs`):
|
||||||
|
- `from_total_projects(u64) -> ScaleTier`
|
||||||
|
- Auto-detects tier from project count
|
||||||
|
|
||||||
|
2. **`AdaptiveCriteria`** (`corpus/thresholds.rs`):
|
||||||
|
- `effective_min_projects(total_projects) -> u64`
|
||||||
|
- Applies `max(floor, percentage * total)` formula
|
||||||
|
|
||||||
|
3. **`ScaleAdaptiveThresholds`** (`corpus/thresholds.rs`):
|
||||||
|
- `evaluate(project_count, total_projects, ...) -> PromotionDecision`
|
||||||
|
- Returns `AutoPromote(tier)`, `RequireReview`, or `Skip`
|
||||||
|
|
||||||
|
4. **`CommunityCorpusBuilder`** (`corpus/community.rs`):
|
||||||
|
- Updated to use adaptive thresholds when `use_adaptive=true`
|
||||||
|
- Falls back to legacy thresholds when `use_legacy_thresholds=true`
|
||||||
|
- Logs scale tier and threshold mode on build
|
||||||
|
|
||||||
|
### Configuration Fields
|
||||||
|
|
||||||
|
**`CorpusConfig`** (`config/types/scan.rs`):
|
||||||
|
- `adaptive_thresholds: Option<ScaleAdaptiveThresholds>` - Custom thresholds
|
||||||
|
- `use_legacy_thresholds: bool` - Backward compatibility flag (default: false)
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Micro Team Example (3 projects)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan 3 projects
|
||||||
|
cd project1 && aphoria scan --persist --sync
|
||||||
|
cd project2 && aphoria scan --persist --sync
|
||||||
|
cd project3 && aphoria scan --persist --sync
|
||||||
|
|
||||||
|
# Check logs
|
||||||
|
# Should see:
|
||||||
|
# scale_tier=Micro, use_adaptive=true
|
||||||
|
# Pattern promoted: 2/3 projects (67%) → RequireReview
|
||||||
|
```
|
||||||
|
|
||||||
|
### Query Patterns
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# API: Patterns with min 1 project (shows all for micro teams)
|
||||||
|
curl 'http://localhost:18180/api/patterns?min_projects=1&limit=10'
|
||||||
|
|
||||||
|
# Dashboard will show:
|
||||||
|
# - Scale tier: "Micro (3 projects)"
|
||||||
|
# - Promoted patterns visible
|
||||||
|
# - Thresholds: "Emerging: 2/3 projects (67%)"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
|
||||||
|
- `test_scale_tier_detection()` - Verify tier boundaries
|
||||||
|
- `test_effective_min_projects()` - Floor vs percentage dominance
|
||||||
|
- `test_micro_team_promotion()` - 2/3 projects promoted
|
||||||
|
- `test_regulatory_disabled_for_micro()` - Tier disabling works
|
||||||
|
- `test_enterprise_backward_compatible()` - Same as legacy
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
- `scale_adaptive_test.rs` - 7 tests covering all scenarios
|
||||||
|
- All 1199 library tests pass
|
||||||
|
|
||||||
|
## Migration
|
||||||
|
|
||||||
|
**Existing deployments:** No action required
|
||||||
|
- Adaptive thresholds default to enabled
|
||||||
|
- Enterprise behavior unchanged (501+ projects)
|
||||||
|
- Legacy mode available if needed
|
||||||
|
|
||||||
|
**New deployments:** Immediate value
|
||||||
|
- Small teams see patterns after 2-3 scans
|
||||||
|
- Quality maintained via floors and adoption rates
|
||||||
|
- Natural growth path as team scales
|
||||||
|
|
||||||
|
## Philosophy
|
||||||
|
|
||||||
|
**Start simple, scale naturally:**
|
||||||
|
- Small teams see value immediately (2-3 projects → patterns visible)
|
||||||
|
- Quality maintained via floors (no single-project noise)
|
||||||
|
- Adoption rate still matters (community consensus)
|
||||||
|
- Enterprise behavior unchanged (backward compatible)
|
||||||
|
- Configuration optional (defaults work for 95%)
|
||||||
|
|
||||||
|
**This unlocks the flywheel:**
|
||||||
|
- Small teams adopt → see patterns → gain trust
|
||||||
|
- Teams grow → thresholds tighten → quality improves
|
||||||
|
- Cross-team patterns emerge → community corpus strengthens
|
||||||
|
- No manual threshold tuning required
|
||||||
88
applications/aphoria/examples/scale_adaptive_demo.rs
Normal file
88
applications/aphoria/examples/scale_adaptive_demo.rs
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
//! Demonstrates scale-adaptive promotion thresholds.
|
||||||
|
//!
|
||||||
|
//! Run with: `cargo run --example scale_adaptive_demo`
|
||||||
|
|
||||||
|
use aphoria::corpus::thresholds::{ScaleAdaptiveThresholds, ScaleTier};
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
println!("=== Scale-Adaptive Promotion Thresholds Demo ===\n");
|
||||||
|
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Scenario 1: Micro Team (3 projects)
|
||||||
|
println!("📊 Scenario 1: Micro Team (3 projects)");
|
||||||
|
println!("Pattern appears in 2 out of 3 projects (67% adoption)\n");
|
||||||
|
|
||||||
|
let tier = ScaleTier::from_total_projects(3);
|
||||||
|
println!(" Scale Tier: {:?}", tier);
|
||||||
|
|
||||||
|
let decision = thresholds.evaluate(2, 3, false, None);
|
||||||
|
println!(" Decision: {:?}", decision);
|
||||||
|
println!(" ✅ Pattern VISIBLE to team (RequireReview)\n");
|
||||||
|
|
||||||
|
// Scenario 2: Small Team with RFC match
|
||||||
|
println!("📊 Scenario 2: Small Team (10 projects)");
|
||||||
|
println!("Pattern appears in 9 projects with RFC match (90% adoption)\n");
|
||||||
|
|
||||||
|
let tier = ScaleTier::from_total_projects(10);
|
||||||
|
println!(" Scale Tier: {:?}", tier);
|
||||||
|
|
||||||
|
let decision = thresholds.evaluate(9, 10, true, Some("rfc://5246"));
|
||||||
|
println!(" Decision: {:?}", decision);
|
||||||
|
println!(" ✅ Auto-promoted to Regulatory tier\n");
|
||||||
|
|
||||||
|
// Scenario 3: Enterprise (1000 projects)
|
||||||
|
println!("📊 Scenario 3: Enterprise (1000 projects)");
|
||||||
|
println!("Pattern appears in 950 projects with RFC match (95% adoption)\n");
|
||||||
|
|
||||||
|
let tier = ScaleTier::from_total_projects(1000);
|
||||||
|
println!(" Scale Tier: {:?}", tier);
|
||||||
|
|
||||||
|
let decision = thresholds.evaluate(950, 1000, true, Some("rfc://9110"));
|
||||||
|
println!(" Decision: {:?}", decision);
|
||||||
|
println!(" ✅ Auto-promoted to Regulatory tier (backward compatible)\n");
|
||||||
|
|
||||||
|
// Scenario 4: Noise prevention
|
||||||
|
println!("📊 Scenario 4: Noise Prevention (3 projects)");
|
||||||
|
println!("Pattern appears in only 1 project (33% adoption)\n");
|
||||||
|
|
||||||
|
let tier = ScaleTier::from_total_projects(3);
|
||||||
|
println!(" Scale Tier: {:?}", tier);
|
||||||
|
|
||||||
|
let decision = thresholds.evaluate(1, 3, false, None);
|
||||||
|
println!(" Decision: {:?}", decision);
|
||||||
|
println!(" ✅ Skipped (floor prevents single-project noise)\n");
|
||||||
|
|
||||||
|
// Show threshold matrix
|
||||||
|
println!("=== Threshold Matrix ===\n");
|
||||||
|
println!("| Tier | Projects | Emerging Floor | Regulatory Floor |");
|
||||||
|
println!("|------------|----------|----------------|------------------|");
|
||||||
|
|
||||||
|
for (name, total) in [
|
||||||
|
("Micro", 3),
|
||||||
|
("Small", 10),
|
||||||
|
("Medium", 50),
|
||||||
|
("Large", 200),
|
||||||
|
("Enterprise", 1000),
|
||||||
|
] {
|
||||||
|
let tier = ScaleTier::from_total_projects(total);
|
||||||
|
let tier_thresholds = thresholds.for_tier(tier);
|
||||||
|
|
||||||
|
let emerging_min = tier_thresholds.emerging.effective_min_projects(total);
|
||||||
|
|
||||||
|
let regulatory_min = if let Some(reg) = &tier_thresholds.regulatory {
|
||||||
|
format!("{}", reg.effective_min_projects(total))
|
||||||
|
} else {
|
||||||
|
"N/A".to_string()
|
||||||
|
};
|
||||||
|
|
||||||
|
println!(
|
||||||
|
"| {:10} | {:8} | {:14} | {:16} |",
|
||||||
|
name, total, emerging_min, regulatory_min
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\n✅ Small teams see value immediately!");
|
||||||
|
println!("✅ Quality maintained via floors and adoption rates!");
|
||||||
|
println!("✅ Enterprise behavior unchanged!");
|
||||||
|
}
|
||||||
@ -380,6 +380,37 @@ pub enum CorpusCommands {
|
|||||||
#[arg(long)]
|
#[arg(long)]
|
||||||
offline: bool,
|
offline: bool,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
/// Create a new corpus item from structured data
|
||||||
|
Create {
|
||||||
|
/// Subject path (e.g., "ml/dependencies/basicsr/torchvision")
|
||||||
|
#[arg(long)]
|
||||||
|
subject: String,
|
||||||
|
|
||||||
|
/// Predicate (e.g., "incompatible_with", "requires", "recommends")
|
||||||
|
#[arg(long)]
|
||||||
|
predicate: String,
|
||||||
|
|
||||||
|
/// Value (string, number, or boolean)
|
||||||
|
#[arg(long)]
|
||||||
|
value: String,
|
||||||
|
|
||||||
|
/// Full explanation/context for this claim
|
||||||
|
#[arg(long)]
|
||||||
|
explanation: String,
|
||||||
|
|
||||||
|
/// Authority source (GitHub URL, paper citation, docs URL)
|
||||||
|
#[arg(long)]
|
||||||
|
authority: String,
|
||||||
|
|
||||||
|
/// Category (compatibility, performance, security, architecture)
|
||||||
|
#[arg(long)]
|
||||||
|
category: String,
|
||||||
|
|
||||||
|
/// Authority tier (0=regulatory, 1=clinical, 2=observational, 3=community)
|
||||||
|
#[arg(long)]
|
||||||
|
tier: u8,
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Subcommand)]
|
#[derive(Subcommand)]
|
||||||
|
|||||||
@ -11,7 +11,11 @@ use super::types::{
|
|||||||
|
|
||||||
impl Default for EpistemeConfig {
|
impl Default for EpistemeConfig {
|
||||||
fn default() -> Self {
|
fn default() -> Self {
|
||||||
Self { data_dir: dirs_default_data_dir(), url: None }
|
Self {
|
||||||
|
data_dir: dirs_default_data_dir(),
|
||||||
|
corpus_data_dir: Some(dirs_default_corpus_dir()),
|
||||||
|
url: None,
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -147,6 +151,8 @@ impl Default for CorpusConfig {
|
|||||||
use_community: true, // Enabled by default - async runtime issue resolved
|
use_community: true, // Enabled by default - async runtime issue resolved
|
||||||
aggregation_enabled: true, // Enable observation aggregation
|
aggregation_enabled: true, // Enable observation aggregation
|
||||||
rfc_list: None,
|
rfc_list: None,
|
||||||
|
adaptive_thresholds: None, // Use built-in defaults
|
||||||
|
use_legacy_thresholds: false, // Use adaptive by default
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -239,11 +245,30 @@ impl Default for AutonomousConfig {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Get the default Aphoria data directory.
|
/// Get the default Aphoria data directory.
|
||||||
|
///
|
||||||
|
/// **Changed in Phase 2:** Now defaults to project-local `.aphoria/db/` instead of
|
||||||
|
/// home-based `~/.aphoria/db/`. This enables proper per-project database isolation.
|
||||||
|
///
|
||||||
|
/// To override for shared mode (all projects on machine), set:
|
||||||
|
/// ```toml
|
||||||
|
/// [episteme]
|
||||||
|
/// data_dir = "~/.aphoria/db" # Or any absolute path
|
||||||
|
/// ```
|
||||||
fn dirs_default_data_dir() -> PathBuf {
|
fn dirs_default_data_dir() -> PathBuf {
|
||||||
if let Some(home) = dirs::home_dir() {
|
|
||||||
home.join(".aphoria").join("db")
|
|
||||||
} else {
|
|
||||||
PathBuf::from(".aphoria/db")
|
PathBuf::from(".aphoria/db")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the default corpus database directory (shared across projects).
|
||||||
|
///
|
||||||
|
/// **New in Phase 3:** Corpus database stores aggregated pattern data from multiple
|
||||||
|
/// projects for community corpus building. This is separate from per-project observations.
|
||||||
|
///
|
||||||
|
/// **Default:** `~/.aphoria/corpus-db` (home-based, shared across all projects)
|
||||||
|
fn dirs_default_corpus_dir() -> PathBuf {
|
||||||
|
if let Some(home) = dirs::home_dir() {
|
||||||
|
home.join(".aphoria").join("corpus-db")
|
||||||
|
} else {
|
||||||
|
PathBuf::from(".aphoria/corpus-db")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -112,9 +112,21 @@ pub struct ProjectConfig {
|
|||||||
#[derive(Debug, Clone, Deserialize)]
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
#[serde(default)]
|
#[serde(default)]
|
||||||
pub struct EpistemeConfig {
|
pub struct EpistemeConfig {
|
||||||
/// Path to local Episteme data directory.
|
/// Path to local Episteme data directory (per-project observations).
|
||||||
|
///
|
||||||
|
/// **Default:** `.aphoria/db` (project-local)
|
||||||
|
///
|
||||||
|
/// For shared mode (all projects), override to `~/.aphoria/db`.
|
||||||
pub data_dir: PathBuf,
|
pub data_dir: PathBuf,
|
||||||
|
|
||||||
|
/// Path to corpus database (shared across projects).
|
||||||
|
///
|
||||||
|
/// **Default:** `~/.aphoria/corpus-db` (home-based, shared)
|
||||||
|
///
|
||||||
|
/// This stores aggregated pattern data from multiple projects for
|
||||||
|
/// community corpus building. Set to `None` to disable corpus aggregation.
|
||||||
|
pub corpus_data_dir: Option<PathBuf>,
|
||||||
|
|
||||||
/// Remote Episteme URL (future feature).
|
/// Remote Episteme URL (future feature).
|
||||||
pub url: Option<String>,
|
pub url: Option<String>,
|
||||||
}
|
}
|
||||||
|
|||||||
@ -4,6 +4,8 @@ use std::path::PathBuf;
|
|||||||
|
|
||||||
use serde::Deserialize;
|
use serde::Deserialize;
|
||||||
|
|
||||||
|
use crate::corpus::thresholds::ScaleAdaptiveThresholds;
|
||||||
|
|
||||||
/// Scan configuration.
|
/// Scan configuration.
|
||||||
#[derive(Debug, Clone, Deserialize)]
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
#[serde(default)]
|
#[serde(default)]
|
||||||
@ -68,4 +70,18 @@ pub struct CorpusConfig {
|
|||||||
|
|
||||||
/// Override the default RFC list (if None, uses default list).
|
/// Override the default RFC list (if None, uses default list).
|
||||||
pub rfc_list: Option<Vec<u32>>,
|
pub rfc_list: Option<Vec<u32>>,
|
||||||
|
|
||||||
|
/// Scale-adaptive threshold configuration (if None, uses built-in defaults).
|
||||||
|
///
|
||||||
|
/// Allows overriding promotion thresholds per scale tier (micro/small/medium/large/enterprise).
|
||||||
|
/// When not set, uses ScaleAdaptiveThresholds::default() which provides sensible defaults
|
||||||
|
/// for teams of all sizes.
|
||||||
|
pub adaptive_thresholds: Option<ScaleAdaptiveThresholds>,
|
||||||
|
|
||||||
|
/// Use legacy static thresholds instead of adaptive thresholds.
|
||||||
|
///
|
||||||
|
/// When true, ignores scale tier and uses fixed thresholds (min_projects = 850/100/50).
|
||||||
|
/// Useful for backward compatibility or when explicit control is needed.
|
||||||
|
/// Default: false (use adaptive thresholds).
|
||||||
|
pub use_legacy_thresholds: bool,
|
||||||
}
|
}
|
||||||
|
|||||||
227
applications/aphoria/src/corpus/authority_parser.rs
Normal file
227
applications/aphoria/src/corpus/authority_parser.rs
Normal file
@ -0,0 +1,227 @@
|
|||||||
|
//! Authority source parsing for wiki patterns
|
||||||
|
//!
|
||||||
|
//! Parses authority strings from wiki markdown into structured Authority enums,
|
||||||
|
//! enabling proper subject scheme generation (rfc://, owasp://, cwe://).
|
||||||
|
|
||||||
|
use regex::Regex;
|
||||||
|
use std::sync::OnceLock;
|
||||||
|
|
||||||
|
/// Structured authority source
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub enum Authority {
|
||||||
|
/// RFC with optional section
|
||||||
|
RFC {
|
||||||
|
/// RFC number
|
||||||
|
num: u32,
|
||||||
|
/// Optional section reference
|
||||||
|
section: Option<String>,
|
||||||
|
},
|
||||||
|
/// OWASP with ID and optional year
|
||||||
|
OWASP {
|
||||||
|
/// OWASP identifier (e.g., "a03")
|
||||||
|
id: String,
|
||||||
|
/// Optional year (e.g., 2021)
|
||||||
|
year: Option<u32>,
|
||||||
|
},
|
||||||
|
/// CWE (Common Weakness Enumeration)
|
||||||
|
CWE {
|
||||||
|
/// CWE identifier
|
||||||
|
id: u32,
|
||||||
|
},
|
||||||
|
/// Unknown/unrecognized authority source
|
||||||
|
Unknown(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Lazy-initialized regex patterns
|
||||||
|
static RFC_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||||
|
static OWASP_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||||
|
static CWE_PATTERN: OnceLock<Regex> = OnceLock::new();
|
||||||
|
|
||||||
|
fn rfc_pattern() -> &'static Regex {
|
||||||
|
RFC_PATTERN.get_or_init(|| {
|
||||||
|
// These regex patterns are simple and static - they will always compile
|
||||||
|
Regex::new(r"(?i)rfc\s*(\d+)(?:\s+section\s+([0-9.]+))?")
|
||||||
|
.unwrap_or_else(|_| unreachable!("RFC regex pattern is known to be valid"))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn owasp_pattern() -> &'static Regex {
|
||||||
|
OWASP_PATTERN.get_or_init(|| {
|
||||||
|
// These regex patterns are simple and static - they will always compile
|
||||||
|
Regex::new(r"(?i)owasp\s+([a-z]\d+)(?::(\d{4}))?")
|
||||||
|
.unwrap_or_else(|_| unreachable!("OWASP regex pattern is known to be valid"))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn cwe_pattern() -> &'static Regex {
|
||||||
|
CWE_PATTERN.get_or_init(|| {
|
||||||
|
// These regex patterns are simple and static - they will always compile
|
||||||
|
Regex::new(r"(?i)cwe[-\s]*(\d+)")
|
||||||
|
.unwrap_or_else(|_| unreachable!("CWE regex pattern is known to be valid"))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse authority string into structured Authority enum
|
||||||
|
///
|
||||||
|
/// # Examples
|
||||||
|
///
|
||||||
|
/// ```
|
||||||
|
/// use aphoria::corpus::authority_parser::{parse_authority, Authority};
|
||||||
|
///
|
||||||
|
/// let auth = parse_authority("RFC 5246 Section 7.4.2");
|
||||||
|
/// assert_eq!(auth, Authority::RFC { num: 5246, section: Some("7.4.2".to_string()) });
|
||||||
|
///
|
||||||
|
/// let auth = parse_authority("OWASP A03:2021");
|
||||||
|
/// assert_eq!(auth, Authority::OWASP { id: "a03".to_string(), year: Some(2021) });
|
||||||
|
///
|
||||||
|
/// let auth = parse_authority("CWE-79");
|
||||||
|
/// assert_eq!(auth, Authority::CWE { id: 79 });
|
||||||
|
/// ```
|
||||||
|
pub fn parse_authority(authority_str: &str) -> Authority {
|
||||||
|
let trimmed = authority_str.trim();
|
||||||
|
|
||||||
|
// Try RFC pattern
|
||||||
|
if let Some(caps) = rfc_pattern().captures(trimmed) {
|
||||||
|
// Regex guarantees caps[1] is all digits, so parse will always succeed
|
||||||
|
let num = caps[1].parse().unwrap_or_else(|_| unreachable!("regex matched \\d+"));
|
||||||
|
let section = caps.get(2).map(|m| m.as_str().to_string());
|
||||||
|
return Authority::RFC { num, section };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try OWASP pattern
|
||||||
|
if let Some(caps) = owasp_pattern().captures(trimmed) {
|
||||||
|
let id = caps[1].to_lowercase();
|
||||||
|
let year = caps.get(2).and_then(|m| m.as_str().parse().ok());
|
||||||
|
return Authority::OWASP { id, year };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try CWE pattern
|
||||||
|
if let Some(caps) = cwe_pattern().captures(trimmed) {
|
||||||
|
// Regex guarantees caps[1] is all digits, so parse will always succeed
|
||||||
|
let id = caps[1].parse().unwrap_or_else(|_| unreachable!("regex matched \\d+"));
|
||||||
|
return Authority::CWE { id };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback to unknown
|
||||||
|
Authority::Unknown(trimmed.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_rfc_basic() {
|
||||||
|
let auth = parse_authority("RFC 5246");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::RFC {
|
||||||
|
num: 5246,
|
||||||
|
section: None
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_rfc_with_section() {
|
||||||
|
let auth = parse_authority("RFC 5246 Section 7.4.2");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::RFC {
|
||||||
|
num: 5246,
|
||||||
|
section: Some("7.4.2".to_string())
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_rfc_lowercase() {
|
||||||
|
let auth = parse_authority("rfc 7519");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::RFC {
|
||||||
|
num: 7519,
|
||||||
|
section: None
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_rfc_no_space() {
|
||||||
|
let auth = parse_authority("RFC7519");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::RFC {
|
||||||
|
num: 7519,
|
||||||
|
section: None
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_owasp_with_year() {
|
||||||
|
let auth = parse_authority("OWASP A03:2021");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::OWASP {
|
||||||
|
id: "a03".to_string(),
|
||||||
|
year: Some(2021)
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_owasp_without_year() {
|
||||||
|
let auth = parse_authority("OWASP A01");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::OWASP {
|
||||||
|
id: "a01".to_string(),
|
||||||
|
year: None
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_owasp_lowercase() {
|
||||||
|
let auth = parse_authority("owasp a03:2021");
|
||||||
|
assert_eq!(
|
||||||
|
auth,
|
||||||
|
Authority::OWASP {
|
||||||
|
id: "a03".to_string(),
|
||||||
|
year: Some(2021)
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_cwe_hyphen() {
|
||||||
|
let auth = parse_authority("CWE-79");
|
||||||
|
assert_eq!(auth, Authority::CWE { id: 79 });
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_cwe_space() {
|
||||||
|
let auth = parse_authority("CWE 89");
|
||||||
|
assert_eq!(auth, Authority::CWE { id: 89 });
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_cwe_lowercase() {
|
||||||
|
let auth = parse_authority("cwe-79");
|
||||||
|
assert_eq!(auth, Authority::CWE { id: 79 });
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_unknown() {
|
||||||
|
let auth = parse_authority("Some Random Source");
|
||||||
|
assert_eq!(auth, Authority::Unknown("Some Random Source".to_string()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_owasp_cheat_sheet() {
|
||||||
|
let auth = parse_authority("OWASP Password Storage Cheat Sheet");
|
||||||
|
// Doesn't match pattern, falls back to Unknown
|
||||||
|
matches!(auth, Authority::Unknown(_));
|
||||||
|
}
|
||||||
|
}
|
||||||
130
applications/aphoria/src/corpus/cli_created.rs
Normal file
130
applications/aphoria/src/corpus/cli_created.rs
Normal file
@ -0,0 +1,130 @@
|
|||||||
|
//! Corpus builder for items created via `aphoria corpus create` CLI.
|
||||||
|
//!
|
||||||
|
//! These are user-authored corpus items stored in the shared corpus database
|
||||||
|
//! with metadata flag "source": "cli_create". This builder makes CLI-created
|
||||||
|
//! items visible in `aphoria corpus build` and `aphoria corpus list`.
|
||||||
|
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use ed25519_dalek::SigningKey;
|
||||||
|
use stemedb_core::types::Assertion;
|
||||||
|
use stemedb_storage::{HybridStore, KVStore};
|
||||||
|
use tracing::{info, instrument};
|
||||||
|
|
||||||
|
use crate::config::CorpusConfig;
|
||||||
|
use crate::AphoriaError;
|
||||||
|
|
||||||
|
/// Corpus builder for CLI-created items.
|
||||||
|
///
|
||||||
|
/// Items created with `aphoria corpus create` are stored in the corpus database
|
||||||
|
/// with metadata `"source": "cli_create"`. This builder:
|
||||||
|
/// 1. Queries the corpus store (passed in from registry)
|
||||||
|
/// 2. Scans all items with "subject:" prefix
|
||||||
|
/// 3. Filters for items with `source == "cli_create"` in metadata
|
||||||
|
/// 4. Returns them as corpus assertions
|
||||||
|
///
|
||||||
|
/// This makes CLI-created items visible in:
|
||||||
|
/// - `aphoria corpus build` (they get included in the build)
|
||||||
|
/// - Dashboard corpus queries (they appear in the corpus list)
|
||||||
|
pub struct CliCreatedBuilder {
|
||||||
|
/// Reference to the corpus store for querying CLI-created items.
|
||||||
|
corpus_store: Arc<HybridStore>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl CliCreatedBuilder {
|
||||||
|
/// Create a new CLI-created corpus builder.
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
///
|
||||||
|
/// * `corpus_store` - The corpus database store (from LocalEpisteme::open_corpus_db)
|
||||||
|
pub fn new(corpus_store: Arc<HybridStore>) -> Self {
|
||||||
|
Self { corpus_store }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
impl super::AsyncCorpusBuilder for CliCreatedBuilder {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
"CLI-Created Items"
|
||||||
|
}
|
||||||
|
|
||||||
|
fn scheme(&self) -> &str {
|
||||||
|
"cli"
|
||||||
|
}
|
||||||
|
|
||||||
|
fn default_tier(&self) -> u8 {
|
||||||
|
3 // Community tier by default (individual items may override)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip(self, _signing_key, _config), fields(builder = "CLI-Created"))]
|
||||||
|
async fn build(
|
||||||
|
&self,
|
||||||
|
_signing_key: &SigningKey,
|
||||||
|
_timestamp: u64,
|
||||||
|
_config: &CorpusConfig,
|
||||||
|
) -> Result<Vec<Assertion>, AphoriaError> {
|
||||||
|
info!("Building corpus from CLI-created items");
|
||||||
|
|
||||||
|
// Scan all items with "subject:" prefix
|
||||||
|
let all_items = self
|
||||||
|
.corpus_store
|
||||||
|
.scan_prefix(b"subject:")
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to scan corpus database: {e}")))?;
|
||||||
|
|
||||||
|
info!(total_items = all_items.len(), "Scanned corpus database for CLI-created items");
|
||||||
|
|
||||||
|
// Filter for CLI-created items by checking metadata
|
||||||
|
let mut assertions = Vec::new();
|
||||||
|
for (_key, value) in all_items {
|
||||||
|
let assertion: Assertion = stemedb_core::serde::deserialize(&value)
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to deserialize assertion: {e}")))?;
|
||||||
|
|
||||||
|
// Check metadata for "source": "cli_create"
|
||||||
|
if let Some(ref meta_bytes) = assertion.source_metadata {
|
||||||
|
if let Ok(meta_json) = serde_json::from_slice::<serde_json::Value>(meta_bytes) {
|
||||||
|
if meta_json.get("source").and_then(|v| v.as_str()) == Some("cli_create") {
|
||||||
|
assertions.push(assertion);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
info!(
|
||||||
|
cli_created_count = assertions.len(),
|
||||||
|
"Found {} CLI-created corpus items",
|
||||||
|
assertions.len()
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(assertions)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn requires_network(&self) -> bool {
|
||||||
|
false // CLI items are local only
|
||||||
|
}
|
||||||
|
|
||||||
|
fn source_ids(&self) -> Vec<String> {
|
||||||
|
vec![] // No specific source IDs for CLI-created items
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::corpus::AsyncCorpusBuilder;
|
||||||
|
use stemedb_storage::HybridStore;
|
||||||
|
use tempfile::TempDir;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_builder_metadata() {
|
||||||
|
let temp_dir = TempDir::new().unwrap();
|
||||||
|
let store = Arc::new(HybridStore::open(temp_dir.path()).unwrap());
|
||||||
|
let builder = CliCreatedBuilder::new(store);
|
||||||
|
|
||||||
|
assert_eq!(builder.name(), "CLI-Created Items");
|
||||||
|
assert_eq!(builder.scheme(), "cli");
|
||||||
|
assert_eq!(builder.default_tier(), 3);
|
||||||
|
assert!(!builder.requires_network());
|
||||||
|
assert!(builder.source_ids().is_empty());
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -13,7 +13,9 @@ use ed25519_dalek::SigningKey;
|
|||||||
use stemedb_core::types::{Assertion, ObjectValue, SourceClass};
|
use stemedb_core::types::{Assertion, ObjectValue, SourceClass};
|
||||||
use tracing::{info, instrument};
|
use tracing::{info, instrument};
|
||||||
|
|
||||||
use super::thresholds::{CorpusPromotionThresholds, PromotionDecision};
|
use super::thresholds::{
|
||||||
|
CorpusPromotionThresholds, PromotionDecision, ScaleAdaptiveThresholds, ScaleTier,
|
||||||
|
};
|
||||||
use crate::community::PatternAggregate;
|
use crate::community::PatternAggregate;
|
||||||
use crate::config::CorpusConfig;
|
use crate::config::CorpusConfig;
|
||||||
use crate::episteme::create_authoritative_assertion;
|
use crate::episteme::create_authoritative_assertion;
|
||||||
@ -72,9 +74,15 @@ pub struct CommunityCorpusBuilder {
|
|||||||
/// Pattern aggregate store for querying community data.
|
/// Pattern aggregate store for querying community data.
|
||||||
pattern_store: Box<dyn PatternAggregateStore>,
|
pattern_store: Box<dyn PatternAggregateStore>,
|
||||||
|
|
||||||
/// Promotion thresholds for multi-tier decision making.
|
/// Legacy promotion thresholds (used when use_adaptive=false).
|
||||||
thresholds: CorpusPromotionThresholds,
|
thresholds: CorpusPromotionThresholds,
|
||||||
|
|
||||||
|
/// Scale-adaptive thresholds (used when use_adaptive=true).
|
||||||
|
adaptive_thresholds: ScaleAdaptiveThresholds,
|
||||||
|
|
||||||
|
/// Whether to use adaptive thresholds (default: true).
|
||||||
|
use_adaptive: bool,
|
||||||
|
|
||||||
/// Path to manually promoted patterns file.
|
/// Path to manually promoted patterns file.
|
||||||
///
|
///
|
||||||
/// Format: `.aphoria/corpus/community.toml`
|
/// Format: `.aphoria/corpus/community.toml`
|
||||||
@ -92,7 +100,13 @@ impl CommunityCorpusBuilder {
|
|||||||
pattern_store: Box<dyn PatternAggregateStore>,
|
pattern_store: Box<dyn PatternAggregateStore>,
|
||||||
thresholds: CorpusPromotionThresholds,
|
thresholds: CorpusPromotionThresholds,
|
||||||
) -> Self {
|
) -> Self {
|
||||||
Self { pattern_store, thresholds, manual_promotions_path: None }
|
Self {
|
||||||
|
pattern_store,
|
||||||
|
thresholds,
|
||||||
|
adaptive_thresholds: ScaleAdaptiveThresholds::default(),
|
||||||
|
use_adaptive: false, // Legacy constructor defaults to legacy behavior
|
||||||
|
manual_promotions_path: None,
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Create a builder with stub storage (for testing/shadow mode).
|
/// Create a builder with stub storage (for testing/shadow mode).
|
||||||
@ -100,9 +114,9 @@ impl CommunityCorpusBuilder {
|
|||||||
Self::new(Box::new(StubPatternStore), thresholds)
|
Self::new(Box::new(StubPatternStore), thresholds)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Create a builder from StemeDB stores.
|
/// Create a builder from StemeDB stores with configuration.
|
||||||
///
|
///
|
||||||
/// This is the production constructor that uses real storage.
|
/// This is the production constructor that uses real storage and respects config.
|
||||||
pub fn from_stores(
|
pub fn from_stores(
|
||||||
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
||||||
predicate_index: std::sync::Arc<
|
predicate_index: std::sync::Arc<
|
||||||
@ -110,11 +124,20 @@ impl CommunityCorpusBuilder {
|
|||||||
std::sync::Arc<stemedb_storage::HybridStore>,
|
std::sync::Arc<stemedb_storage::HybridStore>,
|
||||||
>,
|
>,
|
||||||
>,
|
>,
|
||||||
thresholds: CorpusPromotionThresholds,
|
config: &CorpusConfig,
|
||||||
) -> Self {
|
) -> Self {
|
||||||
use crate::community::StemeDBPatternStore;
|
use crate::community::StemeDBPatternStore;
|
||||||
let pattern_store = Box::new(StemeDBPatternStore::new(kv_store, predicate_index));
|
let pattern_store = Box::new(StemeDBPatternStore::new(kv_store, predicate_index));
|
||||||
Self::new(pattern_store, thresholds)
|
|
||||||
|
let adaptive_thresholds = config.adaptive_thresholds.clone().unwrap_or_default();
|
||||||
|
|
||||||
|
Self {
|
||||||
|
pattern_store,
|
||||||
|
thresholds: CorpusPromotionThresholds::default(), // Keep for legacy path
|
||||||
|
adaptive_thresholds,
|
||||||
|
use_adaptive: !config.use_legacy_thresholds,
|
||||||
|
manual_promotions_path: None,
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Set path to manual promotions file.
|
/// Set path to manual promotions file.
|
||||||
@ -152,11 +175,18 @@ impl CommunityCorpusBuilder {
|
|||||||
fn should_promote(
|
fn should_promote(
|
||||||
&self,
|
&self,
|
||||||
pattern: &PatternAggregate,
|
pattern: &PatternAggregate,
|
||||||
_adoption_rate: f64,
|
total_projects: u64,
|
||||||
authority_match: (bool, Option<String>),
|
authority_match: (bool, Option<String>),
|
||||||
) -> PromotionDecision {
|
) -> PromotionDecision {
|
||||||
let total_projects = pattern.project_count; // Approximation for shadow mode
|
if self.use_adaptive {
|
||||||
|
self.adaptive_thresholds.evaluate(
|
||||||
|
pattern.project_count,
|
||||||
|
total_projects,
|
||||||
|
authority_match.0,
|
||||||
|
authority_match.1.as_deref(),
|
||||||
|
)
|
||||||
|
} else {
|
||||||
|
// Legacy path for backward compatibility
|
||||||
self.thresholds.evaluate(
|
self.thresholds.evaluate(
|
||||||
pattern.project_count,
|
pattern.project_count,
|
||||||
total_projects,
|
total_projects,
|
||||||
@ -164,6 +194,7 @@ impl CommunityCorpusBuilder {
|
|||||||
authority_match.1.as_deref(),
|
authority_match.1.as_deref(),
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Create assertion from promoted pattern.
|
/// Create assertion from promoted pattern.
|
||||||
fn create_assertion(
|
fn create_assertion(
|
||||||
@ -236,6 +267,8 @@ impl CommunityCorpusBuilder {
|
|||||||
) -> Result<Vec<PromotionCandidate>, AphoriaError> {
|
) -> Result<Vec<PromotionCandidate>, AphoriaError> {
|
||||||
info!("Shadow mode: Evaluating patterns for promotion");
|
info!("Shadow mode: Evaluating patterns for promotion");
|
||||||
|
|
||||||
|
let total_projects = self.pattern_store.get_total_projects().await?;
|
||||||
|
|
||||||
let patterns = self
|
let patterns = self
|
||||||
.pattern_store
|
.pattern_store
|
||||||
.get_popular_patterns(self.thresholds.emerging.min_projects, 1000)
|
.get_popular_patterns(self.thresholds.emerging.min_projects, 1000)
|
||||||
@ -251,7 +284,7 @@ impl CommunityCorpusBuilder {
|
|||||||
for pattern in patterns {
|
for pattern in patterns {
|
||||||
let adoption_rate = self.calculate_adoption_rate(&pattern).await?;
|
let adoption_rate = self.calculate_adoption_rate(&pattern).await?;
|
||||||
let authority_match = self.check_authority_match(&pattern);
|
let authority_match = self.check_authority_match(&pattern);
|
||||||
let decision = self.should_promote(&pattern, adoption_rate, authority_match.clone());
|
let decision = self.should_promote(&pattern, total_projects, authority_match.clone());
|
||||||
|
|
||||||
match decision {
|
match decision {
|
||||||
PromotionDecision::AutoPromote(source_class) => {
|
PromotionDecision::AutoPromote(source_class) => {
|
||||||
@ -331,20 +364,32 @@ impl super::AsyncCorpusBuilder for CommunityCorpusBuilder {
|
|||||||
timestamp: u64,
|
timestamp: u64,
|
||||||
_config: &CorpusConfig,
|
_config: &CorpusConfig,
|
||||||
) -> Result<Vec<Assertion>, AphoriaError> {
|
) -> Result<Vec<Assertion>, AphoriaError> {
|
||||||
info!("Building community corpus from pattern aggregates");
|
let total_projects = self.pattern_store.get_total_projects().await?;
|
||||||
|
let scale_tier = ScaleTier::from_total_projects(total_projects);
|
||||||
|
|
||||||
|
info!(
|
||||||
|
total_projects,
|
||||||
|
?scale_tier,
|
||||||
|
use_adaptive = self.use_adaptive,
|
||||||
|
"Building community corpus with scale-adaptive thresholds"
|
||||||
|
);
|
||||||
|
|
||||||
|
// Determine minimum project threshold for initial query
|
||||||
|
let min_projects_for_query = if self.use_adaptive {
|
||||||
|
// Use micro tier's emerging floor as minimum (most permissive)
|
||||||
|
2
|
||||||
|
} else {
|
||||||
|
self.thresholds.emerging.min_projects
|
||||||
|
};
|
||||||
|
|
||||||
// Fetch popular patterns (now properly async without block_on!)
|
// Fetch popular patterns (now properly async without block_on!)
|
||||||
let patterns = self
|
let patterns = self.pattern_store.get_popular_patterns(min_projects_for_query, 1000).await?;
|
||||||
.pattern_store
|
|
||||||
.get_popular_patterns(self.thresholds.emerging.min_projects, 1000)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
if patterns.is_empty() {
|
if patterns.is_empty() {
|
||||||
info!("No patterns found for community corpus (empty store or below threshold)");
|
info!("No patterns found for community corpus (empty store or below threshold)");
|
||||||
return Ok(vec![]);
|
return Ok(vec![]);
|
||||||
}
|
}
|
||||||
|
|
||||||
let total_projects = self.pattern_store.get_total_projects().await?;
|
|
||||||
info!(
|
info!(
|
||||||
pattern_count = patterns.len(),
|
pattern_count = patterns.len(),
|
||||||
total_projects, "Evaluating patterns for promotion"
|
total_projects, "Evaluating patterns for promotion"
|
||||||
@ -360,7 +405,7 @@ impl super::AsyncCorpusBuilder for CommunityCorpusBuilder {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let authority_match = self.check_authority_match(&pattern);
|
let authority_match = self.check_authority_match(&pattern);
|
||||||
let decision = self.should_promote(&pattern, adoption_rate, authority_match.clone());
|
let decision = self.should_promote(&pattern, total_projects, authority_match.clone());
|
||||||
|
|
||||||
match decision {
|
match decision {
|
||||||
super::thresholds::PromotionDecision::AutoPromote(source_class) => {
|
super::thresholds::PromotionDecision::AutoPromote(source_class) => {
|
||||||
|
|||||||
@ -33,22 +33,33 @@
|
|||||||
//! └─────────────────────────────────────────────────────────────────┘
|
//! └─────────────────────────────────────────────────────────────────┘
|
||||||
//! ```
|
//! ```
|
||||||
|
|
||||||
|
mod authority_parser;
|
||||||
|
mod cli_created;
|
||||||
mod community;
|
mod community;
|
||||||
mod enricher;
|
mod enricher;
|
||||||
mod owasp;
|
mod owasp;
|
||||||
mod resolver;
|
mod resolver;
|
||||||
mod rfc;
|
mod rfc;
|
||||||
mod thresholds;
|
mod subject_builder;
|
||||||
|
pub mod thresholds; // Public to allow config types to use ScaleAdaptiveThresholds
|
||||||
mod vendor;
|
mod vendor;
|
||||||
|
mod wiki_corpus_builder;
|
||||||
mod wiki_importer;
|
mod wiki_importer;
|
||||||
|
|
||||||
|
pub use authority_parser::{parse_authority, Authority};
|
||||||
|
pub use cli_created::CliCreatedBuilder;
|
||||||
pub use community::{CommunityCorpusBuilder, PatternAggregateStore, StubPatternStore};
|
pub use community::{CommunityCorpusBuilder, PatternAggregateStore, StubPatternStore};
|
||||||
pub use enricher::{Enrichment, PatternEnricher};
|
pub use enricher::{Enrichment, PatternEnricher};
|
||||||
pub use owasp::OwaspCorpusBuilder;
|
pub use owasp::OwaspCorpusBuilder;
|
||||||
pub use resolver::CorpusResolver;
|
pub use resolver::CorpusResolver;
|
||||||
pub use rfc::RfcCorpusBuilder;
|
pub use rfc::RfcCorpusBuilder;
|
||||||
pub use thresholds::{CorpusPromotionThresholds, PromotionCriteria, PromotionDecision};
|
pub use subject_builder::build_corpus_subject;
|
||||||
|
pub use thresholds::{
|
||||||
|
CorpusPromotionThresholds, PromotionCriteria, PromotionDecision, ScaleAdaptiveThresholds,
|
||||||
|
ScaleTier,
|
||||||
|
};
|
||||||
pub use vendor::VendorCorpusBuilder;
|
pub use vendor::VendorCorpusBuilder;
|
||||||
|
pub use wiki_corpus_builder::promote_wiki_patterns_to_corpus;
|
||||||
pub use wiki_importer::{import_from_wiki, WikiParser, WikiPattern};
|
pub use wiki_importer::{import_from_wiki, WikiParser, WikiPattern};
|
||||||
|
|
||||||
use ed25519_dalek::SigningKey;
|
use ed25519_dalek::SigningKey;
|
||||||
@ -190,6 +201,13 @@ impl CorpusRegistry {
|
|||||||
///
|
///
|
||||||
/// Use this constructor when you have access to StemeDB stores (LocalEpisteme).
|
/// Use this constructor when you have access to StemeDB stores (LocalEpisteme).
|
||||||
/// The community corpus builder queries pattern aggregates from storage.
|
/// The community corpus builder queries pattern aggregates from storage.
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
///
|
||||||
|
/// * `config` - Corpus configuration
|
||||||
|
/// * `kv_store` - Project KV store for community patterns
|
||||||
|
/// * `predicate_index` - Predicate index for community patterns
|
||||||
|
/// * `corpus_store` - Optional corpus database store for CLI-created items
|
||||||
pub fn with_stores(
|
pub fn with_stores(
|
||||||
config: &CorpusConfig,
|
config: &CorpusConfig,
|
||||||
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
kv_store: std::sync::Arc<stemedb_storage::HybridStore>,
|
||||||
@ -198,19 +216,23 @@ impl CorpusRegistry {
|
|||||||
std::sync::Arc<stemedb_storage::HybridStore>,
|
std::sync::Arc<stemedb_storage::HybridStore>,
|
||||||
>,
|
>,
|
||||||
>,
|
>,
|
||||||
|
corpus_store: Option<std::sync::Arc<stemedb_storage::HybridStore>>,
|
||||||
) -> Self {
|
) -> Self {
|
||||||
let mut registry = Self::with_defaults(config);
|
let mut registry = Self::with_defaults(config);
|
||||||
|
|
||||||
// Add community corpus builder if enabled
|
// Add community corpus builder if enabled
|
||||||
if config.use_community {
|
if config.use_community {
|
||||||
use crate::corpus::thresholds::CorpusPromotionThresholds;
|
let community_builder = CommunityCorpusBuilder::from_stores(kv_store, predicate_index, config);
|
||||||
let thresholds = CorpusPromotionThresholds::default();
|
|
||||||
let community_builder =
|
|
||||||
CommunityCorpusBuilder::from_stores(kv_store, predicate_index, thresholds);
|
|
||||||
registry.register_async(Box::new(community_builder));
|
registry.register_async(Box::new(community_builder));
|
||||||
info!("Registered community corpus builder (async)");
|
info!("Registered community corpus builder (async)");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Add CLI-created items builder if corpus store is available
|
||||||
|
if let Some(corpus_store) = corpus_store {
|
||||||
|
registry.register_async(Box::new(CliCreatedBuilder::new(corpus_store)));
|
||||||
|
info!("Registered CLI-created items corpus builder (async)");
|
||||||
|
}
|
||||||
|
|
||||||
registry
|
registry
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
145
applications/aphoria/src/corpus/subject_builder.rs
Normal file
145
applications/aphoria/src/corpus/subject_builder.rs
Normal file
@ -0,0 +1,145 @@
|
|||||||
|
//! Subject URI builder for corpus patterns
|
||||||
|
//!
|
||||||
|
//! Converts WikiPattern + Authority into proper corpus subject URIs
|
||||||
|
//! (rfc://, owasp://, cwe://, community://wiki/).
|
||||||
|
|
||||||
|
use crate::corpus::authority_parser::Authority;
|
||||||
|
use crate::corpus::wiki_importer::WikiPattern;
|
||||||
|
|
||||||
|
/// Build corpus subject URI from WikiPattern and Authority
|
||||||
|
///
|
||||||
|
/// # Examples
|
||||||
|
///
|
||||||
|
/// ```
|
||||||
|
/// use aphoria::corpus::authority_parser::Authority;
|
||||||
|
/// use aphoria::corpus::subject_builder::build_corpus_subject;
|
||||||
|
/// use aphoria::corpus::wiki_importer::WikiPattern;
|
||||||
|
///
|
||||||
|
/// let pattern = WikiPattern {
|
||||||
|
/// subject: "tls/cert_verification".to_string(),
|
||||||
|
/// predicate: "enabled".to_string(),
|
||||||
|
/// value: "true".to_string(),
|
||||||
|
/// statement: "TLS cert verification MUST be enabled".to_string(),
|
||||||
|
/// authority: Some("RFC 5246 Section 7.4.2".to_string()),
|
||||||
|
/// };
|
||||||
|
///
|
||||||
|
/// let authority = Authority::RFC { num: 5246, section: Some("7.4.2".to_string()) };
|
||||||
|
/// let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
/// assert_eq!(subject, "rfc://5246/tls/cert_verification");
|
||||||
|
/// ```
|
||||||
|
pub fn build_corpus_subject(pattern: &WikiPattern, authority: &Authority) -> String {
|
||||||
|
let normalized = normalize_subject(&pattern.subject);
|
||||||
|
|
||||||
|
match authority {
|
||||||
|
Authority::RFC { num, .. } => {
|
||||||
|
format!("rfc://{}/{}", num, normalized)
|
||||||
|
}
|
||||||
|
Authority::OWASP { id, .. } => {
|
||||||
|
format!("owasp://{}/{}", id.to_lowercase(), normalized)
|
||||||
|
}
|
||||||
|
Authority::CWE { id } => {
|
||||||
|
format!("cwe://{}/{}", id, normalized)
|
||||||
|
}
|
||||||
|
Authority::Unknown(_) => {
|
||||||
|
format!("community://wiki/{}", normalized)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Normalize subject path for URI
|
||||||
|
///
|
||||||
|
/// Converts to lowercase, replaces spaces with underscores, trims slashes.
|
||||||
|
fn normalize_subject(subject: &str) -> String {
|
||||||
|
subject
|
||||||
|
.trim()
|
||||||
|
.trim_matches('/')
|
||||||
|
.to_lowercase()
|
||||||
|
.replace(' ', "_")
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::community::CommunityObjectValue;
|
||||||
|
|
||||||
|
fn make_pattern(subject: &str) -> WikiPattern {
|
||||||
|
WikiPattern {
|
||||||
|
subject: subject.to_string(),
|
||||||
|
predicate: "test".to_string(),
|
||||||
|
value: CommunityObjectValue::Boolean(true),
|
||||||
|
statement: "test statement".to_string(),
|
||||||
|
authority: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_rfc_subject() {
|
||||||
|
let pattern = make_pattern("tls/cert_verification");
|
||||||
|
let authority = Authority::RFC {
|
||||||
|
num: 5246,
|
||||||
|
section: Some("7.4.2".to_string()),
|
||||||
|
};
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "rfc://5246/tls/cert_verification");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_rfc_subject_with_spaces() {
|
||||||
|
let pattern = make_pattern("TLS Cert Verification");
|
||||||
|
let authority = Authority::RFC {
|
||||||
|
num: 5246,
|
||||||
|
section: None,
|
||||||
|
};
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "rfc://5246/tls_cert_verification");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_owasp_subject() {
|
||||||
|
let pattern = make_pattern("password/storage");
|
||||||
|
let authority = Authority::OWASP {
|
||||||
|
id: "A03".to_string(),
|
||||||
|
year: Some(2021),
|
||||||
|
};
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "owasp://a03/password/storage");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_cwe_subject() {
|
||||||
|
let pattern = make_pattern("xss/prevention");
|
||||||
|
let authority = Authority::CWE { id: 79 };
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "cwe://79/xss/prevention");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_unknown_authority() {
|
||||||
|
let pattern = make_pattern("custom/pattern");
|
||||||
|
let authority = Authority::Unknown("Some Source".to_string());
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "community://wiki/custom/pattern");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_normalize_leading_trailing_slashes() {
|
||||||
|
let pattern = make_pattern("/api/security/");
|
||||||
|
let authority = Authority::RFC {
|
||||||
|
num: 7519,
|
||||||
|
section: None,
|
||||||
|
};
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "rfc://7519/api/security");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_normalize_uppercase() {
|
||||||
|
let pattern = make_pattern("JWT/Validation");
|
||||||
|
let authority = Authority::RFC {
|
||||||
|
num: 7519,
|
||||||
|
section: None,
|
||||||
|
};
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
assert_eq!(subject, "rfc://7519/jwt/validation");
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -197,6 +197,334 @@ impl CorpusPromotionThresholds {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Scale tier based on total projects in organization
|
||||||
|
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||||
|
pub enum ScaleTier {
|
||||||
|
/// 1-5 projects: Very small teams
|
||||||
|
Micro,
|
||||||
|
/// 6-25 projects: Small teams
|
||||||
|
Small,
|
||||||
|
/// 26-100 projects: Medium organizations
|
||||||
|
Medium,
|
||||||
|
/// 101-500 projects: Large organizations
|
||||||
|
Large,
|
||||||
|
/// 501+ projects: Enterprise scale
|
||||||
|
Enterprise,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ScaleTier {
|
||||||
|
/// Detect scale tier from total project count
|
||||||
|
pub fn from_total_projects(total: u64) -> Self {
|
||||||
|
match total {
|
||||||
|
0..=5 => Self::Micro,
|
||||||
|
6..=25 => Self::Small,
|
||||||
|
26..=100 => Self::Medium,
|
||||||
|
101..=500 => Self::Large,
|
||||||
|
_ => Self::Enterprise,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Adaptive promotion criteria that scales with team size
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct AdaptiveCriteria {
|
||||||
|
/// Absolute minimum projects (safety floor)
|
||||||
|
pub min_projects_floor: u64,
|
||||||
|
/// Percentage of total projects required (scale factor)
|
||||||
|
pub min_projects_percentage: f64,
|
||||||
|
/// Minimum adoption rate (0.0-1.0)
|
||||||
|
pub min_adoption_rate: f64,
|
||||||
|
/// Whether authority source match is required
|
||||||
|
pub require_authority: bool,
|
||||||
|
/// List of authority source prefixes (e.g., ["rfc://", "nist://"])
|
||||||
|
pub authority_sources: Vec<String>,
|
||||||
|
/// Whether to auto-promote or require manual review
|
||||||
|
pub auto_promote: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AdaptiveCriteria {
|
||||||
|
/// Calculate effective minimum projects for current total
|
||||||
|
///
|
||||||
|
/// Returns max(floor, percentage * total) to ensure:
|
||||||
|
/// - Small teams: percentage dominates (scales with growth)
|
||||||
|
/// - Large teams: floor dominates (maintains quality)
|
||||||
|
pub fn effective_min_projects(&self, total_projects: u64) -> u64 {
|
||||||
|
let from_percentage = (self.min_projects_percentage * total_projects as f64).ceil() as u64;
|
||||||
|
self.min_projects_floor.max(from_percentage)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for AdaptiveCriteria {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
min_projects_floor: 2,
|
||||||
|
min_projects_percentage: 0.50,
|
||||||
|
min_adoption_rate: 0.50,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: false,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Thresholds for a specific scale tier
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct TierThresholds {
|
||||||
|
/// Regulatory tier (RFC, NIST, etc.) - may be disabled (None)
|
||||||
|
pub regulatory: Option<AdaptiveCriteria>,
|
||||||
|
/// Clinical tier (OWASP, CWE, etc.) - may be disabled (None)
|
||||||
|
pub clinical: Option<AdaptiveCriteria>,
|
||||||
|
/// Emerging tier (community patterns) - always enabled
|
||||||
|
pub emerging: AdaptiveCriteria,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Scale-adaptive threshold system
|
||||||
|
///
|
||||||
|
/// Automatically adjusts promotion criteria based on organization size:
|
||||||
|
/// - Micro teams (2-3 projects): See patterns immediately
|
||||||
|
/// - Small teams: Lower thresholds, all tiers enabled
|
||||||
|
/// - Medium/Large: Balanced quality gates
|
||||||
|
/// - Enterprise: Strict thresholds (backward compatible)
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ScaleAdaptiveThresholds {
|
||||||
|
/// Thresholds for micro teams (1-5 projects).
|
||||||
|
pub micro: TierThresholds,
|
||||||
|
/// Thresholds for small teams (6-25 projects).
|
||||||
|
pub small: TierThresholds,
|
||||||
|
/// Thresholds for medium organizations (26-100 projects).
|
||||||
|
pub medium: TierThresholds,
|
||||||
|
/// Thresholds for large organizations (101-500 projects).
|
||||||
|
pub large: TierThresholds,
|
||||||
|
/// Thresholds for enterprise scale (501+ projects).
|
||||||
|
pub enterprise: TierThresholds,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ScaleAdaptiveThresholds {
|
||||||
|
/// Get thresholds for a specific scale tier
|
||||||
|
pub fn for_tier(&self, tier: ScaleTier) -> &TierThresholds {
|
||||||
|
match tier {
|
||||||
|
ScaleTier::Micro => &self.micro,
|
||||||
|
ScaleTier::Small => &self.small,
|
||||||
|
ScaleTier::Medium => &self.medium,
|
||||||
|
ScaleTier::Large => &self.large,
|
||||||
|
ScaleTier::Enterprise => &self.enterprise,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Evaluate promotion decision for a pattern
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
/// - `project_count`: Number of projects pattern appears in
|
||||||
|
/// - `total_projects`: Total projects in organization
|
||||||
|
/// - `has_authority_match`: Whether pattern matches authority source
|
||||||
|
/// - `authority_scheme`: Authority scheme if matched (e.g., "rfc://")
|
||||||
|
pub fn evaluate(
|
||||||
|
&self,
|
||||||
|
project_count: u64,
|
||||||
|
total_projects: u64,
|
||||||
|
has_authority_match: bool,
|
||||||
|
authority_scheme: Option<&str>,
|
||||||
|
) -> PromotionDecision {
|
||||||
|
if total_projects == 0 {
|
||||||
|
return PromotionDecision::Skip;
|
||||||
|
}
|
||||||
|
|
||||||
|
let tier = ScaleTier::from_total_projects(total_projects);
|
||||||
|
let thresholds = self.for_tier(tier);
|
||||||
|
|
||||||
|
let adoption_rate = project_count as f64 / total_projects as f64;
|
||||||
|
|
||||||
|
// Try regulatory (if enabled for this tier)
|
||||||
|
if let Some(reg) = &thresholds.regulatory {
|
||||||
|
let min_projects = reg.effective_min_projects(total_projects);
|
||||||
|
if adoption_rate >= reg.min_adoption_rate
|
||||||
|
&& project_count >= min_projects
|
||||||
|
&& (!reg.require_authority
|
||||||
|
|| matches_authority(has_authority_match, authority_scheme, ®.authority_sources))
|
||||||
|
{
|
||||||
|
return PromotionDecision::AutoPromote(SourceClass::Regulatory);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try clinical (if enabled)
|
||||||
|
if let Some(clin) = &thresholds.clinical {
|
||||||
|
let min_projects = clin.effective_min_projects(total_projects);
|
||||||
|
if adoption_rate >= clin.min_adoption_rate
|
||||||
|
&& project_count >= min_projects
|
||||||
|
&& (!clin.require_authority
|
||||||
|
|| matches_authority(has_authority_match, authority_scheme, &clin.authority_sources))
|
||||||
|
{
|
||||||
|
return PromotionDecision::AutoPromote(SourceClass::Clinical);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try emerging (always enabled)
|
||||||
|
let min_projects = thresholds.emerging.effective_min_projects(total_projects);
|
||||||
|
if adoption_rate >= thresholds.emerging.min_adoption_rate && project_count >= min_projects {
|
||||||
|
if thresholds.emerging.auto_promote {
|
||||||
|
return PromotionDecision::AutoPromote(SourceClass::Community);
|
||||||
|
} else {
|
||||||
|
return PromotionDecision::RequireReview;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PromotionDecision::Skip
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ScaleAdaptiveThresholds {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
// Micro: 1-5 projects - Only emerging tier, very permissive
|
||||||
|
micro: TierThresholds {
|
||||||
|
regulatory: None, // Disabled
|
||||||
|
clinical: None, // Disabled
|
||||||
|
emerging: AdaptiveCriteria {
|
||||||
|
min_projects_floor: 2,
|
||||||
|
min_projects_percentage: 0.50, // Pattern in 50% of projects
|
||||||
|
min_adoption_rate: 0.50,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: true, // Auto-promote for immediate visibility
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
// Small: 6-25 projects - All tiers enabled, lower floors
|
||||||
|
small: TierThresholds {
|
||||||
|
regulatory: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 5,
|
||||||
|
min_projects_percentage: 0.90,
|
||||||
|
min_adoption_rate: 0.90,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
clinical: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 4,
|
||||||
|
min_projects_percentage: 0.75,
|
||||||
|
min_adoption_rate: 0.75,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
emerging: AdaptiveCriteria {
|
||||||
|
min_projects_floor: 2,
|
||||||
|
min_projects_percentage: 0.40,
|
||||||
|
min_adoption_rate: 0.40,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: true, // Auto-promote for small teams too
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
// Medium: 26-100 projects - Balanced thresholds
|
||||||
|
medium: TierThresholds {
|
||||||
|
regulatory: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 20,
|
||||||
|
min_projects_percentage: 0.90,
|
||||||
|
min_adoption_rate: 0.90,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
clinical: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 10,
|
||||||
|
min_projects_percentage: 0.75,
|
||||||
|
min_adoption_rate: 0.75,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
emerging: AdaptiveCriteria {
|
||||||
|
min_projects_floor: 5,
|
||||||
|
min_projects_percentage: 0.40,
|
||||||
|
min_adoption_rate: 0.40,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: false,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
// Large: 101-500 projects - Higher quality gates
|
||||||
|
large: TierThresholds {
|
||||||
|
regulatory: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 50,
|
||||||
|
min_projects_percentage: 0.90,
|
||||||
|
min_adoption_rate: 0.90,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
clinical: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 30,
|
||||||
|
min_projects_percentage: 0.75,
|
||||||
|
min_adoption_rate: 0.75,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
emerging: AdaptiveCriteria {
|
||||||
|
min_projects_floor: 15,
|
||||||
|
min_projects_percentage: 0.40,
|
||||||
|
min_adoption_rate: 0.40,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: false,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
// Enterprise: 501+ projects - Current defaults (backward compatible)
|
||||||
|
enterprise: TierThresholds {
|
||||||
|
regulatory: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 100,
|
||||||
|
min_projects_percentage: 0.95,
|
||||||
|
min_adoption_rate: 0.95,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["rfc://".into(), "nist://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
clinical: Some(AdaptiveCriteria {
|
||||||
|
min_projects_floor: 50,
|
||||||
|
min_projects_percentage: 0.80,
|
||||||
|
min_adoption_rate: 0.80,
|
||||||
|
require_authority: true,
|
||||||
|
authority_sources: vec!["owasp://".into(), "cwe://".into()],
|
||||||
|
auto_promote: true,
|
||||||
|
}),
|
||||||
|
emerging: AdaptiveCriteria {
|
||||||
|
min_projects_floor: 25,
|
||||||
|
min_projects_percentage: 0.50,
|
||||||
|
min_adoption_rate: 0.50,
|
||||||
|
require_authority: false,
|
||||||
|
authority_sources: vec![],
|
||||||
|
auto_promote: false,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper: Check if authority sources match
|
||||||
|
fn matches_authority(
|
||||||
|
has_authority_match: bool,
|
||||||
|
authority_scheme: Option<&str>,
|
||||||
|
required_sources: &[String],
|
||||||
|
) -> bool {
|
||||||
|
if !has_authority_match {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if required_sources.is_empty() {
|
||||||
|
return true; // Any authority source acceptable
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(scheme) = authority_scheme {
|
||||||
|
required_sources.iter().any(|src| scheme.starts_with(src))
|
||||||
|
} else {
|
||||||
|
false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
@ -322,4 +650,138 @@ mod tests {
|
|||||||
// Should not promote to Regulatory due to min_projects
|
// Should not promote to Regulatory due to min_projects
|
||||||
assert_ne!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
assert_ne!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ===== Scale-Adaptive Tests =====
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_scale_tier_detection() {
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(1), ScaleTier::Micro);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(3), ScaleTier::Micro);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(5), ScaleTier::Micro);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(6), ScaleTier::Small);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(25), ScaleTier::Small);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(26), ScaleTier::Medium);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(100), ScaleTier::Medium);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(101), ScaleTier::Large);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(500), ScaleTier::Large);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(501), ScaleTier::Enterprise);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(10000), ScaleTier::Enterprise);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_effective_min_projects() {
|
||||||
|
let criteria = AdaptiveCriteria {
|
||||||
|
min_projects_floor: 5,
|
||||||
|
min_projects_percentage: 0.50,
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Floor dominates for small counts
|
||||||
|
assert_eq!(criteria.effective_min_projects(3), 5); // 50% * 3 = 1.5 → 2 < 5
|
||||||
|
assert_eq!(criteria.effective_min_projects(8), 5); // 50% * 8 = 4 < 5
|
||||||
|
|
||||||
|
// Percentage dominates for larger counts
|
||||||
|
assert_eq!(criteria.effective_min_projects(12), 6); // 50% * 12 = 6 > 5
|
||||||
|
assert_eq!(criteria.effective_min_projects(20), 10); // 50% * 20 = 10 > 5
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_micro_team_promotion() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 3 projects total, pattern in 2 projects (67% adoption)
|
||||||
|
let decision = thresholds.evaluate(2, 3, false, None);
|
||||||
|
|
||||||
|
// Should promote to emerging: max(2, 0.50*3) = 2, adoption = 67% >= 50%
|
||||||
|
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_micro_team_below_threshold() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 3 projects total, pattern in 1 project (33% adoption)
|
||||||
|
let decision = thresholds.evaluate(1, 3, false, None);
|
||||||
|
|
||||||
|
// Should NOT promote: 33% < 50% adoption rate
|
||||||
|
assert_eq!(decision, PromotionDecision::Skip);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_regulatory_disabled_for_micro() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 3 projects total, pattern in 3 projects (100% adoption, RFC match)
|
||||||
|
let decision = thresholds.evaluate(3, 3, true, Some("rfc://1234"));
|
||||||
|
|
||||||
|
// Should NOT promote to regulatory (disabled for micro tier)
|
||||||
|
// Should promote to emerging instead
|
||||||
|
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_small_team_with_authority() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 10 projects total, pattern in 9 (90% adoption, RFC match)
|
||||||
|
let decision = thresholds.evaluate(9, 10, true, Some("rfc://1234"));
|
||||||
|
|
||||||
|
// Small tier regulatory: max(5, 0.90*10) = 9, rate = 90%
|
||||||
|
// Should auto-promote to regulatory
|
||||||
|
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_small_team_emerging() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 10 projects total, pattern in 4 (40% adoption, no authority)
|
||||||
|
let decision = thresholds.evaluate(4, 10, false, None);
|
||||||
|
|
||||||
|
// Small tier emerging: max(2, 0.40*10) = 4, rate = 40%
|
||||||
|
// Should require review
|
||||||
|
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_medium_team_clinical() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 50 projects total, pattern in 38 (76% adoption, OWASP match)
|
||||||
|
let decision = thresholds.evaluate(38, 50, true, Some("owasp://top-10/a01"));
|
||||||
|
|
||||||
|
// Medium tier clinical: max(10, 0.75*50) = 37.5 → 38, rate = 76%
|
||||||
|
// Should auto-promote to clinical
|
||||||
|
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Clinical));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_enterprise_backward_compatible() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// 1000 projects total, pattern in 950 (95% adoption, RFC match)
|
||||||
|
let decision = thresholds.evaluate(950, 1000, true, Some("rfc://9110"));
|
||||||
|
|
||||||
|
// Enterprise tier: max(100, 0.95*1000) = 950, rate = 95%
|
||||||
|
// Should auto-promote to regulatory (same as legacy behavior)
|
||||||
|
assert_eq!(decision, PromotionDecision::AutoPromote(SourceClass::Regulatory));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_authority_matching() {
|
||||||
|
// RFC source matches regulatory
|
||||||
|
assert!(matches_authority(true, Some("rfc://9110"), &["rfc://".into(), "nist://".into()]));
|
||||||
|
|
||||||
|
// NIST source matches regulatory
|
||||||
|
assert!(matches_authority(true, Some("nist://sp800-53"), &["rfc://".into(), "nist://".into()]));
|
||||||
|
|
||||||
|
// OWASP doesn't match regulatory
|
||||||
|
assert!(!matches_authority(true, Some("owasp://top-10/a01"), &["rfc://".into(), "nist://".into()]));
|
||||||
|
|
||||||
|
// No authority doesn't match when required
|
||||||
|
assert!(!matches_authority(false, None, &["rfc://".into()]));
|
||||||
|
|
||||||
|
// Empty sources accepts any authority
|
||||||
|
assert!(matches_authority(true, Some("anything://"), &[]));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
185
applications/aphoria/src/corpus/wiki_corpus_builder.rs
Normal file
185
applications/aphoria/src/corpus/wiki_corpus_builder.rs
Normal file
@ -0,0 +1,185 @@
|
|||||||
|
//! Wiki corpus builder
|
||||||
|
//!
|
||||||
|
//! Converts WikiPatterns into signed authoritative assertions for the corpus database.
|
||||||
|
//! Reuses existing helpers from episteme/corpus.rs to handle signing and metadata.
|
||||||
|
|
||||||
|
use crate::corpus::authority_parser::{parse_authority, Authority};
|
||||||
|
use crate::corpus::subject_builder::build_corpus_subject;
|
||||||
|
use crate::corpus::wiki_importer::WikiPattern;
|
||||||
|
use crate::episteme::create_authoritative_assertion_with_metadata;
|
||||||
|
use crate::error::AphoriaError;
|
||||||
|
use ed25519_dalek::SigningKey;
|
||||||
|
use serde_json::json;
|
||||||
|
use stemedb_core::types::SourceClass;
|
||||||
|
use stemedb_storage::{HybridStore, KVStore};
|
||||||
|
use std::sync::Arc;
|
||||||
|
use std::time::{SystemTime, UNIX_EPOCH};
|
||||||
|
use tracing::{info, warn};
|
||||||
|
|
||||||
|
/// Promote wiki patterns to corpus database as signed assertions
|
||||||
|
///
|
||||||
|
/// This function:
|
||||||
|
/// 1. Parses authority strings into structured Authority enums
|
||||||
|
/// 2. Builds proper subject URIs (rfc://, owasp://, cwe://, community://wiki/)
|
||||||
|
/// 3. Creates signed assertions with rich metadata
|
||||||
|
/// 4. Stores in corpus database with subject and predicate indexes
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
///
|
||||||
|
/// * `patterns` - WikiPatterns parsed from markdown files
|
||||||
|
/// * `signing_key` - Ed25519 key for signing assertions
|
||||||
|
/// * `corpus_store` - Corpus database KV store (NOT project database)
|
||||||
|
///
|
||||||
|
/// # Returns
|
||||||
|
///
|
||||||
|
/// Number of patterns successfully promoted to corpus
|
||||||
|
pub async fn promote_wiki_patterns_to_corpus(
|
||||||
|
patterns: Vec<WikiPattern>,
|
||||||
|
signing_key: &SigningKey,
|
||||||
|
corpus_store: Arc<HybridStore>,
|
||||||
|
) -> Result<usize, AphoriaError> {
|
||||||
|
let mut promoted = 0;
|
||||||
|
|
||||||
|
for pattern in patterns {
|
||||||
|
// Parse authority (or Unknown if missing)
|
||||||
|
let authority = pattern
|
||||||
|
.authority
|
||||||
|
.as_ref()
|
||||||
|
.map(|s| parse_authority(s))
|
||||||
|
.unwrap_or_else(|| Authority::Unknown("wiki import".to_string()));
|
||||||
|
|
||||||
|
// Build proper subject URI
|
||||||
|
let subject = build_corpus_subject(&pattern, &authority);
|
||||||
|
|
||||||
|
// Determine tier based on authority
|
||||||
|
let source_class = match &authority {
|
||||||
|
Authority::RFC { .. } | Authority::OWASP { .. } => SourceClass::Regulatory,
|
||||||
|
Authority::CWE { .. } => SourceClass::Clinical,
|
||||||
|
Authority::Unknown(_) => SourceClass::Community,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Get authority source string for metadata
|
||||||
|
let authority_source = pattern
|
||||||
|
.authority
|
||||||
|
.clone()
|
||||||
|
.unwrap_or_else(|| "wiki import".to_string());
|
||||||
|
|
||||||
|
// Build rich metadata
|
||||||
|
let metadata = json!({
|
||||||
|
"description": pattern.statement,
|
||||||
|
"authority_source": authority_source,
|
||||||
|
"category": infer_category(&pattern.subject),
|
||||||
|
"source": "wiki_import"
|
||||||
|
});
|
||||||
|
|
||||||
|
// Get current timestamp
|
||||||
|
let timestamp = SystemTime::now()
|
||||||
|
.duration_since(UNIX_EPOCH)
|
||||||
|
.map_err(|e| AphoriaError::Io(std::io::Error::other(e)))?
|
||||||
|
.as_secs();
|
||||||
|
|
||||||
|
// Create signed assertion (REUSE EXISTING HELPER)
|
||||||
|
let assertion = create_authoritative_assertion_with_metadata(
|
||||||
|
signing_key,
|
||||||
|
&subject,
|
||||||
|
&pattern.predicate,
|
||||||
|
pattern.value.clone().into(),
|
||||||
|
source_class,
|
||||||
|
&pattern.statement,
|
||||||
|
timestamp,
|
||||||
|
metadata,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Serialize assertion
|
||||||
|
let serialized = stemedb_core::serde::serialize(&assertion)
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to serialize assertion: {}", e)))?;
|
||||||
|
|
||||||
|
// Store with subject prefix for API querying
|
||||||
|
let subject_key = format!("subject:{}", subject);
|
||||||
|
corpus_store
|
||||||
|
.put(subject_key.as_bytes(), &serialized)
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to store assertion: {}", e)))?;
|
||||||
|
|
||||||
|
// Also store in predicate index
|
||||||
|
let pred_key = format!("predicate:corpus:{}", assertion.predicate);
|
||||||
|
corpus_store
|
||||||
|
.put(pred_key.as_bytes(), &serialized)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
AphoriaError::Storage(format!("Failed to store predicate index: {}", e))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"Promoted wiki pattern to corpus: {} -> {}",
|
||||||
|
pattern.subject, subject
|
||||||
|
);
|
||||||
|
promoted += 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if promoted > 0 {
|
||||||
|
info!("Successfully promoted {} wiki patterns to corpus", promoted);
|
||||||
|
} else {
|
||||||
|
warn!("No wiki patterns were promoted to corpus");
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(promoted)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Infer category from subject path
|
||||||
|
///
|
||||||
|
/// Uses simple keyword matching to categorize patterns into:
|
||||||
|
/// - security: TLS, JWT, password, auth, crypto
|
||||||
|
/// - architecture: HTTP, API, REST
|
||||||
|
/// - quality: test, CI
|
||||||
|
/// - general: everything else
|
||||||
|
fn infer_category(subject: &str) -> &str {
|
||||||
|
let lower = subject.to_lowercase();
|
||||||
|
if lower.contains("tls")
|
||||||
|
|| lower.contains("jwt")
|
||||||
|
|| lower.contains("password")
|
||||||
|
|| lower.contains("auth")
|
||||||
|
|| lower.contains("crypto")
|
||||||
|
{
|
||||||
|
"security"
|
||||||
|
} else if lower.contains("http") || lower.contains("api") || lower.contains("rest") {
|
||||||
|
"architecture"
|
||||||
|
} else if lower.contains("test") || lower.contains("ci") {
|
||||||
|
"quality"
|
||||||
|
} else {
|
||||||
|
"general"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_category_security() {
|
||||||
|
assert_eq!(infer_category("tls/cert_verification"), "security");
|
||||||
|
assert_eq!(infer_category("JWT/validation"), "security");
|
||||||
|
assert_eq!(infer_category("password/storage"), "security");
|
||||||
|
assert_eq!(infer_category("authentication/oauth"), "security");
|
||||||
|
assert_eq!(infer_category("crypto/hashing"), "security");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_category_architecture() {
|
||||||
|
assert_eq!(infer_category("http/headers"), "architecture");
|
||||||
|
assert_eq!(infer_category("API/versioning"), "architecture");
|
||||||
|
assert_eq!(infer_category("rest/endpoints"), "architecture");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_category_quality() {
|
||||||
|
assert_eq!(infer_category("test/coverage"), "quality");
|
||||||
|
assert_eq!(infer_category("CI/pipeline"), "quality");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_category_general() {
|
||||||
|
assert_eq!(infer_category("logging/format"), "general");
|
||||||
|
assert_eq!(infer_category("config/defaults"), "general");
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -3,9 +3,9 @@
|
|||||||
use std::path::{Path, PathBuf};
|
use std::path::{Path, PathBuf};
|
||||||
|
|
||||||
use crate::bridge;
|
use crate::bridge;
|
||||||
use crate::community::PatternAggregator;
|
use stemedb_storage::KVStore;
|
||||||
use crate::config::AphoriaConfig;
|
use crate::config::AphoriaConfig;
|
||||||
use crate::corpus::{import_from_wiki, CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
use crate::corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
||||||
use crate::current_timestamp;
|
use crate::current_timestamp;
|
||||||
use crate::episteme;
|
use crate::episteme;
|
||||||
use crate::error::AphoriaError;
|
use crate::error::AphoriaError;
|
||||||
@ -53,10 +53,25 @@ pub async fn build_corpus(
|
|||||||
corpus_config.include_rfc = only.iter().any(|s| s == "rfc");
|
corpus_config.include_rfc = only.iter().any(|s| s == "rfc");
|
||||||
corpus_config.include_owasp = only.iter().any(|s| s == "owasp");
|
corpus_config.include_owasp = only.iter().any(|s| s == "owasp");
|
||||||
corpus_config.include_vendor = only.iter().any(|s| s == "vendor");
|
corpus_config.include_vendor = only.iter().any(|s| s == "vendor");
|
||||||
|
corpus_config.use_community = only.iter().any(|s| s == "community");
|
||||||
}
|
}
|
||||||
|
|
||||||
// Create registry with configured builders
|
// Open Episteme to get access to stores for community corpus
|
||||||
let registry = CorpusRegistry::with_defaults(&corpus_config);
|
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
||||||
|
|
||||||
|
// Open corpus database for CLI-created items (if configured)
|
||||||
|
let corpus_store = if let Some(ref corpus_data_dir) = config.episteme.corpus_data_dir {
|
||||||
|
let corpus_episteme = episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||||
|
Some(corpus_episteme.store().clone())
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
};
|
||||||
|
|
||||||
|
// Create registry with stores (enables community corpus builder and CLI-created items)
|
||||||
|
let kv_store = episteme.store().clone();
|
||||||
|
let predicate_index =
|
||||||
|
std::sync::Arc::new(stemedb_storage::GenericPredicateIndexStore::new(kv_store.clone()));
|
||||||
|
let registry = CorpusRegistry::with_stores(&corpus_config, kv_store, predicate_index, corpus_store);
|
||||||
|
|
||||||
// Load signing key
|
// Load signing key
|
||||||
let signing_key = bridge::load_or_generate_key(&project_root)?;
|
let signing_key = bridge::load_or_generate_key(&project_root)?;
|
||||||
@ -68,12 +83,13 @@ pub async fn build_corpus(
|
|||||||
|
|
||||||
// Ingest into Episteme
|
// Ingest into Episteme
|
||||||
if !result.assertions.is_empty() {
|
if !result.assertions.is_empty() {
|
||||||
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
|
||||||
let ingested = episteme.ingest_authoritative(&result.assertions).await?;
|
let ingested = episteme.ingest_authoritative(&result.assertions).await?;
|
||||||
episteme.shutdown().await;
|
|
||||||
info!(ingested, "Corpus ingested into Episteme");
|
info!(ingested, "Corpus ingested into Episteme");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Shutdown episteme
|
||||||
|
episteme.shutdown().await;
|
||||||
|
|
||||||
Ok(result)
|
Ok(result)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -149,11 +165,14 @@ pub async fn export_corpus_as_pack(
|
|||||||
Ok(assertion_count)
|
Ok(assertion_count)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Import patterns from wiki documentation and store as pattern aggregates.
|
/// Import patterns from wiki documentation and store in corpus database.
|
||||||
///
|
///
|
||||||
/// This is a bootstrap operation for seeding the community corpus when
|
/// This function:
|
||||||
/// starting fresh. Patterns extracted from wiki docs are stored as
|
/// 1. Parses wiki markdown to extract WikiPatterns
|
||||||
/// pattern aggregates in StemeDB with initial project_count = 1.
|
/// 2. Parses authority strings (RFC, OWASP, CWE) into structured Authority enums
|
||||||
|
/// 3. Builds proper subject URIs (rfc://, owasp://, cwe://, community://wiki/)
|
||||||
|
/// 4. Creates signed assertions with rich metadata
|
||||||
|
/// 5. Stores in corpus database (~/.aphoria/corpus-db/) NOT project database
|
||||||
///
|
///
|
||||||
/// # Arguments
|
/// # Arguments
|
||||||
///
|
///
|
||||||
@ -162,19 +181,50 @@ pub async fn export_corpus_as_pack(
|
|||||||
///
|
///
|
||||||
/// # Returns
|
/// # Returns
|
||||||
///
|
///
|
||||||
/// Number of patterns imported and stored.
|
/// Number of patterns promoted to corpus database.
|
||||||
#[instrument(skip(config), fields(wiki_path = %wiki_path.as_ref().display()))]
|
#[instrument(skip(config), fields(wiki_path = %wiki_path.as_ref().display()))]
|
||||||
pub async fn import_corpus_from_wiki<P: AsRef<Path>>(
|
pub async fn import_corpus_from_wiki<P: AsRef<Path>>(
|
||||||
wiki_path: P,
|
wiki_path: P,
|
||||||
config: &AphoriaConfig,
|
config: &AphoriaConfig,
|
||||||
) -> Result<usize, AphoriaError> {
|
) -> Result<usize, AphoriaError> {
|
||||||
info!("Importing corpus from wiki");
|
use crate::corpus::promote_wiki_patterns_to_corpus;
|
||||||
|
use crate::corpus::WikiParser;
|
||||||
|
|
||||||
|
info!("Importing wiki from: {}", wiki_path.as_ref().display());
|
||||||
|
|
||||||
let project_root = std::env::current_dir()?;
|
let project_root = std::env::current_dir()?;
|
||||||
let timestamp = current_timestamp();
|
|
||||||
|
|
||||||
// Parse wiki files and extract patterns
|
// Parse wiki files and extract WikiPatterns
|
||||||
let patterns = import_from_wiki(wiki_path, timestamp).await?;
|
let parser = WikiParser::new()?;
|
||||||
|
let mut patterns = Vec::new();
|
||||||
|
|
||||||
|
let wiki_path = wiki_path.as_ref();
|
||||||
|
if !wiki_path.exists() {
|
||||||
|
return Err(AphoriaError::Config(format!(
|
||||||
|
"Wiki path does not exist: {}",
|
||||||
|
wiki_path.display()
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Walk directory for markdown files
|
||||||
|
let walker = ignore::WalkBuilder::new(wiki_path)
|
||||||
|
.follow_links(true)
|
||||||
|
.build();
|
||||||
|
|
||||||
|
for entry in walker.flatten() {
|
||||||
|
if entry.file_type().is_some_and(|ft| ft.is_file()) {
|
||||||
|
let path = entry.path();
|
||||||
|
if let Some(ext) = path.extension() {
|
||||||
|
if ext == "md" {
|
||||||
|
info!("Parsing wiki file: {}", path.display());
|
||||||
|
let content = tokio::fs::read_to_string(path).await?;
|
||||||
|
let file_patterns = parser.parse(&content)?;
|
||||||
|
patterns.extend(file_patterns);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
let pattern_count = patterns.len();
|
let pattern_count = patterns.len();
|
||||||
|
|
||||||
if patterns.is_empty() {
|
if patterns.is_empty() {
|
||||||
@ -182,21 +232,378 @@ pub async fn import_corpus_from_wiki<P: AsRef<Path>>(
|
|||||||
return Ok(0);
|
return Ok(0);
|
||||||
}
|
}
|
||||||
|
|
||||||
info!(pattern_count, "Extracted patterns from wiki");
|
info!(pattern_count, "Parsed {} patterns from wiki", pattern_count);
|
||||||
|
|
||||||
// Open local Episteme to get storage handles
|
// Get corpus_data_dir from config (required)
|
||||||
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
|
let corpus_data_dir = config
|
||||||
|
.episteme
|
||||||
|
.corpus_data_dir
|
||||||
|
.as_ref()
|
||||||
|
.ok_or_else(|| AphoriaError::Config("corpus_data_dir not configured".into()))?;
|
||||||
|
|
||||||
// Get stores for pattern aggregator
|
// Open corpus database (NOT project database)
|
||||||
let kv_store = episteme.get_kv_store();
|
let mut corpus_episteme =
|
||||||
let predicate_index = episteme.get_predicate_index();
|
episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||||
|
|
||||||
// Create pattern aggregator and store patterns
|
// Get signing key from corpus episteme
|
||||||
let aggregator = PatternAggregator::new(kv_store, predicate_index);
|
let signing_key = corpus_episteme.signing_key().clone();
|
||||||
aggregator.add_patterns(&patterns).await?;
|
|
||||||
|
|
||||||
episteme.shutdown().await;
|
// Promote wiki patterns to corpus database
|
||||||
|
let promoted = promote_wiki_patterns_to_corpus(
|
||||||
|
patterns,
|
||||||
|
&signing_key,
|
||||||
|
corpus_episteme.get_kv_store(),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
|
||||||
info!(imported = pattern_count, "Wiki patterns imported into corpus");
|
corpus_episteme.shutdown().await;
|
||||||
Ok(pattern_count)
|
|
||||||
|
info!(promoted, "Promoted {} wiki patterns to corpus database", promoted);
|
||||||
|
Ok(promoted)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a single corpus item from structured fields.
|
||||||
|
///
|
||||||
|
/// This function is used by the `aphoria corpus create` CLI command and by
|
||||||
|
/// LLM-based extraction skills to programmatically add corpus items.
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
///
|
||||||
|
/// * `subject` - Hierarchical subject path (e.g., "ml/dependencies/basicsr/torchvision")
|
||||||
|
/// * `predicate` - Predicate name (e.g., "incompatible_with", "requires")
|
||||||
|
/// * `value` - Value as string (auto-detected as boolean, number, or text)
|
||||||
|
/// * `explanation` - Full context and explanation for this claim
|
||||||
|
/// * `authority` - Authority source (GitHub URL, paper citation, docs URL)
|
||||||
|
/// * `category` - Category (compatibility, performance, security, architecture)
|
||||||
|
/// * `tier` - Authority tier (0=regulatory, 1=clinical, 2=observational, 3=community)
|
||||||
|
/// * `config` - Aphoria configuration
|
||||||
|
///
|
||||||
|
/// # Returns
|
||||||
|
///
|
||||||
|
/// Corpus item ID in format "corpus://{subject}/{predicate}"
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
#[instrument(skip(config), fields(subject = %subject, tier = tier))]
|
||||||
|
pub async fn create_corpus_item(
|
||||||
|
subject: String,
|
||||||
|
predicate: String,
|
||||||
|
value: String,
|
||||||
|
explanation: String,
|
||||||
|
authority: String,
|
||||||
|
category: String,
|
||||||
|
tier: u8,
|
||||||
|
config: &AphoriaConfig,
|
||||||
|
) -> Result<String, AphoriaError> {
|
||||||
|
use crate::episteme::create_authoritative_assertion_with_metadata;
|
||||||
|
use stemedb_core::types::SourceClass;
|
||||||
|
|
||||||
|
// 1. Validate tier (0-3)
|
||||||
|
let source_class = match tier {
|
||||||
|
0 => SourceClass::Regulatory,
|
||||||
|
1 => SourceClass::Clinical,
|
||||||
|
2 => SourceClass::Observational,
|
||||||
|
3 => SourceClass::Community,
|
||||||
|
_ => {
|
||||||
|
return Err(AphoriaError::Config(format!(
|
||||||
|
"Invalid tier: {tier}. Must be 0-3"
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// 2. Parse value into ObjectValue
|
||||||
|
let object_value = parse_value_string(&value)?;
|
||||||
|
|
||||||
|
// 3. Infer URI scheme if not present
|
||||||
|
let subject_uri = infer_subject_uri(&subject, tier, &authority)?;
|
||||||
|
|
||||||
|
// 4. Get project root and signing key
|
||||||
|
let project_root = std::env::current_dir()?;
|
||||||
|
let signing_key = bridge::load_or_generate_key(&project_root)?;
|
||||||
|
|
||||||
|
// 5. Get corpus database path from config
|
||||||
|
let corpus_data_dir = config
|
||||||
|
.episteme
|
||||||
|
.corpus_data_dir
|
||||||
|
.as_ref()
|
||||||
|
.ok_or_else(|| AphoriaError::Config("corpus_data_dir not configured".into()))?;
|
||||||
|
|
||||||
|
// 6. Open corpus database
|
||||||
|
let mut corpus_episteme =
|
||||||
|
episteme::LocalEpisteme::open_corpus_db(corpus_data_dir, &project_root).await?;
|
||||||
|
|
||||||
|
// 7. Build metadata
|
||||||
|
let metadata = serde_json::json!({
|
||||||
|
"description": explanation,
|
||||||
|
"authority_source": authority,
|
||||||
|
"category": category,
|
||||||
|
"source": "cli_create"
|
||||||
|
});
|
||||||
|
|
||||||
|
// 8. Create signed assertion with URI-schemed subject
|
||||||
|
let timestamp = current_timestamp();
|
||||||
|
let assertion = create_authoritative_assertion_with_metadata(
|
||||||
|
&signing_key,
|
||||||
|
&subject_uri,
|
||||||
|
&predicate,
|
||||||
|
object_value,
|
||||||
|
source_class,
|
||||||
|
&explanation,
|
||||||
|
timestamp,
|
||||||
|
metadata,
|
||||||
|
);
|
||||||
|
|
||||||
|
// 9. Serialize and store
|
||||||
|
let serialized = stemedb_core::serde::serialize(&assertion)
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to serialize assertion: {e}")))?;
|
||||||
|
|
||||||
|
// Store with subject index (use URI-schemed subject)
|
||||||
|
let subject_key = format!("subject:{}", subject_uri);
|
||||||
|
corpus_episteme
|
||||||
|
.store()
|
||||||
|
.put(subject_key.as_bytes(), &serialized)
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to store: {e}")))?;
|
||||||
|
|
||||||
|
// Store with predicate index
|
||||||
|
let pred_key = format!("predicate:corpus:{}", predicate);
|
||||||
|
corpus_episteme
|
||||||
|
.store()
|
||||||
|
.put(pred_key.as_bytes(), &serialized)
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to store predicate index: {e}")))?;
|
||||||
|
|
||||||
|
// 10. Shutdown and return
|
||||||
|
corpus_episteme.shutdown().await;
|
||||||
|
|
||||||
|
info!(subject = %subject_uri, predicate = %predicate, tier = tier, "Created corpus item");
|
||||||
|
Ok(format!("corpus://{}/{}", subject_uri, predicate))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Infer URI scheme from authority and tier.
|
||||||
|
///
|
||||||
|
/// If the subject already has a scheme (contains "://"), return as-is.
|
||||||
|
/// Otherwise, infer scheme based on authority string and tier:
|
||||||
|
/// - RFC authority → rfc://
|
||||||
|
/// - OWASP authority → owasp://
|
||||||
|
/// - CWE authority → cwe://
|
||||||
|
/// - Tier 2 (observational) → vendor://
|
||||||
|
/// - Tier 3 (community) → community://
|
||||||
|
///
|
||||||
|
/// # Examples
|
||||||
|
///
|
||||||
|
/// ```
|
||||||
|
/// assert_eq!(infer_subject_uri("tls/validation", 0, "RFC 5280"), "rfc://tls/validation");
|
||||||
|
/// assert_eq!(infer_subject_uri("xss/prevention", 1, "OWASP Top 10"), "owasp://xss/prevention");
|
||||||
|
/// assert_eq!(infer_subject_uri("rfc://already/schemed", 0, "RFC 9999"), "rfc://already/schemed");
|
||||||
|
/// ```
|
||||||
|
fn infer_subject_uri(subject: &str, tier: u8, authority: &str) -> Result<String, AphoriaError> {
|
||||||
|
// If already has scheme, return as-is
|
||||||
|
if subject.contains("://") {
|
||||||
|
return Ok(subject.to_string());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Infer scheme from authority and tier (case-insensitive matching)
|
||||||
|
let authority_lower = authority.to_lowercase();
|
||||||
|
let scheme = if authority_lower.contains("rfc") {
|
||||||
|
"rfc"
|
||||||
|
} else if authority_lower.contains("owasp") {
|
||||||
|
"owasp"
|
||||||
|
} else if authority_lower.contains("cwe") {
|
||||||
|
"cwe"
|
||||||
|
} else if tier == 2 {
|
||||||
|
"vendor"
|
||||||
|
} else if tier == 3 {
|
||||||
|
"community"
|
||||||
|
} else {
|
||||||
|
// For tier 0 or 1 without recognized authority, use "corpus" as fallback
|
||||||
|
"corpus"
|
||||||
|
};
|
||||||
|
|
||||||
|
Ok(format!("{}://{}", scheme, subject))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse value string into ObjectValue.
|
||||||
|
///
|
||||||
|
/// Attempts to parse as boolean, then number, then defaults to text.
|
||||||
|
fn parse_value_string(value: &str) -> Result<stemedb_core::types::ObjectValue, AphoriaError> {
|
||||||
|
use stemedb_core::types::ObjectValue;
|
||||||
|
// Try boolean
|
||||||
|
if value.eq_ignore_ascii_case("true") {
|
||||||
|
return Ok(ObjectValue::Boolean(true));
|
||||||
|
}
|
||||||
|
if value.eq_ignore_ascii_case("false") {
|
||||||
|
return Ok(ObjectValue::Boolean(false));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try number
|
||||||
|
if let Ok(n) = value.parse::<f64>() {
|
||||||
|
return Ok(ObjectValue::Number(n));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default to text
|
||||||
|
Ok(ObjectValue::Text(value.to_string()))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_rfc_authority() {
|
||||||
|
// RFC authority should infer rfc:// scheme (case-insensitive)
|
||||||
|
let result = infer_subject_uri("tls/validation", 0, "RFC 5280").unwrap();
|
||||||
|
assert_eq!(result, "rfc://tls/validation");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("tls/cipher_suites", 1, "rfc 8446").unwrap();
|
||||||
|
assert_eq!(result, "rfc://tls/cipher_suites");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("http/headers", 2, "Rfc 7231").unwrap();
|
||||||
|
assert_eq!(result, "rfc://http/headers");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_owasp_authority() {
|
||||||
|
// OWASP authority should infer owasp:// scheme (case-insensitive)
|
||||||
|
let result = infer_subject_uri("xss/prevention", 0, "OWASP Top 10").unwrap();
|
||||||
|
assert_eq!(result, "owasp://xss/prevention");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("csrf/token", 1, "owasp cheat sheet").unwrap();
|
||||||
|
assert_eq!(result, "owasp://csrf/token");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("injection/sql", 2, "Owasp Guide").unwrap();
|
||||||
|
assert_eq!(result, "owasp://injection/sql");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_cwe_authority() {
|
||||||
|
// CWE authority should infer cwe:// scheme (case-insensitive)
|
||||||
|
let result = infer_subject_uri("buffer/overflow", 0, "CWE-120").unwrap();
|
||||||
|
assert_eq!(result, "cwe://buffer/overflow");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("path/traversal", 1, "cwe-22").unwrap();
|
||||||
|
assert_eq!(result, "cwe://path/traversal");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("injection/command", 2, "Cwe-78").unwrap();
|
||||||
|
assert_eq!(result, "cwe://injection/command");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_vendor_tier() {
|
||||||
|
// Tier 2 (observational) should infer vendor:// scheme
|
||||||
|
let result = infer_subject_uri("ml/dependencies", 2, "GitHub Issue #123").unwrap();
|
||||||
|
assert_eq!(result, "vendor://ml/dependencies");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("api/rate_limit", 2, "Vendor Documentation").unwrap();
|
||||||
|
assert_eq!(result, "vendor://api/rate_limit");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_community_tier() {
|
||||||
|
// Tier 3 (community) should infer community:// scheme
|
||||||
|
let result = infer_subject_uri("best_practices/logging", 3, "Team Wiki").unwrap();
|
||||||
|
assert_eq!(result, "community://best_practices/logging");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("patterns/error_handling", 3, "Internal Docs").unwrap();
|
||||||
|
assert_eq!(result, "community://patterns/error_handling");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_corpus_fallback() {
|
||||||
|
// Tier 0 or 1 without recognized authority should use corpus:// fallback
|
||||||
|
let result = infer_subject_uri("custom/subject", 0, "Unknown Authority").unwrap();
|
||||||
|
assert_eq!(result, "corpus://custom/subject");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("another/subject", 1, "Some Other Source").unwrap();
|
||||||
|
assert_eq!(result, "corpus://another/subject");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_already_schemed() {
|
||||||
|
// Subjects with existing schemes should be returned as-is
|
||||||
|
let result = infer_subject_uri("rfc://already/schemed", 0, "RFC 9999").unwrap();
|
||||||
|
assert_eq!(result, "rfc://already/schemed");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("owasp://already/schemed", 1, "OWASP").unwrap();
|
||||||
|
assert_eq!(result, "owasp://already/schemed");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("custom://some/path", 2, "Vendor").unwrap();
|
||||||
|
assert_eq!(result, "custom://some/path");
|
||||||
|
|
||||||
|
let result = infer_subject_uri("http://example.com/path", 3, "Community").unwrap();
|
||||||
|
assert_eq!(result, "http://example.com/path");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_infer_subject_uri_authority_priority() {
|
||||||
|
// Authority string takes priority over tier for scheme inference
|
||||||
|
let result = infer_subject_uri("test/subject", 3, "RFC 1234").unwrap();
|
||||||
|
assert_eq!(result, "rfc://test/subject"); // RFC wins over tier 3
|
||||||
|
|
||||||
|
let result = infer_subject_uri("test/subject", 2, "OWASP Guide").unwrap();
|
||||||
|
assert_eq!(result, "owasp://test/subject"); // OWASP wins over tier 2
|
||||||
|
|
||||||
|
let result = infer_subject_uri("test/subject", 3, "CWE-999").unwrap();
|
||||||
|
assert_eq!(result, "cwe://test/subject"); // CWE wins over tier 3
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_value_string_boolean() {
|
||||||
|
use stemedb_core::types::ObjectValue;
|
||||||
|
|
||||||
|
// Test boolean parsing (case-insensitive)
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("true").unwrap(),
|
||||||
|
ObjectValue::Boolean(true)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("TRUE").unwrap(),
|
||||||
|
ObjectValue::Boolean(true)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("false").unwrap(),
|
||||||
|
ObjectValue::Boolean(false)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("False").unwrap(),
|
||||||
|
ObjectValue::Boolean(false)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_value_string_number() {
|
||||||
|
use stemedb_core::types::ObjectValue;
|
||||||
|
|
||||||
|
// Test number parsing
|
||||||
|
assert_eq!(parse_value_string("42").unwrap(), ObjectValue::Number(42.0));
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("3.14").unwrap(),
|
||||||
|
ObjectValue::Number(3.14)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("-100").unwrap(),
|
||||||
|
ObjectValue::Number(-100.0)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("0.0").unwrap(),
|
||||||
|
ObjectValue::Number(0.0)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_value_string_text() {
|
||||||
|
use stemedb_core::types::ObjectValue;
|
||||||
|
|
||||||
|
// Test text parsing (fallback for non-boolean, non-number)
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("hello world").unwrap(),
|
||||||
|
ObjectValue::Text("hello world".to_string())
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("not_a_bool").unwrap(),
|
||||||
|
ObjectValue::Text("not_a_bool".to_string())
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
parse_value_string("1.2.3").unwrap(),
|
||||||
|
ObjectValue::Text("1.2.3".to_string())
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -42,6 +42,96 @@ pub struct LocalEpisteme {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl LocalEpisteme {
|
impl LocalEpisteme {
|
||||||
|
/// Open corpus database (shared across projects).
|
||||||
|
///
|
||||||
|
/// This opens a separate database for corpus assertions (RFC, OWASP, etc.)
|
||||||
|
/// stored in `~/.aphoria/corpus-db/` instead of the project-local database.
|
||||||
|
#[instrument(fields(corpus_data_dir = ?corpus_data_dir))]
|
||||||
|
pub async fn open_corpus_db(corpus_data_dir: &Path, project_root: &Path) -> Result<Self, AphoriaError> {
|
||||||
|
// Expand tilde if present
|
||||||
|
let corpus_path = if let Some(path_str) = corpus_data_dir.to_str() {
|
||||||
|
if path_str.starts_with('~') {
|
||||||
|
let expanded = shellexpand::tilde(path_str).into_owned();
|
||||||
|
PathBuf::from(expanded)
|
||||||
|
} else {
|
||||||
|
corpus_data_dir.to_path_buf()
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
corpus_data_dir.to_path_buf()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Create directory if it doesn't exist
|
||||||
|
tokio::fs::create_dir_all(&corpus_path).await
|
||||||
|
.map_err(AphoriaError::Io)?;
|
||||||
|
|
||||||
|
// Canonicalize (required by fjall/lsm-tree)
|
||||||
|
let corpus_path = corpus_path.canonicalize().map_err(|e| {
|
||||||
|
AphoriaError::Storage(format!("Failed to canonicalize corpus_data_dir: {}", e))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let wal_dir = corpus_path.join("wal");
|
||||||
|
std::fs::create_dir_all(&wal_dir)?;
|
||||||
|
|
||||||
|
info!("Opening corpus database at {}", corpus_path.display());
|
||||||
|
|
||||||
|
// Open WAL
|
||||||
|
let journal = Arc::new(Mutex::new(Journal::open(&wal_dir).map_err(|e| {
|
||||||
|
AphoriaError::Storage(format!("Failed to open corpus WAL at {}: {e}", wal_dir.display()))
|
||||||
|
})?));
|
||||||
|
|
||||||
|
// Open store (directly at corpus_path, matching API behavior)
|
||||||
|
let store = Arc::new(HybridStore::open(&corpus_path).map_err(|e| {
|
||||||
|
AphoriaError::Storage(format!("Failed to open corpus store at {}: {e}", corpus_path.display()))
|
||||||
|
})?);
|
||||||
|
|
||||||
|
// Create ingestor
|
||||||
|
let mut ingestor = Ingestor::new(journal.clone(), store.clone())
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to create corpus ingestor: {e}")))?;
|
||||||
|
ingestor.start();
|
||||||
|
|
||||||
|
// Load or generate signing key (from project root)
|
||||||
|
let signing_key = load_or_generate_key(project_root).map_err(|e| {
|
||||||
|
AphoriaError::Storage(format!(
|
||||||
|
"Failed to load/generate signing key at {}: {e}",
|
||||||
|
project_root.display()
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// Create stores
|
||||||
|
let alias_store = GenericAliasStore::new(store.clone());
|
||||||
|
let predicate_index_store = GenericPredicateIndexStore::new(store.clone());
|
||||||
|
let pack_source_store = GenericPackSourceStore::new(store.clone());
|
||||||
|
let predicate_alias_store = GenericPredicateAliasStore::new(store.clone());
|
||||||
|
|
||||||
|
// Load predicate aliases
|
||||||
|
let stored_aliases = predicate_alias_store
|
||||||
|
.list_all_predicate_aliases()
|
||||||
|
.await
|
||||||
|
.map_err(|e| AphoriaError::Storage(format!("Failed to load corpus predicate aliases: {e}")))?;
|
||||||
|
let predicate_aliases: Vec<PredicateAliasSet> = stored_aliases
|
||||||
|
.into_iter()
|
||||||
|
.map(|s| PredicateAliasSet::new(s.canonical, s.aliases))
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
if !predicate_aliases.is_empty() {
|
||||||
|
info!(count = predicate_aliases.len(), "Loaded predicate aliases from corpus storage");
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Self {
|
||||||
|
journal,
|
||||||
|
store,
|
||||||
|
ingestor,
|
||||||
|
signing_key,
|
||||||
|
alias_store,
|
||||||
|
predicate_index_store,
|
||||||
|
pack_source_store,
|
||||||
|
predicate_alias_store,
|
||||||
|
predicate_aliases,
|
||||||
|
project_root: project_root.to_path_buf(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
/// Open or create a local Episteme instance.
|
/// Open or create a local Episteme instance.
|
||||||
#[instrument(skip(config), fields(data_dir = %config.episteme.data_dir.display()))]
|
#[instrument(skip(config), fields(data_dir = %config.episteme.data_dir.display()))]
|
||||||
pub async fn open(config: &AphoriaConfig, project_root: &Path) -> Result<Self, AphoriaError> {
|
pub async fn open(config: &AphoriaConfig, project_root: &Path) -> Result<Self, AphoriaError> {
|
||||||
@ -143,6 +233,11 @@ impl LocalEpisteme {
|
|||||||
self.signing_key.verifying_key().to_bytes()
|
self.signing_key.verifying_key().to_bytes()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Get a reference to the signing key for creating assertions.
|
||||||
|
pub fn signing_key(&self) -> &SigningKey {
|
||||||
|
&self.signing_key
|
||||||
|
}
|
||||||
|
|
||||||
/// Get a reference to the alias store for querying created aliases.
|
/// Get a reference to the alias store for querying created aliases.
|
||||||
#[allow(dead_code)]
|
#[allow(dead_code)]
|
||||||
pub fn alias_store(&self) -> &GenericAliasStore<Arc<HybridStore>> {
|
pub fn alias_store(&self) -> &GenericAliasStore<Arc<HybridStore>> {
|
||||||
@ -169,7 +264,10 @@ impl LocalEpisteme {
|
|||||||
// Create registry with all builders including community (if enabled)
|
// Create registry with all builders including community (if enabled)
|
||||||
// Note: GenericPredicateIndexStore doesn't implement Clone, so we create a new one
|
// Note: GenericPredicateIndexStore doesn't implement Clone, so we create a new one
|
||||||
let predicate_index = Arc::new(GenericPredicateIndexStore::new(self.store.clone()));
|
let predicate_index = Arc::new(GenericPredicateIndexStore::new(self.store.clone()));
|
||||||
let registry = CorpusRegistry::with_stores(config, self.store.clone(), predicate_index);
|
|
||||||
|
// No corpus_store here - CLI-created items are only needed in explicit corpus builds,
|
||||||
|
// not during scans (which use project-local episteme)
|
||||||
|
let registry = CorpusRegistry::with_stores(config, self.store.clone(), predicate_index, None);
|
||||||
|
|
||||||
let timestamp = current_timestamp();
|
let timestamp = current_timestamp();
|
||||||
|
|
||||||
|
|||||||
@ -88,5 +88,37 @@ pub async fn handle_corpus_command(command: CorpusCommands, config: &AphoriaConf
|
|||||||
}
|
}
|
||||||
ExitCode::SUCCESS
|
ExitCode::SUCCESS
|
||||||
}
|
}
|
||||||
|
|
||||||
|
CorpusCommands::Create {
|
||||||
|
subject,
|
||||||
|
predicate,
|
||||||
|
value,
|
||||||
|
explanation,
|
||||||
|
authority,
|
||||||
|
category,
|
||||||
|
tier,
|
||||||
|
} => {
|
||||||
|
match aphoria::create_corpus_item(
|
||||||
|
subject,
|
||||||
|
predicate,
|
||||||
|
value,
|
||||||
|
explanation,
|
||||||
|
authority,
|
||||||
|
category,
|
||||||
|
tier,
|
||||||
|
config,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(corpus_id) => {
|
||||||
|
println!("Created corpus item: {}", corpus_id);
|
||||||
|
ExitCode::SUCCESS
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
eprintln!("Error creating corpus item: {e}");
|
||||||
|
ExitCode::from(3)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -107,8 +107,8 @@ pub use config::{
|
|||||||
};
|
};
|
||||||
pub use corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
pub use corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
|
||||||
pub use corpus_build::{
|
pub use corpus_build::{
|
||||||
build_corpus, export_corpus_as_pack, import_corpus_from_wiki, list_corpus_sources,
|
build_corpus, create_corpus_item, export_corpus_as_pack, import_corpus_from_wiki,
|
||||||
CorpusBuildArgs,
|
list_corpus_sources, CorpusBuildArgs,
|
||||||
};
|
};
|
||||||
pub use coverage::{
|
pub use coverage::{
|
||||||
compute_coverage, compute_coverage_from_report, format_coverage_json, format_coverage_markdown,
|
compute_coverage, compute_coverage_from_report, format_coverage_json, format_coverage_markdown,
|
||||||
|
|||||||
140
applications/aphoria/tests/scale_adaptive_test.rs
Normal file
140
applications/aphoria/tests/scale_adaptive_test.rs
Normal file
@ -0,0 +1,140 @@
|
|||||||
|
//! Integration tests for scale-adaptive promotion thresholds.
|
||||||
|
//!
|
||||||
|
//! Verifies that promotion criteria automatically adjust based on organization size,
|
||||||
|
//! enabling small teams to see value immediately while maintaining quality gates
|
||||||
|
//! for larger organizations.
|
||||||
|
|
||||||
|
use aphoria::corpus::thresholds::{PromotionDecision, ScaleAdaptiveThresholds, ScaleTier};
|
||||||
|
use stemedb_core::types::SourceClass;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_micro_team_sees_patterns() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Micro team with 3 projects, pattern appears in 2
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
2, // project_count
|
||||||
|
3, // total_projects
|
||||||
|
false, // no authority
|
||||||
|
None,
|
||||||
|
);
|
||||||
|
|
||||||
|
// With adaptive thresholds:
|
||||||
|
// - Scale tier: Micro (1-5 projects)
|
||||||
|
// - Emerging min_projects: max(2, 0.50*3) = max(2, 1.5) = 2
|
||||||
|
// - Adoption rate: 2/3 = 67% >= 50%
|
||||||
|
// Should require review (emerging tier)
|
||||||
|
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_micro_team_regulatory_disabled() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Micro team with 5 projects, pattern appears in all 5 with RFC match
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
5, // project_count
|
||||||
|
5, // total_projects
|
||||||
|
true, // has authority
|
||||||
|
Some("rfc://1234"), // RFC scheme
|
||||||
|
);
|
||||||
|
|
||||||
|
// Regulatory tier is disabled for micro teams
|
||||||
|
// Should fall through to emerging tier
|
||||||
|
assert_eq!(decision, PromotionDecision::RequireReview);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_small_team_enables_all_tiers() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Small team with 10 projects, pattern in 9 with RFC match
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
9, // project_count
|
||||||
|
10, // total_projects
|
||||||
|
true, // has authority
|
||||||
|
Some("rfc://5246"), // RFC scheme
|
||||||
|
);
|
||||||
|
|
||||||
|
// Small tier regulatory: max(5, 0.90*10) = max(5, 9) = 9
|
||||||
|
// Adoption rate: 9/10 = 90% >= 90%
|
||||||
|
// Should auto-promote to regulatory
|
||||||
|
assert_eq!(
|
||||||
|
decision,
|
||||||
|
PromotionDecision::AutoPromote(SourceClass::Regulatory)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_enterprise_maintains_strict_thresholds() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Enterprise with 1000 projects, pattern in 950 with RFC match
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
950, // project_count
|
||||||
|
1000, // total_projects
|
||||||
|
true, // has authority
|
||||||
|
Some("rfc://9110"), // RFC scheme
|
||||||
|
);
|
||||||
|
|
||||||
|
// Enterprise tier: max(100, 0.95*1000) = max(100, 950) = 950
|
||||||
|
// Adoption rate: 950/1000 = 95% >= 95%
|
||||||
|
// Should auto-promote to regulatory (backward compatible behavior)
|
||||||
|
assert_eq!(
|
||||||
|
decision,
|
||||||
|
PromotionDecision::AutoPromote(SourceClass::Regulatory)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_scale_tier_progression() {
|
||||||
|
// Verify scale tier boundaries
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(1), ScaleTier::Micro);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(5), ScaleTier::Micro);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(6), ScaleTier::Small);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(25), ScaleTier::Small);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(26), ScaleTier::Medium);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(100), ScaleTier::Medium);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(101), ScaleTier::Large);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(500), ScaleTier::Large);
|
||||||
|
assert_eq!(ScaleTier::from_total_projects(501), ScaleTier::Enterprise);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_adaptive_floor_prevents_noise() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Micro team with 3 projects, pattern appears in only 1
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
1, // project_count
|
||||||
|
3, // total_projects
|
||||||
|
false, // no authority
|
||||||
|
None,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Even though 1/3 = 33% meets percentage (50% of 3 = 1.5),
|
||||||
|
// the floor of 2 prevents single-project noise
|
||||||
|
// Adoption rate: 1/3 = 33% < 50%
|
||||||
|
assert_eq!(decision, PromotionDecision::Skip);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_medium_team_clinical_tier() {
|
||||||
|
let thresholds = ScaleAdaptiveThresholds::default();
|
||||||
|
|
||||||
|
// Medium team with 50 projects, pattern in 38 with OWASP match
|
||||||
|
let decision = thresholds.evaluate(
|
||||||
|
38, // project_count
|
||||||
|
50, // total_projects
|
||||||
|
true, // has authority
|
||||||
|
Some("owasp://top-10/a01"), // OWASP scheme
|
||||||
|
);
|
||||||
|
|
||||||
|
// Medium tier clinical: max(10, 0.75*50) = max(10, 37.5) = 38
|
||||||
|
// Adoption rate: 38/50 = 76% >= 75%
|
||||||
|
// Should auto-promote to clinical
|
||||||
|
assert_eq!(
|
||||||
|
decision,
|
||||||
|
PromotionDecision::AutoPromote(SourceClass::Clinical)
|
||||||
|
);
|
||||||
|
}
|
||||||
@ -26,6 +26,7 @@ axum = { version = "0.7", features = ["json"] }
|
|||||||
tokio = { version = "1", features = ["full"] }
|
tokio = { version = "1", features = ["full"] }
|
||||||
serde = { version = "1", features = ["derive"] }
|
serde = { version = "1", features = ["derive"] }
|
||||||
serde_json = "1"
|
serde_json = "1"
|
||||||
|
serde_qs = "0.13"
|
||||||
utoipa = { version = "5", features = ["axum_extras"] }
|
utoipa = { version = "5", features = ["axum_extras"] }
|
||||||
utoipa-axum = "0.1"
|
utoipa-axum = "0.1"
|
||||||
utoipa-swagger-ui = { version = "8", features = ["axum"] }
|
utoipa-swagger-ui = { version = "8", features = ["axum"] }
|
||||||
|
|||||||
@ -303,3 +303,31 @@ pub struct AcknowledgeViolationRequest {
|
|||||||
#[serde(skip_serializing_if = "Option::is_none")]
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
pub expires_at: Option<String>,
|
pub expires_at: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Corpus Endpoint DTOs
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
/// Request to get corpus items from authoritative sources.
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||||
|
pub struct GetCorpusRequest {
|
||||||
|
/// Filter by source schemes (e.g., ["rfc", "owasp", "community"]).
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub sources: Option<Vec<String>>,
|
||||||
|
|
||||||
|
/// Filter by category (e.g., "security", "architecture").
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub category: Option<String>,
|
||||||
|
|
||||||
|
/// Maximum number of items to return (default: 100).
|
||||||
|
#[serde(default = "default_corpus_limit")]
|
||||||
|
pub limit: usize,
|
||||||
|
|
||||||
|
/// Pagination offset (default: 0).
|
||||||
|
#[serde(default)]
|
||||||
|
pub offset: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
fn default_corpus_limit() -> usize {
|
||||||
|
100
|
||||||
|
}
|
||||||
|
|||||||
@ -270,3 +270,22 @@ pub struct AcknowledgeViolationResponse {
|
|||||||
/// Status message.
|
/// Status message.
|
||||||
pub message: String,
|
pub message: String,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Corpus Endpoint DTOs
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
use super::types::CorpusItemDto;
|
||||||
|
|
||||||
|
/// Response containing corpus items from authoritative sources.
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||||
|
pub struct GetCorpusResponse {
|
||||||
|
/// The corpus items matching the query.
|
||||||
|
pub items: Vec<CorpusItemDto>,
|
||||||
|
|
||||||
|
/// Total number of items matching (before limit applied).
|
||||||
|
pub total_matching: usize,
|
||||||
|
|
||||||
|
/// Sources included in this response.
|
||||||
|
pub sources_included: Vec<String>,
|
||||||
|
}
|
||||||
|
|||||||
@ -490,3 +490,39 @@ pub struct CoverageSummaryDto {
|
|||||||
/// Number of modules with zero claims.
|
/// Number of modules with zero claims.
|
||||||
pub modules_without_claims: usize,
|
pub modules_without_claims: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Corpus Types
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
/// A single corpus item (authoritative assertion from RFC/OWASP/Community).
|
||||||
|
///
|
||||||
|
/// Unlike PatternDto (which shows statistical aggregates), CorpusItemDto
|
||||||
|
/// represents valuable best practices from trusted sources.
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
|
||||||
|
pub struct CorpusItemDto {
|
||||||
|
/// The subject path (e.g., "rfc://9110/methods/GET", "owasp://a03/tls/version").
|
||||||
|
pub subject: String,
|
||||||
|
|
||||||
|
/// The predicate (e.g., "case_sensitive", "min_version").
|
||||||
|
pub predicate: String,
|
||||||
|
|
||||||
|
/// Display value (e.g., "true", "TLS 1.2").
|
||||||
|
pub value: String,
|
||||||
|
|
||||||
|
/// Source identifier (e.g., "rfc://9110", "owasp://a03", "community://pattern/xyz").
|
||||||
|
pub source: String,
|
||||||
|
|
||||||
|
/// Authority tier (0-4: Regulatory=0, RFC/OWASP=0, Expert=3, Community=4).
|
||||||
|
pub tier: u8,
|
||||||
|
|
||||||
|
/// Optional category (e.g., "security", "architecture", "performance").
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub category: Option<String>,
|
||||||
|
|
||||||
|
/// Human-readable explanation of the best practice.
|
||||||
|
pub explanation: String,
|
||||||
|
|
||||||
|
/// Authority source citation (e.g., "RFC 9110 Section 9.1", "OWASP A03:2021").
|
||||||
|
pub authority_source: String,
|
||||||
|
}
|
||||||
|
|||||||
187
crates/stemedb-api/src/extractors.rs
Normal file
187
crates/stemedb-api/src/extractors.rs
Normal file
@ -0,0 +1,187 @@
|
|||||||
|
//! Custom axum extractors for the StemeDB API.
|
||||||
|
|
||||||
|
use axum::{
|
||||||
|
async_trait,
|
||||||
|
extract::FromRequestParts,
|
||||||
|
http::{request::Parts, StatusCode},
|
||||||
|
response::{IntoResponse, Response},
|
||||||
|
};
|
||||||
|
use serde::de::DeserializeOwned;
|
||||||
|
use std::fmt;
|
||||||
|
|
||||||
|
/// Rejection type for QsQuery extraction failures.
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct QsQueryRejection {
|
||||||
|
message: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl fmt::Display for QsQueryRejection {
|
||||||
|
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||||
|
write!(f, "Failed to deserialize query string: {}", self.message)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::error::Error for QsQueryRejection {}
|
||||||
|
|
||||||
|
impl IntoResponse for QsQueryRejection {
|
||||||
|
fn into_response(self) -> Response {
|
||||||
|
(StatusCode::BAD_REQUEST, self.message).into_response()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Query string extractor that supports bracket notation (e.g., `?sources[]=value1&sources[]=value2`).
|
||||||
|
///
|
||||||
|
/// This extractor uses `serde_qs` instead of `serde_urlencoded` to properly handle
|
||||||
|
/// array parameters with bracket notation, which is the standard format used by
|
||||||
|
/// JavaScript's URLSearchParams and the StemeDB Dashboard.
|
||||||
|
///
|
||||||
|
/// # When to Use QsQuery vs Query
|
||||||
|
///
|
||||||
|
/// **Use `QsQuery` when:**
|
||||||
|
/// - Your request DTO contains `Vec<T>` or `Option<Vec<T>>` fields
|
||||||
|
/// - The endpoint is called by the dashboard or JavaScript clients
|
||||||
|
/// - You need bracket notation support: `?filters[]=a&filters[]=b`
|
||||||
|
///
|
||||||
|
/// **Use standard `axum::extract::Query` when:**
|
||||||
|
/// - All query parameters are scalars (String, usize, bool, Option<String>, etc.)
|
||||||
|
/// - No array/vector parameters needed
|
||||||
|
/// - Simpler and lighter weight for non-array cases
|
||||||
|
///
|
||||||
|
/// # Example
|
||||||
|
///
|
||||||
|
/// ```rust,ignore
|
||||||
|
/// use stemedb_api::extractors::QsQuery;
|
||||||
|
/// use serde::Deserialize;
|
||||||
|
///
|
||||||
|
/// #[derive(Deserialize)]
|
||||||
|
/// struct MyRequest {
|
||||||
|
/// sources: Option<Vec<String>>, // Array parameter
|
||||||
|
/// limit: usize, // Scalar parameter
|
||||||
|
/// }
|
||||||
|
///
|
||||||
|
/// // ✅ Correct - QsQuery handles both array and scalar params
|
||||||
|
/// async fn handler(QsQuery(params): QsQuery<MyRequest>) {
|
||||||
|
/// // Dashboard sends: ?sources[]=rfc&sources[]=community&limit=10
|
||||||
|
/// // params.sources = Some(vec!["rfc", "community"])
|
||||||
|
/// // params.limit = 10
|
||||||
|
/// }
|
||||||
|
///
|
||||||
|
/// // ❌ Wrong - standard Query can't parse bracket notation
|
||||||
|
/// async fn wrong_handler(Query(params): Query<MyRequest>) {
|
||||||
|
/// // Dashboard sends: ?sources[]=rfc&sources[]=community
|
||||||
|
/// // Result: params.sources = None (silently fails!)
|
||||||
|
/// }
|
||||||
|
/// ```
|
||||||
|
///
|
||||||
|
/// # Dashboard Compatibility
|
||||||
|
///
|
||||||
|
/// The StemeDB Dashboard uses JavaScript's `URLSearchParams.append()` which generates
|
||||||
|
/// bracket notation for arrays:
|
||||||
|
///
|
||||||
|
/// ```javascript
|
||||||
|
/// // Dashboard code
|
||||||
|
/// params.sources.forEach(s => searchParams.append("sources[]", s));
|
||||||
|
/// // Generates: ?sources[]=rfc&sources[]=owasp&sources[]=community
|
||||||
|
/// ```
|
||||||
|
///
|
||||||
|
/// If you use standard `Query` for array parameters, the dashboard filters will appear
|
||||||
|
/// to work but silently fail (returning all results instead of filtered results).
|
||||||
|
#[derive(Debug, Clone, Copy, Default)]
|
||||||
|
pub struct QsQuery<T>(pub T);
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T, S> FromRequestParts<S> for QsQuery<T>
|
||||||
|
where
|
||||||
|
T: DeserializeOwned,
|
||||||
|
S: Send + Sync,
|
||||||
|
{
|
||||||
|
type Rejection = QsQueryRejection;
|
||||||
|
|
||||||
|
async fn from_request_parts(parts: &mut Parts, _state: &S) -> Result<Self, Self::Rejection> {
|
||||||
|
let query = parts.uri.query().unwrap_or_default();
|
||||||
|
let value = serde_qs::from_str(query).map_err(|err| QsQueryRejection {
|
||||||
|
message: err.to_string(),
|
||||||
|
})?;
|
||||||
|
Ok(QsQuery(value))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T> std::ops::Deref for QsQuery<T> {
|
||||||
|
type Target = T;
|
||||||
|
|
||||||
|
fn deref(&self) -> &Self::Target {
|
||||||
|
&self.0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T> std::ops::DerefMut for QsQuery<T> {
|
||||||
|
fn deref_mut(&mut self) -> &mut Self::Target {
|
||||||
|
&mut self.0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use axum::http::{Request, Uri};
|
||||||
|
use serde::Deserialize;
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, PartialEq)]
|
||||||
|
struct TestParams {
|
||||||
|
sources: Option<Vec<String>>,
|
||||||
|
limit: Option<usize>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_bracket_notation() {
|
||||||
|
let uri: Uri = "http://example.com?sources[]=rfc&sources[]=community&limit=10"
|
||||||
|
.parse()
|
||||||
|
.unwrap();
|
||||||
|
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||||
|
|
||||||
|
let QsQuery(params): QsQuery<TestParams> =
|
||||||
|
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
params,
|
||||||
|
TestParams {
|
||||||
|
sources: Some(vec!["rfc".to_string(), "community".to_string()]),
|
||||||
|
limit: Some(10),
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_no_brackets() {
|
||||||
|
let uri: Uri = "http://example.com?limit=5".parse().unwrap();
|
||||||
|
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||||
|
|
||||||
|
let QsQuery(params): QsQuery<TestParams> =
|
||||||
|
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
params,
|
||||||
|
TestParams {
|
||||||
|
sources: None,
|
||||||
|
limit: Some(5),
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_empty_query() {
|
||||||
|
let uri: Uri = "http://example.com".parse().unwrap();
|
||||||
|
let mut parts = Request::builder().uri(uri).body(()).unwrap().into_parts().0;
|
||||||
|
|
||||||
|
let QsQuery(params): QsQuery<TestParams> =
|
||||||
|
QsQuery::from_request_parts(&mut parts, &()).await.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
params,
|
||||||
|
TestParams {
|
||||||
|
sources: None,
|
||||||
|
limit: None,
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
182
crates/stemedb-api/src/handlers/aphoria/corpus.rs
Normal file
182
crates/stemedb-api/src/handlers/aphoria/corpus.rs
Normal file
@ -0,0 +1,182 @@
|
|||||||
|
//! Corpus query handler for Aphoria.
|
||||||
|
//!
|
||||||
|
//! This endpoint returns authoritative assertions from RFC, OWASP, and Community
|
||||||
|
//! corpus sources - valuable best practices rather than statistical aggregates.
|
||||||
|
|
||||||
|
use axum::{extract::State, Json};
|
||||||
|
use stemedb_core::types::{ObjectValue, SourceClass};
|
||||||
|
use stemedb_storage::KVStore;
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
dto::aphoria::{CorpusItemDto, GetCorpusRequest, GetCorpusResponse},
|
||||||
|
error::{ApiError, Result},
|
||||||
|
extractors::QsQuery,
|
||||||
|
state::AppState,
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Get corpus items from authoritative sources (RFC, OWASP, vendor, community patterns, and CLI-created items).
|
||||||
|
///
|
||||||
|
/// Unlike the `/patterns` endpoint (which returns statistical aggregates),
|
||||||
|
/// this endpoint returns valuable, curated best practices from trusted sources.
|
||||||
|
#[utoipa::path(
|
||||||
|
get,
|
||||||
|
path = "/v1/aphoria/corpus",
|
||||||
|
params(
|
||||||
|
("sources" = Option<Vec<String>>, Query, description = "Filter by source schemes (rfc, owasp, community, vendor)"),
|
||||||
|
("category" = Option<String>, Query, description = "Filter by category (security, architecture, etc.)"),
|
||||||
|
("limit" = usize, Query, description = "Maximum items to return (default: 100)"),
|
||||||
|
("offset" = usize, Query, description = "Pagination offset (default: 0)"),
|
||||||
|
),
|
||||||
|
responses(
|
||||||
|
(status = 200, description = "Corpus items retrieved successfully", body = GetCorpusResponse),
|
||||||
|
(status = 400, description = "Invalid request", body = crate::dto::ErrorResponse),
|
||||||
|
(status = 500, description = "Internal server error", body = crate::dto::ErrorResponse),
|
||||||
|
),
|
||||||
|
tag = "aphoria"
|
||||||
|
)]
|
||||||
|
#[instrument(skip_all, fields(sources = ?params.sources, limit = params.limit, offset = params.offset))]
|
||||||
|
pub async fn get_corpus(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
QsQuery(params): QsQuery<GetCorpusRequest>,
|
||||||
|
) -> Result<Json<GetCorpusResponse>> {
|
||||||
|
// Determine which source prefixes to query
|
||||||
|
let source_prefixes = if let Some(sources) = ¶ms.sources {
|
||||||
|
sources
|
||||||
|
.iter()
|
||||||
|
.map(|s| match s.as_str() {
|
||||||
|
"rfc" => "rfc://",
|
||||||
|
"owasp" => "owasp://",
|
||||||
|
"community" => "community://",
|
||||||
|
"vendor" => "vendor://",
|
||||||
|
_ => s.as_str(),
|
||||||
|
})
|
||||||
|
.collect::<Vec<_>>()
|
||||||
|
} else {
|
||||||
|
// Default: query all authoritative sources
|
||||||
|
vec!["rfc://", "owasp://", "community://", "vendor://"]
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut all_items = Vec::new();
|
||||||
|
let mut sources_included = std::collections::HashSet::new();
|
||||||
|
|
||||||
|
// Query each source prefix
|
||||||
|
for prefix in source_prefixes {
|
||||||
|
let prefix_key = format!("subject:{}", prefix);
|
||||||
|
let pairs = state
|
||||||
|
.corpus_store
|
||||||
|
.scan_prefix(prefix_key.as_bytes())
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError::Internal(format!("Failed to scan corpus: {}", e)))?;
|
||||||
|
|
||||||
|
for (_key, value) in pairs {
|
||||||
|
// Deserialize assertion
|
||||||
|
let assertion: stemedb_core::types::Assertion =
|
||||||
|
stemedb_core::serde::deserialize(&value)
|
||||||
|
.map_err(|e| ApiError::Internal(format!("Failed to deserialize assertion: {}", e)))?;
|
||||||
|
|
||||||
|
// Extract metadata
|
||||||
|
let metadata: Option<serde_json::Value> = assertion
|
||||||
|
.source_metadata
|
||||||
|
.as_ref()
|
||||||
|
.and_then(|bytes| serde_json::from_slice(bytes).ok());
|
||||||
|
|
||||||
|
let explanation = metadata
|
||||||
|
.as_ref()
|
||||||
|
.and_then(|m| m.get("description"))
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
.unwrap_or("No description")
|
||||||
|
.to_string();
|
||||||
|
|
||||||
|
let category = metadata
|
||||||
|
.as_ref()
|
||||||
|
.and_then(|m| m.get("category"))
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
.map(|s| s.to_string());
|
||||||
|
|
||||||
|
let authority_source = metadata
|
||||||
|
.as_ref()
|
||||||
|
.and_then(|m| m.get("authority_source"))
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
.or_else(|| {
|
||||||
|
// Fallback: extract from subject
|
||||||
|
if assertion.subject.starts_with("rfc://") {
|
||||||
|
Some("RFC")
|
||||||
|
} else if assertion.subject.starts_with("owasp://") {
|
||||||
|
Some("OWASP")
|
||||||
|
} else if assertion.subject.starts_with("community://") {
|
||||||
|
Some("Community")
|
||||||
|
} else if assertion.subject.starts_with("vendor://") {
|
||||||
|
Some("Vendor")
|
||||||
|
} else {
|
||||||
|
Some("Unknown")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.unwrap_or("Unknown")
|
||||||
|
.to_string();
|
||||||
|
|
||||||
|
// Filter by category if requested
|
||||||
|
if let Some(ref filter_category) = params.category {
|
||||||
|
if category.as_deref() != Some(filter_category.as_str()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract source scheme
|
||||||
|
let source = if let Some(pos) = assertion.subject.find("://") {
|
||||||
|
let scheme_end = assertion.subject[..pos].to_string();
|
||||||
|
format!("{}://", scheme_end)
|
||||||
|
} else {
|
||||||
|
assertion.subject.clone()
|
||||||
|
};
|
||||||
|
|
||||||
|
sources_included.insert(source.clone());
|
||||||
|
|
||||||
|
// Convert object to display value
|
||||||
|
let value = match &assertion.object {
|
||||||
|
ObjectValue::Boolean(b) => b.to_string(),
|
||||||
|
ObjectValue::Number(n) => n.to_string(),
|
||||||
|
ObjectValue::Text(s) => s.clone(),
|
||||||
|
ObjectValue::Reference(r) => r.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Map SourceClass to tier number
|
||||||
|
let tier = match assertion.source_class {
|
||||||
|
SourceClass::Regulatory => 0,
|
||||||
|
SourceClass::Clinical => 1,
|
||||||
|
SourceClass::Observational => 2,
|
||||||
|
SourceClass::Expert => 3,
|
||||||
|
SourceClass::Community => 4,
|
||||||
|
SourceClass::Anecdotal => 5,
|
||||||
|
SourceClass::TeamPolicy => 1, // Treat team policy similar to clinical
|
||||||
|
};
|
||||||
|
|
||||||
|
all_items.push(CorpusItemDto {
|
||||||
|
subject: assertion.subject,
|
||||||
|
predicate: assertion.predicate,
|
||||||
|
value,
|
||||||
|
source,
|
||||||
|
tier,
|
||||||
|
category,
|
||||||
|
explanation,
|
||||||
|
authority_source,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Apply pagination
|
||||||
|
let total_matching = all_items.len();
|
||||||
|
let items: Vec<CorpusItemDto> =
|
||||||
|
all_items.into_iter().skip(params.offset).take(params.limit).collect();
|
||||||
|
|
||||||
|
let sources_included: Vec<String> = sources_included.into_iter().collect();
|
||||||
|
|
||||||
|
tracing::info!(
|
||||||
|
total_matching,
|
||||||
|
returned = items.len(),
|
||||||
|
sources = sources_included.len(),
|
||||||
|
"Corpus query complete"
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(Json(GetCorpusResponse { items, total_matching, sources_included }))
|
||||||
|
}
|
||||||
@ -5,9 +5,11 @@
|
|||||||
//! - `policy` - Trust pack import/export and blessing handlers
|
//! - `policy` - Trust pack import/export and blessing handlers
|
||||||
//! - `scan` - Project scanning handlers
|
//! - `scan` - Project scanning handlers
|
||||||
//! - `report` - Observation reporting and pattern query handlers
|
//! - `report` - Observation reporting and pattern query handlers
|
||||||
|
//! - `corpus` - Authoritative corpus query handlers
|
||||||
|
|
||||||
// Make submodules crate-visible so utoipa path structs can be accessed
|
// Make submodules crate-visible so utoipa path structs can be accessed
|
||||||
pub(crate) mod claims;
|
pub(crate) mod claims;
|
||||||
|
pub(crate) mod corpus;
|
||||||
pub(crate) mod policy;
|
pub(crate) mod policy;
|
||||||
pub(crate) mod report;
|
pub(crate) mod report;
|
||||||
pub(crate) mod scan;
|
pub(crate) mod scan;
|
||||||
@ -17,6 +19,7 @@ pub use claims::{
|
|||||||
acknowledge_violation, coverage, create_claim, deprecate_claim, list_claims, update_claim,
|
acknowledge_violation, coverage, create_claim, deprecate_claim, list_claims, update_claim,
|
||||||
verify_claims_handler,
|
verify_claims_handler,
|
||||||
};
|
};
|
||||||
|
pub use corpus::get_corpus;
|
||||||
pub use policy::{bless, export_policy, import_policy};
|
pub use policy::{bless, export_policy, import_policy};
|
||||||
pub use report::{get_patterns, push_community_observations, push_observations};
|
pub use report::{get_patterns, push_community_observations, push_observations};
|
||||||
pub use scan::{list_scans, scan};
|
pub use scan::{list_scans, scan};
|
||||||
|
|||||||
@ -78,6 +78,6 @@ pub use metrics::metrics_handler;
|
|||||||
#[cfg(feature = "aphoria")]
|
#[cfg(feature = "aphoria")]
|
||||||
pub use aphoria::{
|
pub use aphoria::{
|
||||||
acknowledge_violation, bless, coverage, create_claim, deprecate_claim, export_policy,
|
acknowledge_violation, bless, coverage, create_claim, deprecate_claim, export_policy,
|
||||||
get_patterns, import_policy, list_claims, list_scans, push_community_observations,
|
get_corpus, get_patterns, import_policy, list_claims, list_scans, push_community_observations,
|
||||||
push_observations, scan, update_claim, verify_claims_handler,
|
push_observations, scan, update_claim, verify_claims_handler,
|
||||||
};
|
};
|
||||||
|
|||||||
@ -204,7 +204,7 @@ mod tests {
|
|||||||
let store =
|
let store =
|
||||||
std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store);
|
let state = AppState::new(write_journal, read_journal, store, None);
|
||||||
|
|
||||||
let app = axum::Router::new()
|
let app = axum::Router::new()
|
||||||
.route("/v1/source", axum::routing::post(store_source))
|
.route("/v1/source", axum::routing::post(store_source))
|
||||||
|
|||||||
@ -41,7 +41,7 @@ async fn test_app() -> TestContext {
|
|||||||
let read_journal = Journal::open(&wal_path).expect("failed to open read journal");
|
let read_journal = Journal::open(&wal_path).expect("failed to open read journal");
|
||||||
let store = std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
let store = std::sync::Arc::new(HybridStore::open(&store_path).expect("failed to open store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store);
|
let state = AppState::new(write_journal, read_journal, store, None);
|
||||||
|
|
||||||
let app = Router::new()
|
let app = Router::new()
|
||||||
.route("/v1/sources", post(register_source))
|
.route("/v1/sources", post(register_source))
|
||||||
|
|||||||
@ -23,7 +23,7 @@
|
|||||||
//! ```ignore
|
//! ```ignore
|
||||||
//! use stemedb_api::{create_router, AppState};
|
//! use stemedb_api::{create_router, AppState};
|
||||||
//!
|
//!
|
||||||
//! let state = AppState::new(write_journal, read_journal, store);
|
//! let state = AppState::new(write_journal, read_journal, store, None);
|
||||||
//! let app = create_router(state);
|
//! let app = create_router(state);
|
||||||
//!
|
//!
|
||||||
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?;
|
//! axum::Server::bind(&addr).serve(app.into_make_service()).await?;
|
||||||
@ -32,6 +32,7 @@
|
|||||||
pub mod bootstrap;
|
pub mod bootstrap;
|
||||||
pub mod dto;
|
pub mod dto;
|
||||||
pub mod error;
|
pub mod error;
|
||||||
|
pub mod extractors;
|
||||||
pub mod handlers;
|
pub mod handlers;
|
||||||
pub mod hex;
|
pub mod hex;
|
||||||
pub mod middleware;
|
pub mod middleware;
|
||||||
@ -312,6 +313,7 @@ mod aphoria_openapi {
|
|||||||
use super::*;
|
use super::*;
|
||||||
|
|
||||||
// Re-export the path items for OpenAPI from the submodules
|
// Re-export the path items for OpenAPI from the submodules
|
||||||
|
use handlers::aphoria::corpus::__path_get_corpus;
|
||||||
use handlers::aphoria::policy::{__path_bless, __path_export_policy, __path_import_policy};
|
use handlers::aphoria::policy::{__path_bless, __path_export_policy, __path_import_policy};
|
||||||
use handlers::aphoria::report::__path_push_observations;
|
use handlers::aphoria::report::__path_push_observations;
|
||||||
use handlers::aphoria::scan::__path_scan;
|
use handlers::aphoria::scan::__path_scan;
|
||||||
@ -324,6 +326,7 @@ mod aphoria_openapi {
|
|||||||
import_policy,
|
import_policy,
|
||||||
scan,
|
scan,
|
||||||
push_observations,
|
push_observations,
|
||||||
|
get_corpus,
|
||||||
),
|
),
|
||||||
components(
|
components(
|
||||||
schemas(
|
schemas(
|
||||||
@ -346,6 +349,9 @@ mod aphoria_openapi {
|
|||||||
dto::aphoria::ObservationDto,
|
dto::aphoria::ObservationDto,
|
||||||
dto::aphoria::ObservationValueDto,
|
dto::aphoria::ObservationValueDto,
|
||||||
dto::aphoria::ObservationSignatureDto,
|
dto::aphoria::ObservationSignatureDto,
|
||||||
|
dto::aphoria::GetCorpusRequest,
|
||||||
|
dto::aphoria::GetCorpusResponse,
|
||||||
|
dto::aphoria::CorpusItemDto,
|
||||||
)
|
)
|
||||||
),
|
),
|
||||||
tags(
|
tags(
|
||||||
|
|||||||
@ -15,6 +15,7 @@
|
|||||||
//! | `STEMEDB_DB_DIR` | `data/db` | Directory for KV store |
|
//! | `STEMEDB_DB_DIR` | `data/db` | Directory for KV store |
|
||||||
//! | `STEMEDB_BIND_ADDR` | `127.0.0.1:18180` | HTTP server bind address |
|
//! | `STEMEDB_BIND_ADDR` | `127.0.0.1:18180` | HTTP server bind address |
|
||||||
//! | `STEMEDB_METER_ENABLED` | `true` | Enable economic throttling |
|
//! | `STEMEDB_METER_ENABLED` | `true` | Enable economic throttling |
|
||||||
|
//! | `STEMEDB_CORPUS_DB_DIR` | (none) | Optional: Directory for Aphoria corpus DB |
|
||||||
|
|
||||||
use std::path::PathBuf;
|
use std::path::PathBuf;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
@ -42,6 +43,9 @@ struct Config {
|
|||||||
|
|
||||||
/// Enable economic throttling (The Meter)
|
/// Enable economic throttling (The Meter)
|
||||||
meter_enabled: bool,
|
meter_enabled: bool,
|
||||||
|
|
||||||
|
/// Optional corpus database directory (for Aphoria corpus)
|
||||||
|
corpus_db_dir: Option<PathBuf>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
@ -51,6 +55,7 @@ impl Default for Config {
|
|||||||
db_dir: PathBuf::from("data/db"),
|
db_dir: PathBuf::from("data/db"),
|
||||||
bind_addr: "127.0.0.1:18180".to_string(),
|
bind_addr: "127.0.0.1:18180".to_string(),
|
||||||
meter_enabled: true,
|
meter_enabled: true,
|
||||||
|
corpus_db_dir: None,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -76,6 +81,10 @@ impl Config {
|
|||||||
config.meter_enabled = meter_enabled.to_lowercase() != "false" && meter_enabled != "0";
|
config.meter_enabled = meter_enabled.to_lowercase() != "false" && meter_enabled != "0";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if let Ok(corpus_db_dir) = std::env::var("STEMEDB_CORPUS_DB_DIR") {
|
||||||
|
config.corpus_db_dir = Some(PathBuf::from(corpus_db_dir));
|
||||||
|
}
|
||||||
|
|
||||||
config
|
config
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -117,8 +126,19 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
info!("Opening HybridStore at {:?}", config.db_dir);
|
info!("Opening HybridStore at {:?}", config.db_dir);
|
||||||
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
let store = Arc::new(HybridStore::open(&config.db_dir)?);
|
||||||
|
|
||||||
|
// Open optional corpus store (for Aphoria corpus)
|
||||||
|
let corpus_store = if let Some(ref corpus_dir) = config.corpus_db_dir {
|
||||||
|
// Ensure corpus directory exists
|
||||||
|
std::fs::create_dir_all(corpus_dir)?;
|
||||||
|
info!("Opening corpus HybridStore at {:?}", corpus_dir);
|
||||||
|
Some(Arc::new(HybridStore::open(corpus_dir)?))
|
||||||
|
} else {
|
||||||
|
info!("No separate corpus DB configured, using main store for corpus queries");
|
||||||
|
None
|
||||||
|
};
|
||||||
|
|
||||||
// Create application state (initializes GroupCommitBuffer)
|
// Create application state (initializes GroupCommitBuffer)
|
||||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store));
|
let state = AppState::new(write_journal, read_journal, Arc::clone(&store), corpus_store);
|
||||||
|
|
||||||
// Spawn IngestWorker background task (uses read journal)
|
// Spawn IngestWorker background task (uses read journal)
|
||||||
info!("Spawning IngestWorker background task");
|
info!("Spawning IngestWorker background task");
|
||||||
|
|||||||
@ -387,6 +387,7 @@ fn build_api_routes() -> Router<AppState> {
|
|||||||
post(handlers::push_community_observations),
|
post(handlers::push_community_observations),
|
||||||
)
|
)
|
||||||
.route("/v1/aphoria/patterns", get(handlers::get_patterns))
|
.route("/v1/aphoria/patterns", get(handlers::get_patterns))
|
||||||
|
.route("/v1/aphoria/corpus", get(handlers::get_corpus))
|
||||||
// Claims management endpoints
|
// Claims management endpoints
|
||||||
.route("/v1/aphoria/claims/list", post(handlers::list_claims))
|
.route("/v1/aphoria/claims/list", post(handlers::list_claims))
|
||||||
.route("/v1/aphoria/claims/create", post(handlers::create_claim))
|
.route("/v1/aphoria/claims/create", post(handlers::create_claim))
|
||||||
|
|||||||
@ -53,6 +53,10 @@ pub struct AppState {
|
|||||||
/// Key-value store for reading assertions
|
/// Key-value store for reading assertions
|
||||||
pub store: Arc<HybridStore>,
|
pub store: Arc<HybridStore>,
|
||||||
|
|
||||||
|
/// Corpus store for Aphoria authoritative sources (RFC, OWASP, Community).
|
||||||
|
/// Falls back to main store if not configured separately.
|
||||||
|
pub corpus_store: Arc<HybridStore>,
|
||||||
|
|
||||||
/// Quota store for economic throttling (The Meter)
|
/// Quota store for economic throttling (The Meter)
|
||||||
pub quota_store: Arc<QuotaStoreImpl>,
|
pub quota_store: Arc<QuotaStoreImpl>,
|
||||||
|
|
||||||
@ -97,7 +101,14 @@ impl AppState {
|
|||||||
///
|
///
|
||||||
/// Creates a shared notification channel that GroupCommitBuffer uses
|
/// Creates a shared notification channel that GroupCommitBuffer uses
|
||||||
/// to signal IngestWorker when new data is flushed.
|
/// to signal IngestWorker when new data is flushed.
|
||||||
pub fn new(write_journal: Journal, read_journal: Journal, store: Arc<HybridStore>) -> Self {
|
///
|
||||||
|
/// If `corpus_store` is None, the main `store` will be used for corpus queries.
|
||||||
|
pub fn new(
|
||||||
|
write_journal: Journal,
|
||||||
|
read_journal: Journal,
|
||||||
|
store: Arc<HybridStore>,
|
||||||
|
corpus_store: Option<Arc<HybridStore>>,
|
||||||
|
) -> Self {
|
||||||
// Create shared notification channel for WAL flush -> IngestWorker signaling
|
// Create shared notification channel for WAL flush -> IngestWorker signaling
|
||||||
let flush_notify = Arc::new(Notify::new());
|
let flush_notify = Arc::new(Notify::new());
|
||||||
|
|
||||||
@ -108,6 +119,9 @@ impl AppState {
|
|||||||
|
|
||||||
let journal = Arc::new(Mutex::new(read_journal));
|
let journal = Arc::new(Mutex::new(read_journal));
|
||||||
|
|
||||||
|
// Use provided corpus_store or fall back to main store
|
||||||
|
let corpus_store = corpus_store.unwrap_or_else(|| Arc::clone(&store));
|
||||||
|
|
||||||
// Create quota store backed by the same KV store
|
// Create quota store backed by the same KV store
|
||||||
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
|
let quota_store = Arc::new(GenericQuotaStore::new(Arc::clone(&store)));
|
||||||
|
|
||||||
@ -139,6 +153,7 @@ impl AppState {
|
|||||||
commit_buffer,
|
commit_buffer,
|
||||||
journal,
|
journal,
|
||||||
store,
|
store,
|
||||||
|
corpus_store,
|
||||||
quota_store,
|
quota_store,
|
||||||
escalation_store,
|
escalation_store,
|
||||||
alias_store,
|
alias_store,
|
||||||
|
|||||||
@ -39,7 +39,7 @@ pub async fn create_test_env() -> TestEnvironment {
|
|||||||
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||||
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("failed to open store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store);
|
let state = AppState::new(write_journal, read_journal, store, None);
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state }
|
TestEnvironment { _temp_dir: temp_dir, state }
|
||||||
}
|
}
|
||||||
@ -70,7 +70,7 @@ pub async fn create_test_env_with_ingestor() -> TestEnvironmentWithIngestor {
|
|||||||
// Create AppState with write and read journals
|
// Create AppState with write and read journals
|
||||||
let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
|
let write_journal = Journal::open(&wal_dir).expect("failed to open write journal");
|
||||||
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
let read_journal = Journal::open(&wal_dir).expect("failed to open read journal");
|
||||||
let state = AppState::new(write_journal, read_journal, store);
|
let state = AppState::new(write_journal, read_journal, store, None);
|
||||||
|
|
||||||
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
|
TestEnvironmentWithIngestor { _temp_dir: temp_dir, state, ingestor }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -65,7 +65,7 @@ async fn create_test_environment() -> TestEnvironment {
|
|||||||
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||||
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||||
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||||
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
let state = stemedb_api::AppState::new(write_journal, read_journal, Arc::clone(&store_arc), None);
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -53,7 +53,7 @@ async fn create_test_environment() -> TestEnvironment {
|
|||||||
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
Arc::new(Mutex::new(Journal::open(&wal_dir).expect("Failed to open journal for ingest")));
|
||||||
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
let write_journal = Journal::open(&wal_dir).expect("Failed to open write journal");
|
||||||
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
let read_journal = Journal::open(&wal_dir).expect("Failed to open read journal");
|
||||||
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc));
|
let state = AppState::new(write_journal, read_journal, Arc::clone(&store_arc), None);
|
||||||
|
|
||||||
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
TestEnvironment { _temp_dir: temp_dir, state, store: store_arc, journal: journal_arc }
|
||||||
}
|
}
|
||||||
|
|||||||
@ -202,7 +202,7 @@ async fn test_quota_consumption_with_meter() {
|
|||||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||||
let quota_store = state.quota_store.clone();
|
let quota_store = state.quota_store.clone();
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
@ -258,7 +258,7 @@ async fn test_quota_exceeded_response() {
|
|||||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||||
let quota_store = state.quota_store.clone();
|
let quota_store = state.quota_store.clone();
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
@ -304,7 +304,7 @@ async fn test_quota_headers_format() {
|
|||||||
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
let read_journal = Journal::open(&wal_dir).expect("read journal");
|
||||||
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
let store = Arc::new(HybridStore::open(&db_dir).expect("store"));
|
||||||
|
|
||||||
let state = AppState::new(write_journal, read_journal, store.clone());
|
let state = AppState::new(write_journal, read_journal, store.clone(), None);
|
||||||
let quota_store = state.quota_store.clone();
|
let quota_store = state.quota_store.clone();
|
||||||
|
|
||||||
let app = create_router_with_meter(state);
|
let app = create_router_with_meter(state);
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user