Major documentation restructure to improve discoverability and reduce duplication. ## Changes **Deleted (Archived/Consolidated)**: - Removed duplicate getting started guides - Archived outdated planning documents - Consolidated corpus and configuration docs - Removed obsolete vision/spec files (superseded by vision.md) - Cleaned up scrapyard and old PDFs **New Structure**: - docs/about/ - Project overview and introduction - docs/guides/ - User guides (moved from root) - docs/specs/ - Technical specifications - docs/sdk/ - SDK documentation (Go) - docs/references/ - API references - docs/archive/ - Archived historical docs - applications/aphoria/docs/advanced/ - Advanced topics - applications/aphoria/docs/reference/ - CLI reference - applications/aphoria/docs/archive/ - Archived aphoria docs **Updated**: - README.md - New root README with clear navigation - CONTRIBUTING.md - Contribution guidelines - CLAUDE.md - Updated paths to new structure - roadmap.md - Added recent completions ## Files Changed - 57 files changed - 1,977 insertions(+) - 961 deletions(-) **Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
645 lines
17 KiB
Markdown
645 lines
17 KiB
Markdown
---
|
|
created: 2026-02-08
|
|
last_updated: 2026-02-08
|
|
status: Planning Document
|
|
feature: Phase 2-3 - LLM-Assisted Document Ingestion
|
|
timeline: 4 weeks estimated
|
|
---
|
|
|
|
# Ingest Best Practices Documentation - Executable Policy
|
|
|
|
## Problem Statement
|
|
|
|
**Current Reality:**
|
|
1. Teams write extensive architecture/security/style guides (50+ pages)
|
|
2. Developers are expected to read and remember all guidelines
|
|
3. Compliance is checked manually in code review
|
|
4. Guidelines drift out of sync with code over time
|
|
5. New team members miss context from old documents
|
|
|
|
**What Users Want:**
|
|
- Write documentation once (markdown, PDF, confluence)
|
|
- Have Aphoria automatically enforce the guidelines
|
|
- Get real-time feedback during development
|
|
- Maintain compliance without manual review
|
|
|
|
## Vision: Documentation That Enforces Itself
|
|
|
|
### Example: Hexagonal Architecture Guide
|
|
|
|
**Traditional Flow:**
|
|
```
|
|
1. Architect writes: "HTTP handlers MUST be in adapters/http/"
|
|
2. Developer reads guide (hopefully)
|
|
3. Developer writes code in wrong location
|
|
4. Code reviewer catches it (maybe)
|
|
5. Fix during review (wasted time)
|
|
```
|
|
|
|
**With Aphoria Ingestion:**
|
|
```
|
|
1. Architect writes: "HTTP handlers MUST be in adapters/http/"
|
|
2. Run: aphoria ingest-guide architecture.md
|
|
3. Developer writes code in wrong location
|
|
4. aphoria scan immediately shows:
|
|
❌ File location violation
|
|
Expected: adapters/http/*_handler.go
|
|
Found: adapters/handlers/user.go
|
|
Fix: Move to adapters/http/user_handler.go
|
|
5. Developer fixes before commit (no review cycles wasted)
|
|
```
|
|
|
|
## User Experience
|
|
|
|
### 1. Ingest Phase
|
|
|
|
```bash
|
|
$ aphoria ingest-guide docs/architecture/hexagonal.md \
|
|
--authority-tier team_policy \
|
|
--category architecture \
|
|
--dry-run
|
|
|
|
Analyzing: docs/architecture/hexagonal.md (15 KB, 342 lines)
|
|
|
|
📊 Extraction Summary:
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
Section Claims Severity
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
Directory Structure 8 MUST
|
|
Dependency Rules 6 MUST_NOT
|
|
Naming Conventions 5 MUST
|
|
Interface Definitions 4 SHOULD
|
|
Testing Strategy 3 SHOULD
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
Total 26 claims extracted
|
|
|
|
🔍 Preview of Extracted Claims:
|
|
|
|
1. [MUST] HTTP handlers in adapters/http/ directory
|
|
Subject: code://go/*/adapters/http/**
|
|
Predicate: directory_pattern
|
|
Value: *_handler.go
|
|
Source: hexagonal.md:45-47
|
|
|
|
2. [MUST_NOT] Core domain imports infrastructure
|
|
Subject: code://go/*/core/domain/**
|
|
Predicate: imports_forbidden
|
|
Value: infrastructure/*
|
|
Source: hexagonal.md:62-64
|
|
|
|
3. [MUST] Handler files end with _handler.go
|
|
Subject: code://go/*/adapters/http/*.go
|
|
Predicate: filename_pattern
|
|
Value: *_handler.go
|
|
Source: hexagonal.md:89-91
|
|
|
|
... (23 more)
|
|
|
|
Would add 26 claims to authoritative corpus.
|
|
Estimated scan coverage: ~65% of codebase
|
|
|
|
Proceed with ingestion? [y/N]
|
|
```
|
|
|
|
### 2. Compliance Checking
|
|
|
|
```bash
|
|
$ aphoria scan --check-policy hexagonal-arch
|
|
|
|
📋 Policy Compliance Report: Hexagonal Architecture
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
|
|
✅ Directory Structure (95% compliant)
|
|
✓ 45 files in correct locations
|
|
❌ 3 violations:
|
|
• adapters/handlers/user.go → should be adapters/http/user_handler.go
|
|
• adapters/db/user_repo.go → should be adapters/persistence/user_repo.go
|
|
• domain/user_service.go → should be core/domain/user_service.go
|
|
|
|
✅ Dependency Rules (100% compliant)
|
|
✓ No forbidden imports detected
|
|
✓ Core domain is clean of infrastructure dependencies
|
|
|
|
⚠️ Naming Conventions (80% compliant)
|
|
✓ 35 files follow naming conventions
|
|
❌ 9 violations:
|
|
• adapters/http/user.go → should be user_handler.go
|
|
• adapters/http/order.go → should be order_handler.go
|
|
... (7 more)
|
|
|
|
✅ Interface Definitions (90% compliant)
|
|
✓ 18 interfaces properly named
|
|
⚠️ 2 warnings:
|
|
• PostgresUserRepository → consider UserStore (behavior-based naming)
|
|
• MySQLOrderRepository → consider OrderStore (behavior-based naming)
|
|
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
Overall Compliance: 91% (237 of 260 checks passed)
|
|
|
|
📝 Recommendations:
|
|
1. Run: aphoria fix --policy hexagonal-arch --auto-safe
|
|
This will automatically fix 8 file location issues
|
|
|
|
2. Manually review 2 interface naming suggestions
|
|
|
|
3. Update hexagonal.md if any rules need revision
|
|
|
|
Last policy update: hexagonal.md (modified 3 days ago)
|
|
```
|
|
|
|
### 3. Real-Time Feedback
|
|
|
|
```bash
|
|
$ git add adapters/handlers/user.go
|
|
$ git commit
|
|
|
|
⚠️ Pre-commit hook: Aphoria policy check
|
|
|
|
❌ Policy violations detected (hexagonal-arch):
|
|
|
|
adapters/handlers/user.go:
|
|
❌ File location violation
|
|
Expected: adapters/http/*_handler.go
|
|
Found: adapters/handlers/user.go
|
|
Rule: HTTP handlers must be in adapters/http/
|
|
Source: hexagonal.md:45 (team policy)
|
|
|
|
Suggested fix:
|
|
git mv adapters/handlers/user.go adapters/http/user_handler.go
|
|
|
|
Commit blocked. Fix violations or use --no-verify to skip.
|
|
```
|
|
|
|
## Data Model
|
|
|
|
### Ingested Claim Structure
|
|
|
|
```rust
|
|
pub struct IngestedClaim {
|
|
/// Unique claim ID
|
|
pub id: String,
|
|
|
|
/// Subject pattern (supports wildcards)
|
|
pub subject_pattern: String,
|
|
|
|
/// Predicate
|
|
pub predicate: String,
|
|
|
|
/// Expected value
|
|
pub value: ClaimValue,
|
|
|
|
/// Comparison mode
|
|
pub comparison: ComparisonMode, // MUST, MUST_NOT, SHOULD, MAY
|
|
|
|
/// Category
|
|
pub category: String, // "architecture" | "security" | "style"
|
|
|
|
/// Explanation (from doc)
|
|
pub explanation: String,
|
|
|
|
/// Authority tier
|
|
pub authority_tier: AuthorityTier, // TeamPolicy (Tier 2.5)
|
|
|
|
/// Source document tracking
|
|
pub source: DocumentSource,
|
|
|
|
/// When ingested
|
|
pub ingested_at: u64,
|
|
}
|
|
|
|
pub struct DocumentSource {
|
|
/// Path to source document
|
|
pub file_path: String,
|
|
|
|
/// Line numbers in source
|
|
pub line_start: u32,
|
|
pub line_end: u32,
|
|
|
|
/// Section heading
|
|
pub section: String, // "Directory Structure Rules"
|
|
|
|
/// Document version (hash)
|
|
pub document_hash: String,
|
|
}
|
|
|
|
pub enum ComparisonMode {
|
|
Must, // Value MUST match
|
|
MustNot, // Value MUST NOT match
|
|
Should, // Warning if doesn't match
|
|
May, // Informational only
|
|
}
|
|
```
|
|
|
|
### Authority Tier Hierarchy
|
|
|
|
```
|
|
Tier 0: System (StemeDB internals, not user-facing)
|
|
Tier 1: Regulatory (RFCs, legal requirements)
|
|
Tier 2: Clinical (OWASP, NIST, industry standards)
|
|
Tier 2.5: TeamPolicy ← NEW (team-specific guidelines)
|
|
Tier 3: Expert (recognized authorities, vetted claims)
|
|
Tier 4: Community (project-specific observations)
|
|
```
|
|
|
|
TeamPolicy tier:
|
|
- Higher authority than community observations
|
|
- Lower authority than industry standards (OWASP)
|
|
- Can override community patterns
|
|
- Cannot override RFCs or security standards
|
|
- Scoped to project/team
|
|
|
|
## Implementation
|
|
|
|
### Phase 1: Manual Extraction (MVP - 2 days)
|
|
|
|
User manually creates claims TOML from guidelines, then imports:
|
|
|
|
```bash
|
|
# User writes claims.toml manually from reading architecture.md
|
|
aphoria claims import team-guidelines.toml --authority-tier team_policy
|
|
```
|
|
|
|
**Pros:** Works immediately, no LLM needed
|
|
**Cons:** Manual work, doesn't scale
|
|
|
|
### Phase 2: LLM-Assisted Extraction (Week 1 - 3 days)
|
|
|
|
```bash
|
|
$ aphoria ingest-guide docs/architecture/hexagonal.md --preview
|
|
|
|
Processing: hexagonal.md
|
|
Using LLM to extract claims...
|
|
|
|
Found 26 potential claims. Review and edit before importing.
|
|
Opening editor...
|
|
|
|
# Generated claims (edit before importing)
|
|
[[claim]]
|
|
id = "hex-arch-http-handlers-001"
|
|
subject = "code://go/*/adapters/http/**"
|
|
predicate = "directory_pattern"
|
|
value = "*_handler.go"
|
|
comparison = "must"
|
|
category = "architecture"
|
|
explanation = "HTTP handlers must be in adapters/http/ directory"
|
|
source = "hexagonal.md:45-47"
|
|
|
|
# Edit to refine, then save and close to import
|
|
```
|
|
|
|
**LLM Prompt:**
|
|
```
|
|
Extract architectural claims from this document.
|
|
|
|
For each MUST/SHOULD/MUST NOT statement:
|
|
1. Identify the subject (what code element)
|
|
2. Identify the predicate (what property)
|
|
3. Identify the value (expected value)
|
|
4. Determine comparison mode (must/should/must_not)
|
|
5. Extract explanation
|
|
|
|
Format as TOML claims.
|
|
|
|
Example input:
|
|
"HTTP handlers MUST be in adapters/http/ directory and end with _handler.go"
|
|
|
|
Example output:
|
|
[[claim]]
|
|
subject = "code://go/*/adapters/http/**"
|
|
predicate = "directory_pattern"
|
|
value = "*_handler.go"
|
|
comparison = "must"
|
|
explanation = "HTTP handlers must be in adapters/http/ directory"
|
|
```
|
|
|
|
**Implementation:**
|
|
- File: `applications/aphoria/src/llm/document_ingestion.rs`
|
|
- Uses existing LLM infrastructure
|
|
- Outputs TOML for review before import
|
|
- User can edit/refine before committing
|
|
|
|
### Phase 3: Automated Extraction with Validation (Week 2 - 4 days)
|
|
|
|
Fully automated pipeline with confidence scoring:
|
|
|
|
```rust
|
|
pub struct DocumentIngester {
|
|
llm: LlmClient,
|
|
validator: ClaimValidator,
|
|
}
|
|
|
|
impl DocumentIngester {
|
|
/// Ingest a document and extract claims.
|
|
pub async fn ingest(
|
|
&self,
|
|
doc_path: &Path,
|
|
options: IngestionOptions,
|
|
) -> Result<Vec<IngestedClaim>, IngestionError> {
|
|
// 1. Parse document (markdown/PDF/text)
|
|
let sections = self.parse_document(doc_path)?;
|
|
|
|
// 2. Extract claims using LLM
|
|
let raw_claims = self.extract_claims_from_sections(sections).await?;
|
|
|
|
// 3. Validate and score confidence
|
|
let validated = self.validate_claims(raw_claims)?;
|
|
|
|
// 4. Filter by confidence threshold
|
|
let high_confidence: Vec<_> = validated
|
|
.into_iter()
|
|
.filter(|c| c.confidence >= options.min_confidence)
|
|
.collect();
|
|
|
|
// 5. Preview or auto-import
|
|
if options.dry_run {
|
|
self.preview_claims(&high_confidence)?;
|
|
Ok(vec![])
|
|
} else {
|
|
self.import_claims(high_confidence).await
|
|
}
|
|
}
|
|
|
|
/// Extract claims from a section using LLM.
|
|
async fn extract_claims_from_section(
|
|
&self,
|
|
section: &DocumentSection,
|
|
) -> Result<Vec<Observation>, LlmError> {
|
|
let prompt = format!(
|
|
r#"Extract architectural claims from this section.
|
|
|
|
Section: {}
|
|
|
|
Content:
|
|
{}
|
|
|
|
For each claim:
|
|
1. Identify subject pattern (supports wildcards)
|
|
2. Identify predicate
|
|
3. Identify expected value
|
|
4. Determine severity (MUST/SHOULD/MAY)
|
|
5. Extract explanation
|
|
|
|
Return as JSON array."#,
|
|
section.heading,
|
|
section.content,
|
|
);
|
|
|
|
self.llm.extract_structured(prompt).await
|
|
}
|
|
|
|
/// Validate extracted claims for quality.
|
|
fn validate_claims(
|
|
&self,
|
|
claims: Vec<Observation>,
|
|
) -> Result<Vec<ValidatedClaim>, ValidationError> {
|
|
claims
|
|
.into_iter()
|
|
.map(|claim| {
|
|
// Check if subject pattern is valid
|
|
let subject_valid = self.validator.validate_subject(&claim.subject);
|
|
|
|
// Check if predicate is recognized
|
|
let predicate_valid = self.validator.validate_predicate(&claim.predicate);
|
|
|
|
// Compute confidence score
|
|
let confidence = self.compute_confidence(&claim, subject_valid, predicate_valid);
|
|
|
|
ValidatedClaim {
|
|
claim,
|
|
confidence,
|
|
validation_issues: vec![],
|
|
}
|
|
})
|
|
.collect()
|
|
}
|
|
}
|
|
```
|
|
|
|
## CLI Design
|
|
|
|
### Commands
|
|
|
|
```bash
|
|
# Ingest a document
|
|
aphoria ingest-guide <path> [options]
|
|
|
|
Options:
|
|
--authority-tier <tier> Authority tier (default: team_policy)
|
|
--category <category> Category (architecture|security|style)
|
|
--min-confidence <float> Min confidence to include (0.0-1.0, default: 0.7)
|
|
--dry-run Preview without importing
|
|
--edit Open editor to review/refine before importing
|
|
--project <name> Project scope (default: current project)
|
|
|
|
# List ingested guidelines
|
|
aphoria list-guides
|
|
|
|
# Check compliance against a guideline
|
|
aphoria check-compliance <guide-name>
|
|
|
|
# Update from changed document
|
|
aphoria update-guide <guide-name>
|
|
|
|
# Remove a guideline
|
|
aphoria remove-guide <guide-name>
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# Ingest with preview
|
|
aphoria ingest-guide docs/architecture/hexagonal.md --dry-run
|
|
|
|
# Ingest with manual review
|
|
aphoria ingest-guide docs/security/owasp-top-10.md --edit
|
|
|
|
# Check compliance
|
|
aphoria check-compliance hexagonal-arch
|
|
|
|
# Update when doc changes
|
|
aphoria update-guide hexagonal-arch --from docs/architecture/hexagonal.md
|
|
|
|
# List all active guidelines
|
|
aphoria list-guides
|
|
```
|
|
|
|
## Integration with Existing Features
|
|
|
|
### 1. Conflict Detection
|
|
Ingested claims stored as authoritative assertions → existing conflict engine detects violations
|
|
|
|
### 2. Scan Reports
|
|
Compliance shown in standard scan reports:
|
|
```
|
|
Conflicts: 3
|
|
❌ hexagonal.md:45 - File in wrong directory
|
|
❌ hexagonal.md:62 - Forbidden import detected
|
|
❌ hexagonal.md:89 - Invalid filename pattern
|
|
```
|
|
|
|
### 3. Authority Lens
|
|
TeamPolicy tier (2.5) ranks between Clinical (2) and Expert (3):
|
|
- Overrides community observations
|
|
- Can be overridden by team-authored claims (explicit)
|
|
- Respects RFCs and security standards
|
|
|
|
### 4. Pre-commit Hooks
|
|
Compliance checking in pre-commit:
|
|
```bash
|
|
#!/bin/bash
|
|
# .git/hooks/pre-commit
|
|
|
|
aphoria scan --check-policy hexagonal-arch --exit-code
|
|
```
|
|
|
|
## Storage
|
|
|
|
### Claims Storage
|
|
Ingested claims stored as regular AuthoredClaim instances:
|
|
- File: `.aphoria/claims.toml`
|
|
- Tagged with `ingested_from: "hexagonal.md"`
|
|
- Authority tier: `team_policy`
|
|
|
|
### Document Metadata
|
|
Track source documents:
|
|
```toml
|
|
# .aphoria/ingested_guides.toml
|
|
|
|
[[guide]]
|
|
id = "hexagonal-arch"
|
|
name = "Hexagonal Architecture Guidelines"
|
|
source_path = "docs/architecture/hexagonal.md"
|
|
document_hash = "blake3:abc123..."
|
|
ingested_at = 1234567890
|
|
claims_count = 26
|
|
authority_tier = "team_policy"
|
|
category = "architecture"
|
|
|
|
[[guide]]
|
|
id = "security-owasp"
|
|
name = "OWASP Top 10 Compliance"
|
|
source_path = "docs/security/owasp.md"
|
|
document_hash = "blake3:def456..."
|
|
ingested_at = 1234567900
|
|
claims_count = 15
|
|
authority_tier = "team_policy"
|
|
category = "security"
|
|
```
|
|
|
|
### Update Detection
|
|
```bash
|
|
$ aphoria scan
|
|
|
|
⚠️ Warning: Source document has changed
|
|
Guide: hexagonal-arch
|
|
Source: docs/architecture/hexagonal.md
|
|
Last ingested: 3 days ago (hash: abc123...)
|
|
Current hash: xyz789...
|
|
|
|
Run: aphoria update-guide hexagonal-arch
|
|
```
|
|
|
|
## Success Metrics
|
|
|
|
### Phase 1 (Manual Import)
|
|
- Users can manually create claims from guidelines
|
|
- Claims enforce during scans
|
|
- Pre-commit hooks work
|
|
|
|
### Phase 2 (LLM-Assisted)
|
|
- LLM extracts 80%+ of claims correctly
|
|
- Users can review/edit before importing
|
|
- Saves >90% of manual effort
|
|
|
|
### Phase 3 (Automated)
|
|
- Confidence scoring filters noise
|
|
- Automatic updates when docs change
|
|
- Compliance reports show trends
|
|
|
|
## Open Questions
|
|
|
|
1. **How to handle ambiguous statements?**
|
|
- "Handlers should generally be in adapters/http/"
|
|
- Extract as SHOULD with low confidence?
|
|
|
|
2. **How to handle conflicting guidelines?**
|
|
- Doc A says X, Doc B says Y
|
|
- Use authority tier + recency?
|
|
|
|
3. **Should we support non-Markdown formats?**
|
|
- PDF extraction (common for external standards)
|
|
- Confluence/Google Docs (via export)
|
|
- Word documents
|
|
|
|
4. **How to version guidelines?**
|
|
- When doc changes, update claims or create new versions?
|
|
- Show history of guideline changes?
|
|
|
|
5. **Should compliance be project-scoped or team-scoped?**
|
|
- Team-level guidelines apply to all team projects?
|
|
- Project-specific guidelines override team guidelines?
|
|
|
|
## Future Enhancements
|
|
|
|
### 1. Guided Onboarding
|
|
```bash
|
|
$ aphoria init --with-guides
|
|
|
|
No guidelines found. Would you like to:
|
|
1. Import existing documentation
|
|
2. Start from template (hexagonal/clean/ddd)
|
|
3. Skip for now
|
|
|
|
Choice: 1
|
|
|
|
Enter path to architecture guide: docs/architecture.md
|
|
Enter path to security guide: docs/security.md
|
|
Enter path to style guide: (skip)
|
|
|
|
Extracting claims...
|
|
Found 42 claims across 2 documents.
|
|
Review before importing? [Y/n]
|
|
```
|
|
|
|
### 2. Compliance Dashboard
|
|
Visual compliance tracking over time:
|
|
- Trend graphs (improving/declining)
|
|
- Per-guideline compliance scores
|
|
- Team comparison (if multiple teams)
|
|
|
|
### 3. AI-Generated Fix Suggestions
|
|
```bash
|
|
❌ File location violation
|
|
Expected: adapters/http/*_handler.go
|
|
Found: adapters/handlers/user.go
|
|
|
|
Suggested fix:
|
|
git mv adapters/handlers/user.go adapters/http/user_handler.go
|
|
|
|
Apply fix? [y/N]
|
|
```
|
|
|
|
### 4. Guideline Templates
|
|
Pre-built guidelines for common architectures:
|
|
- Hexagonal Architecture
|
|
- Clean Architecture
|
|
- Domain-Driven Design
|
|
- Microservices patterns
|
|
- Security baselines (OWASP, NIST)
|
|
|
|
```bash
|
|
$ aphoria init-guide --template hexagonal
|
|
|
|
Imported 35 hexagonal architecture claims.
|
|
Customize at: .aphoria/claims.toml
|
|
```
|
|
|
|
## Timeline
|
|
|
|
- **Week 1:** Manual import (MVP) + LLM extraction prototype
|
|
- **Week 2:** Automated pipeline + confidence scoring
|
|
- **Week 3:** CLI polish + documentation + examples
|
|
- **Week 4:** Dashboard integration + compliance reports
|
|
|
|
**Total:** 4 weeks for complete feature
|