stemedb/applications/aphoria/src/community/mod.rs
jml 65065f3d8f feat(aphoria): implement community corpus with wiki import and pattern aggregation
Implements Phase 4 (A4) - Community corpus as first-class citizens:

- **Community Corpus Builder** - Queries StemeDB pattern aggregates
- **Wiki Import** - Bootstrap corpus from markdown docs (aphoria corpus import wiki)
- **Pattern Aggregation** - Automatic learning from local scans (--sync flag)
- **Storage Layer** - StemeDBPatternStore with content-addressed deduplication
- **Promotion Logic** - Multi-tier thresholds (95%/80%/50% adoption rates)
- **Corpus Build** - Unified registry for RFC/OWASP/Vendor/Community sources
- **Trust Packs** - Export corpus as signed, distributable artifacts
- **Documentation** - bootstrap-corpus.md guide + CLI reference updates

Technical details:
- Pattern aggregates stored as assertions with predicate "pattern_aggregate"
- Content-addressed subjects via BLAKE3(subject:predicate:value)
- PatternAggregator handles write path (observations → patterns)
- StemeDBPatternStore handles read path (pattern queries)
- Integration tests + fixtures in tests/wiki_import_test.rs

Deleted hardcoded.rs (368 lines) - corpus now fully emergent from StemeDB.
Deleted enriched-corpus-patterns.md (677 lines) - feature shipped.

Closes VG-026 (community corpus), part of A4 milestone.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-09 00:12:31 +00:00

40 lines
1.5 KiB
Rust

//! Community corpus contribution module for Aphoria.
//!
//! Enables opt-in anonymous contribution of scan patterns to a central corpus,
//! allowing community consensus to adjust default thresholds.
//!
//! # Privacy Model
//!
//! The anonymization pipeline strips all identifying information:
//! - Project names are wildcarded: `code://rust/myapp/tls` → `code://rust/*/tls`
//! - File paths, line numbers, and matched text are NOT included in the anon_hash
//! - Timestamps are rounded to the nearest hour for k-anonymity
//! - Server receives project_hash (not project_id) to prevent name leakage
//!
//! # User Journey
//!
//! ```text
//! [opt-in: [community] enabled=true]
//! → [scan extracts claims]
//! → [filter by community.exclude]
//! → [anonymize: wildcard project path, strip file/line/text, rehash]
//! → [push to POST /v1/aphoria/community/observations]
//! → [server aggregates by (subject, predicate, value)]
//! → [GET /v1/aphoria/patterns returns high-confidence patterns]
//! ```
mod anonymizer;
mod extractor_loader;
mod pattern_store;
mod pattern_syncer;
mod types;
pub use anonymizer::{anonymize_claim, compute_anon_hash, wildcard_project_path};
pub use extractor_loader::CommunityExtractorLoader;
pub use pattern_store::{PatternAggregator, StemeDBPatternStore};
pub use pattern_syncer::{compute_pattern_hash, PatternSyncer};
pub use types::{
AnonymizedObservation, CommunityClaimDef, CommunityExtractor, CommunityExtractorProvenance,
CommunityObjectValue, PatternAggregate, SharedClaimTemplate, SharedPattern,
};