Implements Phase 4 (A4) - Community corpus as first-class citizens: - **Community Corpus Builder** - Queries StemeDB pattern aggregates - **Wiki Import** - Bootstrap corpus from markdown docs (aphoria corpus import wiki) - **Pattern Aggregation** - Automatic learning from local scans (--sync flag) - **Storage Layer** - StemeDBPatternStore with content-addressed deduplication - **Promotion Logic** - Multi-tier thresholds (95%/80%/50% adoption rates) - **Corpus Build** - Unified registry for RFC/OWASP/Vendor/Community sources - **Trust Packs** - Export corpus as signed, distributable artifacts - **Documentation** - bootstrap-corpus.md guide + CLI reference updates Technical details: - Pattern aggregates stored as assertions with predicate "pattern_aggregate" - Content-addressed subjects via BLAKE3(subject:predicate:value) - PatternAggregator handles write path (observations → patterns) - StemeDBPatternStore handles read path (pattern queries) - Integration tests + fixtures in tests/wiki_import_test.rs Deleted hardcoded.rs (368 lines) - corpus now fully emergent from StemeDB. Deleted enriched-corpus-patterns.md (677 lines) - feature shipped. Closes VG-026 (community corpus), part of A4 milestone. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
40 lines
1.5 KiB
Rust
40 lines
1.5 KiB
Rust
//! Community corpus contribution module for Aphoria.
|
|
//!
|
|
//! Enables opt-in anonymous contribution of scan patterns to a central corpus,
|
|
//! allowing community consensus to adjust default thresholds.
|
|
//!
|
|
//! # Privacy Model
|
|
//!
|
|
//! The anonymization pipeline strips all identifying information:
|
|
//! - Project names are wildcarded: `code://rust/myapp/tls` → `code://rust/*/tls`
|
|
//! - File paths, line numbers, and matched text are NOT included in the anon_hash
|
|
//! - Timestamps are rounded to the nearest hour for k-anonymity
|
|
//! - Server receives project_hash (not project_id) to prevent name leakage
|
|
//!
|
|
//! # User Journey
|
|
//!
|
|
//! ```text
|
|
//! [opt-in: [community] enabled=true]
|
|
//! → [scan extracts claims]
|
|
//! → [filter by community.exclude]
|
|
//! → [anonymize: wildcard project path, strip file/line/text, rehash]
|
|
//! → [push to POST /v1/aphoria/community/observations]
|
|
//! → [server aggregates by (subject, predicate, value)]
|
|
//! → [GET /v1/aphoria/patterns returns high-confidence patterns]
|
|
//! ```
|
|
|
|
mod anonymizer;
|
|
mod extractor_loader;
|
|
mod pattern_store;
|
|
mod pattern_syncer;
|
|
mod types;
|
|
|
|
pub use anonymizer::{anonymize_claim, compute_anon_hash, wildcard_project_path};
|
|
pub use extractor_loader::CommunityExtractorLoader;
|
|
pub use pattern_store::{PatternAggregator, StemeDBPatternStore};
|
|
pub use pattern_syncer::{compute_pattern_hash, PatternSyncer};
|
|
pub use types::{
|
|
AnonymizedObservation, CommunityClaimDef, CommunityExtractor, CommunityExtractorProvenance,
|
|
CommunityObjectValue, PatternAggregate, SharedClaimTemplate, SharedPattern,
|
|
};
|