Content Defense (Phase 7): - Add SimilarityIndex with MinHash/LSH for near-duplicate detection - Add QuarantineStore for flagged assertions awaiting admin review - Add CircuitBreakerStore for per-agent circuit breaker state - Add ContentDefenseLayer for ingestion pipeline integration - Add API endpoints for quarantine and circuit breaker management - Add research module with gap detection and documentation fetching Code Structure Improvements: - Extract research CLI commands to research_commands.rs - Extract API routers to routers.rs module - Extract key_codec extraction functions to separate module - Extract test modules to separate files across multiple crates - All files now under 500 line limit per pre-commit hook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
30 lines
1.0 KiB
Rust
30 lines
1.0 KiB
Rust
//! Ingestion pipeline for Episteme.
|
|
//!
|
|
//! This crate handles the reading of the Write-Ahead Log (WAL) and
|
|
//! the background indexing of assertions into the storage engine.
|
|
//!
|
|
//! # Storage Layout
|
|
//!
|
|
//! Records are stored with content-addressed keys:
|
|
//! - `H:{hash}` - Assertions
|
|
//! - `V:{assertion_hash}:{vote_hash}` - Votes
|
|
//! - `E:{hash}` - Epochs
|
|
//! - `S:{subject}` - Subject index
|
|
|
|
/// Content defense layer for spam detection and quality control.
|
|
pub mod content_defense;
|
|
/// Error types and Result wrapper for ingestion.
|
|
pub mod error;
|
|
/// Gossip broadcast trait for distributed replication.
|
|
pub mod gossip;
|
|
/// High-level ingestor manager.
|
|
pub mod ingestor;
|
|
/// Background worker logic for processing the WAL.
|
|
pub mod worker;
|
|
|
|
pub use content_defense::{ContentDefenseConfig, ContentDefenseLayer};
|
|
pub use error::{IngestError, Result};
|
|
pub use gossip::{GossipBroadcast, GossipError, NoOpGossipBroadcast};
|
|
pub use ingestor::Ingestor;
|
|
pub use worker::{serialize_assertion, serialize_epoch, serialize_vote, IngestWorker, RecordType};
|