stemedb/crates/stemedb-ingest/src/lib.rs
jordan a734be3a0d feat: Phase 7 Content Defense + code structure refactoring
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching

Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:44:05 -07:00

30 lines
1.0 KiB
Rust

//! Ingestion pipeline for Episteme.
//!
//! This crate handles the reading of the Write-Ahead Log (WAL) and
//! the background indexing of assertions into the storage engine.
//!
//! # Storage Layout
//!
//! Records are stored with content-addressed keys:
//! - `H:{hash}` - Assertions
//! - `V:{assertion_hash}:{vote_hash}` - Votes
//! - `E:{hash}` - Epochs
//! - `S:{subject}` - Subject index
/// Content defense layer for spam detection and quality control.
pub mod content_defense;
/// Error types and Result wrapper for ingestion.
pub mod error;
/// Gossip broadcast trait for distributed replication.
pub mod gossip;
/// High-level ingestor manager.
pub mod ingestor;
/// Background worker logic for processing the WAL.
pub mod worker;
pub use content_defense::{ContentDefenseConfig, ContentDefenseLayer};
pub use error::{IngestError, Result};
pub use gossip::{GossipBroadcast, GossipError, NoOpGossipBroadcast};
pub use ingestor::Ingestor;
pub use worker::{serialize_assertion, serialize_epoch, serialize_vote, IngestWorker, RecordType};