- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs) - M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates) - M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking) - M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators) - Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.) - Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.) - Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers - Add benches: fusion, search, session, text_index - Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index) - Update blog posts, roadmap, content strategy, and M5 planning docs - Add tmp/ and .claude/worktrees/ to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
40 lines
1.4 KiB
Rust
40 lines
1.4 KiB
Rust
use crate::schema::EntityKind;
|
|
|
|
/// Declaration of a text field for full-text search indexing.
|
|
///
|
|
/// Each `TextFieldDef` maps a metadata key (e.g., "title", "description") to
|
|
/// a Tantivy indexing mode. Declared in the schema via `SchemaBuilder::text_field`.
|
|
#[derive(Debug, Clone)]
|
|
pub struct TextFieldDef {
|
|
/// The metadata key to index (e.g., "title", "description", "tags").
|
|
pub key: String,
|
|
/// Whether this field is tokenized (full-text) or raw (keyword/exact-match).
|
|
pub field_type: TextFieldType,
|
|
}
|
|
|
|
/// The Tantivy indexing mode for a text field.
|
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
|
pub enum TextFieldType {
|
|
/// Full tokenization with Tantivy's default tokenizer.
|
|
/// Good for: title, description, body text.
|
|
Text,
|
|
/// Raw storage, no tokenization. Only exact-match queries work.
|
|
/// Good for: category, format, `creator_id`, language tags.
|
|
Keyword,
|
|
}
|
|
|
|
/// Definition of an embedding vector slot for ANN search.
|
|
///
|
|
/// Declared in the schema to tell tidalDB which embedding dimensions
|
|
/// to expect for a given entity kind. The database retrieves and ranks
|
|
/// over vectors -- it does not generate them.
|
|
#[derive(Debug, Clone)]
|
|
pub struct EmbeddingSlotDef {
|
|
/// Slot name (e.g., "default", "thumbnail").
|
|
pub name: String,
|
|
/// Which entity kind this slot is attached to.
|
|
pub entity_kind: EntityKind,
|
|
/// Dimensionality of the embedding vector.
|
|
pub dimensions: usize,
|
|
}
|