tidaldb/tidal/src/schema/validation/text.rs
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

40 lines
1.4 KiB
Rust

use crate::schema::EntityKind;
/// Declaration of a text field for full-text search indexing.
///
/// Each `TextFieldDef` maps a metadata key (e.g., "title", "description") to
/// a Tantivy indexing mode. Declared in the schema via `SchemaBuilder::text_field`.
#[derive(Debug, Clone)]
pub struct TextFieldDef {
/// The metadata key to index (e.g., "title", "description", "tags").
pub key: String,
/// Whether this field is tokenized (full-text) or raw (keyword/exact-match).
pub field_type: TextFieldType,
}
/// The Tantivy indexing mode for a text field.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum TextFieldType {
/// Full tokenization with Tantivy's default tokenizer.
/// Good for: title, description, body text.
Text,
/// Raw storage, no tokenization. Only exact-match queries work.
/// Good for: category, format, `creator_id`, language tags.
Keyword,
}
/// Definition of an embedding vector slot for ANN search.
///
/// Declared in the schema to tell tidalDB which embedding dimensions
/// to expect for a given entity kind. The database retrieves and ranks
/// over vectors -- it does not generate them.
#[derive(Debug, Clone)]
pub struct EmbeddingSlotDef {
/// Slot name (e.g., "default", "thumbnail").
pub name: String,
/// Which entity kind this slot is attached to.
pub entity_kind: EntityKind,
/// Dimensionality of the embedding vector.
pub dimensions: usize,
}