# Task 02: Document Write/Delete ## Delivers `TextIndexWriter` with `index_item()`, `delete_item()`, field mapping (text → tokenized, keyword → raw), metadata-to-document conversion, and commit with sequence number payload. ## Complexity: M ## Dependencies - Task 01 complete: `TextIndex`, `TantivyFields`, `TextFieldDef`, `TextFieldType` all exist ## Technical Design ### TextIndexWriter ```rust // tidal/src/text/writer.rs use std::collections::HashMap; use std::sync::MutexGuard; use tantivy::{Document, Term, doc}; use tantivy::schema::Value; use crate::schema::EntityId; use crate::text::index::{TextIndex, TantivyFields}; use crate::TidalError; /// Write operations on the Tantivy text index. /// /// This is a thin wrapper over the locked IndexWriter that converts tidalDB /// metadata maps into Tantivy documents and handles entity_id-based deletes. /// /// Thread safety: `TextIndexWriter` holds a `MutexGuard` on the IndexWriter. /// Operations are batched in memory and only become visible after `commit()`. pub struct TextIndexWriter<'a> { writer: MutexGuard<'a, tantivy::IndexWriter>, fields: &'a TantivyFields, } impl TextIndex { /// Lock the writer and return a `TextIndexWriter` for batch operations. /// /// # Errors /// Returns `TidalError::Internal` if the writer mutex is poisoned. pub fn writer_guard(&self) -> crate::Result> { let writer = self .writer .lock() .map_err(|e| TidalError::Internal(format!("writer lock poisoned: {e}")))?; Ok(TextIndexWriter { writer, fields: &self.fields, }) } } impl<'a> TextIndexWriter<'a> { /// Index or re-index an item. /// /// Tantivy has no atomic update — this deletes any existing document for /// `entity_id` and adds a fresh document. Both operations are in the same /// batch and become visible atomically on the next `commit()`. /// /// Only metadata keys that match a declared text field are indexed. /// Unknown keys are silently ignored. pub fn index_item( &mut self, entity_id: EntityId, metadata: &HashMap, ) -> crate::Result<()> { // Delete any existing document for this entity_id let id_term = Term::from_field_u64(self.fields.entity_id, entity_id.get()); self.writer.delete_term(id_term); // Build document let mut doc = Document::new(); doc.add_u64(self.fields.entity_id, entity_id.get()); for (key, tv_field, _field_type) in &self.fields.text_fields { if let Some(value) = metadata.get(key) { doc.add_text(*tv_field, value); } } self.writer .add_document(doc) .map_err(|e| TidalError::Internal(format!("tantivy add_document: {e}")))?; Ok(()) } /// Remove an item from the index. /// /// The delete takes effect on the next `commit()`. pub fn delete_item(&mut self, entity_id: EntityId) { let id_term = Term::from_field_u64(self.fields.entity_id, entity_id.get()); self.writer.delete_term(id_term); } /// Commit all pending writes and store `last_seq` in the commit payload. /// /// This is the durability boundary: after `commit()` returns, all indexed /// documents are visible to new `IndexReader::searcher()` instances. /// /// The `last_seq` is stored in the Tantivy commit payload via `set_payload()`. /// On crash recovery, read the last commit payload to find the resume point. /// /// # Errors /// Returns `TidalError::Internal` if the commit fails. pub fn commit(&mut self, last_seq: u64) -> crate::Result<()> { self.writer.set_payload(&last_seq.to_string()); self.writer .commit() .map_err(|e| TidalError::Internal(format!("tantivy commit: {e}")))?; Ok(()) } /// Read the last committed sequence number from the Tantivy index payload. /// /// Returns 0 if no commit payload exists (fresh index or first run). pub fn last_committed_seq(index: &tantivy::Index) -> u64 { index .load_metas() .ok() .and_then(|meta| meta.payload) .and_then(|p| p.parse::().ok()) .unwrap_or(0) } } ``` ### Integration with TidalDb Wire `index_item` calls into `TidalDb::write_item_with_metadata()` and `write_item()`. The text index should be updated **after** the entity store write succeeds (DB-primary consistency: entity store wins, Tantivy is derived). In the immediate term (before the background syncer in task-03), do a synchronous index update after each write. The background syncer in task-03 will replace this with an async outbox pattern. Actually, for correctness in m5p1, keep it synchronous (direct call after entity store write). Task-03 (Background Syncer) replaces the synchronous write with the outbox pattern. ### EntityId fast field access `EntityId` must expose its inner `u64` value. Check if `EntityId::get()` exists — if not, add it: ```rust impl EntityId { pub fn get(&self) -> u64 { self.0 // or whatever the inner field is } } ``` ## Acceptance Criteria - [ ] `TextIndexWriter::index_item(entity_id, metadata)` builds a Tantivy document with `entity_id` fast field + all matching text fields - [ ] Unknown metadata keys (not declared as text fields) are silently ignored - [ ] `delete_item(entity_id)` issues a `delete_term` on the `entity_id` fast field - [ ] `index_item` does delete-then-add (same batch): updating an item does not leave orphan documents - [ ] `commit(last_seq)` calls `set_payload(&last_seq.to_string())` before `commit()` - [ ] `TextIndexWriter::last_committed_seq(index)` reads payload from last commit; returns 0 on fresh index - [ ] `TextIndex::writer_guard()` acquires the mutex and returns `TextIndexWriter` - [ ] Unit tests: `index_and_search`, `delete_removes_document`, `update_replaces_document`, `commit_stores_sequence`, `last_committed_seq_returns_zero_fresh`, `last_committed_seq_returns_stored_value` - [ ] `cargo check`, `cargo fmt`, `cargo clippy -D warnings` all pass ## Test Strategy ```rust #[test] fn index_and_search() { let fields = vec![ TextFieldDef { key: "title".into(), field_type: TextFieldType::Text }, ]; let idx = TextIndex::ephemeral(&fields).unwrap(); let mut w = idx.writer_guard().unwrap(); let mut meta = HashMap::new(); meta.insert("title".into(), "Rust programming language".into()); w.index_item(EntityId::new(42), &meta).unwrap(); w.commit(1).unwrap(); // Searcher should find item 42 for query "Rust" idx.reader.reload().unwrap(); // force reader refresh in test let searcher = idx.reader.searcher(); // ... assert item found } #[test] fn delete_removes_document() { // Write, commit, delete, commit, verify not found } #[test] fn commit_stores_sequence() { let idx = TextIndex::ephemeral(&[]).unwrap(); // no text fields, just entity_id // index_item with only entity_id field, commit(seq=42) let seq = TextIndexWriter::last_committed_seq(&idx.index); assert_eq!(seq, 42); } ```