- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs) - M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates) - M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking) - M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators) - Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.) - Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.) - Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers - Add benches: fusion, search, session, text_index - Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index) - Update blog posts, roadmap, content strategy, and M5 planning docs - Add tmp/ and .claude/worktrees/ to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
389 lines
20 KiB
Plaintext
389 lines
20 KiB
Plaintext
---
|
||
title: "Negative signals are equal citizens"
|
||
date: "2026-02-21"
|
||
author: "Jordan Washburn"
|
||
description: "A skip is not the absence of a like. It is data. tidalDB treats negative signals with the same precision and immediacy as positive ones — typed, timestamped, decay-aware, and wired into the ranking pipeline."
|
||
tags: ["signals", "ranking", "rust"]
|
||
---
|
||
|
||
Open your recommendation system's codebase. Search for "not interested." You will find one of two things.
|
||
|
||
The first: a `not_interested` table in PostgreSQL. A user ID, an item ID, a timestamp. When the ranking service builds a candidate list, it left-joins against this table and filters out matches. The join adds 12ms to every query. Nobody has indexed it properly because the table was added during an incident in 2023 and the person who wrote it left the company. The table has 800 million rows and nobody knows the retention policy.
|
||
|
||
The second: nothing. The "not interested" button writes to an analytics pipeline. A Kafka topic. A data lake. Somewhere that a machine learning team will eventually use to retrain a model. The model retrains weekly. For the next seven days, the user keeps seeing the content they explicitly rejected.
|
||
|
||
Both approaches share the same root assumption: negative feedback is a second-class citizen. Something to be handled separately, in a different system, with different latency characteristics, by a different team. Positive signals flow through the hot path. Negative signals flow through the cold path. The architecture encodes a hierarchy that does not exist in the data.
|
||
|
||
A skip is not the absence of a like. It is data. It tells you something specific, with a timestamp, about a user's relationship to a piece of content. It deserves the same infrastructure.
|
||
|
||
## The taxonomy of no
|
||
|
||
Not all negative signals mean the same thing. The conventional approach collapses them into a single bucket -- "negative feedback" -- and applies a uniform penalty or exclusion. This is wrong. The signals have different semantics, different half-lives, and different effects on ranking.
|
||
|
||
tidalDB distinguishes four hard-negative signal types:
|
||
|
||
```rust
|
||
// From tidal/src/entities/hard_neg.rs
|
||
|
||
pub const HARD_NEG_SIGNALS: &[&str] = &["skip", "hide", "dislike", "block"];
|
||
```
|
||
|
||
Each one means something different:
|
||
|
||
**Skip** -- the user saw the content and moved past it quickly. This is the weakest negative. It might mean "not now" rather than "not ever." It decays fast -- a 1-day half-life in the schema. A skip from last week should carry less weight than a skip from an hour ago.
|
||
|
||
**Dislike** -- the user explicitly pressed a button to register disapproval. Stronger than a skip. Slower decay. The content should rank lower for this user, but it is not permanently excluded.
|
||
|
||
**Hide** -- "don't show me this again." This is a permanent exclusion for this specific item. No decay. The item is removed from all future query results for this user.
|
||
|
||
**Block** -- "don't show me anything from this creator." The strongest negative. It removes every item from the blocked creator from all future queries. It is a relationship-level exclusion, not an item-level one.
|
||
|
||
These are declared in the schema alongside positive signals. Same infrastructure. Same declaration syntax. Different decay rates:
|
||
|
||
```rust
|
||
// Schema declaration for negative signals
|
||
|
||
let mut builder = SchemaBuilder::new();
|
||
|
||
// Positive signals.
|
||
let _ = builder.signal("view", EntityKind::Item, DecaySpec::Exponential {
|
||
half_life: Duration::from_secs(7 * 24 * 3600), // 7-day half-life
|
||
}).windows(&[Window::OneHour, Window::TwentyFourHours])
|
||
.velocity(true)
|
||
.add();
|
||
|
||
let _ = builder.signal("like", EntityKind::Item, DecaySpec::Exponential {
|
||
half_life: Duration::from_secs(14 * 24 * 3600), // 14-day half-life
|
||
}).windows(&[Window::AllTime])
|
||
.velocity(false)
|
||
.add();
|
||
|
||
// Negative signals — same declaration, different decay.
|
||
let _ = builder.signal("skip", EntityKind::Item, DecaySpec::Exponential {
|
||
half_life: Duration::from_secs(24 * 3600), // 1-day half-life (fast decay)
|
||
}).windows(&[Window::OneHour, Window::TwentyFourHours])
|
||
.velocity(true)
|
||
.add();
|
||
|
||
let _ = builder.signal("dislike", EntityKind::Item, DecaySpec::Exponential {
|
||
half_life: Duration::from_secs(7 * 24 * 3600), // 7-day half-life
|
||
}).windows(&[Window::AllTime])
|
||
.velocity(false)
|
||
.add();
|
||
|
||
let _ = builder.signal("hide", EntityKind::Item, DecaySpec::Permanent)
|
||
.add();
|
||
|
||
let _ = builder.signal("block", EntityKind::Item, DecaySpec::Permanent)
|
||
.add();
|
||
|
||
let schema = builder.build().expect("valid schema");
|
||
```
|
||
|
||
The `Permanent` decay model means the signal never decays. Its weight is fixed. A hide from six months ago carries the same weight as a hide from today. That is the correct behavior -- the user said "never show me this" and meant it.
|
||
|
||
The `Exponential` decay on `skip` means something different. A skip decays with a 1-day half-life. By tomorrow, its contribution to the score is halved. By next week, it is negligible. This matches the semantics: "not now" is not "not ever."
|
||
|
||
The decay infrastructure is identical to what positive signals use. Same `HotSignalState` struct. Same O(1) forward-decay formula:
|
||
|
||
```
|
||
S(t) = S(t_prev) * exp(-lambda * dt) + weight
|
||
```
|
||
|
||
Same cache-line-aligned atomic CAS updates. Same windowed counters in the warm tier. The signal ledger does not distinguish between positive and negative signals. It records events. The semantics live in the schema declaration and the ranking profile.
|
||
|
||
## The write path
|
||
|
||
When a signal arrives with user context, the database classifies it and dispatches side effects:
|
||
|
||
```rust
|
||
// Record the base signal (item ledger, WAL, windowed counters).
|
||
self.signal(signal_type, entity_id, weight, timestamp)?;
|
||
|
||
if let Some(user_id) = for_user {
|
||
// 1. Hard negatives: skip/hide/dislike/block -> exclusion bitmap.
|
||
if HardNegIndex::is_hard_neg_signal(signal_type) {
|
||
self.hard_negatives.add(user_id, item_u32);
|
||
}
|
||
|
||
// 2. Seen tracking.
|
||
self.user_state.mark_seen(user_id, item_u32);
|
||
|
||
// 3. Interaction weight.
|
||
if let Some(cid) = creator_id {
|
||
self.interaction_ledger
|
||
.record(user_id, cid, weight, timestamp.as_nanos());
|
||
}
|
||
|
||
// 4. Preference vector (positive signals only).
|
||
if is_positive_engagement_signal(signal_type) {
|
||
self.try_update_preference_vector(user_id, entity_id);
|
||
}
|
||
}
|
||
```
|
||
|
||
The dispatch branches on signal type at step 1. When the signal is a hard negative (`skip`, `hide`, `dislike`, `block`), the item is added to the user's hard-negative bitmap. This is a `RoaringBitmap` keyed by user ID, stored in the `HardNegIndex`:
|
||
|
||
```rust
|
||
// From tidal/src/entities/hard_neg.rs
|
||
|
||
pub struct HardNegIndex {
|
||
inner: DashMap<u64, RoaringBitmap>,
|
||
}
|
||
|
||
impl HardNegIndex {
|
||
pub fn add(&self, user_id: u64, item_id: u32) {
|
||
self.inner.entry(user_id).or_default().insert(item_id);
|
||
}
|
||
|
||
pub fn is_negative(&self, user_id: u64, item_id: u32) -> bool {
|
||
self.inner
|
||
.get(&user_id)
|
||
.is_some_and(|bm| bm.contains(item_id))
|
||
}
|
||
}
|
||
```
|
||
|
||
O(1) insertion. O(1) lookup. The bitmap is per-user, so checking whether a user has rejected a specific item does not scan other users' data.
|
||
|
||
Notice what step 4 does not do. Negative signals do not update the preference vector. The `is_positive_engagement_signal` function returns `true` only for `like`, `share`, and `completion`:
|
||
|
||
```rust
|
||
// From tidal/src/db/mod.rs
|
||
|
||
fn is_positive_engagement_signal(signal_type: &str) -> bool {
|
||
matches!(signal_type, "like" | "share" | "completion")
|
||
}
|
||
```
|
||
|
||
This is a deliberate asymmetry. Positive signals shift the user's taste toward the content. Negative signals do not shift taste away. The rationale: a skip on a jazz video does not mean the user dislikes jazz. It might mean the thumbnail was bad, or the title was misleading, or they already watched it on another device. Encoding "not this item" is precise. Encoding "not this entire region of embedding space" from a single skip is overcorrection.
|
||
|
||
The interaction weight, however, updates for both positive and negative signals (step 3). If a user blocks creator A, the interaction weight for that user-creator pair decays. If a user skips three items from creator B in a row, creator B's interaction weight drops. This is the correct granularity -- the user's relationship with the creator weakens, but their general taste profile does not shift.
|
||
|
||
## The filter path
|
||
|
||
During a `RETRIEVE` query, the hard-negative bitmap is intersected with the candidate set in the filter stage. This happens before scoring -- items the user has rejected are never scored, never ranked, never returned.
|
||
|
||
The query executor wires the hard-negative index into the filter pipeline alongside other user-state exclusions:
|
||
|
||
```rust
|
||
// Remove seen items.
|
||
let seen = user_state.seen_bitmap(user_id);
|
||
candidates.retain(|id| !seen.contains(id.as_u64() as u32));
|
||
|
||
// Remove hidden items (explicit "never show me this again").
|
||
let hidden = user_state.hidden_items(user_id);
|
||
candidates.retain(|id| !hidden.contains(id.as_u64() as u32));
|
||
|
||
// Remove items from blocked creators.
|
||
let blocked_creators = user_state.blocked_creators(user_id);
|
||
if !blocked_creators.is_empty() {
|
||
let mut blocked_items = RoaringBitmap::new();
|
||
for &cid in &blocked_creators {
|
||
if let Some(bm) = creator_items.get(cid) {
|
||
blocked_items |= &bm;
|
||
}
|
||
}
|
||
candidates.retain(|id| !blocked_items.contains(id.as_u64() as u32));
|
||
}
|
||
|
||
// Remove hard negatives (skip/dislike/block signals).
|
||
let neg_bitmap = hard_neg.bitmap(user_id);
|
||
if !neg_bitmap.is_empty() {
|
||
candidates.retain(|id| !neg_bitmap.contains(id.as_u64() as u32));
|
||
}
|
||
```
|
||
|
||
Four `retain` passes. Each one is O(n) where n is the candidate count, with O(1) per-element bitmap lookup via RoaringBitmap. The blocked-creator pass builds a merged bitmap first, then retains once — so a user with ten blocked creators does not do ten passes. The blocked-creator exclusion is the most powerful: a single block removes every item from that creator in the entire candidate set. Not just the items the user has seen. All of them. Past, present, and future.
|
||
|
||
This is the difference between "exclude this item" and "exclude this creator." A hide is item-scoped. A block is creator-scoped. The query pipeline handles both, at different granularities, with the same bitmap algebra.
|
||
|
||
## Inside the ranking pipeline
|
||
|
||
Hard negatives are the binary case: the item is excluded entirely. But negative signals also participate in scoring through the ranking profile system. This is where the nuance lives.
|
||
|
||
The `controversial` built-in profile explicitly reads both positive and negative signal counts:
|
||
|
||
```rust
|
||
// From tidal/src/ranking/executor.rs
|
||
|
||
fn score_controversial(&self, entity_id: EntityId) -> f64 {
|
||
let pos = read_agg(
|
||
entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger,
|
||
);
|
||
let neg = read_agg(
|
||
entity_id, "dislike", &SignalAgg::Value, Window::AllTime, self.ledger,
|
||
);
|
||
controversial_score(pos, neg)
|
||
}
|
||
|
||
/// Controversial: `(pos * neg) / (pos + neg)^2`
|
||
fn controversial_score(pos: f64, neg: f64) -> f64 {
|
||
let denom = (pos + neg).powi(2);
|
||
if denom < f64::EPSILON {
|
||
0.0
|
||
} else {
|
||
(pos * neg) / denom
|
||
}
|
||
}
|
||
```
|
||
|
||
The formula maximizes the product of positive and negative signals, normalized by total engagement. A post with 1,000 likes and 1,000 dislikes scores higher than one with 1,800 likes and 200 dislikes. The negative signal is not a penalty here. It is half the signal.
|
||
|
||
This only works because dislikes are first-class signal types with their own decay curves, windowed counts, and velocity. If dislikes were stored in a separate table, or processed through a different pipeline, or only available after a weekly model retrain, this formula would read stale data. The controversial sort would be wrong -- not dramatically wrong, but subtly wrong in the way that makes users distrust the ranking without being able to articulate why.
|
||
|
||
The `hidden_gems` profile takes the opposite approach. It reads completion rate (a quality signal) and view count (a reach signal), and scores items that have high quality but low reach:
|
||
|
||
```rust
|
||
// From tidal/src/ranking/executor.rs
|
||
|
||
fn score_hidden_gems(&self, entity_id: EntityId) -> f64 {
|
||
let quality = read_agg(
|
||
entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger,
|
||
);
|
||
let view_count = read_agg(
|
||
entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger,
|
||
);
|
||
hidden_gems_score(quality, view_count)
|
||
}
|
||
|
||
/// Hidden gems: `quality / log10(view_count + 10)`
|
||
fn hidden_gems_score(quality: f64, view_count: f64) -> f64 {
|
||
quality / (view_count + 10.0).log10()
|
||
}
|
||
```
|
||
|
||
Where do negative signals fit here? Implicitly. An item with a high skip rate has a low completion rate. The skip signal feeds into the completion signal's denominator through user behavior -- people who skip do not complete. The hidden gems formula does not read skip counts directly, but skip velocity is an early warning that completion rate will drop.
|
||
|
||
A profile author who wants explicit skip penalization can use the `boosts` field with negative skip velocity. The `RankingProfile` struct also has a `penalties` field (the `Penalty` struct is designed to subtract `weight * agg(signal, window)` from the candidate's score), though applying penalties is on the roadmap -- the current scorer applies boosts only. For now, the pattern is a negative-weighted signal via boosts:
|
||
|
||
```rust
|
||
// Application-defined profile with explicit skip signal boost (negative weight).
|
||
let profile = RankingProfile {
|
||
name: "for_you".into(),
|
||
version: 1,
|
||
boosts: vec![
|
||
Boost {
|
||
signal: "like".into(),
|
||
agg: SignalAgg::Value,
|
||
window: Window::AllTime,
|
||
weight: 1.5,
|
||
},
|
||
// Skip velocity as a negative boost — penalizes items with recent skip momentum.
|
||
Boost {
|
||
signal: "skip".into(),
|
||
agg: SignalAgg::Velocity,
|
||
window: Window::OneHour,
|
||
weight: -2.0,
|
||
},
|
||
],
|
||
// ... other fields
|
||
};
|
||
```
|
||
|
||
The negative-weight boost subtracts skip velocity from the raw score. A skip velocity of 0.5 events/second with weight −2.0 subtracts 1.0 from the raw score. This happens before normalization, so the penalty competes directly with boosts from positive signals. The ranking is a single linear combination. Positive and negative signals are terms in the same equation.
|
||
|
||
## Three seconds or less
|
||
|
||
A skip that happens within 3 seconds of an item appearing is a strong quality signal. The user saw the content and immediately moved on. In a swipe-based feed (TikTok, Reels, Shorts), a sub-3-second dwell time is the strongest negative signal available.
|
||
|
||
The application records this with a weight that reflects the confidence:
|
||
|
||
```rust
|
||
// Application-level signal dispatch for a swipe feed.
|
||
let dwell_time = now - impression_time;
|
||
if dwell_time < Duration::from_secs(3) {
|
||
// Strong skip: fast rejection.
|
||
db.signal_with_context(
|
||
"skip", item_id, 3.0, Timestamp::now(),
|
||
Some(user_id), Some(creator_id),
|
||
).expect("signal write");
|
||
} else if dwell_time < Duration::from_secs(10) {
|
||
// Weak skip: mild disinterest.
|
||
db.signal_with_context(
|
||
"skip", item_id, 1.0, Timestamp::now(),
|
||
Some(user_id), Some(creator_id),
|
||
).expect("signal write");
|
||
}
|
||
// Dwell > 10 seconds: not a skip. The user engaged.
|
||
```
|
||
|
||
The weight argument is caller-controlled. A 3.0-weight skip contributes three times as much to the decay score as a 1.0-weight skip. Because the signal ledger applies the same `S(t) = S(prev) * exp(-lambda * dt) + weight` formula regardless of sign or magnitude, this works without any special-casing. The skip signal type has a 1-day half-life, so even a strong skip decays to 1.5 by tomorrow and 0.75 by the day after. The math is the same. The semantics come from the application.
|
||
|
||
The database does not interpret dwell time. It does not know what "3 seconds" means. It records a signal with a type, a weight, and a timestamp. The application decides what constitutes a skip, what weight to assign, and when to send it. The database stores it with the same fidelity as a like.
|
||
|
||
## The exclusion guarantee
|
||
|
||
The strongest claim: hidden items and blocked creators never appear in query results. This is enforced at the bitmap level, before scoring, and verified by integration tests:
|
||
|
||
```rust
|
||
// User blocks creator 300.
|
||
db.write_relationship(
|
||
EntityId::new(user_id),
|
||
RelationshipType::Blocks,
|
||
EntityId::new(300),
|
||
1.0, ts,
|
||
)?;
|
||
|
||
// User hides item 5.
|
||
db.signal_with_context(
|
||
"hide", EntityId::new(5), 1.0, ts,
|
||
Some(user_id), None,
|
||
)?;
|
||
|
||
// Query with user context.
|
||
let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("new"))
|
||
.for_user(user_id)
|
||
.limit(20)
|
||
.build()?;
|
||
let results = db.retrieve(&query)?;
|
||
|
||
// Items from blocked creators and hidden items are not present.
|
||
// The block removes all items from creator 300. The hide removes item 5.
|
||
// Both exclusions take effect before scoring — they are never ranked.
|
||
```
|
||
|
||
There is no delay. The block write updates the user-state bitmap. The next query reads the bitmap. The blocked creator's items are removed from the candidate set by a retain pass against the merged blocked-creator bitmap. They are not scored. They are not ranked. They do not exist in the query's universe.
|
||
|
||
Blocks are durable. They are written as relationship edges in the users keyspace and survive process restarts. On startup, `rebuild_entity_state` scans all relationship edges and reconstructs the in-memory bitmaps:
|
||
|
||
```rust
|
||
// From tidal/src/db/mod.rs — rebuild_entity_state()
|
||
|
||
match rel_type {
|
||
RelationshipType::Blocks => {
|
||
user_state.add_block_creator(from_id_u64, to_id.as_u64());
|
||
}
|
||
RelationshipType::Hide => {
|
||
user_state.add_hide(from_id_u64, to_id.as_u64() as u32);
|
||
}
|
||
RelationshipType::Follows => {
|
||
user_state.add_follow(from_id_u64, to_id.as_u64());
|
||
}
|
||
// ...
|
||
}
|
||
```
|
||
|
||
Blocks and hides survive crashes. Seen-from-views does not (intentionally -- users should rediscover content after a restart). The durability boundary is explicit: permanent exclusions are durable. Temporary impressions are not. This is a design choice, not a limitation.
|
||
|
||
## Why it matters
|
||
|
||
The 6-system stack treats negative signals as an afterthought because the architecture makes them expensive. Adding a "not interested" button means:
|
||
|
||
1. A new event type in Kafka.
|
||
2. A new consumer that writes to a new PostgreSQL table.
|
||
3. A join in the ranking service that reads from that table.
|
||
4. A cache layer because the join is too slow at scale.
|
||
5. A cache invalidation bug that shows hidden content for 5 minutes after the user hides it.
|
||
6. A separate table for blocks, with a different consumer, a different join, and a different cache.
|
||
|
||
Each negative signal type requires its own pipeline. Each pipeline has its own latency characteristics. Each has its own failure modes. The architecture fights you because it was not designed for this.
|
||
|
||
tidalDB's architecture does not fight you because negative signals use the same infrastructure as positive signals. The `signal` method records an event. The `HardNegIndex` classifies it. The `RoaringBitmap` enforces it. The `RankingProfile` scores it. There is no separate pipeline. There is no cache to invalidate. There is no consumer to lag. The signal is data. The data flows through the same path.
|
||
|
||
When the next product manager asks "can we add a 'show less like this' button?", the answer is a schema declaration, a signal type, and a decay rate. Not a quarter of engineering work to build another pipeline.
|
||
|
||
---
|
||
|
||
*The hard-negative index is at [tidal/src/entities/hard_neg.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/src/entities/hard_neg.rs). The signal dispatch is at [tidal/src/db/mod.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/src/db/mod.rs). The ranking profiles are at [tidal/src/ranking/builtins.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/src/ranking/builtins.rs) and the scoring formulas are at [tidal/src/ranking/executor.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/src/ranking/executor.rs). Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*
|