tidaldb/USE_CASES.md
jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards
- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:52:20 -07:00

780 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Use Cases
Each use case describes a real surface, the query it requires, how signals flow in, and what "correct" looks like. These are the scenarios the database must handle natively and efficiently.
A note on scope: this document is exhaustive by design. Every filtering mode, sort order, and discovery pattern listed here is something a real user has wanted on YouTube, Twitter, Reddit, Pinterest, Netflix, Spotify, or a media library. tidalDB must support all of them without the application building custom ranking logic.
A critical addition to this document is UC-15: Cohort-Scoped Trending. This addresses the requirement that trending, rising, and quality signals must be sliceable by audience segment — not just globally or by category, but by demographic, behavioral, and interest-based cohorts. This capability underpins advertiser-facing trend reports, localized discovery, and the "trending for people like you" surface.
---
## Table of Contents
- [UC-01 · Personalized Feed — For You](#uc-01--personalized-feed--for-you)
- [UC-02 · Search](#uc-02--search)
- [UC-03 · Trending and Rising](#uc-03--trending-and-rising)
- [UC-04 · Following / Subscription Feed](#uc-04--following--subscription-feed)
- [UC-05 · Related Content / Up Next](#uc-05--related-content--up-next)
- [UC-06 · Browse and Category Discovery](#uc-06--browse-and-category-discovery)
- [UC-07 · Notification Prioritization](#uc-07--notification-prioritization)
- [UC-08 · Creator Profile Page](#uc-08--creator-profile-page)
- [UC-09 · User Library and Personal Collections](#uc-09--user-library-and-personal-collections)
- [UC-10 · People and Creator Search](#uc-10--people-and-creator-search)
- [UC-11 · Visual and Semantic Search](#uc-11--visual-and-semantic-search)
- [UC-12 · Live and Scheduled Content](#uc-12--live-and-scheduled-content)
- [UC-13 · Hidden Gems and Breakout Detection](#uc-13--hidden-gems-and-breakout-detection)
- [UC-14 · Controversial and Hot Surfaces](#uc-14--controversial-and-hot-surfaces)
- [UC-15 · Cohort-Scoped Trending](#uc-15--cohort-scoped-trending)
- [Appendix A · Filter Reference](#appendix-a--filter-reference)
- [Appendix B · Sort Mode Reference](#appendix-b--sort-mode-reference)
- [Appendix C · Signal Reference](#appendix-c--signal-reference)
---
## UC-01 · Personalized Feed — For You
**Surface:** The primary content feed. YouTube home, TikTok FYP, Instagram feed, Twitter For You, Reddit home.
**The Question:** Given user U right now, what content should they see that they haven't seen, from creators they'll enjoy, with healthy format and topic diversity?
**Signals Required:**
- User implicit preference vector (built continuously from history)
- Item engagement velocity — views, completions, shares in the last 24h
- User→creator interaction weight (have they engaged with this creator before? how much?)
- Social proof — did people the user follows engage with this?
- Negative signals — skips, hides, mutes, "not interested" taps
- Recency decay on item age
- Completion rate as a quality gate (do not surface content people abandon)
- Format affinity — does this user prefer short-form, long-form, articles, images?
**Ranking Profile:** `for_you` — preference match and social proof weighted heavily, moderate recency decay, completion rate as quality floor, skip signals as strong negative.
**Diversity Constraints:** max 2 items per creator per batch, format mix enforced, minimum 10% exploration budget (creators the user does not follow).
**Feedback Written Back:**
- Viewed → increment view signal, update user→item relationship
- Completed → strong positive on completion signal, boost creator weight
- Skipped in under 3 seconds → strong skip signal, decay creator weight
- Liked or shared → strong positive across all signals
- "Not interested" → permanent negative on this item, decay topic weight
- "Don't recommend this creator" → hard suppress, equivalent to soft block for ranking
**What Correct Looks Like:** Feels like it knows the user without being a hall of mirrors. Some expected, some surprising. No creator dominating. Nothing seen this week resurfaced.
---
## UC-02 · Search
Search is the most complex surface because it has the most dimensions. Every sub-feature listed here is something real users rely on daily.
### 2.1 · Full-Text Keyword Search
**The Question:** User typed a query — return the most relevant results, ranked for this user specifically.
**Query Capabilities:**
- Basic keyword match: `jazz piano tutorial`
- Exact phrase match: `"jazz piano"` returns only items containing that exact sequence
- Boolean operators: `jazz AND piano NOT beginner`, `(jazz OR blues) piano`
- Exclusion: `-beginner`, `NOT beginner` — never show items matching this term
- Wildcard: `jazz pian*` matches piano, pianist, pianos
- Field-scoped search: `title:jazz`, `tag:tutorial`, `creator:username`
- Hashtag search: `#jazz` matches tagged items directly
- Minimum engagement filter inline: `jazz piano min_views:10000`
**Signals Required:**
- BM25 text relevance score (inverted index)
- Semantic similarity — query embedding vs item embedding (catches "intro to jazz" matching "jazz for beginners")
- User topic engagement history — a beginner gets beginner content elevated
- Item quality signals — completion rate, like ratio as secondary ranking
- Recency curve — configurable per content type (news decays fast, tutorials decay slowly)
**Ranking Profile:** `search` — text relevance is the floor, personalization adjusts rank above it. An irrelevant result never surfaces because the user likes the creator.
**Diversity Constraints:** max 2 results per creator in the first 10 results.
**Feedback Written Back:**
- Click at rank N → positive relevance signal, trains query→item affinity
- Immediate back-navigation → negative signal (irrelevant or low quality)
- Long dwell after click → strong positive on item and creator for this topic
- Zero clicks in a session → weak signal that results were poor for this query
### 2.2 · Advanced Search Filters
These are the filters users expect to be able to combine freely. All must be composable — any filter can be combined with any other. See Appendix A for the complete filter reference.
**By Date / Recency:**
- Presets: last hour, today, this week, this month, this year
- Custom range: `uploaded_after:2024-01-01 uploaded_before:2024-06-01`
- Relative: `uploaded_within:30d`
**By Duration:**
- Presets: short (under 4 minutes), medium (420 minutes), long (over 20 minutes)
- Custom range: `duration_min:5m duration_max:15m`
**By Format / Content Type:**
- Video, short-form video, live stream, VOD of past live, podcast episode, article, image, image gallery, audio-only, interactive
**By Quality / Technical Specs:**
- Resolution: SD, HD, Full HD, 4K, 8K
- HDR, Dolby Vision, Dolby Atmos, spatial audio
- Subtitles/captions available, audio description available, sign language version available
- Offline/download available
**By Language:**
- Content language, subtitle language available, dubbed version available in language X, original language only
**By Content Rating / Maturity:**
- G, PG, PG-13, R, etc.
- Safe search toggle, age-gated content filter, sensitive topics toggle
**By Creator Attributes:**
- Verified only, minimum follower count, creators the user follows only, creators new to the user, exclude a specific creator, search within one creator's catalog
**By Engagement Thresholds:**
- Minimum views, likes, like ratio, comments — lets users filter to proven content
**By Location / Geography:**
- Content created in a region, content about a location, trending in a region
**By Status / Availability:**
- Live right now, premiering soon, subscriber-only, free only, leaving platform soon, downloadable
**By Community Signals (Reddit / forum-style):**
- Flair filter, awarded/gilded only, minimum score, specific community, post type (text/link/image/video/poll), original only (exclude crossposts)
**By Seen State:**
- Unseen only, already seen (user wants to find something they watched before), in progress, saved/bookmarked
### 2.3 · Search Suggestions and Autocomplete
- Autocomplete on partial query (prefix match on popular queries)
- Trending searches in empty search bar
- Personalized suggestions based on search and watch history
- Creator name autocomplete, hashtag autocomplete
- "Did you mean" typo correction on submitted query
- Related query suggestions below results ("People also search for")
### 2.4 · Saved Searches and Alerts
- User saves a search query — gets a feed of new results matching it over time
- Alert when new content matching a saved search is published
- Search history (personal, clearable)
- Quick access to recent searches
### 2.5 · Search Within Scoped Results (Query Composition)
Search can be composed with other retrieval modes. The application specifies a retrieval scope, and search operates within that candidate set:
- **Search within trending:** "jazz piano" within globally trending items
- **Search within cohort trending:** "jazz piano" within items trending for US users aged 18-24
- **Search within following:** "jazz piano" within items from followed creators
- **Search within category:** "jazz piano" within the Jazz category (this already works via filters, but the composition model generalizes it)
Query composition means SEARCH and RETRIEVE are not separate operations — they can be layered. The database handles the intersection efficiently using its query planner.
---
## UC-03 · Trending and Rising
### 3.1 · Trending
**Surface:** Trending tab, Explore page, "What's happening" sidebar.
**The Question:** What content is gaining real traction right now — not what has the most views historically?
**Signals Required:**
- Share velocity — rate of new shares (strongest trending signal)
- View velocity — rate of new views, not total views
- New-user reach — percentage of viewers new to this creator (measures virality, not fanbase loyalty)
- Engagement ratio — (likes + comments + shares) / views — filters clickbait
- Comment velocity — discussions erupting quickly signal cultural relevance
**Ranking Profile:** `trending` — velocity signals only, windowed 1h/6h/24h. No personalization. Total view count is explicitly not a primary signal.
**Scoping (same profile, different candidate sets):**
- Global trending
- Trending in category/genre
- Trending in my language/region
- Trending among people I follow
- Trending within a specific community
- **Trending within a cohort (demographic/behavioral segment)**
- **Search within cohort-scoped trending**
See UC-15 for full cohort-scoped trending specification. Cohort trending uses the same velocity-based ranking profile but scopes signal aggregation to users matching a cohort predicate.
**Diversity Constraints:** max 1 item per creator in top 10.
### 3.2 · Rising (Hot / Breakout)
**The Question:** What new content is overperforming for its age? Designed to surface content that has not yet reached trending thresholds but is clearly on its way.
This is the Reddit "rising" concept applied broadly.
**Signals Required:**
- Age of content (hard weight — very new content gets a boost)
- Engagement velocity relative to creator's own baseline (a small creator getting 10× their normal engagement is "rising")
- Engagement velocity relative to category baseline
- Share-to-view ratio (high share rate relative to views signals genuine enthusiasm)
**Ranking Profile:** `rising` — age-weighted velocity. A 2-hour-old video with 5k views and a 15% share rate outranks a 2-day-old video with 100k views and a 0.3% share rate.
---
## UC-04 · Following / Subscription Feed
**Surface:** YouTube Subscriptions tab, Twitter Following tab, Substack inbox, Twitch following list.
**The Question:** Show me everything from creators I follow, in the right order, with nothing missing.
**Signals Required:**
- Relationship: user follows creator (hard filter — only followed creators)
- Recency: primary sort signal
- Light quality gate: completion rate as tiebreaker within same-minute posts
- Seen flag: filter already-seen items (optional — some users want all posts regardless)
**Ranking Profile:** `following` — recency-dominant, minimal algorithmic intervention. This is the surface where users feel most strongly that the algorithm should stay out of the way.
**Diversity Constraints:** None by default. If a followed creator posts 10 times in one day, show all 10.
**Modes:**
- Chronological (pure reverse time order)
- Chronological with quality tiebreaker (same timestamp → prefer higher quality)
- Algorithmic following (light ranking — surfaces the most engaging posts from follows first, for users who follow too many to consume everything)
**What Correct Looks Like:** Nothing missing. Nothing reordered dramatically. The user trusts this surface because it reflects their explicit choices.
---
## UC-05 · Related Content / Up Next
**Surface:** YouTube right rail, Spotify Radio, Netflix "More Like This," Pinterest related pins, end-screen recommendations.
**The Question:** Given what the user just consumed, what is the natural next thing?
**Signals Required:**
- Semantic similarity — embedding distance between source item and candidates
- Collaborative filtering — users who engaged with item A also engaged with item B
- User preference match — semantically similar AND matches this user's taste
- Content journey awareness — a user who just watched beginner content should see intermediate next, not more beginner
- Quality gate — completion rate (do not autoplay bad content)
- Novelty — do not recommend items the user has already seen
**Ranking Profile:** `related` — semantic similarity as primary retrieval signal, collaborative filtering as secondary boost, user preference as personalization layer.
**Diversity Constraints:** avoid same creator as source item in first 3 results (unless on a creator profile page).
**Feedback Written Back:**
- Autoplay accepted → strong positive on source→target pairing
- Autoplay skipped under 10 seconds → negative on pairing
- Manual click on sidebar → weaker positive than autoplay accept
- Saved to watch later → strong positive
---
## UC-06 · Browse and Category Discovery
**Surface:** Genre pages, mood pages, topic pages, aesthetic boards (Pinterest).
### 6.1 · Standard Browse
**Sort modes users expect within a category:**
**Top / Best** — all-time quality rank. Completion rate + like ratio + total reach. Stable. Does not change hourly.
**New** — pure reverse chronological. No quality gate. Shows everything. Users use this to find content the algorithm hasn't surfaced yet.
**Hot** — recency + engagement combined. Content decays as it ages regardless of engagement. The Reddit model: score / (age_hours + 2)^gravity. Refreshes meaningfully every hour.
**Rising** — overperforming new content (see UC-03.2).
**Trending** — velocity-based, short time window (see UC-03.1).
**Controversial** — high engagement with polarized sentiment. High comment count, moderate like ratio, high share count. Surfaces content generating strong reactions in both directions.
**Top: This Hour / Today / This Week / This Month / This Year / All Time** — windowed top sort. Lets users choose their recency preference explicitly. All-Time Top is useful for discovering classics. This Week is useful for staying current without hourly noise.
**Shuffle / Random** — random sample weighted by quality score. Useful for music, podcasts, and "surprise me" contexts.
**Alphabetical** — A-Z / Z-A. Useful for structured collections and course curricula.
**Shortest First / Longest First** — sort by duration. Users looking for something quick explicitly want this.
**Highest Rated** — explicit critic or audience score where available, distinct from like ratio.
### 6.2 · Faceted Browse (Multiple Simultaneous Filters)
All filters must be composable simultaneously. Examples of real user behavior:
- Genre: Jazz AND Duration: Short AND New (last 7 days)
- Format: Podcast AND Language: Spanish AND Top: This Month
- Creator: Verified AND Category: Cooking AND Sort: Hot
- Mood: Focus AND Duration: Long AND Unseen only
- Quality: 4K AND Has Subtitles AND Top: All Time
The database must handle arbitrary filter combinations without the application implementing them. Faceted queries are a first-class operation.
### 6.3 · Mood and Aesthetic Filters
Common on music, video, and Pinterest-style platforms. Moods are not categories — they are cross-cutting signals derived from engagement patterns and embeddings.
- Mood: Chill, Energetic, Focus, Sad, Happy, Hype, Romantic, Dark, Nostalgic
- Aesthetic: Minimalist, Maximalist, Vintage, Futuristic, Cottagecore, Brutalist
- Era/Decade: 70s, 80s, 90s, 2000s — useful for music, film, fashion content
These are best modeled as embedding-space regions rather than explicit tags. A "chill" query retrieves items whose embeddings cluster near what users seeking "chill" content engage with.
### 6.4 · Color and Visual Filtering (Pinterest Model)
When items are images or have dominant visual content:
- Filter by dominant color or color palette
- "More like this image" — visual similarity search
- Style similarity — find visually similar items even without shared tags
- Crop-and-search — user selects a region of an image and searches for items similar to that region
These require visual embeddings on items. The application provides the embedding. The database handles retrieval and ranking.
---
## UC-07 · Notification Prioritization
**Surface:** Push notifications, in-app notification center, email digest.
**The Question:** Of all events since the user was last active, which deserve a push? Which deserve inbox prominence?
**Signals Required:**
- Relationship strength — notification from a creator they interact with constantly vs. one they follow but never open
- Quality of the triggering item — a video already performing well is more worth notifying about
- User notification open rate — are they opening notifications lately? If not, reduce frequency (fatigue signal)
- Time since event — events older than 48h are suppressed
- Notification type priority — a reply to the user's comment > a new video from a creator they loosely follow
**Ranking Profile:** `notification` — relationship strength dominant, item quality secondary, strict recency filter.
**Diversity Constraints:** max 1 push per creator per day, max 3 total pushes per day per user, max 1 per topic cluster per hour.
**Feedback Written Back:**
- Opened → strong positive on creator relationship for notification context
- Dismissed → mild negative, reduce future frequency
- Notifications disabled for creator → permanent suppress
- App opened directly (not via notification) → weak positive on all pending notifications
---
## UC-08 · Creator Profile Page
**Surface:** A creator's profile — their complete catalog, browsable by the visitor.
**The Question:** Show this creator's content in the best order for this visitor.
**Modes:**
- **Top** — all-time quality. Best first impression for new visitors.
- **New** — reverse chronological. For fans keeping up.
- **Hot** — currently performing best within the creator's own catalog.
- **For You** — which of this creator's items best matches this visitor's preferences. A jazz fan visiting a multi-genre creator sees jazz content elevated even if the creator's pop content has more total views.
- **Series / Playlists** — items grouped into explicit collections, ordered within collection.
**What Correct Looks Like:** A first-time visitor and a longtime subscriber see different orderings on "For You." The new visitor sees the creator's best all-time content. The subscriber sees what they haven't yet watched from this creator.
---
## UC-09 · User Library and Personal Collections
**Surface:** YouTube Library, Spotify Liked Songs, Instagram Saved, Reddit Saved, Pinterest Boards, Watch History.
### 9.1 · Watch / View History
- Complete chronological history of items the user viewed
- Filterable by: date range, category, creator, format, duration
- Searchable by keyword within history
- Clearable (individual items or full history)
- "Continue Watching" — items viewed but not completed, sorted by most recently viewed
- Resume playback position stored per item
### 9.2 · Saved / Bookmarked Items
- Items explicitly saved (Watch Later, Saved Posts, Bookmarks)
- Sortable by: date saved (default), date published, creator, category, duration
- Filterable by: category, creator, format, unseen vs. seen
- Expiry detection — saved items that have since been deleted or become unavailable
- Bulk management — mark batch as watched, remove batch
### 9.3 · Liked Items
- All items the user has liked
- Sortable by: date liked, creator, category
- Used as a strong signal in preference vector construction
### 9.4 · User-Created Collections / Boards / Playlists
- Named collections the user creates and curates
- Items can belong to multiple collections
- Collections can be private, shared with specific users, or public
- Collections themselves are rankable — popular public playlists surface in browse/search
- Collaborative collections (multiple users contribute — shared boards, Pinterest-style)
### 9.5 · Downloads / Offline
- Items downloaded for offline viewing
- Filterable, sortable, manageable separately from online library
- Download state as a retrievable attribute in queries
---
## UC-10 · People and Creator Search
**Surface:** Search results "People" tab, "Accounts" tab, creator discovery.
**The Question:** User wants to find creators, not content.
**Capabilities:**
- Search by creator name, username, handle
- Search creators by topic: "find creators who post about jazz"
- Filter creators by: follower count range, posting frequency, category, language, location, verified status
- "Creators like [creator X]" — semantic similarity between creator embeddings (built from their catalog)
- "Creators followed by people I follow" — social graph traversal
- "Creators I used to follow" — historical relationship query
- Sort creators by: follower count, posting frequency, engagement rate, recent activity
**Signals Required:**
- Creator-level embedding (derived from their item embeddings, aggregated)
- Creator engagement rate (average engagement ratio across recent catalog)
- Creator posting frequency
- Social graph (who follows them, who follows the current user)
- User's creator affinity history
---
## UC-11 · Visual and Semantic Search
**Surface:** Pinterest visual search, Google Lens-style search, "find more like this image."
### 11.1 · Search by Image
- User uploads an image or selects one from the platform
- Find items whose visual embedding is nearest to the query image embedding
- Crop-and-search — user selects a region of the image to search against
- Combine with text: image embedding + text query vector, merged score
### 11.2 · Semantic / Intent Search
Beyond keyword matching — understanding what the user means, not just what they typed.
- Query: "something relaxing to watch on a rainy day" → system interprets mood/intent, retrieves by embedding similarity to that intent
- Query: "that video about the jazz pianist in new orleans I watched last year" → retrieves from user history using semantic match, not exact title
- Disambiguation: "jaguar" — is the user searching for the car or the animal? User history and query context disambiguate
### 11.3 · Multi-Modal Retrieval
- Text query against image items (find images matching a text description)
- Image query against video items (find videos containing visuals similar to a reference image)
- Audio fingerprint query against audio items — tidalDB handles the embedding retrieval, not the generation
---
## UC-12 · Live and Scheduled Content
**Surface:** Live tab, "Happening Now," event pages, premiere countdowns.
**The Question:** What is live right now that this user would care about?
**Signals Required:**
- Live status flag (boolean, real-time)
- Scheduled start time and end time
- Current viewer count (real-time signal)
- Creator relationship weight (live from a creator they care about > random live)
- Category match with user preferences
- Notification opt-in (did the user set a reminder?)
**Ranking Profile:** `live` — relationship weight dominant, current viewer count as social proof, category match secondary. Recency is not a concept here — everything is happening now.
**Discovery of upcoming:**
- "Premiering in X hours" — scheduled content with countdown
- "Set reminder" → creates a notification relationship between user and item
- Calendar-style browse of upcoming events
**Filtering:**
- Live only (exclude VOD)
- By category within live
- By minimum viewer count
- From followed creators only
---
## UC-13 · Hidden Gems and Breakout Detection
**Surface:** "Underrated," "Staff Picks," "You might have missed," editorial surfaces.
**The Question:** What high-quality content is being overlooked by the algorithm?
Hidden gems are items with high completion rate and like ratio but low total view count relative to those quality signals. Content that performs well with everyone who sees it but hasn't been seen by many people yet.
**Signals Required:**
- Quality signals: completion rate, like ratio — must be high
- Reach signals: total views — must be low relative to quality
- Age of content — recent enough to still be worth surfacing
- Creator follower count — small/new creators get priority
**Ranking Profile:** `hidden_gems` — quality signals as primary, inverse of reach as a boost, creator size as a discovery equity signal.
**What Correct Looks Like:** Content that makes the user think "how have I never seen this before?" Not content that is obscure because it is bad.
---
## UC-14 · Controversial and Hot Surfaces
### 14.1 · Controversial Sort
**Surface:** Reddit "Controversial" sort, comment sections surfacing heated debates.
**The Question:** What content is generating strong reactions in both directions?
Controversial is defined as: high total engagement AND polarized sentiment. High comment count, high share count, but split positive/negative signal ratio. Content people feel strongly about in opposite directions.
**Signals Required:**
- Total engagement (must be high enough to be genuinely controversial, not just unpopular)
- Sentiment polarity — ratio of positive to negative signals
- Comment velocity — discussions growing quickly
- Share count — even content people dislike gets shared for debate
**Ranking Profile:** `controversial` — maximizes the product of positive and negative engagement signals. A post with 1000 upvotes and 1000 downvotes scores higher than one with 1800 upvotes and 200 downvotes.
### 14.2 · Hot Sort (Reddit Model)
**Surface:** Reddit "Hot," Hacker News front page, time-sensitive community surfaces.
**The Question:** What is the best content right now, with age decay applied?
Hot rewards early engagement but punishes age. Formula concept: `score / (age_hours + 2)^gravity`. The database exposes this as a native sort mode — the application does not implement the formula.
**What makes Hot different from Trending:** Trending is pure velocity (rate of change). Hot is cumulative score with age decay. An hour-old post with 500 upvotes scores higher on Hot than a day-old post with 2000 upvotes.
---
## UC-15 · Cohort-Scoped Trending
**Surface:** "Trending for You" (personalized trending), audience-segmented trending dashboards, advertiser-facing trend reports, regional/demographic trend pages.
**The Question:** What content is gaining traction right now among users who match this profile?
This is distinct from global trending (UC-03) and personalized feeds (UC-01). Global trending answers "what's popular everywhere." Personalized feeds answer "what should this specific user see." Cohort trending answers "what's resonating with this type of user" — a question that sits between the two.
**Cohort Definition:**
A cohort is a named predicate over user attributes:
- Demographic: locale, age range, gender
- Interest-based: users who engage with jazz, cooking, tech, etc.
- Behavioral: power users, casual browsers, binge watchers
- Geographic: users in a specific region or timezone
- Composite: US + age 18-24 + interest:jazz (AND logic across dimensions)
**Signals Required:**
- All the same velocity signals as UC-03 (share velocity, view velocity, engagement ratio)
- But scoped to signal events generated by users matching the cohort predicate
- Cohort-scoped view_velocity(24h) = rate of views from cohort members, not global views
**Three-Layer Model:**
1. **Global trending** — same as UC-03, no cohort filter
2. **Cohort trending** — velocity signals filtered to cohort members
3. **Search within cohort trending** — text/semantic search composed with cohort trending
**Ranking Profile:** `cohort_trending` — same velocity-based ranking as `trending`, but candidate set and signal aggregation scoped to cohort.
**Scoping (composable):**
- Cohort: locale:US, age:18-24
- Cohort + category: above AND category:jazz
- Cohort + search: above AND QUERY "piano tutorial"
- Cohort + social: above AND social_graph:@u123
**Diversity Constraints:** max 1 item per creator in top 10.
**What Correct Looks Like:** A 22-year-old in Tokyo and a 45-year-old in Texas see different trending pages. Not because of personalization (individual preference), but because different content is genuinely trending within their respective audience segments. An advertiser can see what's trending among their target demographic. A creator can see what's trending in their niche audience.
---
## Appendix A · Filter Reference
All filters must be composable with each other and with any sort mode. A query combining any subset of these is a valid, first-class database operation.
### Content Attribute Filters
| Filter | Values | Notes |
|---|---|---|
| `category` | string or list | multi-select: Jazz OR Blues OR Soul |
| `tag` | string or list | multi-select |
| `hashtag` | string | exact match |
| `flair` | string | community-specific labels |
| `format` | video, short, live, vod, podcast, article, image, gallery, audio | |
| `duration` | range (min, max) or preset | short / medium / long presets |
| `language` | ISO code | content language |
| `subtitle_language` | ISO code | subtitles available in this language |
| `dubbed_language` | ISO code | dubbed version available |
| `resolution` | SD, HD, FHD, 4K, 8K | |
| `hdr` | bool | |
| `audio_quality` | standard, high, lossless, spatial | |
| `has_subtitles` | bool | |
| `has_audio_description` | bool | accessibility |
| `has_sign_language` | bool | accessibility |
| `content_rating` | G, PG, PG-13, R, etc. | |
| `safe_search` | bool | |
| `sensitive_content` | show, hide, only | |
| `status` | published, live, scheduled, archived | |
| `availability` | free, premium, subscriber_only | |
| `downloadable` | bool | |
| `leaving_soon` | bool or date threshold | availability window ending |
| `award_count` | minimum int | gilded/awarded |
| `has_award` | bool | |
| `post_type` | text, link, image, video, poll, crosspost | |
| `original_only` | bool | exclude crossposts/reposts |
### Date and Time Filters
| Filter | Values |
|---|---|
| `created_after` | ISO date |
| `created_before` | ISO date |
| `created_within` | duration: 7d, 30d, 1y |
| `created_preset` | hour, today, week, month, year |
| `updated_after` | ISO date |
| `event_date` | date range for scheduled/live content |
### Creator Filters
| Filter | Values |
|---|---|
| `creator` | creator_id or handle |
| `exclude_creator` | creator_id or handle |
| `creator_min_followers` | integer |
| `creator_max_followers` | integer |
| `creator_verified` | bool |
| `creator_followed_by_user` | bool |
| `creator_new_to_user` | bool — never seen this creator before |
| `creator_language` | ISO code |
| `creator_location` | region or country |
### Engagement Threshold Filters
| Filter | Values |
|---|---|
| `min_views` | integer |
| `max_views` | integer — for hidden gems |
| `min_likes` | integer |
| `min_like_ratio` | float 01 |
| `min_comments` | integer |
| `min_shares` | integer |
| `min_score` | integer — upvotes for forum-style |
| `min_completion_rate` | float 01 |
### User State Filters
| Filter | Values |
|---|---|
| `seen` | bool — true = already seen, false = unseen only |
| `in_progress` | bool — partially watched |
| `saved` | bool — in user's saved/bookmarked |
| `liked` | bool — user has liked this |
| `downloaded` | bool — available offline |
| `in_collection` | collection_id |
### Geographic Filters
| Filter | Values |
|---|---|
| `content_region` | country or region code |
| `trending_in_region` | country or region code |
| `creator_region` | country or region code |
| `near_location` | lat/lng + radius |
### Cohort Filters
| Filter | Values | Notes |
|---|---|---|
| `cohort` | cohort_name | Pre-defined named cohort |
| `cohort_locale` | locale code (en-US, ja-JP) | User locale match |
| `cohort_age` | range (18-24, 25-34) | User age range |
| `cohort_interest` | keyword or list | User interest match |
| `cohort_engagement_level` | power, regular, casual | Behavioral segment |
| `cohort_format_preference` | short, long, mixed | Content format preference |
---
## Appendix B · Sort Mode Reference
All sort modes must be available on any surface. The application specifies the sort mode; tidalDB executes it natively without application-side sorting logic.
| Sort Mode | Description | Best For |
|---|---|---|
| `relevance` | Text + semantic match score | Search results |
| `personalized` | User preference match | For You surfaces |
| `new` | `created_at DESC` | Latest content |
| `old` | `created_at ASC` | Archives, chronological viewing |
| `top_all_time` | Cumulative quality score, no decay | Classic / best-of |
| `top_hour` | Quality score, last 1h | Real-time quality |
| `top_today` | Quality score, last 24h | Daily best |
| `top_week` | Quality score, last 7d | Weekly digest |
| `top_month` | Quality score, last 30d | Monthly recap |
| `top_year` | Quality score, last 365d | Annual best |
| `hot` | Score / (age + 2)^gravity — decays with time | Community frontpages |
| `trending` | Pure engagement velocity | Trending tabs |
| `rising` | Velocity relative to baseline, age-boosted | Breakout content |
| `controversial` | max(positive_signals × negative_signals) | Debate/discussion |
| `hidden_gems` | High quality, low reach, inverse boost | Discovery |
| `most_viewed` | Raw view count DESC | All-time popularity |
| `most_liked` | Raw like count DESC | Positive sentiment |
| `most_commented` | Raw comment count DESC | Discussion |
| `most_shared` | Raw share count DESC | Virality |
| `shortest` | `duration ASC` | Quick content |
| `longest` | `duration DESC` | Deep dives |
| `alphabetical_asc` | Title AZ | Structured catalogs |
| `alphabetical_desc` | Title ZA | |
| `shuffle` | Random, weighted by quality | Music, "surprise me" |
| `live_viewer_count` | Current viewer count DESC | Live surfaces |
| `date_saved` | When user saved/bookmarked DESC | Personal library |
| `creator_engagement_rate` | Creator's avg engagement ratio | Creator discovery |
---
## Appendix C · Signal Reference
| Signal | Type | Decay | Primary Use |
|---|---|---|---|
| `view` | count | slow (7d half-life) | baseline engagement |
| `unique_view` | count | slow | deduped reach |
| `impression` | count | fast | exposure without engagement |
| `completion` | ratio 01 | very slow | quality signal |
| `partial_completion` | float — last position | slow | continue watching |
| `like` | count | slow | positive sentiment |
| `dislike` | count | slow | negative sentiment |
| `share` | count | medium | virality |
| `repost` | count | medium | Twitter RT / reblog equivalent |
| `quote` | count | medium | engaged reshare with commentary |
| `comment` | count | medium | community engagement |
| `reply` | count | medium | discussion depth |
| `upvote` | count | medium | forum positive signal |
| `downvote` | count | medium | forum negative signal |
| `save` | count | slow | intent to return |
| `pin` | count | slow | Pinterest save-equivalent |
| `collection_add` | count | slow | curation signal |
| `download` | count | slow | high-intent engagement |
| `screenshot` | count | slow | save-intent (Pinterest model) |
| `outbound_click` | count | medium | link content engagement |
| `skip` | count | fast (1d half-life) | negative quality |
| `skip_intro` | bool | fast | format preference |
| `hide` | bool | permanent | hard negative |
| `not_interested` | bool | permanent | hard negative on topic |
| `block` | bool | permanent | hard filter |
| `mute` | bool | permanent | soft filter |
| `report` | count | permanent | quality / moderation flag |
| `follow` | bool | permanent | relationship |
| `unfollow` | event | decays follow signal | relationship decay |
| `interaction_weight` | float | slow | relationship strength |
| `dwell_time` | duration | medium | true engagement depth |
| `replay` | count | medium | exceptional content signal |
| `autoplay_accept` | bool | medium | recommendation quality |
| `autoplay_reject` | bool | fast | recommendation failure |
| `notification_open` | bool | slow | creator notification priority |
| `notification_dismiss` | bool | medium | reduce push frequency |
| `reminder_set` | bool | slow | intent signal for scheduled content |
| `search_click` | count + rank | medium | query relevance |
| `search_impression` | count | fast | query exposure |
| `award_given` | count | permanent | community quality endorsement |