Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 KiB
iknowyou — Architecture
Core Thesis
Communication personalization is a signal processing problem. Every exchange between the system and a person produces observable signals — engagement, sentiment, timing, style — that decay over time and compound across conversations. tidalDB's signal ledger, preference vectors, windowed aggregation, and cohort system provide the learning substrate. iknowyou wraps these primitives with an observation pipeline (LM-as-classifier), a briefing engine (query-to-profile), and a generation interface (brief-to-prompt).
The system has no training loop, no batch pipeline, no feature store. Learning is continuous: signals are written on every exchange, preference vectors update via EMA, and the next query reflects the latest state. The entire closed loop executes within a single process.
Domain Model
Entities
| Entity | tidalDB Kind | What it represents |
|---|---|---|
| Person | User |
An individual the system communicates with. Has metadata (timezone, role, context), a preference vector (learned from message engagement), a signal ledger, cohort memberships, and a user-state index (conversation history). |
| Message | Item |
A message the system generated and sent. Has metadata (topic, tone, length, structure, time_sent, conversation_id), an embedding (from the message content), and signals written against it based on the person's response. |
| Observation | Item |
A natural-language statement about a person's communication pattern. Has an embedding (for semantic retrieval), a confidence signal (decays over time), and metadata (person_id, category, source_conversation). |
Messages and observations are both Item entities but are distinguished by a kind metadata field: "message" or "observation". This reuses tidalDB's existing entity model without extension.
Schema Primitives
| Primitive | Configuration | Purpose |
|---|---|---|
| Signals | 10 signal types (see below) | Capture engagement, sentiment, topic, timing dimensions |
| Decay | Exponential, per-signal half-life | Recent interactions matter more; old patterns fade |
| Windows | 1h, 24h, 7d, 30d, AllTime | Temporal aggregation for time-of-day patterns |
| Velocity | On engagement signals | Distinguish "always liked X" from "suddenly interested in X" |
| Preference vectors | 384D, EMA with adaptive rate | Communication style convergence per-person |
| Cohorts | Predicate-based, per-cohort ledger | Cold-start priors, cross-pollination, drift detection |
Signal Schema
Engagement Signals (on Message items)
| Signal | Half-life | Windows | Velocity | Weight semantics |
|---|---|---|---|---|
replied |
7d | 1h, 24h, 7d, AllTime | yes | 1.0 = responded at all |
replied_fast |
3d | 1h, 24h, 7d | yes | 1.0 = latency < 120s |
replied_substantively |
7d | 24h, 7d, AllTime | yes | 0.0–1.0 normalized by word count / depth |
positive_sentiment |
14d | 24h, 7d, 30d, AllTime | no | 0.0–1.0 from observer sentiment score |
negative_sentiment |
3d | 24h, 7d | no | 0.0–1.0 from observer sentiment score |
went_silent |
1d | 24h, 7d | no | 1.0 = no response after timeout |
Topic Signals (on topic-cluster items or Message items)
| Signal | Half-life | Windows | Velocity | Weight semantics |
|---|---|---|---|---|
topic_engaged |
14d | 7d, 30d, AllTime | yes | 1.0 = stayed on or deepened topic |
topic_dropped |
3d | 7d | no | 1.0 = redirected or went brief |
initiated |
30d | 30d, AllTime | no | 1.5 = they brought this up unprompted |
Meta Signals (on Observation items)
| Signal | Half-life | Windows | Velocity | Weight semantics |
|---|---|---|---|---|
confidence |
30d | AllTime | no | 1.0 at creation; decays unless reinforced |
Design Rationale
- Asymmetric decay: Negative signals (3d) decay 2–5x faster than positive signals (7–14d). The system is forgiving by default. Bad days don't poison the model.
initiatedis the strongest signal: When someone raises a topic unprompted, that's stronger evidence of interest than responding to a topic you raised. Weight 1.5, half-life 30d.went_silentis gentle: 1-day half-life. Silence might mean they're busy, not that the message was wrong. But it's still a signal — if silence correlates with a pattern (late-night messages, formal tone), the preference vector will drift away from that pattern.- Velocity on engagement signals: Velocity separates stable preferences from emerging ones. If
topic_engagedvelocity spikes on "replication" this week, the brief surfaces it as a rising interest — even if AllTime count is low.
Module Structure
applications/iknowyou/
├── engine/ ← Core library (no network, no LM calls)
│ └── src/
│ ├── lib.rs ← IkyEngine: wraps TidalDb
│ ├── schema.rs ← Signal schema + cohort definitions
│ ├── observer.rs ← ObserverOutput: structured extraction type
│ ├── briefing.rs ← Brief: queries tidalDB, assembles profile
│ ├── signals.rs ← Signal writing: observation → tidalDB signals
│ ├── observations.rs ← Observation lifecycle: write, retrieve, decay
│ └── cohorts.rs ← Cohort definitions + cold-start logic
│
├── server/ ← HTTP API + LM integration
│ └── src/
│ ├── main.rs ← Axum server, startup, shutdown
│ ├── handlers.rs ← /message, /observe, /brief, /feedback
│ ├── llm.rs ← LM client: observer calls + generation calls
│ └── loop.rs ← Orchestrator: observe → learn → brief → generate
│
├── vision.md ← Product vision
└── architecture.md ← This document
Dependency Flow
server (Axum, LM client)
│
├──→ engine (pure Rust, no IO except tidalDB)
│ │
│ └──→ tidalDB (embedded, same process)
│
└──→ LM API (HTTP, external)
The engine crate has zero network dependencies. It takes structured ObserverOutput and returns structured Brief. The server crate handles LM API calls and HTTP. This separation means the engine is fully testable without mocking LM calls.
The Closed Loop — Detailed
Phase 1: Observe
When a person responds to a message (or doesn't respond within the timeout window), the server calls the observer LM with the conversation context and the person's message.
Observer input:
System message sent: "Have you looked at what happens when segment count exceeds L0?"
Person replied: "yeah good call - the compaction pass is actually the bottleneck,
not the segment count itself. been profiling it all morning"
Time since system message: 47 seconds
Conversation turn: 4
Observer output (structured JSON, single inference):
{
"engagement": {
"replied": true,
"latency_seconds": 47,
"substantive": true,
"word_count": 22,
"sentiment_score": 0.75,
"sentiment_direction": "positive"
},
"style": {
"formality": 0.2,
"uses_lowercase": true,
"uses_jargon": true,
"structure": "stream_of_thought",
"emoji": false
},
"topic": {
"primary": "compaction_profiling",
"domain": "database_internals",
"specificity": "high",
"continued_from_previous": true,
"deepened": true
},
"dynamics": {
"redirected": true,
"redirect_direction": "more_specific",
"who_is_leading": "person",
"built_on_previous": true,
"corrected_system": true
}
}
The observer is a small, fast model (Haiku-class). It doesn't need to be creative — it needs to reliably extract structure. Latency target: < 500ms. Cost per call: negligible.
Phase 2: Learn
The engine receives ObserverOutput and writes signals to tidalDB. This is a pure function: structured input → signal writes. No LM call.
Signal writes for this exchange:
// Engagement signals on the sent message
db.signal("replied", msg_entity_id, 1.0, now)?;
db.signal("replied_fast", msg_entity_id, 1.0, now)?; // 47s < 120s
db.signal("replied_substantively", msg_entity_id, 0.85, now)?; // normalized
db.signal("positive_sentiment", msg_entity_id, 0.75, now)?;
// Topic signals
db.signal("topic_engaged", topic_entity_id("compaction_profiling"), 1.0, now)?;
db.signal("topic_engaged", topic_entity_id("database_internals"), 1.0, now)?;
// No negative signals this exchange
Preference vector update: The sent message's embedding blends into the person's preference vector. The message was direct, technical, question-form — so the preference vector shifts toward that communication style. EMA adaptive rate: high early (person has few interactions), lower as history accumulates.
Observation generation (periodic, not every turn): Every N turns or on session close, the observer produces natural-language observations:
"Jordan corrects the system's framing and steers toward more specific
technical problems — prefers to lead the conversation direction"
"Jordan responds fastest to direct technical questions (median 45s)
vs. status-check questions (median 4m)"
These are stored as Item entities with embeddings, kind: "observation", and a confidence signal at weight 1.0. The confidence decays with a 30-day half-life. If the same pattern is observed again, confidence is reinforced.
Cohort propagation:
If the person matches the developers cohort (via role == "engineer" predicate), these signals also write to the cohort's signal ledger. Aggregate effect: the developers cohort accumulates evidence that direct technical questions produce fast, substantive, positive replies.
Phase 3: Brief
Before generating the next message, the engine queries tidalDB and assembles a communication brief. This is a read-only operation — no writes, no LM calls.
Brief structure:
{
"person": {
"id": "jordan",
"metadata": { "timezone": "America/Los_Angeles", "role": "engineer" },
"interaction_count": 47,
"first_interaction": "2026-01-15T09:00:00Z"
},
"topics": {
"hot": [
{ "topic": "compaction_profiling", "velocity": "rising", "alltime": 12 },
{ "topic": "wal_recovery", "velocity": "stable", "alltime": 28 },
{ "topic": "replication", "velocity": "rising", "alltime": 3 }
],
"cold": [
{ "topic": "documentation", "last_engaged": "2026-01-20", "sentiment": "negative" }
],
"initiated_by_person": ["compaction_profiling", "rust_performance"]
},
"style": {
"formality": { "current": 0.2, "trend": "stable" },
"preferred_length": "medium",
"preferred_structure": "conversational",
"responds_to_questions": true,
"prefers_to_lead": true,
"jargon_comfortable": true,
"emoji_usage": "none"
},
"timing": {
"most_active_hours": [9, 10, 11, 21, 22],
"fastest_reply_hours": [21, 22],
"goes_silent_after": 23,
"current_hour": 21,
"day_of_week": "tuesday",
"in_active_window": true
},
"what_works": {
"high_engagement_patterns": [
"direct technical questions about specific subsystems",
"building on their correction or redirection",
"short messages that open a thread, not close one"
],
"recent_positive_messages": [
{ "summary": "Asked about L0 threshold during compaction", "sentiment": 0.75 },
{ "summary": "Shared profiling approach for signal write path", "sentiment": 0.82 }
]
},
"what_doesnt_work": {
"low_engagement_patterns": [
"status-update style messages",
"long explanations without questions",
"messages after 11pm Pacific"
]
},
"observations": [
"Jordan corrects framing and steers toward specifics — prefers to lead",
"Jordan's replies get shorter after 10pm — engagement drops",
"Jordan uses 'yeah' as opener when genuinely engaged, 'sure' when not"
],
"cohort_priors": {
"developers": {
"preferred_tone": "direct",
"preferred_depth": "technical",
"avg_engagement_length": "medium"
}
}
}
How the brief is assembled:
| Brief section | tidalDB query | Primitive used |
|---|---|---|
topics.hot |
read_decay_score + read_velocity on topic items |
Signal decay, velocity |
topics.cold |
Topic items with low AllTime count + negative sentiment | Windowed aggregation |
topics.initiated_by_person |
Items with initiated signal > threshold |
Signal decay |
style.* |
Person metadata + observer-written style fields | Entity metadata |
timing.* |
read_windowed_count("replied", Window::OneHour) across 24 hour buckets |
Windowed aggregation |
what_works |
retrieve() with person's preference vector, filtered to high-sentiment messages |
ANN + preference vector |
what_doesnt_work |
Messages with went_silent or negative_sentiment signals |
Signal decay |
observations |
search() with current conversation context as query, filtered to kind: "observation" |
BM25 + ANN semantic retrieval |
cohort_priors |
Cohort ledger queries for person's matching cohorts | Cohort signal ledger |
Phase 4: Generate
The brief is injected into the LM's system prompt. The LM generates the next message. The engine stores the generated message as a new Item entity with metadata and embedding.
[system]
You are communicating with Jordan. Here is what we know about how
Jordan communicates:
{brief as structured text}
Guidelines derived from this profile:
- Be direct and technical. Ask specific questions.
- Let Jordan lead the conversation direction — build on their framing.
- Keep messages medium length. Conversational, not structured.
- This is an active window (9pm Tuesday) — Jordan is typically responsive now.
- Current hot topic with rising velocity: compaction profiling.
- Avoid: status updates, long explanations, messages after 11pm.
The LM never touches tidalDB. It reads the brief, generates a message, and the loop continues.
Observation Lifecycle
Observations are the bridge between raw signals and human-legible learning. They capture patterns that numbers alone can't express: "uses 'yeah' when engaged, 'sure' when not."
Creation
Observations are generated by the observer LM periodically:
- Every 5 conversation turns
- On session close
- When the observer detects a novel pattern (contradiction with existing observations, or new behavioral signal)
Each observation is:
- Embedded (384D, same model as messages)
- Stored as an
Itemwithkind: "observation",person_id,category(style, topic, timing, dynamics) - Given a
confidencesignal at weight 1.0
Retrieval
Before briefing, the engine runs db.search() with the current conversation context as the query text, filtered to kind: "observation" and the target person. BM25 matches on keywords; ANN matches on semantic similarity. RRF fusion ranks by relevance.
Top-5 observations are included in the brief.
Decay and Reinforcement
The confidence signal has a 30-day half-life. An observation created 60 days ago has ~25% of its original weight. If the same pattern is observed again, a new confidence signal is written — reinforcing the observation back toward full weight.
Observations that are never reinforced fade below a retrieval threshold and are effectively forgotten. No garbage collection needed — decay handles it.
Contradiction Resolution
When the observer generates an observation that contradicts an existing one (e.g., "Jordan now prefers formal tone" vs. existing "Jordan prefers casual tone"), the new observation is stored alongside the old one. The old observation's confidence is decaying; the new one starts at 1.0. Within a few weeks, the old observation falls below retrieval threshold naturally.
No explicit deletion. No conflict resolution logic. Decay handles contradiction.
Cohort Architecture
Definition
Cohorts are defined at schema time in engine/src/cohorts.rs:
registry.define("developers", Predicate::Eq {
field: "role".into(),
value: "engineer".into(),
});
registry.define("us_pacific", Predicate::Eq {
field: "timezone".into(),
value: "America/Los_Angeles".into(),
});
registry.define("high_engagement", Predicate::Range {
field: "interaction_count".into(),
min: "20".into(),
max: None,
});
Cold-Start Flow
New person arrives
→ Match against cohort predicates (metadata-based)
→ For each matching cohort:
Query cohort signal ledger for aggregate patterns
→ Merge cohort priors into brief (weighted by cohort size / confidence)
→ LM generates first message using cohort-derived style
→ Person responds
→ Individual signals begin overriding cohort priors
The weight of cohort priors in the brief decreases as individual interaction count grows. By ~10 interactions, individual signals dominate. By ~30, cohort priors are negligible unless individual data is sparse on a specific dimension.
Cohort Learning
Cohort signal ledgers learn from all members simultaneously. When Jordan (a developers cohort member) responds positively to a direct technical question, that signal writes to both Jordan's personal ledger and the developers cohort ledger.
This means: the more people the system talks to, the better its cold-start priors become — without any explicit aggregation step. tidalDB's cohort signal propagation handles it at write time.
Conversation (Session) Mechanics
Each conversation is a tidalDB session:
let handle = db.start_session(person_id, agent_id, "iknowyou_default", metadata)?;
// During conversation:
db.session_signal(&handle, "replied", msg_id, 1.0, now)?;
// ...more signals per exchange...
// On conversation end:
let summary = db.close_session(handle)?;
// → Triggers preference vector update (EMA blend of engaged message embeddings)
// → Triggers observation generation (periodic analysis)
// → Session signals aggregate into global ledger
Session-scoped vs. global signals: Within a session, signals are scoped — they don't affect the global ledger until session close. This prevents a single bad conversation from immediately poisoning the model. Session close triggers the EMA preference update and promotes signals to global state.
Long conversations: For ongoing conversations (e.g., a persistent chat channel), sessions can be rotated on a timer — close and immediately reopen every 30 minutes. This provides regular preference updates without waiting for an explicit "conversation end."
Embedding Strategy
Message Embeddings (384D)
Generated from message text using a sentence-transformer model (external to iknowyou). The embedding captures semantic content + style in a single vector.
Messages with similar communication style (casual + technical + question) cluster in the embedding space. The person's preference vector — evolved through EMA blending of positively-received message embeddings — converges on the region of embedding space that represents "how this person likes to be communicated with."
Observation Embeddings (384D, same model)
Observations are embedded with the same model. This means semantic search over observations uses the same distance metric as message retrieval. "Jordan prefers direct questions" is retrievable both by keyword ("direct questions") and by semantic similarity to a conversation about asking direct questions.
Preference Vector Evolution
Initial: null (cold start, use cohort priors)
After 1 msg: preference = message_embedding (first positive response)
After N: preference = (1 - alpha) * preference + alpha * new_message_embedding
where alpha = base_alpha / (1 + ln(update_count + 1))
base_alpha = 0.15
The adaptive learning rate means:
- Interaction 1: alpha ≈ 0.15 (strong influence)
- Interaction 5: alpha ≈ 0.08 (moderate)
- Interaction 20: alpha ≈ 0.04 (refinement)
- Interaction 100: alpha ≈ 0.03 (stable, slow drift)
Write Path — Full Trace
A person sends a reply. Here is everything that happens:
1. Server receives person's message
└─ HTTP handler in server/handlers.rs
2. Observer LM call (async, < 500ms)
├─ Input: conversation context + person's message
└─ Output: ObserverOutput (structured JSON)
3. Engine processes ObserverOutput
├─ 3a. Write engagement signals on sent message
│ ├─ db.signal("replied", msg_id, 1.0, now) → WAL + hot tier
│ ├─ db.signal("replied_fast", msg_id, 1.0, now) → WAL + hot tier
│ ├─ db.signal("replied_substantively", msg_id, 0.85, now)
│ └─ db.signal("positive_sentiment", msg_id, 0.75, now)
│
├─ 3b. Write topic signals
│ ├─ db.signal("topic_engaged", topic_id, 1.0, now)
│ └─ db.signal("initiated", topic_id, 1.5, now) [if person-initiated]
│
├─ 3c. Update person metadata
│ └─ db.write_user_metadata(person_id, updated_fields) [style cues, timing]
│
├─ 3d. Session signal (within active session)
│ └─ db.session_signal(&handle, ...) [scoped, not yet global]
│
└─ 3e. Cohort propagation (automatic at signal-write time)
└─ For each matching cohort: cohort_ledger.record(...)
4. [Every 5 turns] Observer generates observations
├─ Stored as Item entities with embeddings
└─ confidence signal at 1.0, 30d half-life
5. Briefing engine queries tidalDB (read-only, < 10ms)
├─ Signal reads: decay scores, windowed counts, velocity
├─ ANN retrieval: preference-aligned past messages
├─ Search: relevant observations for current context
├─ Cohort queries: priors for sparse dimensions
└─ Assembles Brief struct
6. Generator LM call
├─ Input: brief (as system prompt) + conversation history
└─ Output: next message
7. Store generated message as Item
├─ db.write_item_with_metadata(msg_id, metadata)
├─ db.write_item_embedding(msg_id, embedding)
└─ Message is now a target for future signals
8. Send message to person → loop continues
Latency budget:
| Step | Target | Notes |
|---|---|---|
| Observer LM call | < 500ms | Small model, structured output |
| Signal writes (6–8 signals) | < 1ms total | tidalDB hot path, < 100µs each |
| Metadata update | < 200µs | Single fjall write |
| Briefing query | < 10ms | Signal reads + ANN + search |
| Generator LM call | 500ms–2s | Full model, depends on length |
| Message storage | < 500µs | Metadata + embedding write |
| Total loop | < 3s | Dominated by LM calls |
The tidalDB operations are negligible. The latency floor is the LM inference time.
Performance Targets
| Operation | Target |
|---|---|
| Signal write (single, including WAL) | < 100µs |
| Brief assembly (all queries) | < 10ms |
| Observation retrieval (semantic search) | < 5ms |
| Preference vector ANN query (10K messages) | < 3ms |
| Full loop excluding LM calls | < 15ms |
| Observer LM call | < 500ms |
| Generator LM call | < 2s |
| End-to-end response latency | < 3s |
Key Architectural Decisions
| Decision | Choice | Why |
|---|---|---|
| Observer as separate LM call | Small/fast model, structured output | Decouples observation quality from generation quality. Testable independently. Cheap per-call. |
| Messages as tidalDB Items | Reuse entity model, no schema extension | Messages get embeddings, signals, metadata, ANN retrieval for free. |
| Observations as Items (not metadata) | Semantic retrieval via search pipeline | Observations are retrievable by relevance to current context, not just by person. Decay handles staleness. |
| Engine has no LM dependency | Pure Rust, structured IO | Fully testable without mocking LM. Server owns all external calls. |
| Session-scoped signals | Promote to global on close | Prevents single bad conversation from poisoning the model. Batched preference update. |
| Asymmetric decay (negative < positive) | 3d negative vs. 7–14d positive | Forgiving by default. Bad days fade fast. Good patterns persist. |
| Cohort priors fade with interaction count | Weight = 1 / (1 + individual_count / 10) | Bootstraps cold start, gets out of the way once individual data exists. |
| 384D embeddings | Sentence-transformer class | Good quality/cost ratio. Same model for messages and observations enables cross-type search. |
| Brief as JSON, not prompt text | Structured, inspectable, testable | Can validate brief contents without running the generator. Can swap LM providers without changing the brief format. |
| Periodic observation generation | Every 5 turns + session close | Not every turn (too noisy, too expensive). Not only session close (too infrequent for long conversations). |