tidaldb/applications/iknowyou/architecture.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

25 KiB
Raw Blame History

iknowyou — Architecture

Core Thesis

Communication personalization is a signal processing problem. Every exchange between the system and a person produces observable signals — engagement, sentiment, timing, style — that decay over time and compound across conversations. tidalDB's signal ledger, preference vectors, windowed aggregation, and cohort system provide the learning substrate. iknowyou wraps these primitives with an observation pipeline (LM-as-classifier), a briefing engine (query-to-profile), and a generation interface (brief-to-prompt).

The system has no training loop, no batch pipeline, no feature store. Learning is continuous: signals are written on every exchange, preference vectors update via EMA, and the next query reflects the latest state. The entire closed loop executes within a single process.

Domain Model

Entities

Entity tidalDB Kind What it represents
Person User An individual the system communicates with. Has metadata (timezone, role, context), a preference vector (learned from message engagement), a signal ledger, cohort memberships, and a user-state index (conversation history).
Message Item A message the system generated and sent. Has metadata (topic, tone, length, structure, time_sent, conversation_id), an embedding (from the message content), and signals written against it based on the person's response.
Observation Item A natural-language statement about a person's communication pattern. Has an embedding (for semantic retrieval), a confidence signal (decays over time), and metadata (person_id, category, source_conversation).

Messages and observations are both Item entities but are distinguished by a kind metadata field: "message" or "observation". This reuses tidalDB's existing entity model without extension.

Schema Primitives

Primitive Configuration Purpose
Signals 10 signal types (see below) Capture engagement, sentiment, topic, timing dimensions
Decay Exponential, per-signal half-life Recent interactions matter more; old patterns fade
Windows 1h, 24h, 7d, 30d, AllTime Temporal aggregation for time-of-day patterns
Velocity On engagement signals Distinguish "always liked X" from "suddenly interested in X"
Preference vectors 384D, EMA with adaptive rate Communication style convergence per-person
Cohorts Predicate-based, per-cohort ledger Cold-start priors, cross-pollination, drift detection

Signal Schema

Engagement Signals (on Message items)

Signal Half-life Windows Velocity Weight semantics
replied 7d 1h, 24h, 7d, AllTime yes 1.0 = responded at all
replied_fast 3d 1h, 24h, 7d yes 1.0 = latency < 120s
replied_substantively 7d 24h, 7d, AllTime yes 0.01.0 normalized by word count / depth
positive_sentiment 14d 24h, 7d, 30d, AllTime no 0.01.0 from observer sentiment score
negative_sentiment 3d 24h, 7d no 0.01.0 from observer sentiment score
went_silent 1d 24h, 7d no 1.0 = no response after timeout

Topic Signals (on topic-cluster items or Message items)

Signal Half-life Windows Velocity Weight semantics
topic_engaged 14d 7d, 30d, AllTime yes 1.0 = stayed on or deepened topic
topic_dropped 3d 7d no 1.0 = redirected or went brief
initiated 30d 30d, AllTime no 1.5 = they brought this up unprompted

Meta Signals (on Observation items)

Signal Half-life Windows Velocity Weight semantics
confidence 30d AllTime no 1.0 at creation; decays unless reinforced

Design Rationale

  • Asymmetric decay: Negative signals (3d) decay 25x faster than positive signals (714d). The system is forgiving by default. Bad days don't poison the model.
  • initiated is the strongest signal: When someone raises a topic unprompted, that's stronger evidence of interest than responding to a topic you raised. Weight 1.5, half-life 30d.
  • went_silent is gentle: 1-day half-life. Silence might mean they're busy, not that the message was wrong. But it's still a signal — if silence correlates with a pattern (late-night messages, formal tone), the preference vector will drift away from that pattern.
  • Velocity on engagement signals: Velocity separates stable preferences from emerging ones. If topic_engaged velocity spikes on "replication" this week, the brief surfaces it as a rising interest — even if AllTime count is low.

Module Structure

applications/iknowyou/
├── engine/                          ← Core library (no network, no LM calls)
│   └── src/
│       ├── lib.rs                   ← IkyEngine: wraps TidalDb
│       ├── schema.rs                ← Signal schema + cohort definitions
│       ├── observer.rs              ← ObserverOutput: structured extraction type
│       ├── briefing.rs              ← Brief: queries tidalDB, assembles profile
│       ├── signals.rs               ← Signal writing: observation → tidalDB signals
│       ├── observations.rs          ← Observation lifecycle: write, retrieve, decay
│       └── cohorts.rs               ← Cohort definitions + cold-start logic
│
├── server/                          ← HTTP API + LM integration
│   └── src/
│       ├── main.rs                  ← Axum server, startup, shutdown
│       ├── handlers.rs              ← /message, /observe, /brief, /feedback
│       ├── llm.rs                   ← LM client: observer calls + generation calls
│       └── loop.rs                  ← Orchestrator: observe → learn → brief → generate
│
├── vision.md                        ← Product vision
└── architecture.md                  ← This document

Dependency Flow

server (Axum, LM client)
  │
  ├──→ engine (pure Rust, no IO except tidalDB)
  │      │
  │      └──→ tidalDB (embedded, same process)
  │
  └──→ LM API (HTTP, external)

The engine crate has zero network dependencies. It takes structured ObserverOutput and returns structured Brief. The server crate handles LM API calls and HTTP. This separation means the engine is fully testable without mocking LM calls.

The Closed Loop — Detailed

Phase 1: Observe

When a person responds to a message (or doesn't respond within the timeout window), the server calls the observer LM with the conversation context and the person's message.

Observer input:

System message sent: "Have you looked at what happens when segment count exceeds L0?"
Person replied: "yeah good call - the compaction pass is actually the bottleneck,
                 not the segment count itself. been profiling it all morning"
Time since system message: 47 seconds
Conversation turn: 4

Observer output (structured JSON, single inference):

{
  "engagement": {
    "replied": true,
    "latency_seconds": 47,
    "substantive": true,
    "word_count": 22,
    "sentiment_score": 0.75,
    "sentiment_direction": "positive"
  },
  "style": {
    "formality": 0.2,
    "uses_lowercase": true,
    "uses_jargon": true,
    "structure": "stream_of_thought",
    "emoji": false
  },
  "topic": {
    "primary": "compaction_profiling",
    "domain": "database_internals",
    "specificity": "high",
    "continued_from_previous": true,
    "deepened": true
  },
  "dynamics": {
    "redirected": true,
    "redirect_direction": "more_specific",
    "who_is_leading": "person",
    "built_on_previous": true,
    "corrected_system": true
  }
}

The observer is a small, fast model (Haiku-class). It doesn't need to be creative — it needs to reliably extract structure. Latency target: < 500ms. Cost per call: negligible.

Phase 2: Learn

The engine receives ObserverOutput and writes signals to tidalDB. This is a pure function: structured input → signal writes. No LM call.

Signal writes for this exchange:

// Engagement signals on the sent message
db.signal("replied",                 msg_entity_id, 1.0,  now)?;
db.signal("replied_fast",           msg_entity_id, 1.0,  now)?;  // 47s < 120s
db.signal("replied_substantively",  msg_entity_id, 0.85, now)?;  // normalized
db.signal("positive_sentiment",     msg_entity_id, 0.75, now)?;

// Topic signals
db.signal("topic_engaged", topic_entity_id("compaction_profiling"), 1.0, now)?;
db.signal("topic_engaged", topic_entity_id("database_internals"),   1.0, now)?;

// No negative signals this exchange

Preference vector update: The sent message's embedding blends into the person's preference vector. The message was direct, technical, question-form — so the preference vector shifts toward that communication style. EMA adaptive rate: high early (person has few interactions), lower as history accumulates.

Observation generation (periodic, not every turn): Every N turns or on session close, the observer produces natural-language observations:

"Jordan corrects the system's framing and steers toward more specific
 technical problems — prefers to lead the conversation direction"

"Jordan responds fastest to direct technical questions (median 45s)
 vs. status-check questions (median 4m)"

These are stored as Item entities with embeddings, kind: "observation", and a confidence signal at weight 1.0. The confidence decays with a 30-day half-life. If the same pattern is observed again, confidence is reinforced.

Cohort propagation: If the person matches the developers cohort (via role == "engineer" predicate), these signals also write to the cohort's signal ledger. Aggregate effect: the developers cohort accumulates evidence that direct technical questions produce fast, substantive, positive replies.

Phase 3: Brief

Before generating the next message, the engine queries tidalDB and assembles a communication brief. This is a read-only operation — no writes, no LM calls.

Brief structure:

{
  "person": {
    "id": "jordan",
    "metadata": { "timezone": "America/Los_Angeles", "role": "engineer" },
    "interaction_count": 47,
    "first_interaction": "2026-01-15T09:00:00Z"
  },

  "topics": {
    "hot": [
      { "topic": "compaction_profiling", "velocity": "rising",  "alltime": 12 },
      { "topic": "wal_recovery",         "velocity": "stable",  "alltime": 28 },
      { "topic": "replication",          "velocity": "rising",  "alltime": 3  }
    ],
    "cold": [
      { "topic": "documentation", "last_engaged": "2026-01-20", "sentiment": "negative" }
    ],
    "initiated_by_person": ["compaction_profiling", "rust_performance"]
  },

  "style": {
    "formality": { "current": 0.2, "trend": "stable" },
    "preferred_length": "medium",
    "preferred_structure": "conversational",
    "responds_to_questions": true,
    "prefers_to_lead": true,
    "jargon_comfortable": true,
    "emoji_usage": "none"
  },

  "timing": {
    "most_active_hours": [9, 10, 11, 21, 22],
    "fastest_reply_hours": [21, 22],
    "goes_silent_after": 23,
    "current_hour": 21,
    "day_of_week": "tuesday",
    "in_active_window": true
  },

  "what_works": {
    "high_engagement_patterns": [
      "direct technical questions about specific subsystems",
      "building on their correction or redirection",
      "short messages that open a thread, not close one"
    ],
    "recent_positive_messages": [
      { "summary": "Asked about L0 threshold during compaction", "sentiment": 0.75 },
      { "summary": "Shared profiling approach for signal write path", "sentiment": 0.82 }
    ]
  },

  "what_doesnt_work": {
    "low_engagement_patterns": [
      "status-update style messages",
      "long explanations without questions",
      "messages after 11pm Pacific"
    ]
  },

  "observations": [
    "Jordan corrects framing and steers toward specifics — prefers to lead",
    "Jordan's replies get shorter after 10pm — engagement drops",
    "Jordan uses 'yeah' as opener when genuinely engaged, 'sure' when not"
  ],

  "cohort_priors": {
    "developers": {
      "preferred_tone": "direct",
      "preferred_depth": "technical",
      "avg_engagement_length": "medium"
    }
  }
}

How the brief is assembled:

Brief section tidalDB query Primitive used
topics.hot read_decay_score + read_velocity on topic items Signal decay, velocity
topics.cold Topic items with low AllTime count + negative sentiment Windowed aggregation
topics.initiated_by_person Items with initiated signal > threshold Signal decay
style.* Person metadata + observer-written style fields Entity metadata
timing.* read_windowed_count("replied", Window::OneHour) across 24 hour buckets Windowed aggregation
what_works retrieve() with person's preference vector, filtered to high-sentiment messages ANN + preference vector
what_doesnt_work Messages with went_silent or negative_sentiment signals Signal decay
observations search() with current conversation context as query, filtered to kind: "observation" BM25 + ANN semantic retrieval
cohort_priors Cohort ledger queries for person's matching cohorts Cohort signal ledger

Phase 4: Generate

The brief is injected into the LM's system prompt. The LM generates the next message. The engine stores the generated message as a new Item entity with metadata and embedding.

[system]
You are communicating with Jordan. Here is what we know about how
Jordan communicates:

{brief as structured text}

Guidelines derived from this profile:
- Be direct and technical. Ask specific questions.
- Let Jordan lead the conversation direction — build on their framing.
- Keep messages medium length. Conversational, not structured.
- This is an active window (9pm Tuesday) — Jordan is typically responsive now.
- Current hot topic with rising velocity: compaction profiling.
- Avoid: status updates, long explanations, messages after 11pm.

The LM never touches tidalDB. It reads the brief, generates a message, and the loop continues.

Observation Lifecycle

Observations are the bridge between raw signals and human-legible learning. They capture patterns that numbers alone can't express: "uses 'yeah' when engaged, 'sure' when not."

Creation

Observations are generated by the observer LM periodically:

  • Every 5 conversation turns
  • On session close
  • When the observer detects a novel pattern (contradiction with existing observations, or new behavioral signal)

Each observation is:

  1. Embedded (384D, same model as messages)
  2. Stored as an Item with kind: "observation", person_id, category (style, topic, timing, dynamics)
  3. Given a confidence signal at weight 1.0

Retrieval

Before briefing, the engine runs db.search() with the current conversation context as the query text, filtered to kind: "observation" and the target person. BM25 matches on keywords; ANN matches on semantic similarity. RRF fusion ranks by relevance.

Top-5 observations are included in the brief.

Decay and Reinforcement

The confidence signal has a 30-day half-life. An observation created 60 days ago has ~25% of its original weight. If the same pattern is observed again, a new confidence signal is written — reinforcing the observation back toward full weight.

Observations that are never reinforced fade below a retrieval threshold and are effectively forgotten. No garbage collection needed — decay handles it.

Contradiction Resolution

When the observer generates an observation that contradicts an existing one (e.g., "Jordan now prefers formal tone" vs. existing "Jordan prefers casual tone"), the new observation is stored alongside the old one. The old observation's confidence is decaying; the new one starts at 1.0. Within a few weeks, the old observation falls below retrieval threshold naturally.

No explicit deletion. No conflict resolution logic. Decay handles contradiction.

Cohort Architecture

Definition

Cohorts are defined at schema time in engine/src/cohorts.rs:

registry.define("developers", Predicate::Eq {
    field: "role".into(),
    value: "engineer".into(),
});

registry.define("us_pacific", Predicate::Eq {
    field: "timezone".into(),
    value: "America/Los_Angeles".into(),
});

registry.define("high_engagement", Predicate::Range {
    field: "interaction_count".into(),
    min: "20".into(),
    max: None,
});

Cold-Start Flow

New person arrives
  → Match against cohort predicates (metadata-based)
    → For each matching cohort:
        Query cohort signal ledger for aggregate patterns
    → Merge cohort priors into brief (weighted by cohort size / confidence)
      → LM generates first message using cohort-derived style
        → Person responds
          → Individual signals begin overriding cohort priors

The weight of cohort priors in the brief decreases as individual interaction count grows. By ~10 interactions, individual signals dominate. By ~30, cohort priors are negligible unless individual data is sparse on a specific dimension.

Cohort Learning

Cohort signal ledgers learn from all members simultaneously. When Jordan (a developers cohort member) responds positively to a direct technical question, that signal writes to both Jordan's personal ledger and the developers cohort ledger.

This means: the more people the system talks to, the better its cold-start priors become — without any explicit aggregation step. tidalDB's cohort signal propagation handles it at write time.

Conversation (Session) Mechanics

Each conversation is a tidalDB session:

let handle = db.start_session(person_id, agent_id, "iknowyou_default", metadata)?;

// During conversation:
db.session_signal(&handle, "replied", msg_id, 1.0, now)?;
// ...more signals per exchange...

// On conversation end:
let summary = db.close_session(handle)?;
// → Triggers preference vector update (EMA blend of engaged message embeddings)
// → Triggers observation generation (periodic analysis)
// → Session signals aggregate into global ledger

Session-scoped vs. global signals: Within a session, signals are scoped — they don't affect the global ledger until session close. This prevents a single bad conversation from immediately poisoning the model. Session close triggers the EMA preference update and promotes signals to global state.

Long conversations: For ongoing conversations (e.g., a persistent chat channel), sessions can be rotated on a timer — close and immediately reopen every 30 minutes. This provides regular preference updates without waiting for an explicit "conversation end."

Embedding Strategy

Message Embeddings (384D)

Generated from message text using a sentence-transformer model (external to iknowyou). The embedding captures semantic content + style in a single vector.

Messages with similar communication style (casual + technical + question) cluster in the embedding space. The person's preference vector — evolved through EMA blending of positively-received message embeddings — converges on the region of embedding space that represents "how this person likes to be communicated with."

Observation Embeddings (384D, same model)

Observations are embedded with the same model. This means semantic search over observations uses the same distance metric as message retrieval. "Jordan prefers direct questions" is retrievable both by keyword ("direct questions") and by semantic similarity to a conversation about asking direct questions.

Preference Vector Evolution

Initial:     null (cold start, use cohort priors)
After 1 msg: preference = message_embedding (first positive response)
After N:     preference = (1 - alpha) * preference + alpha * new_message_embedding
             where alpha = base_alpha / (1 + ln(update_count + 1))
             base_alpha = 0.15

The adaptive learning rate means:

  • Interaction 1: alpha ≈ 0.15 (strong influence)
  • Interaction 5: alpha ≈ 0.08 (moderate)
  • Interaction 20: alpha ≈ 0.04 (refinement)
  • Interaction 100: alpha ≈ 0.03 (stable, slow drift)

Write Path — Full Trace

A person sends a reply. Here is everything that happens:

1. Server receives person's message
   └─ HTTP handler in server/handlers.rs

2. Observer LM call (async, < 500ms)
   ├─ Input: conversation context + person's message
   └─ Output: ObserverOutput (structured JSON)

3. Engine processes ObserverOutput
   ├─ 3a. Write engagement signals on sent message
   │   ├─ db.signal("replied", msg_id, 1.0, now)              → WAL + hot tier
   │   ├─ db.signal("replied_fast", msg_id, 1.0, now)         → WAL + hot tier
   │   ├─ db.signal("replied_substantively", msg_id, 0.85, now)
   │   └─ db.signal("positive_sentiment", msg_id, 0.75, now)
   │
   ├─ 3b. Write topic signals
   │   ├─ db.signal("topic_engaged", topic_id, 1.0, now)
   │   └─ db.signal("initiated", topic_id, 1.5, now)          [if person-initiated]
   │
   ├─ 3c. Update person metadata
   │   └─ db.write_user_metadata(person_id, updated_fields)    [style cues, timing]
   │
   ├─ 3d. Session signal (within active session)
   │   └─ db.session_signal(&handle, ...)                      [scoped, not yet global]
   │
   └─ 3e. Cohort propagation (automatic at signal-write time)
       └─ For each matching cohort: cohort_ledger.record(...)

4. [Every 5 turns] Observer generates observations
   ├─ Stored as Item entities with embeddings
   └─ confidence signal at 1.0, 30d half-life

5. Briefing engine queries tidalDB (read-only, < 10ms)
   ├─ Signal reads: decay scores, windowed counts, velocity
   ├─ ANN retrieval: preference-aligned past messages
   ├─ Search: relevant observations for current context
   ├─ Cohort queries: priors for sparse dimensions
   └─ Assembles Brief struct

6. Generator LM call
   ├─ Input: brief (as system prompt) + conversation history
   └─ Output: next message

7. Store generated message as Item
   ├─ db.write_item_with_metadata(msg_id, metadata)
   ├─ db.write_item_embedding(msg_id, embedding)
   └─ Message is now a target for future signals

8. Send message to person → loop continues

Latency budget:

Step Target Notes
Observer LM call < 500ms Small model, structured output
Signal writes (68 signals) < 1ms total tidalDB hot path, < 100µs each
Metadata update < 200µs Single fjall write
Briefing query < 10ms Signal reads + ANN + search
Generator LM call 500ms2s Full model, depends on length
Message storage < 500µs Metadata + embedding write
Total loop < 3s Dominated by LM calls

The tidalDB operations are negligible. The latency floor is the LM inference time.

Performance Targets

Operation Target
Signal write (single, including WAL) < 100µs
Brief assembly (all queries) < 10ms
Observation retrieval (semantic search) < 5ms
Preference vector ANN query (10K messages) < 3ms
Full loop excluding LM calls < 15ms
Observer LM call < 500ms
Generator LM call < 2s
End-to-end response latency < 3s

Key Architectural Decisions

Decision Choice Why
Observer as separate LM call Small/fast model, structured output Decouples observation quality from generation quality. Testable independently. Cheap per-call.
Messages as tidalDB Items Reuse entity model, no schema extension Messages get embeddings, signals, metadata, ANN retrieval for free.
Observations as Items (not metadata) Semantic retrieval via search pipeline Observations are retrievable by relevance to current context, not just by person. Decay handles staleness.
Engine has no LM dependency Pure Rust, structured IO Fully testable without mocking LM. Server owns all external calls.
Session-scoped signals Promote to global on close Prevents single bad conversation from poisoning the model. Batched preference update.
Asymmetric decay (negative < positive) 3d negative vs. 714d positive Forgiving by default. Bad days fade fast. Good patterns persist.
Cohort priors fade with interaction count Weight = 1 / (1 + individual_count / 10) Bootstraps cold start, gets out of the way once individual data exists.
384D embeddings Sentence-transformer class Good quality/cost ratio. Same model for messages and observations enables cross-type search.
Brief as JSON, not prompt text Structured, inspectable, testable Can validate brief contents without running the generator. Can swap LM providers without changing the brief format.
Periodic observation generation Every 5 turns + session close Not every turn (too noisy, too expensive). Not only session close (too infrequent for long conversations).