jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs

Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-24 13:17:19 -07:00

25 KiB

Raw Blame History

iknowyou — Architecture

Core Thesis

Communication personalization is a signal processing problem. Every exchange between the system and a person produces observable signals — engagement, sentiment, timing, style — that decay over time and compound across conversations. tidalDB's signal ledger, preference vectors, windowed aggregation, and cohort system provide the learning substrate. iknowyou wraps these primitives with an observation pipeline (LM-as-classifier), a briefing engine (query-to-profile), and a generation interface (brief-to-prompt).

The system has no training loop, no batch pipeline, no feature store. Learning is continuous: signals are written on every exchange, preference vectors update via EMA, and the next query reflects the latest state. The entire closed loop executes within a single process.

Domain Model

Entities

Entity	tidalDB Kind	What it represents
Person	`User`	An individual the system communicates with. Has metadata (timezone, role, context), a preference vector (learned from message engagement), a signal ledger, cohort memberships, and a user-state index (conversation history).
Message	`Item`	A message the system generated and sent. Has metadata (topic, tone, length, structure, time_sent, conversation_id), an embedding (from the message content), and signals written against it based on the person's response.
Observation	`Item`	A natural-language statement about a person's communication pattern. Has an embedding (for semantic retrieval), a `confidence` signal (decays over time), and metadata (person_id, category, source_conversation).

Messages and observations are both Item entities but are distinguished by a kind metadata field: "message" or "observation". This reuses tidalDB's existing entity model without extension.

Schema Primitives

Primitive	Configuration	Purpose
Signals	10 signal types (see below)	Capture engagement, sentiment, topic, timing dimensions
Decay	Exponential, per-signal half-life	Recent interactions matter more; old patterns fade
Windows	1h, 24h, 7d, 30d, AllTime	Temporal aggregation for time-of-day patterns
Velocity	On engagement signals	Distinguish "always liked X" from "suddenly interested in X"
Preference vectors	384D, EMA with adaptive rate	Communication style convergence per-person
Cohorts	Predicate-based, per-cohort ledger	Cold-start priors, cross-pollination, drift detection

Signal Schema

Engagement Signals (on Message items)

Signal	Half-life	Windows	Velocity	Weight semantics
`replied`	7d	1h, 24h, 7d, AllTime	yes	1.0 = responded at all
`replied_fast`	3d	1h, 24h, 7d	yes	1.0 = latency < 120s
`replied_substantively`	7d	24h, 7d, AllTime	yes	0.0–1.0 normalized by word count / depth
`positive_sentiment`	14d	24h, 7d, 30d, AllTime	no	0.0–1.0 from observer sentiment score
`negative_sentiment`	3d	24h, 7d	no	0.0–1.0 from observer sentiment score
`went_silent`	1d	24h, 7d	no	1.0 = no response after timeout

Topic Signals (on topic-cluster items or Message items)

Signal	Half-life	Windows	Velocity	Weight semantics
`topic_engaged`	14d	7d, 30d, AllTime	yes	1.0 = stayed on or deepened topic
`topic_dropped`	3d	7d	no	1.0 = redirected or went brief
`initiated`	30d	30d, AllTime	no	1.5 = they brought this up unprompted

Meta Signals (on Observation items)

Signal	Half-life	Windows	Velocity	Weight semantics
`confidence`	30d	AllTime	no	1.0 at creation; decays unless reinforced

Design Rationale

Asymmetric decay: Negative signals (3d) decay 2–5x faster than positive signals (7–14d). The system is forgiving by default. Bad days don't poison the model.
initiated is the strongest signal: When someone raises a topic unprompted, that's stronger evidence of interest than responding to a topic you raised. Weight 1.5, half-life 30d.
went_silent is gentle: 1-day half-life. Silence might mean they're busy, not that the message was wrong. But it's still a signal — if silence correlates with a pattern (late-night messages, formal tone), the preference vector will drift away from that pattern.
Velocity on engagement signals: Velocity separates stable preferences from emerging ones. If topic_engaged velocity spikes on "replication" this week, the brief surfaces it as a rising interest — even if AllTime count is low.

Module Structure

applications/iknowyou/
├── engine/                          ← Core library (no network, no LM calls)
│   └── src/
│       ├── lib.rs                   ← IkyEngine: wraps TidalDb
│       ├── schema.rs                ← Signal schema + cohort definitions
│       ├── observer.rs              ← ObserverOutput: structured extraction type
│       ├── briefing.rs              ← Brief: queries tidalDB, assembles profile
│       ├── signals.rs               ← Signal writing: observation → tidalDB signals
│       ├── observations.rs          ← Observation lifecycle: write, retrieve, decay
│       └── cohorts.rs               ← Cohort definitions + cold-start logic
│
├── server/                          ← HTTP API + LM integration
│   └── src/
│       ├── main.rs                  ← Axum server, startup, shutdown
│       ├── handlers.rs              ← /message, /observe, /brief, /feedback
│       ├── llm.rs                   ← LM client: observer calls + generation calls
│       └── loop.rs                  ← Orchestrator: observe → learn → brief → generate
│
├── vision.md                        ← Product vision
└── architecture.md                  ← This document

Dependency Flow

server (Axum, LM client)
  │
  ├──→ engine (pure Rust, no IO except tidalDB)
  │      │
  │      └──→ tidalDB (embedded, same process)
  │
  └──→ LM API (HTTP, external)

The engine crate has zero network dependencies. It takes structured ObserverOutput and returns structured Brief. The server crate handles LM API calls and HTTP. This separation means the engine is fully testable without mocking LM calls.

The Closed Loop — Detailed

Phase 1: Observe

When a person responds to a message (or doesn't respond within the timeout window), the server calls the observer LM with the conversation context and the person's message.

Observer input:

System message sent: "Have you looked at what happens when segment count exceeds L0?"
Person replied: "yeah good call - the compaction pass is actually the bottleneck,
                 not the segment count itself. been profiling it all morning"
Time since system message: 47 seconds
Conversation turn: 4

Observer output (structured JSON, single inference):

{
  "engagement": {
    "replied": true,
    "latency_seconds": 47,
    "substantive": true,
    "word_count": 22,
    "sentiment_score": 0.75,
    "sentiment_direction": "positive"
  },
  "style": {
    "formality": 0.2,
    "uses_lowercase": true,
    "uses_jargon": true,
    "structure": "stream_of_thought",
    "emoji": false
  },
  "topic": {
    "primary": "compaction_profiling",
    "domain": "database_internals",
    "specificity": "high",
    "continued_from_previous": true,
    "deepened": true
  },
  "dynamics": {
    "redirected": true,
    "redirect_direction": "more_specific",
    "who_is_leading": "person",
    "built_on_previous": true,
    "corrected_system": true
  }
}

The observer is a small, fast model (Haiku-class). It doesn't need to be creative — it needs to reliably extract structure. Latency target: < 500ms. Cost per call: negligible.

Phase 2: Learn

The engine receives ObserverOutput and writes signals to tidalDB. This is a pure function: structured input → signal writes. No LM call.

Signal writes for this exchange:

// Engagement signals on the sent message
db.signal("replied",                 msg_entity_id, 1.0,  now)?;
db.signal("replied_fast",           msg_entity_id, 1.0,  now)?;  // 47s < 120s
db.signal("replied_substantively",  msg_entity_id, 0.85, now)?;  // normalized
db.signal("positive_sentiment",     msg_entity_id, 0.75, now)?;

// Topic signals
db.signal("topic_engaged", topic_entity_id("compaction_profiling"), 1.0, now)?;
db.signal("topic_engaged", topic_entity_id("database_internals"),   1.0, now)?;

// No negative signals this exchange

Preference vector update: The sent message's embedding blends into the person's preference vector. The message was direct, technical, question-form — so the preference vector shifts toward that communication style. EMA adaptive rate: high early (person has few interactions), lower as history accumulates.

Observation generation (periodic, not every turn): Every N turns or on session close, the observer produces natural-language observations:

"Jordan corrects the system's framing and steers toward more specific
 technical problems — prefers to lead the conversation direction"

"Jordan responds fastest to direct technical questions (median 45s)
 vs. status-check questions (median 4m)"

These are stored as Item entities with embeddings, kind: "observation", and a confidence signal at weight 1.0. The confidence decays with a 30-day half-life. If the same pattern is observed again, confidence is reinforced.

Cohort propagation: If the person matches the developers cohort (via role == "engineer" predicate), these signals also write to the cohort's signal ledger. Aggregate effect: the developers cohort accumulates evidence that direct technical questions produce fast, substantive, positive replies.

Phase 3: Brief

Before generating the next message, the engine queries tidalDB and assembles a communication brief. This is a read-only operation — no writes, no LM calls.

Brief structure:

{
  "person": {
    "id": "jordan",
    "metadata": { "timezone": "America/Los_Angeles", "role": "engineer" },
    "interaction_count": 47,
    "first_interaction": "2026-01-15T09:00:00Z"
  },

  "topics": {
    "hot": [
      { "topic": "compaction_profiling", "velocity": "rising",  "alltime": 12 },
      { "topic": "wal_recovery",         "velocity": "stable",  "alltime": 28 },
      { "topic": "replication",          "velocity": "rising",  "alltime": 3  }
    ],
    "cold": [
      { "topic": "documentation", "last_engaged": "2026-01-20", "sentiment": "negative" }
    ],
    "initiated_by_person": ["compaction_profiling", "rust_performance"]
  },

  "style": {
    "formality": { "current": 0.2, "trend": "stable" },
    "preferred_length": "medium",
    "preferred_structure": "conversational",
    "responds_to_questions": true,
    "prefers_to_lead": true,
    "jargon_comfortable": true,
    "emoji_usage": "none"
  },

  "timing": {
    "most_active_hours": [9, 10, 11, 21, 22],
    "fastest_reply_hours": [21, 22],
    "goes_silent_after": 23,
    "current_hour": 21,
    "day_of_week": "tuesday",
    "in_active_window": true
  },

  "what_works": {
    "high_engagement_patterns": [
      "direct technical questions about specific subsystems",
      "building on their correction or redirection",
      "short messages that open a thread, not close one"
    ],
    "recent_positive_messages": [
      { "summary": "Asked about L0 threshold during compaction", "sentiment": 0.75 },
      { "summary": "Shared profiling approach for signal write path", "sentiment": 0.82 }
    ]
  },

  "what_doesnt_work": {
    "low_engagement_patterns": [
      "status-update style messages",
      "long explanations without questions",
      "messages after 11pm Pacific"
    ]
  },

  "observations": [
    "Jordan corrects framing and steers toward specifics — prefers to lead",
    "Jordan's replies get shorter after 10pm — engagement drops",
    "Jordan uses 'yeah' as opener when genuinely engaged, 'sure' when not"
  ],

  "cohort_priors": {
    "developers": {
      "preferred_tone": "direct",
      "preferred_depth": "technical",
      "avg_engagement_length": "medium"
    }
  }
}

How the brief is assembled:

Brief section	tidalDB query	Primitive used
`topics.hot`	`read_decay_score` + `read_velocity` on topic items	Signal decay, velocity
`topics.cold`	Topic items with low AllTime count + negative sentiment	Windowed aggregation
`topics.initiated_by_person`	Items with `initiated` signal > threshold	Signal decay
`style.*`	Person metadata + observer-written style fields	Entity metadata
`timing.*`	`read_windowed_count("replied", Window::OneHour)` across 24 hour buckets	Windowed aggregation
`what_works`	`retrieve()` with person's preference vector, filtered to high-sentiment messages	ANN + preference vector
`what_doesnt_work`	Messages with `went_silent` or `negative_sentiment` signals	Signal decay
`observations`	`search()` with current conversation context as query, filtered to `kind: "observation"`	BM25 + ANN semantic retrieval
`cohort_priors`	Cohort ledger queries for person's matching cohorts	Cohort signal ledger

Phase 4: Generate

The brief is injected into the LM's system prompt. The LM generates the next message. The engine stores the generated message as a new Item entity with metadata and embedding.

[system]
You are communicating with Jordan. Here is what we know about how
Jordan communicates:

{brief as structured text}

Guidelines derived from this profile:
- Be direct and technical. Ask specific questions.
- Let Jordan lead the conversation direction — build on their framing.
- Keep messages medium length. Conversational, not structured.
- This is an active window (9pm Tuesday) — Jordan is typically responsive now.
- Current hot topic with rising velocity: compaction profiling.
- Avoid: status updates, long explanations, messages after 11pm.

The LM never touches tidalDB. It reads the brief, generates a message, and the loop continues.

Observation Lifecycle

Observations are the bridge between raw signals and human-legible learning. They capture patterns that numbers alone can't express: "uses 'yeah' when engaged, 'sure' when not."

Creation

Observations are generated by the observer LM periodically:

Every 5 conversation turns
On session close
When the observer detects a novel pattern (contradiction with existing observations, or new behavioral signal)

Each observation is:

Embedded (384D, same model as messages)
Stored as an Item with kind: "observation", person_id, category (style, topic, timing, dynamics)
Given a confidence signal at weight 1.0

Retrieval

Before briefing, the engine runs db.search() with the current conversation context as the query text, filtered to kind: "observation" and the target person. BM25 matches on keywords; ANN matches on semantic similarity. RRF fusion ranks by relevance.

Top-5 observations are included in the brief.

Decay and Reinforcement

The confidence signal has a 30-day half-life. An observation created 60 days ago has ~25% of its original weight. If the same pattern is observed again, a new confidence signal is written — reinforcing the observation back toward full weight.

Observations that are never reinforced fade below a retrieval threshold and are effectively forgotten. No garbage collection needed — decay handles it.

Contradiction Resolution

When the observer generates an observation that contradicts an existing one (e.g., "Jordan now prefers formal tone" vs. existing "Jordan prefers casual tone"), the new observation is stored alongside the old one. The old observation's confidence is decaying; the new one starts at 1.0. Within a few weeks, the old observation falls below retrieval threshold naturally.

No explicit deletion. No conflict resolution logic. Decay handles contradiction.

Cohort Architecture

Definition

Cohorts are defined at schema time in engine/src/cohorts.rs:

registry.define("developers", Predicate::Eq {
    field: "role".into(),
    value: "engineer".into(),
});

registry.define("us_pacific", Predicate::Eq {
    field: "timezone".into(),
    value: "America/Los_Angeles".into(),
});

registry.define("high_engagement", Predicate::Range {
    field: "interaction_count".into(),
    min: "20".into(),
    max: None,
});

Cold-Start Flow

New person arrives
  → Match against cohort predicates (metadata-based)
    → For each matching cohort:
        Query cohort signal ledger for aggregate patterns
    → Merge cohort priors into brief (weighted by cohort size / confidence)
      → LM generates first message using cohort-derived style
        → Person responds
          → Individual signals begin overriding cohort priors

The weight of cohort priors in the brief decreases as individual interaction count grows. By ~10 interactions, individual signals dominate. By ~30, cohort priors are negligible unless individual data is sparse on a specific dimension.

Cohort Learning

Cohort signal ledgers learn from all members simultaneously. When Jordan (a developers cohort member) responds positively to a direct technical question, that signal writes to both Jordan's personal ledger and the developers cohort ledger.

This means: the more people the system talks to, the better its cold-start priors become — without any explicit aggregation step. tidalDB's cohort signal propagation handles it at write time.

Conversation (Session) Mechanics

Each conversation is a tidalDB session:

let handle = db.start_session(person_id, agent_id, "iknowyou_default", metadata)?;

// During conversation:
db.session_signal(&handle, "replied", msg_id, 1.0, now)?;
// ...more signals per exchange...

// On conversation end:
let summary = db.close_session(handle)?;
// → Triggers preference vector update (EMA blend of engaged message embeddings)
// → Triggers observation generation (periodic analysis)
// → Session signals aggregate into global ledger

Session-scoped vs. global signals: Within a session, signals are scoped — they don't affect the global ledger until session close. This prevents a single bad conversation from immediately poisoning the model. Session close triggers the EMA preference update and promotes signals to global state.

Long conversations: For ongoing conversations (e.g., a persistent chat channel), sessions can be rotated on a timer — close and immediately reopen every 30 minutes. This provides regular preference updates without waiting for an explicit "conversation end."

Embedding Strategy

Message Embeddings (384D)

Generated from message text using a sentence-transformer model (external to iknowyou). The embedding captures semantic content + style in a single vector.

Messages with similar communication style (casual + technical + question) cluster in the embedding space. The person's preference vector — evolved through EMA blending of positively-received message embeddings — converges on the region of embedding space that represents "how this person likes to be communicated with."

Observation Embeddings (384D, same model)

Observations are embedded with the same model. This means semantic search over observations uses the same distance metric as message retrieval. "Jordan prefers direct questions" is retrievable both by keyword ("direct questions") and by semantic similarity to a conversation about asking direct questions.

Preference Vector Evolution

Initial:     null (cold start, use cohort priors)
After 1 msg: preference = message_embedding (first positive response)
After N:     preference = (1 - alpha) * preference + alpha * new_message_embedding
             where alpha = base_alpha / (1 + ln(update_count + 1))
             base_alpha = 0.15

The adaptive learning rate means:

Interaction 1: alpha ≈ 0.15 (strong influence)
Interaction 5: alpha ≈ 0.08 (moderate)
Interaction 20: alpha ≈ 0.04 (refinement)
Interaction 100: alpha ≈ 0.03 (stable, slow drift)

Write Path — Full Trace

A person sends a reply. Here is everything that happens:

1. Server receives person's message
   └─ HTTP handler in server/handlers.rs

2. Observer LM call (async, < 500ms)
   ├─ Input: conversation context + person's message
   └─ Output: ObserverOutput (structured JSON)

3. Engine processes ObserverOutput
   ├─ 3a. Write engagement signals on sent message
   │   ├─ db.signal("replied", msg_id, 1.0, now)              → WAL + hot tier
   │   ├─ db.signal("replied_fast", msg_id, 1.0, now)         → WAL + hot tier
   │   ├─ db.signal("replied_substantively", msg_id, 0.85, now)
   │   └─ db.signal("positive_sentiment", msg_id, 0.75, now)
   │
   ├─ 3b. Write topic signals
   │   ├─ db.signal("topic_engaged", topic_id, 1.0, now)
   │   └─ db.signal("initiated", topic_id, 1.5, now)          [if person-initiated]
   │
   ├─ 3c. Update person metadata
   │   └─ db.write_user_metadata(person_id, updated_fields)    [style cues, timing]
   │
   ├─ 3d. Session signal (within active session)
   │   └─ db.session_signal(&handle, ...)                      [scoped, not yet global]
   │
   └─ 3e. Cohort propagation (automatic at signal-write time)
       └─ For each matching cohort: cohort_ledger.record(...)

4. [Every 5 turns] Observer generates observations
   ├─ Stored as Item entities with embeddings
   └─ confidence signal at 1.0, 30d half-life

5. Briefing engine queries tidalDB (read-only, < 10ms)
   ├─ Signal reads: decay scores, windowed counts, velocity
   ├─ ANN retrieval: preference-aligned past messages
   ├─ Search: relevant observations for current context
   ├─ Cohort queries: priors for sparse dimensions
   └─ Assembles Brief struct

6. Generator LM call
   ├─ Input: brief (as system prompt) + conversation history
   └─ Output: next message

7. Store generated message as Item
   ├─ db.write_item_with_metadata(msg_id, metadata)
   ├─ db.write_item_embedding(msg_id, embedding)
   └─ Message is now a target for future signals

8. Send message to person → loop continues

Latency budget:

Step	Target	Notes
Observer LM call	< 500ms	Small model, structured output
Signal writes (6–8 signals)	< 1ms total	tidalDB hot path, < 100µs each
Metadata update	< 200µs	Single fjall write
Briefing query	< 10ms	Signal reads + ANN + search
Generator LM call	500ms–2s	Full model, depends on length
Message storage	< 500µs	Metadata + embedding write
Total loop	< 3s	Dominated by LM calls

The tidalDB operations are negligible. The latency floor is the LM inference time.

Performance Targets

Operation	Target
Signal write (single, including WAL)	< 100µs
Brief assembly (all queries)	< 10ms
Observation retrieval (semantic search)	< 5ms
Preference vector ANN query (10K messages)	< 3ms
Full loop excluding LM calls	< 15ms
Observer LM call	< 500ms
Generator LM call	< 2s
End-to-end response latency	< 3s

Key Architectural Decisions

Decision	Choice	Why
Observer as separate LM call	Small/fast model, structured output	Decouples observation quality from generation quality. Testable independently. Cheap per-call.
Messages as tidalDB Items	Reuse entity model, no schema extension	Messages get embeddings, signals, metadata, ANN retrieval for free.
Observations as Items (not metadata)	Semantic retrieval via search pipeline	Observations are retrievable by relevance to current context, not just by person. Decay handles staleness.
Engine has no LM dependency	Pure Rust, structured IO	Fully testable without mocking LM. Server owns all external calls.
Session-scoped signals	Promote to global on close	Prevents single bad conversation from poisoning the model. Batched preference update.
Asymmetric decay (negative < positive)	3d negative vs. 7–14d positive	Forgiving by default. Bad days fade fast. Good patterns persist.
Cohort priors fade with interaction count	Weight = 1 / (1 + individual_count / 10)	Bootstraps cold start, gets out of the way once individual data exists.
384D embeddings	Sentence-transformer class	Good quality/cost ratio. Same model for messages and observations enables cross-type search.
Brief as JSON, not prompt text	Structured, inspectable, testable	Can validate brief contents without running the generator. Can swap LM providers without changing the brief format.
Periodic observation generation	Every 5 turns + session close	Not every turn (too noisy, too expensive). Not only session close (too infrequent for long conversations).

25 KiB Raw Blame History Unescape Escape