jordan 98bdc18a49 feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills

- applications/iknowyou: new Next.js chat application with persona-aware conversations,
  briefing API, cohort logic, vLLM streaming, and sidebar navigation
- tidal M8: add replication control plane (control.rs), tenant migration state machine
  (migration.rs), tenant/upgrade coordinators, cluster/fault test harnesses
- tidal M8 tests: expand m8p2/m8p3/m8p4 test suites; add m8p5_multitenancy and m8_uat
- tidal db: split replication_ops out of db/mod.rs (was 647 lines, now 574)
- .claude: add kai-park, kaya-osei, mira-vasquez agents; add aeries-design-architect,
  aeries-fullstack-engineer, aeries-product-visionary skills
- docs: update ROADMAP.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-24 21:09:11 -07:00

4.8 KiB

Raw Blame History

name	description
aeries-fullstack-engineer	Build the Aeries chat application — frontend, API, vLLM streaming, observation pipeline, tidalDB integration

aeries-fullstack-engineer

When to Use

Building or modifying the Aeries Next.js application
Implementing chat streaming from vLLM
Wiring up the observation pipeline (observer LM call → signal writes)
Integrating tidalDB's iknowyou engine
Fixing bugs in the chat flow, API routes, or real-time UI

Invoked via: /aeries-fullstack-engineer

Delegation

This skill delegates to @kai-park — the Aeries full-stack engineer. All implementation, API design, streaming infrastructure, and tidalDB integration go through his lens.

For design decisions (colors, spacing, component visual specs), defer to @kaya-osei via /aeries-design-architect.

For product decisions (what to build, what to defer, personality), defer to @mira-vasquez via /aeries-product-visionary.

Step Back

Before implementing, ask:

Is the vLLM server healthy? curl http://msd5685.mjhst.com:8000/health — if it's down, nothing else matters.
Does this block the response stream? Anything that adds latency to the user seeing tokens is wrong. Observation, signal writes, preference updates — all async, all after the stream closes.
Am I over-engineering the MVP? The first version needs: send message → stream response → store conversation. Not: authentication, multi-user, observation pipeline, preference vectors.
Does the server own the truth? The client sends { message, conversationId }. Everything else — history, brief, observation — lives server-side.
What happens when vLLM is slow or down? Every external call needs a timeout and a graceful fallback. Never show a stack trace in the UI.

Workflow

Phase 1: Context

Read applications/iknowyou/devsetup.md for infrastructure details
Read applications/iknowyou/architecture.md for system design
Check vLLM health: curl http://msd5685.mjhst.com:8000/v1/models
Review existing code in applications/iknowyou/

Phase 2: Plan

Identify which layer the work touches (frontend, API, vLLM client, observer, tidalDB)
Check dependencies between layers
Determine if design input is needed (delegate to /aeries-design-architect)
Determine if product input is needed (delegate to /aeries-product-visionary)

Phase 3: Implement

Write types first (lib/types.ts)
Build from the API route outward (server → client)
Test streaming with curl before building UI
Use console.log timestamps to verify streaming latency

Phase 4: Verify

Test the full flow: type message → see streaming response → verify storage
Check browser DevTools Network tab for SSE stream behavior
Verify error handling (kill vLLM, send a message, see graceful error)
Run through Done Gate checklist

Quick Reference

Path	Purpose
`applications/iknowyou/app/`	Next.js app directory (routes, layouts)
`applications/iknowyou/app/api/chat/route.ts`	Chat streaming API endpoint
`applications/iknowyou/components/chat/`	Chat UI components
`applications/iknowyou/lib/vllm.ts`	vLLM client (streaming)
`applications/iknowyou/lib/types.ts`	Shared TypeScript types
`applications/iknowyou/server/observer.ts`	Observer pipeline
`applications/iknowyou/server/brief.ts`	Brief assembly
`applications/iknowyou/devsetup.md`	vLLM server details, API examples
`applications/iknowyou/architecture.md`	System architecture
`.claude/agents/kai-park.md`	Engineer agent — stack, patterns, constraints

Infrastructure Quick Reference

Resource	Location
vLLM API	`http://msd5685.mjhst.com:8000/v1`
Model	`Qwen/Qwen3-8B`
SSH	`ssh ubuntu@msd5685.mjhst.com`
vLLM logs	`sudo journalctl -u vllm -f` (on server)
vLLM restart	`sudo systemctl restart vllm` (on server)
GPU check	`nvidia-smi` (on server)
Dev server port	59521 (following tidalDB port range 59520-59529)

Standards

All API responses are typed (no any)
Streaming uses ReadableStream + SSE (not WebSocket)
Observer runs async after response stream completes
Client sends { message, conversationId } — server owns history
Error states show human-readable messages, never stack traces
vLLM calls include timeout (10s for health, 30s for completion)

Done Gate

Full flow works: type → stream → display → store
First token appears within 500ms of send
Streaming text renders without flicker or reflow
vLLM-down case shows graceful error message
Conversation history persists across page reloads
Types are complete — no any in the chain
API route returns proper SSE headers
No observation logic in the response critical path

4.8 KiB Raw Blame History