- applications/iknowyou: new Next.js chat application with persona-aware conversations, briefing API, cohort logic, vLLM streaming, and sidebar navigation - tidal M8: add replication control plane (control.rs), tenant migration state machine (migration.rs), tenant/upgrade coordinators, cluster/fault test harnesses - tidal M8 tests: expand m8p2/m8p3/m8p4 test suites; add m8p5_multitenancy and m8_uat - tidal db: split replication_ops out of db/mod.rs (was 647 lines, now 574) - .claude: add kai-park, kaya-osei, mira-vasquez agents; add aeries-design-architect, aeries-fullstack-engineer, aeries-product-visionary skills - docs: update ROADMAP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.8 KiB
4.8 KiB
| name | description |
|---|---|
| aeries-fullstack-engineer | Build the Aeries chat application — frontend, API, vLLM streaming, observation pipeline, tidalDB integration |
aeries-fullstack-engineer
When to Use
- Building or modifying the Aeries Next.js application
- Implementing chat streaming from vLLM
- Wiring up the observation pipeline (observer LM call → signal writes)
- Integrating tidalDB's iknowyou engine
- Fixing bugs in the chat flow, API routes, or real-time UI
Invoked via: /aeries-fullstack-engineer
Delegation
This skill delegates to @kai-park — the Aeries full-stack engineer. All implementation, API design, streaming infrastructure, and tidalDB integration go through his lens.
For design decisions (colors, spacing, component visual specs), defer to @kaya-osei via /aeries-design-architect.
For product decisions (what to build, what to defer, personality), defer to @mira-vasquez via /aeries-product-visionary.
Step Back
Before implementing, ask:
- Is the vLLM server healthy?
curl http://msd5685.mjhst.com:8000/health— if it's down, nothing else matters. - Does this block the response stream? Anything that adds latency to the user seeing tokens is wrong. Observation, signal writes, preference updates — all async, all after the stream closes.
- Am I over-engineering the MVP? The first version needs: send message → stream response → store conversation. Not: authentication, multi-user, observation pipeline, preference vectors.
- Does the server own the truth? The client sends
{ message, conversationId }. Everything else — history, brief, observation — lives server-side. - What happens when vLLM is slow or down? Every external call needs a timeout and a graceful fallback. Never show a stack trace in the UI.
Workflow
Phase 1: Context
- Read
applications/iknowyou/devsetup.mdfor infrastructure details - Read
applications/iknowyou/architecture.mdfor system design - Check vLLM health:
curl http://msd5685.mjhst.com:8000/v1/models - Review existing code in
applications/iknowyou/
Phase 2: Plan
- Identify which layer the work touches (frontend, API, vLLM client, observer, tidalDB)
- Check dependencies between layers
- Determine if design input is needed (delegate to
/aeries-design-architect) - Determine if product input is needed (delegate to
/aeries-product-visionary)
Phase 3: Implement
- Write types first (
lib/types.ts) - Build from the API route outward (server → client)
- Test streaming with
curlbefore building UI - Use
console.logtimestamps to verify streaming latency
Phase 4: Verify
- Test the full flow: type message → see streaming response → verify storage
- Check browser DevTools Network tab for SSE stream behavior
- Verify error handling (kill vLLM, send a message, see graceful error)
- Run through Done Gate checklist
Quick Reference
| Path | Purpose |
|---|---|
applications/iknowyou/app/ |
Next.js app directory (routes, layouts) |
applications/iknowyou/app/api/chat/route.ts |
Chat streaming API endpoint |
applications/iknowyou/components/chat/ |
Chat UI components |
applications/iknowyou/lib/vllm.ts |
vLLM client (streaming) |
applications/iknowyou/lib/types.ts |
Shared TypeScript types |
applications/iknowyou/server/observer.ts |
Observer pipeline |
applications/iknowyou/server/brief.ts |
Brief assembly |
applications/iknowyou/devsetup.md |
vLLM server details, API examples |
applications/iknowyou/architecture.md |
System architecture |
.claude/agents/kai-park.md |
Engineer agent — stack, patterns, constraints |
Infrastructure Quick Reference
| Resource | Location |
|---|---|
| vLLM API | http://msd5685.mjhst.com:8000/v1 |
| Model | Qwen/Qwen3-8B |
| SSH | ssh ubuntu@msd5685.mjhst.com |
| vLLM logs | sudo journalctl -u vllm -f (on server) |
| vLLM restart | sudo systemctl restart vllm (on server) |
| GPU check | nvidia-smi (on server) |
| Dev server port | 59521 (following tidalDB port range 59520-59529) |
Standards
- All API responses are typed (no
any) - Streaming uses
ReadableStream+ SSE (not WebSocket) - Observer runs async after response stream completes
- Client sends
{ message, conversationId }— server owns history - Error states show human-readable messages, never stack traces
- vLLM calls include timeout (10s for health, 30s for completion)
Done Gate
- Full flow works: type → stream → display → store
- First token appears within 500ms of send
- Streaming text renders without flicker or reflow
- vLLM-down case shows graceful error message
- Conversation history persists across page reloads
- Types are complete — no
anyin the chain - API route returns proper SSE headers
- No observation logic in the response critical path