stemedb/use-cases/agile-agent-team.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

8.8 KiB

Agile AI Agent Team: Knowledge Coordination

Tier: Production-Ready Pillars Used: First-Class Contradiction, Invalidation Cascades, Multi-Signature Consensus, Semantic Decay Postgres Test: FAILED - Lifecycle stages require application-level state machines; time-travel needs temporal tables with complex joins; query audit trails don't exist natively; epoch supersession requires recursive invalidation logic


The Catastrophe

I watched a production outage take down auth for 47 minutes because an AI agent deployed the wrong JWT configuration.

Here's what happened: Our team uses AI agents for development—a Lead Orchestrator coordinates specialists for research, implementation, and deployment. The deployment agent queried our knowledge base for "current JWT signing algorithm" and got "ES256."

It deployed with confidence. Tests passed. CI went green.

The auth service expected RS256. Every token validation failed. At 3am, the pager fired.

During the post-mortem, someone asked: "Why did the agent think ES256 was correct?"

Silence.

We dug through the knowledge base. Found an RFC from the security team proposing ES256 migration. Found Slack messages discussing it. Found a doc that said "we should use ES256" in future tense. The knowledge base had no distinction between "proposed" and "approved." The most recent entry was the RFC—a proposal, not a decision.

The agent queried, got the proposal, treated it as truth, deployed.

The failure mode: Traditional databases store information without lifecycle state. Proposals look like decisions. Discussions look like conclusions. When an AI agent queries "what is X?", it gets whatever is most recent—whether that's a decision, a debate, or a rejected idea.


The Team

An agile development team uses AI agents to coordinate work across auth migrations, feature flag rollouts, deployment configurations, and research:

Role Need
Lead Orchestrator Routes work, needs definitive current-state answers
Implementation Agent Writes code, needs approved patterns only
Research Agent Ingests docs, papers, discussions—often conflicting
Human Supervisor Reviews agent decisions, needs to trace reasoning
On-Call SRE Investigates incidents, needs time-travel debugging

What They Need from Episteme

1. Lifecycle Stage (Proposed vs. Approved)

The Problem: Research Agent ingests an RFC proposing ES256. Implementation Agent queries "JWT signing algorithm" and gets ES256—even though it was never approved.

The Solution: Lifecycle is a first-class field with lens enforcement:

# Query with lifecycle filter
GET /query?subject=auth/jwt&predicate=signing_algorithm
    &lens=authority
    &lifecycle=approved

-> Returns RS256 (approved decision)
-> Proposal for ES256 is excluded by lifecycle filter

Proposals and approvals coexist in the DAG but are distinguished structurally—not by convention that agents might forget.


2. Query Audit Trail

The Problem: At 3am, auth is broken. What did the deployment agent query? What result did it get? What assertions contributed?

The Solution: Every query is automatically logged with full provenance:

GET /audit/queries?agent=deployment-agent&from=-6h

-> Returns:
{
  "query_id": "q_7f3a2b...",
  "timestamp": "2024-01-15T21:03:47Z",
  "subject": "auth/jwt",
  "predicate": "signing_algorithm",
  "lifecycle_filter": null,  // PROBLEM: agent didn't filter!
  "result": { "value": "ES256", "confidence": 0.87 },
  "contributing_assertions": [
    { "hash": "rfc_2024_001...", "lifecycle": "Proposed", "weight": 0.9 }
  ]
}

The SRE immediately sees: agent didn't filter by lifecycle, proposal outweighed approved config.


3. Time-Travel Queries

The Problem: Production is stable after rollback. Now SRE needs to understand: what was the state of knowledge at 9pm when the agent made its decision?

The Solution: The Merkle DAG is inherently temporal:

# What did we believe at 9pm?
GET /query?subject=auth/jwt&predicate=signing_algorithm
    &as_of=2024-01-15T21:00:00Z

-> Returns ES256 (the state at that moment)
-> Shows which assertions existed then

Time-travel is O(log n) via hash lookups, not complex temporal table joins.


4. Paradigm Shifts (Epochs)

The Problem: Security team migrates from RS256 to ES256. This affects 47 related assertions. In Postgres, you'd need 47 UPDATEs or INSERT/deprecate pairs.

The Solution: Epochs enable O(1) supersession:

# Create new epoch
POST /epoch
{
  "name": "auth-es256-migration",
  "supersedes": "auth-rs256-era",
  "supersession_type": "Temporal",
  "effective_date": "2024-02-01T00:00:00Z"
}

# Queries automatically respect epoch boundaries
GET /query?subject=auth/jwt&predicate=signing_algorithm
-> Returns ES256 (from new epoch)

GET /query?subject=auth/jwt&predicate=signing_algorithm&epoch=auth-rs256-era
-> Returns RS256 (historical)

5. Expert Weighting (Authority Lens)

The Problem: Junior dev discovers Stack Overflow answer suggesting JWT rotation every 15 minutes. Senior security engineer says "That's for high-security contexts; our standard is daily."

The Solution: Multi-signature with domain-weighted reputation:

GET /query?subject=auth/jwt&predicate=rotation_interval
    &lens=authority
    &domain=security

-> Returns: 24h
-> security_lead has reputation 0.95 in security domain
-> junior_dev has reputation 0.4 in security domain

Signatures are cryptographic, immutable, and automatically weighted.


6. Persistent Learning (The Optimization Conflict)

The Problem: You correct an agent: "Don't use requests, use axios." Agent says "Got it!" Next week, new session—agent uses requests again. Repeat forever.

This is The Optimization Conflict: agents rely on context windows that drift. Your correction slides past; the agent reverts to base weights.

The Solution: Corrections become database writes that persist permanently:

# Day 1: Store correction with forbidden alternative
POST /assert
{
  "subject": "Project_X_Http_Client",
  "predicate": "must_use_library",
  "object": "axios",
  "meta": { "forbidden_alternative": "requests", "reason": "deprecated" },
  "confidence": 1.0
}

# Day 30: New session, agent checks constraints before coding
GET /query?context=python_http&lens=constraints
-> Returns: { must_use: "axios", forbidden: "requests" }

# Agent uses axios. Constraint honored across sessions.

The Gardener (background worker) also adjusts TrustRank—agents that make mistakes have reduced confidence on that topic.


The 5-Minute Demo

# Start server
cargo run --bin stemedb-server

# Insert PROPOSED pattern (RFC)
curl -X POST http://localhost:18180/assert -d '{
  "subject": "auth/jwt", "predicate": "signing_algorithm",
  "object": {"Text": "ES256"}, "lifecycle": "Proposed", "confidence": 0.75
}'

# Insert APPROVED pattern (production)
curl -X POST http://localhost:18180/assert -d '{
  "subject": "auth/jwt", "predicate": "signing_algorithm",
  "object": {"Text": "RS256"}, "lifecycle": "Approved", "confidence": 0.9
}'

# Query WITHOUT lifecycle filter (the bug!)
curl "http://localhost:18180/query?subject=auth/jwt&predicate=signing_algorithm&lens=recency"
# Returns ES256 (proposal, most recent)

# Query WITH lifecycle filter (the fix!)
curl "http://localhost:18180/query?subject=auth/jwt&predicate=signing_algorithm&lens=recency&lifecycle=approved"
# Returns RS256 (correct)

Summary: Why Episteme for Agent Teams?

Problem Traditional Approach Episteme Approach
Proposal vs. Approved Status column (unenforced) Lifecycle enum with lens enforcement
Query audit trail Application-level logging Built-in with provenance
Time-travel debugging Temporal tables + complex joins Native as_of parameter
Paradigm shift (RS256→ES256) O(n) updates O(1) epoch supersession
Expert vs. junior weighting Join tables with reputation Cryptographic signatures + Authority lens
Corrections forgotten System prompt drift Negative Constraints + Resurrection
Agents repeat mistakes No learning (stateless) TrustRank back-propagation

The 47-minute outage happened because an AI agent couldn't distinguish a proposal from an approved decision. Episteme ensures that distinction is structural—not a convention that agents might forget.


Further Reading