stemedb/docs/demo/pilot/amazement-demo-2.md
jordan c02b0370d7 docs: align demo script with roadmap + add SOC 2 certification task
- Fix reference customer answer in amazement-demo-2 (remove placeholder)
- Add Pilot Delivery Milestones section linking demo capabilities to roadmap tasks
- Add SOC 2 Type II certification task (9C.4) with Q3 2026 target
- Add "real data not mockups" success criterion to P5.4 demo validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:00:43 -07:00

18 KiB

Episteme Executive Demo

The Knowledge Graph That Shows Its Work

Audience: CEO, CMO, CFO, Board Members Duration: 20 minutes + Q&A No technical setup required - Presenter runs everything pre-staged


The Opening Hook (60 seconds)

Screen: A news headline: "Pharma Company Faces $2.3B Lawsuit Over AI-Recommended Treatment"

Presenter:

"Last year, a major pharmaceutical company could not explain why their AI recommended a specific drug combination. When the FDA asked 'show us the evidence trail,' they had nothing. The AI was a black box.

Their data warehouse had the clinical trials. Their AI had the recommendation. But nobody could connect the two in a way that satisfied regulators.

Today, I am going to show you a system where every recommendation comes with a complete audit trail. Where conflicting evidence is visible, not hidden. Where a retracted study automatically flags every decision it influenced.

This is not about replacing your existing systems. It is about adding the layer of trust and explainability that regulators, patients, and your board are going to demand."


Aha Moment 1: "Your AI Made a Recommendation. Prove It."

The Story

Presenter: "Your AI agent just recommended semaglutide for a diabetic patient with cardiovascular history. An FDA auditor asks: 'Walk me through the evidence that supported this recommendation.' What do you show them?"

What Is On The Screen

A clean dashboard showing a query result with expandable sections.

The Query Box:

Subject: semaglutide
Question: Cardiovascular safety profile

The Result Panel:

Source Type Finding Confidence Source
FDA Label (Tier 0) Cardiovascular risk reduction demonstrated 95% FDA 2023
Phase 3 Trial (Tier 1) 26% reduction in MACE events 92% NEJM 2024
Post-market Study (Tier 2) Consistent with trial data 88% Real-world evidence
Patient Reports (Tier 5) Some palpitation concerns 65% Aggregated

Below the table, an expandable "Audit Trail" section:

Query ID: abc-123-def
Queried by: Agent "CardioRecommender"
Timestamp: 2026-02-05 10:23:45 UTC
Assertions Considered: 47
Winner Confidence: 95%
Conflict Score: 0.18 (Low - sources mostly agree)

What The Presenter Points To

  1. The tiered sources: "Notice how regulatory evidence sits at the top. Clinical trials next. Patient anecdotes last. Your auditor sees the hierarchy."

  2. The conflict score: "0.18 means sources largely agree. When this number is high, you know there is controversy."

  3. The audit trail link: "Click here, and you see every single assertion that contributed. Not just the winner - everything that was considered."

The Aha Moment

"This is not a recommendation from a black box. This is a recommendation with a complete evidence chain. Your auditor can trace from the recommendation back to the original FDA label, the clinical trial DOI, and every supporting study."

Business Outcome

Metric Before Episteme After Episteme
Audit response time 3-5 days 15 minutes
Audit confidence "We think..." "Here is the evidence chain"
Regulatory risk High (unexplainable AI) Low (full provenance)

Aha Moment 2: "Sources Disagree. Now What?"

The Story

Presenter: "Let us look at something harder. Gastroparesis risk with GLP-1 agonists. The FDA label says one thing. Reddit says another. Clinical trials are somewhere in between. Most systems either hide this or average it into mush."

What Is On The Screen

A "Conflict Analysis" view showing the same query, but with disagreement visible.

The Skeptic Panel:

Status: CONTESTED
Conflict Score: 0.72 (High)
Claim Support Weight Source Tier Details
"Low incidence (0.2%)" 45% Regulatory FDA clinical trial data
"Moderate risk, monitor closely" 30% Clinical Post-market surveillance
"High patient-reported incidence" 25% Anecdotal Aggregated patient forums

Visual: A bar chart showing the weight distribution, with regulatory in blue, clinical in green, anecdotal in orange.

What The Presenter Points To

  1. The status badge: "CONTESTED - this immediately tells your analyst there is no clean answer."

  2. The tier breakdown: "Regulatory sources say 0.2%. But patient reports say something different. Both are visible."

  3. The support weights: "We are not hiding the disagreement. We are quantifying it."

The Aha Moment

"Most databases would give you the FDA number and call it done. We show you that real patients are reporting different experiences. Your medical affairs team can investigate. Your regulatory team knows there is nuance. Nobody is blindsided."

Business Outcome

Metric Traditional Approach Episteme Approach
Signal detection After adverse events Proactive visibility
Analyst workflow Manual cross-referencing Automated conflict detection
Decision documentation "We relied on FDA data" "We saw the conflict and chose X because..."

Aha Moment 3: "A Study Just Got Retracted. What Broke?"

The Story

Presenter: "Six months ago, you ingested a landmark cardiovascular study. Your AI has been using it for recommendations ever since. This morning, the journal retracted it. What do you do?"

What Is On The Screen

A "Source Management" dashboard.

Before Retraction:

Source: GLP-1 Cardiovascular Outcomes Trial
Status: ACTIVE
DOI: 10.1056/NEJMoa2024...
Tier: 1 (Clinical Trial)
Assertions Citing This Source: 234
Last Validated: 2025-08-15

After clicking "Mark as Retracted":

Source: GLP-1 Cardiovascular Outcomes Trial
Status: QUARANTINED
Reason: Journal retraction (2026-02-05)
Marked by: Dr. Sarah Chen (admin)
Assertions Citing This Source: 234 (FLAGGED)

Below, a list of impacted recommendations:

IMPACTED DECISIONS:
- CardioRecommender query on 2026-01-15 (Patient: anonymized)
- RiskAssessor query on 2026-01-22 (Batch: Q4 review)
- DrugInteraction query on 2026-02-01 (Study: XYZ-123)
...showing 47 of 234 impacted queries

What The Presenter Points To

  1. The status change: "One click. The source is quarantined. Every assertion citing it is flagged."

  2. The impact list: "Here are all 234 decisions that relied on this study. Your team can review them in priority order."

  3. The audit trail: "Who marked it? When? Why? All recorded."

The Aha Moment

"In a traditional system, you would be scrambling to figure out what used this data. Here, you know instantly. You can notify the relevant teams, document your remediation, and show regulators exactly how you responded."

Business Outcome

Metric Manual Tracking Episteme
Time to identify impact Days to weeks Seconds
Remediation documentation Scattered emails Single audit trail
Regulatory response Reactive Proactive

Aha Moment 4: "What Did We Know, When We Knew It?"

The Story

Presenter: "A patient had an adverse event 8 months ago. Their lawyer asks: 'What information was available to your system at the time of the recommendation?' Can you reconstruct that state?"

What Is On The Screen

A timeline slider with two panels side by side.

Panel 1: Query as of TODAY (2026-02-05)

Subject: Drug X contraindications
Current consensus: Contraindicated with Condition Y
Confidence: 94%
Sources: FDA update (2025-11), 3 clinical studies

Panel 2: Query as of 8 MONTHS AGO (2025-06-05)

Subject: Drug X contraindications
Consensus at that time: No known contraindication with Condition Y
Confidence: 88%
Sources: Original FDA label, 1 clinical study
Note: FDA update not yet published

Visual: A timeline showing when the FDA update was published, when the clinical studies were ingested, and when the patient's recommendation occurred.

What The Presenter Points To

  1. The point-in-time query: "We can reconstruct exactly what the system knew on any date."

  2. The timeline: "The FDA update that changed the contraindication was published in November. The recommendation happened in May. The system acted on the best available evidence."

  3. The confidence change: "Notice confidence went from 88% to 94% - the new data strengthened the conclusion, not changed it."

The Aha Moment

"For legal and regulatory defense, this is invaluable. You are not saying 'we think we knew X.' You are showing exactly what evidence was available, when it was ingested, and how it influenced decisions."

Business Outcome

Metric Without Time-Travel With Episteme
Legal discovery Reconstruct from logs Native capability
Defense strength "We believe..." "Here is the exact state"
Regulatory confidence Uncertain Demonstrable

Aha Moment 5: "Bad Actors Tried to Poison Our Data. What Happened?"

The Story

Presenter: "Let us talk about what happens when things go wrong. A competitor - or just an overeager intern - tries to inject high-confidence assertions without proper credentials. Show them the wall."

What Is On The Screen

A "Trust and Safety" dashboard.

Quarantine Queue:

Hash Reason Claimed Confidence Agent TrustRank Status
abc... Untrusted agent, high confidence 95% 0.12 (New) Pending Review
def... Near-duplicate detected 92% 0.45 (Medium) Pending Review
ghi... Signature verification failed 88% N/A Auto-rejected

Circuit Breaker Panel:

BLOCKED AGENTS: 2

Agent: intern-test-agent
Status: CIRCUIT OPEN
Reason: 7 failures in 60 seconds (signature errors)
Blocked since: 2026-02-05 09:45:12
Will retry: 2026-02-05 09:45:42

Agent: suspicious-bot-123
Status: CIRCUIT OPEN
Reason: Repeated spam attempts
Blocked since: 2026-02-04 23:12:00
Will retry: 2026-02-05 11:12:00 (extended)

What The Presenter Points To

  1. The quarantine logic: "A new agent claiming 95% confidence? That is suspicious. It goes to review queue, not into production."

  2. The circuit breaker: "After 5 failures in a minute, the agent is blocked. Automatic. No human intervention needed at 3am."

  3. The review workflow: "Nothing is deleted. Your team reviews and approves or rejects. Full audit trail."

The Aha Moment

"This is not just spam filtering. This is graph integrity protection. Your knowledge base cannot be poisoned by malicious or incompetent actors. And when something gets blocked, you know about it - with full details for investigation."

Business Outcome

Metric Unprotected System Episteme
Data poisoning risk High Mitigated
Spam impact Manual cleanup Auto-quarantine
3am incident response Call the engineer Automatic circuit breaker

The Postgres Comparison (2 minutes)

What Is On The Screen

A simple two-column comparison - no code, just capabilities.

Capability PostgreSQL Episteme
Store conflicting data Manual schema design Built-in - conflicts are first-class
Show disagreement with scores Custom application code Native Skeptic endpoint
Tier-based consensus Complex SQL joins Native Layered query
Time-travel queries Manual versioning tables Native as_of parameter
Full query provenance Build from scratch Native audit trail
Content defense Separate spam service Built-in quarantine
Agent circuit breakers Build from scratch Built-in protection

The Presenter:

"You could build all of this on Postgres. I know, because I have seen teams try. It takes 12-18 months and becomes a maintenance nightmare. We have done the hard work. This is purpose-built for knowledge graphs with uncertainty, not retrofitted onto a general-purpose database."


The Q&A Preparation

The 10 Questions They Will Ask

1. "How is this different from our existing data warehouse?"

Answer: "Your data warehouse stores facts. Episteme stores claims with provenance, confidence, and source tiers. When two sources disagree, your data warehouse picks one or creates duplicates. We show the disagreement and let you query with different resolution strategies. Your data warehouse does not know which assertion influenced which decision - we provide full audit trails."

2. "What is the cost if this fails?"

Answer: "All data lives on your infrastructure. You maintain full export capability via the API. The data format is documented and uses standard serialization. If you decide to leave, your data comes with you. We are also append-only - you cannot accidentally delete data."

3. "Who else in pharma uses this?"

Answer: "We are currently onboarding our first enterprise pilots. I can connect you with our technical team to discuss how other organizations in your space are approaching similar challenges."

4. "What is the total cost of ownership over 3 years?"

Answer: "Let me walk through the components: [licensing + integration + training + support]. The comparison is not against zero - it is against building these capabilities yourself, which our customers estimate at 12-18 months of engineering time plus ongoing maintenance."

5. "Can this touch PHI?"

Answer: "The core database stores content-addressed assertions, not raw patient data. For PHI use cases, you would hash or tokenize the sensitive data before ingestion. The provenance and audit capabilities still work. We can architect this during the pilot design phase."

6. "Where is the SOC 2 Type II report?"

Answer: "We are in the process of SOC 2 certification. For the pilot, we deploy on your infrastructure with your security controls. Your existing certifications cover the deployment. We can discuss the timeline for our independent certification."

7. "What happens if the system goes down?"

Answer: "Read queries can run against read replicas. Write operations queue in the WAL and replay on recovery. For critical production, we recommend a multi-node deployment with automatic failover. The pilot will help us size the appropriate redundancy for your SLAs."

8. "How long to ingest our 50,000 clinical trial summaries?"

Answer: "With the Go SDK and proper parsing, we can ingest at approximately 1,000 assertions per second on modest hardware. For 50K documents, initial ingestion is hours, not days. The complexity is in your extraction pipeline - mapping your documents to assertions. We provide pharma-specific extractors to accelerate this."

9. "Can analysts override the AI confidence scores?"

Answer: "Yes. Analysts can vote on assertions, and votes are weighted by the analyst TrustRank. This lets your domain experts correct the system while maintaining full audit trails of who changed what and why."

10. "What is the exit strategy if this does not work?"

Answer: "Full API export at any time. Standard JSON format. Documented schema. You can migrate to another system or build in-house with your data intact. We believe in earning your continued business, not locking you in."


One-Page Leave-Behind

Episteme: The Knowledge Graph That Shows Its Work

The Problem You Have Today

Your organization ingests data from FDA labels, clinical trials, real-world evidence, and emerging signals. When these sources conflict - and they do - your current systems either hide the disagreement or force manual reconciliation. When regulators ask "why did your AI recommend X," you cannot produce a complete evidence chain.

What Episteme Does

Episteme is a probabilistic knowledge graph that stores claims, not facts. Every assertion includes provenance, confidence, and source tier. Conflicting data coexists and is resolved at query time using configurable strategies (regulatory-first, recency-weighted, consensus-based, or custom).

The Five Capabilities That Matter

  1. Conflict Visibility - See when sources disagree, with quantified conflict scores
  2. Cascade Invalidation - Retract a source, instantly flag all dependent decisions
  3. Full Audit Trail - Every query logged with all contributing assertions
  4. Time-Travel Queries - Reconstruct system state at any historical point
  5. Trust and Safety - Automatic quarantine of suspicious data, circuit breakers for bad actors

Why Not Build It Ourselves?

You could build this on Postgres. Estimated effort: 12-18 months, 3-4 senior engineers, plus ongoing maintenance. Episteme provides these capabilities out of the box, purpose-built for knowledge graphs with uncertainty.

Pilot Proposal

Duration: 4 weeks Scope: 10,000 clinical trial summaries from one therapeutic area Success Criteria:

  • Sub-second query latency
  • Successful conflict detection on known contradictory studies
  • Complete audit trail export for mock regulatory review
  • Source retraction workflow tested

Investment: [To be discussed based on deployment scope]

Next Steps

  1. Technical architecture review with your infrastructure team
  2. Data sample for extraction pipeline design
  3. Pilot scope and success criteria agreement
  4. Kickoff

Contact: [Account team contact] Technical Documentation: Available under NDA Demo Environment: Can be provisioned on your infrastructure



Pilot Delivery Milestones

Demo Capability Delivery Target Roadmap Reference
Conflict Visualization Dashboard Week 1-2 P1.2, P1.3
Cascade Invalidation (one-click) Week 3 P3.1, P3.2, P3.3
Full Audit Trail Browser Week 2 P1.6
Trust & Safety Dashboard Week 2 P1.4, P1.5
Load-tested Performance (10K assertions) Week 4 P4.1
API Authentication Week 4 P4.2
Prometheus Metrics Week 4 P4.4

See roadmap.md for full implementation details.


Document version: 2026-02-05