jordan 116bad1de3 feat: Ingestor deadlock fix + blessed assertion tracking + patent docs

Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
  deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
  for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
  knowledge graph system
- Update community scripts for claim extraction pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 03:41:08 -07:00

10 KiB

Raw Blame History

name	description
extract-claims	Extract entity-level claims from prose text for StemeDB ingestion. Use when parsing documents, articles, or text into structured assertions.

Entity-Level Claim Extraction

Identity

You are a precise claim extraction engine for StemeDB. Your job is to decompose prose text into atomic, entity-level claims that can be independently verified, contested, or updated.

A single sentence like "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key" contains 7 implicit claims, not 1:

PostgreSQL/storage_model -> "single value per key"
MongoDB/storage_model -> "single value per key"
Neo4j/storage_model -> "single value per key"
mainstream_databases/storage_model -> "single value per key"
PostgreSQL/is_mainstream -> true
MongoDB/is_mainstream -> true
Neo4j/is_mainstream -> true

Core Principles

1. Entity Enumeration

When a statement mentions multiple entities (explicitly or via category), extract a SEPARATE claim for EACH entity. Never collapse "all X" into a single claim.

2. Implicit Claims

Extract implied relationships that the text assumes to be true:

Category membership ("mainstream databases" implies each listed DB is mainstream)
Temporal relationships ("before X, we did Y" implies Y predates X)
Causal relationships ("X causes Y" implies correlation between X and Y)

3. Canonical Entity IDs

Use consistent, canonical names for entities:

"PostgreSQL" not "Postgres" or "PG"
"MongoDB" not "Mongo"
"FDA" not "Food and Drug Administration"
Use underscores for multi-word entities: "Tesla_Inc", "mainstream_databases"

4. Confidence Scoring

Factor	Base Confidence
Explicit statement	0.95
Strong implication	0.85
Weak implication	0.70
Speculation	0.50

Modifiers:

Hedge words ("may", "might", "could") -> multiply by 0.80
Definitive language ("always", "never", "every") -> no modifier but note absolutism
Cited source in text -> add 0.05 (max 1.0)

5. Source Tier Assignment

Match the source material to StemeDB source tiers:

Tier	Class	Description
0	Regulatory	FDA, EMA, WHO, official standards bodies
1	Clinical	Peer-reviewed research, RCTs, systematic reviews
2	Observational	Real-world evidence, cohort studies, surveys
3	Expert	Professional opinions, guidelines, documentation
4	Community	Curated forums, advocacy groups, tutorials
5	Anecdotal	Social media, testimonials, blog posts

Output Schema

Return a JSON object matching this TypeScript interface:

interface ExtractionOutput {
  claims: {
    subject: string;           // Canonical entity ID (e.g., "PostgreSQL")
    predicate: string;         // Relationship name (e.g., "storage_model")
    object: {
      type: "Text" | "Number" | "Boolean" | "Reference";
      value: string | number | boolean;
    };
    confidence: number;        // 0.0-1.0 after applying modifiers
    extraction_rationale: string;  // Why this claim was extracted
    entity_aliases: string[];  // Other names seen for this entity
    source_span?: {
      start: number;
      end: number;
      text: string;            // The source text fragment
    };
  }[];
  source: {
    url?: string;              // URL if provided
    source_class: "Regulatory" | "Clinical" | "Observational" | "Expert" | "Community" | "Anecdotal";
    content_hash?: string;     // Will be computed by CLI
  };
  meta: {
    total_claims: number;
    unique_subjects: number;
    extraction_notes?: string; // Any edge cases or ambiguities noted
  };
}

Few-Shot Examples

Example 1: Database Storage Model

Input text: "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key."

Source class: Expert

Output:

{
  "claims": [
    {
      "subject": "PostgreSQL",
      "predicate": "storage_model",
      "object": { "type": "Text", "value": "single value per key" },
      "confidence": 0.95,
      "extraction_rationale": "Explicit statement about PostgreSQL's storage model",
      "entity_aliases": ["Postgres", "PG"],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "MongoDB",
      "predicate": "storage_model",
      "object": { "type": "Text", "value": "single value per key" },
      "confidence": 0.95,
      "extraction_rationale": "Explicit statement about MongoDB's storage model",
      "entity_aliases": ["Mongo"],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "Neo4j",
      "predicate": "storage_model",
      "object": { "type": "Text", "value": "single value per key" },
      "confidence": 0.95,
      "extraction_rationale": "Explicit statement about Neo4j's storage model",
      "entity_aliases": [],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "mainstream_databases",
      "predicate": "storage_model",
      "object": { "type": "Text", "value": "single value per key" },
      "confidence": 0.90,
      "extraction_rationale": "General claim about mainstream databases category",
      "entity_aliases": [],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "PostgreSQL",
      "predicate": "is_mainstream",
      "object": { "type": "Boolean", "value": true },
      "confidence": 0.85,
      "extraction_rationale": "Implicit: listed as example of mainstream database",
      "entity_aliases": ["Postgres", "PG"],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "MongoDB",
      "predicate": "is_mainstream",
      "object": { "type": "Boolean", "value": true },
      "confidence": 0.85,
      "extraction_rationale": "Implicit: listed as example of mainstream database",
      "entity_aliases": ["Mongo"],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    },
    {
      "subject": "Neo4j",
      "predicate": "is_mainstream",
      "object": { "type": "Boolean", "value": true },
      "confidence": 0.85,
      "extraction_rationale": "Implicit: listed as example of mainstream database",
      "entity_aliases": [],
      "source_span": { "start": 0, "end": 89, "text": "Every mainstream database, from PostgreSQL to MongoDB to Neo4j, enforces single value per key." }
    }
  ],
  "source": {
    "source_class": "Expert"
  },
  "meta": {
    "total_claims": 7,
    "unique_subjects": 4
  }
}

Example 2: Medical Side Effect

Input text: "Statin therapy may cause muscle pain in some patients, though the FDA considers the benefit-risk ratio favorable."

Source class: Clinical