jordan 157dbbb9eb feat: Complete Aphoria Phase 8-9 + UAT suite (90/90 tests passing)

## Phase 8: Enterprise Extractor Improvements ✅
- 14 security extractors (TLS, JWT, SQL injection, XSS, etc.)
- 10 framework-specific extractors (Spring, Django, Rails, etc.)
- Config file security detection (YAML, TOML)

## Phase 9: Autonomous Extractor Generation ✅
- Shadow mode executor with TP/FP tracking
- Graduation pipeline with confidence thresholds
- Auto-rollback on regression detection
- Cross-project pattern syncing

## UAT Suite Complete (14 scripts, 90 tests)
- test-core-detection.sh (6 tests)
- test-declarative-extractors.sh (5 tests)
- test-domain-frameworks.sh (5 tests)
- test-domain-unreal.sh (3 tests)
- test-llm-extraction.sh (6 tests)
- test-eval-harness.sh (5 tests)
- test-cross-language.sh (3 tests)
- test-precommit-performance.sh (4 tests)
- test-output-formats.sh (8 tests)
- test-drift-detection.sh (6 tests)
- test-exit-codes.sh (12 tests)
+ 3 more scripts

## Other Changes
- Updated roadmap to mark Phase 8-9 complete
- Added .gitignore entries for build artifacts
- Updated pre-commit: 800 line limit, exclude tests/data/cmd

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-06 22:50:55 -07:00

20 KiB

Raw Blame History

Governance Models Spec

Status: Draft Author: Jordan Washburn Date: 2026-02-05

Problem

Episteme's append-only model means assertions are never deleted or mutated. Corrections happen via supersession — a signed record that marks an old assertion as replaced by a new one.

The current supersession model has no ownership enforcement: anyone with a valid signing key can supersede any assertion. The system records WHO did it (agent_id + signature) but doesn't restrict WHO is allowed.

This creates a tension between three legitimate use cases:

Enterprise oversight: IT must be able to correct a bot's mistake at 3am without waiting for the bot's owner.
Personal ownership: Researchers want their assertions to represent THEIR epistemic position — if you disagree, create a contradicting assertion, don't overwrite mine.
Collaborative commons: Wikipedia-style shared stewardship where anyone can edit, disputes are visible, and stewards handle escalation.

These are fundamentally different trust topologies, not just configuration options.

Design

Governance Model Enum

A cluster declares its governance model at configuration time. This is a cluster-level setting, not per-assertion or per-write.

/// Cluster governance model for supersession enforcement.
///
/// Determines who can supersede assertions and how disputes are resolved.
/// Set at cluster configuration time; affects all supersession operations.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum GovernanceModel {
    /// Anyone with a valid signature can supersede any assertion.
    ///
    /// Use case: Enterprise deployments where administrators need oversight
    /// capability. Incident response, compliance corrections, adversarial cleanup.
    ///
    /// Mental model: "IT can fix the bot's mistakes."
    Enterprise,

    /// Only the original assertion's signer can supersede it.
    ///
    /// Use case: Research, personal knowledge bases, contexts where authorship
    /// and epistemic ownership matter.
    ///
    /// Mental model: "My assertions are MY position. Disagree? Create your own."
    Sovereign,

    /// Anyone can supersede, but protection levels and stewards exist.
    ///
    /// Use case: Collaborative knowledge commons, Wikipedia-style editing where
    /// disputes are visible and stewards handle escalation.
    ///
    /// Mental model: "The commons owns everything. We track who did what."
    Commons {
        /// Public keys of steward agents who can override protection levels.
        stewards: Vec<[u8; 32]>,
        /// Default protection level for new assertions.
        default_protection: ProtectionLevel,
    },
}

Protection Levels (Commons Mode Only)

In Commons mode, individual assertions can have protection levels that restrict who can supersede them.

/// Protection level for assertions in Commons governance mode.
///
/// Determines who can supersede a protected assertion. Only meaningful
/// when the cluster is running in Commons governance mode.
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
pub enum ProtectionLevel {
    /// Anyone can supersede (Wikipedia default).
    #[default]
    Open,

    /// Only agents with TrustRank >= threshold can supersede.
    /// Analogous to Wikipedia's "semi-protected" (blocks anonymous/new users).
    SemiProtected {
        min_trust_rank: f32,
    },

    /// Only stewards can supersede.
    /// Analogous to Wikipedia's "fully protected."
    FullyProtected,

    /// Only the original author can supersede.
    /// Same behavior as Sovereign mode, but per-assertion.
    AuthorOnly,
}

Governance Model Comparison

Aspect	Enterprise	Sovereign	Commons
Who can supersede?	Anyone with valid signature	Only original signer	Depends on protection level
Dispute resolution	Top-down (admin decides)	No disputes (author owns)	Visible conflict + steward escalation
Mental model	Corporate wiki	Personal blog / academic paper	Wikipedia
Trust topology	Hierarchical	Individual	Federated
Key failure mode	Admin rewrites history	Orphaned assertions (lost keys)	Edit wars
Escape hatch	N/A (admins are the escape)	Quarantine (legal/safety)	Steward override

Real-World Analogues

Enterprise → Corporate Wiki / Internal Docs

Someone is accountable for the corpus
Supervisors can correct subordinates' mistakes
Incident response requires immediate correction capability
Failure mode: "bad data persists because original agent is offline"

Sovereign → Academic Papers / Personal Blog

Authorship matters for citations and credit
"I said X" is distinct from "we believe X"
Corrections should come from the author (retractions)
If you disagree, publish your own contradicting claim
Failure mode: "I can never correct my old assertions if I lose my key"

Commons → Wikipedia / OpenStreetMap

Shared resource, not individual expression
Anyone can edit, anyone can revert
Disputes are visible in the history
Stewards (admins) handle escalation and protection
Failure mode: "edit wars" where users keep reverting each other

Authorization Logic

impl GovernanceModel {
    /// Check if a supersession is authorized.
    ///
    /// # Arguments
    /// * `original_signer` - Public key of the assertion's primary signer
    /// * `superseding_agent` - Public key of the agent creating the supersession
    /// * `agent_trust_rank` - TrustRank of the superseding agent (0.0 to 1.0)
    /// * `assertion_protection` - Protection level of the target assertion
    ///
    /// # Returns
    /// `true` if the supersession is allowed, `false` otherwise.
    pub fn can_supersede(
        &self,
        original_signer: &[u8; 32],
        superseding_agent: &[u8; 32],
        agent_trust_rank: f32,
        assertion_protection: ProtectionLevel,
    ) -> bool {
        match self {
            // Enterprise: always allowed
            GovernanceModel::Enterprise => true,

            // Sovereign: owner only
            GovernanceModel::Sovereign => original_signer == superseding_agent,

            // Commons: depends on protection level
            GovernanceModel::Commons { stewards, default_protection } => {
                // Use assertion's protection or cluster default
                let protection = if assertion_protection == ProtectionLevel::default() {
                    *default_protection
                } else {
                    assertion_protection
                };

                match protection {
                    ProtectionLevel::Open => true,

                    ProtectionLevel::SemiProtected { min_trust_rank } => {
                        agent_trust_rank >= min_trust_rank
                    }

                    ProtectionLevel::FullyProtected => {
                        stewards.contains(superseding_agent)
                    }

                    ProtectionLevel::AuthorOnly => {
                        original_signer == superseding_agent
                    }
                }
            }
        }
    }
}

Enforcement Point

Governance enforcement happens in the ingestion worker, not the API layer.

Why ingestion worker?

All writes flow through the WAL → Ingestion pipeline
Gossip-replicated supersessions bypass the API but hit the ingestor
Single enforcement point prevents bypass
Validation happens during batch processing, not query hot path

Why NOT API layer?

Only protects direct HTTP writes
Gossip-replicated supersessions would bypass enforcement

Why NOT storage layer?

Too late — data already committed to WAL
Rejection would require rollback logic

// In stemedb-ingest/src/worker/processing.rs

async fn ingest_supersession(&self, data: &[u8]) -> Result<()> {
    let supersession: Supersession = deserialize(data)?;

    // Fetch original assertion to get primary signer
    let original = self.fetch_assertion(&supersession.target_hash).await?;
    let original_signer = original.signatures.first()
        .ok_or(IngestError::InvalidSignature("No signatures"))?
        .agent_id;

    // Get superseding agent's trust rank (for Commons mode)
    let trust_rank = self.get_trust_rank(&supersession.agent_id).await?;

    // Get assertion's protection level (for Commons mode)
    let protection = original.protection.unwrap_or_default();

    // Check authorization
    if !self.governance.can_supersede(
        &original_signer,
        &supersession.agent_id,
        trust_rank,
        protection,
    ) {
        return Err(IngestError::Unauthorized(format!(
            "Agent {:?} cannot supersede assertion owned by {:?}",
            supersession.agent_id, original_signer
        )));
    }

    // Proceed with storage...
}

Multi-Signature Assertions

Assertions can have multiple signatures (author + endorsers). For ownership purposes:

Rule: The first signature is the "primary signer" (assertion creator). Only the primary signer is considered the "owner" for Sovereign mode and AuthorOnly protection.

/// Get the primary signer (owner) of an assertion.
///
/// The primary signer is the first signature in the signatures array.
/// This represents the assertion creator; additional signatures are endorsers.
fn get_primary_signer(assertion: &Assertion) -> Option<[u8; 32]> {
    assertion.signatures.first().map(|s| s.agent_id)
}

Rationale:

Deterministic: ordering defines ownership
Matches mental model of "author + reviewers"
Simple to implement

Future extension: If co-ownership is needed, add explicit owners: Vec<[u8; 32]> field to Assertion.

Escape Hatches

Sovereign Mode: Quarantine (Not Supersession)

In Sovereign mode, the owner exclusively controls supersession. But legal/safety situations require intervention. The solution: quarantine is distinct from supersession.

/// A quarantine record for assertions that cannot be superseded but must be hidden.
///
/// Quarantine is NOT supersession. The semantic difference:
/// - Supersession says: "I was wrong, here's the correction"
/// - Quarantine says: "This is being suppressed for external reasons"
///
/// The original assertion remains in the DAG. Quarantine affects query-time
/// filtering, not the underlying data.
pub struct Quarantine {
    /// Hash of the assertion being quarantined.
    pub target_hash: Hash,
    /// Why this assertion is quarantined.
    pub reason: QuarantineReason,
    /// Who issued the quarantine.
    pub authority: QuarantineAuthority,
    /// When the quarantine was issued.
    pub timestamp: u64,
    /// Whether the quarantine can be appealed/reversed.
    pub reversible: bool,
    /// Signature of the quarantine authority.
    pub signature: [u8; 64],
}

/// Reasons for quarantine (not error correction).
pub enum QuarantineReason {
    /// Court order or legal demand.
    LegalTakedown { order_id: String, jurisdiction: String },
    /// Violates safety policy.
    SafetyViolation { policy_version: String },
    /// Copyright claim (DMCA or equivalent).
    CopyrightClaim { claimant: String, dmca_id: Option<String> },
    /// Suspected compromised key.
    CompromisedKey { detection_method: String },
}

/// Who can issue a quarantine.
pub enum QuarantineAuthority {
    /// Cluster operator (legal/safety).
    Operator { operator_id: [u8; 32] },
    /// Automated system (flood detection, anomaly detection).
    Automated { system_name: String },
}

Key distinction:

Supersession = "I was wrong" (epistemic correction)
Quarantine = "This is hidden for external reasons" (administrative action)

Lenses respect quarantine by default but can include quarantined assertions with include_quarantined: true for auditors/researchers.

Commons Mode: Steward Override

Stewards can supersede any assertion regardless of protection level. This is the escalation path for edit wars and disputes.

Steward actions are logged and auditable:

tracing::info!(
    target_hash = %hex::encode(supersession.target_hash),
    steward = %hex::encode(supersession.agent_id),
    reason = %supersession.reason,
    "Steward override supersession"
);

Attack Vectors and Mitigations

Owner-Only Creates "Immutable Misinformation"

Attack: Flood assertions with malicious content, then discard the signing key. Assertions are now uncorrectable.

Mitigation: Quarantine mechanism allows hiding without superseding. Flood detection + auto-quarantine for anomalous behavior.

/// Detect suspicious assertion patterns.
struct FloodDetector {
    /// Track assertion rate per key
    rate_limiter: HashMap<[u8; 32], TokenBucket>,
    /// Threshold for auto-quarantine (assertions per minute)
    threshold: u32,
}

impl FloodDetector {
    fn is_flood(&self, agent_id: &[u8; 32]) -> bool {
        self.rate_limiter.get(agent_id)
            .map(|bucket| bucket.rate() > self.threshold)
            .unwrap_or(false)
    }
}

Key Loss = Lost Ownership

Attack: User loses private key, can never correct old assertions.

Mitigation: Accept this as a feature, not a bug. Key security matters. Users can re-assert with a new key (creates duplicate in DAG), and lenses will resolve based on recency/trust.

/// Re-assert old claims with a new key after key loss.
///
/// This creates a new assertion with the same content but different signer.
/// The old assertion remains (append-only). Lenses see both and resolve
/// based on recency, trust, or other criteria.
async fn re_assert_with_new_key(
    old_assertion_id: Hash,
    new_signer: &Keypair,
    db: &StemeDB,
) -> Result<Hash> {
    let old = db.get_assertion(&old_assertion_id).await?;

    let new_assertion = Assertion {
        subject: old.subject.clone(),
        predicate: old.predicate.clone(),
        object: old.object.clone(),
        // ... new signer, new timestamp
    };

    db.ingest(new_assertion, new_signer).await
}

Key Compromise

Attack: Attacker steals key, posts false assertions, user rotates key but old false assertions persist.

Mitigation: Quarantine all recent assertions from suspected compromised key.

/// Quarantine all assertions from a suspected compromised key.
async fn quarantine_compromised_key(
    key: &[u8; 32],
    since: DateTime<Utc>,
    db: &StemeDB,
) -> Result<u64> {
    let assertions = db.get_assertions_by_signer(key, since).await?;

    let mut count = 0;
    for assertion in assertions {
        db.quarantine(&assertion.hash, QuarantineReason::CompromisedKey {
            detection_method: "user_reported".to_string(),
        }).await?;
        count += 1;
    }

    tracing::warn!(
        key = %hex::encode(key),
        quarantined_count = count,
        "Quarantined assertions from suspected compromised key"
    );

    Ok(count)
}

Configuration

Cluster Config

pub struct ClusterConfig {
    // ... existing fields ...

    /// Governance model for supersession enforcement.
    /// Default: Enterprise (backward compatible with current behavior).
    pub governance: GovernanceModel,
}

TOML Examples

Enterprise deployment:

[cluster]
governance = "enterprise"

Personal research instance:

[cluster]
governance = "sovereign"

Public knowledge commons:

[cluster.governance]
type = "commons"
default_protection = "open"

[[cluster.governance.stewards]]
key = "abc123..." # hex-encoded public key

[[cluster.governance.stewards]]
key = "def456..."

Distributed Replication Considerations

Out-of-Order Delivery

Problem: Node A receives supersession before the target assertion (gossip timing).

Solution: Deferred validation with quarantine buffer.

async fn ingest_supersession(&self, data: &[u8]) -> Result<()> {
    let supersession: Supersession = deserialize(data)?;

    // Check if target exists locally
    match self.fetch_assertion(&supersession.target_hash).await {
        Ok(original) => {
            // Target exists, validate ownership
            self.validate_and_store(supersession, original).await
        }
        Err(NotFound) => {
            // Target not here yet, quarantine for later
            tracing::warn!(
                target = %hex::encode(supersession.target_hash),
                "Supersession before target, quarantining"
            );
            self.quarantine_pending_supersession(supersession).await
        }
    }
}

Quarantined supersessions are re-evaluated when new assertions arrive (anti-entropy sweep).

Cluster-Wide Consistency

All nodes in a cluster must use the same governance model. Enforcement happens at each node's ingestion worker, so:

If target arrives first → validation succeeds immediately
If supersession arrives first → quarantined until target arrives
Both cases converge to the same final state

API Changes

New Endpoints

POST   /v1/quarantine              Create a quarantine record (operator only)
GET    /v1/quarantine/{hash}       Get quarantine status for an assertion
DELETE /v1/quarantine/{hash}       Remove quarantine (if reversible)
GET    /v1/governance              Get cluster governance model

Query Params

pub struct QueryParams {
    // ... existing fields ...

    /// Include quarantined assertions in results.
    /// Default: false (quarantined assertions are hidden).
    pub include_quarantined: Option<bool>,
}

Supersede Request Validation

The existing /v1/supersede endpoint gains governance validation:

// In supersede handler
if !state.governance.can_supersede(...) {
    return Err(ApiError::Forbidden(
        "Supersession not authorized by governance model".to_string()
    ));
}

Migration Path

Phase 1: Add Types (No Enforcement)

Add GovernanceModel, ProtectionLevel, Quarantine types to stemedb-core
Add governance field to ClusterConfig
Default to Enterprise (backward compatible)
No enforcement yet

Phase 2: Sovereign Enforcement

Add ownership check in ingestion worker
When governance == Sovereign, reject non-owner supersessions
Add quarantine store and basic quarantine API

Phase 3: Commons Enforcement

Add protection field to Assertion
Add steward check logic
Add protection level management API
Add conflict detection lens (surfaces "edit wars")

Phase 4: Safety Features

Flood detection + auto-quarantine
Compromised key quarantine workflow
Quarantine appeal/reversal API

Design Decisions

Cluster-level, not per-assertion governance. Governance model is a fundamental trust topology, not a per-write decision. Mixing models would create confusion and split-brain scenarios in distributed replication.
First signature is owner. Simple, deterministic. Co-ownership can be added later if needed.
Quarantine ≠ Supersession. Semantically distinct operations. Supersession is epistemic ("I was wrong"). Quarantine is administrative ("hidden for external reasons"). Both needed.
Accept key loss = lost ownership. Feature, not bug. Encourages key security. Re-assertion with new key is the recovery path.
Stewards, not admins. In Commons mode, the term "steward" emphasizes service to the community, not authority over it. Matches Wikipedia terminology.
No per-assertion governance override. An assertion in a Sovereign cluster can't opt into Commons behavior. The governance model is a property of the cluster, not the data.

Relationship to Other Specs

Supersession: This spec defines WHO can supersede. The existing supersession spec defines WHAT supersession means (Invalidate, Temporal, Refinement, etc.).
Concept Hierarchy: Governance is orthogonal to concept paths. A Commons cluster could have hierarchical concept paths; a Sovereign cluster could have flat subjects.
Epochs: Epoch supersession follows the same governance rules. In Sovereign mode, only the epoch creator can supersede it.
TrustRank: Used by SemiProtected protection level. Low-trust agents can't supersede semi-protected assertions.

What We Do NOT Build

No voting/consensus mechanism for governance changes (cluster config is operator-controlled)
No delegation chains in v1 (can be added later)
No per-assertion governance override
No governance transitions while cluster is running (requires restart)
No cross-cluster governance federation (each cluster has its own model)
No UI for governance management (API + config only)

20 KiB Raw Blame History