## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20 KiB
Governance Models Spec
Status: Draft Author: Jordan Washburn Date: 2026-02-05
Problem
Episteme's append-only model means assertions are never deleted or mutated. Corrections happen via supersession — a signed record that marks an old assertion as replaced by a new one.
The current supersession model has no ownership enforcement: anyone with a valid signing key can supersede any assertion. The system records WHO did it (agent_id + signature) but doesn't restrict WHO is allowed.
This creates a tension between three legitimate use cases:
- Enterprise oversight: IT must be able to correct a bot's mistake at 3am without waiting for the bot's owner.
- Personal ownership: Researchers want their assertions to represent THEIR epistemic position — if you disagree, create a contradicting assertion, don't overwrite mine.
- Collaborative commons: Wikipedia-style shared stewardship where anyone can edit, disputes are visible, and stewards handle escalation.
These are fundamentally different trust topologies, not just configuration options.
Design
Governance Model Enum
A cluster declares its governance model at configuration time. This is a cluster-level setting, not per-assertion or per-write.
/// Cluster governance model for supersession enforcement.
///
/// Determines who can supersede assertions and how disputes are resolved.
/// Set at cluster configuration time; affects all supersession operations.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum GovernanceModel {
/// Anyone with a valid signature can supersede any assertion.
///
/// Use case: Enterprise deployments where administrators need oversight
/// capability. Incident response, compliance corrections, adversarial cleanup.
///
/// Mental model: "IT can fix the bot's mistakes."
Enterprise,
/// Only the original assertion's signer can supersede it.
///
/// Use case: Research, personal knowledge bases, contexts where authorship
/// and epistemic ownership matter.
///
/// Mental model: "My assertions are MY position. Disagree? Create your own."
Sovereign,
/// Anyone can supersede, but protection levels and stewards exist.
///
/// Use case: Collaborative knowledge commons, Wikipedia-style editing where
/// disputes are visible and stewards handle escalation.
///
/// Mental model: "The commons owns everything. We track who did what."
Commons {
/// Public keys of steward agents who can override protection levels.
stewards: Vec<[u8; 32]>,
/// Default protection level for new assertions.
default_protection: ProtectionLevel,
},
}
Protection Levels (Commons Mode Only)
In Commons mode, individual assertions can have protection levels that restrict who can supersede them.
/// Protection level for assertions in Commons governance mode.
///
/// Determines who can supersede a protected assertion. Only meaningful
/// when the cluster is running in Commons governance mode.
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
pub enum ProtectionLevel {
/// Anyone can supersede (Wikipedia default).
#[default]
Open,
/// Only agents with TrustRank >= threshold can supersede.
/// Analogous to Wikipedia's "semi-protected" (blocks anonymous/new users).
SemiProtected {
min_trust_rank: f32,
},
/// Only stewards can supersede.
/// Analogous to Wikipedia's "fully protected."
FullyProtected,
/// Only the original author can supersede.
/// Same behavior as Sovereign mode, but per-assertion.
AuthorOnly,
}
Governance Model Comparison
| Aspect | Enterprise | Sovereign | Commons |
|---|---|---|---|
| Who can supersede? | Anyone with valid signature | Only original signer | Depends on protection level |
| Dispute resolution | Top-down (admin decides) | No disputes (author owns) | Visible conflict + steward escalation |
| Mental model | Corporate wiki | Personal blog / academic paper | Wikipedia |
| Trust topology | Hierarchical | Individual | Federated |
| Key failure mode | Admin rewrites history | Orphaned assertions (lost keys) | Edit wars |
| Escape hatch | N/A (admins are the escape) | Quarantine (legal/safety) | Steward override |
Real-World Analogues
Enterprise → Corporate Wiki / Internal Docs
- Someone is accountable for the corpus
- Supervisors can correct subordinates' mistakes
- Incident response requires immediate correction capability
- Failure mode: "bad data persists because original agent is offline"
Sovereign → Academic Papers / Personal Blog
- Authorship matters for citations and credit
- "I said X" is distinct from "we believe X"
- Corrections should come from the author (retractions)
- If you disagree, publish your own contradicting claim
- Failure mode: "I can never correct my old assertions if I lose my key"
Commons → Wikipedia / OpenStreetMap
- Shared resource, not individual expression
- Anyone can edit, anyone can revert
- Disputes are visible in the history
- Stewards (admins) handle escalation and protection
- Failure mode: "edit wars" where users keep reverting each other
Authorization Logic
impl GovernanceModel {
/// Check if a supersession is authorized.
///
/// # Arguments
/// * `original_signer` - Public key of the assertion's primary signer
/// * `superseding_agent` - Public key of the agent creating the supersession
/// * `agent_trust_rank` - TrustRank of the superseding agent (0.0 to 1.0)
/// * `assertion_protection` - Protection level of the target assertion
///
/// # Returns
/// `true` if the supersession is allowed, `false` otherwise.
pub fn can_supersede(
&self,
original_signer: &[u8; 32],
superseding_agent: &[u8; 32],
agent_trust_rank: f32,
assertion_protection: ProtectionLevel,
) -> bool {
match self {
// Enterprise: always allowed
GovernanceModel::Enterprise => true,
// Sovereign: owner only
GovernanceModel::Sovereign => original_signer == superseding_agent,
// Commons: depends on protection level
GovernanceModel::Commons { stewards, default_protection } => {
// Use assertion's protection or cluster default
let protection = if assertion_protection == ProtectionLevel::default() {
*default_protection
} else {
assertion_protection
};
match protection {
ProtectionLevel::Open => true,
ProtectionLevel::SemiProtected { min_trust_rank } => {
agent_trust_rank >= min_trust_rank
}
ProtectionLevel::FullyProtected => {
stewards.contains(superseding_agent)
}
ProtectionLevel::AuthorOnly => {
original_signer == superseding_agent
}
}
}
}
}
}
Enforcement Point
Governance enforcement happens in the ingestion worker, not the API layer.
Why ingestion worker?
- All writes flow through the WAL → Ingestion pipeline
- Gossip-replicated supersessions bypass the API but hit the ingestor
- Single enforcement point prevents bypass
- Validation happens during batch processing, not query hot path
Why NOT API layer?
- Only protects direct HTTP writes
- Gossip-replicated supersessions would bypass enforcement
Why NOT storage layer?
- Too late — data already committed to WAL
- Rejection would require rollback logic
// In stemedb-ingest/src/worker/processing.rs
async fn ingest_supersession(&self, data: &[u8]) -> Result<()> {
let supersession: Supersession = deserialize(data)?;
// Fetch original assertion to get primary signer
let original = self.fetch_assertion(&supersession.target_hash).await?;
let original_signer = original.signatures.first()
.ok_or(IngestError::InvalidSignature("No signatures"))?
.agent_id;
// Get superseding agent's trust rank (for Commons mode)
let trust_rank = self.get_trust_rank(&supersession.agent_id).await?;
// Get assertion's protection level (for Commons mode)
let protection = original.protection.unwrap_or_default();
// Check authorization
if !self.governance.can_supersede(
&original_signer,
&supersession.agent_id,
trust_rank,
protection,
) {
return Err(IngestError::Unauthorized(format!(
"Agent {:?} cannot supersede assertion owned by {:?}",
supersession.agent_id, original_signer
)));
}
// Proceed with storage...
}
Multi-Signature Assertions
Assertions can have multiple signatures (author + endorsers). For ownership purposes:
Rule: The first signature is the "primary signer" (assertion creator). Only the primary signer is considered the "owner" for Sovereign mode and AuthorOnly protection.
/// Get the primary signer (owner) of an assertion.
///
/// The primary signer is the first signature in the signatures array.
/// This represents the assertion creator; additional signatures are endorsers.
fn get_primary_signer(assertion: &Assertion) -> Option<[u8; 32]> {
assertion.signatures.first().map(|s| s.agent_id)
}
Rationale:
- Deterministic: ordering defines ownership
- Matches mental model of "author + reviewers"
- Simple to implement
Future extension: If co-ownership is needed, add explicit owners: Vec<[u8; 32]> field to Assertion.
Escape Hatches
Sovereign Mode: Quarantine (Not Supersession)
In Sovereign mode, the owner exclusively controls supersession. But legal/safety situations require intervention. The solution: quarantine is distinct from supersession.
/// A quarantine record for assertions that cannot be superseded but must be hidden.
///
/// Quarantine is NOT supersession. The semantic difference:
/// - Supersession says: "I was wrong, here's the correction"
/// - Quarantine says: "This is being suppressed for external reasons"
///
/// The original assertion remains in the DAG. Quarantine affects query-time
/// filtering, not the underlying data.
pub struct Quarantine {
/// Hash of the assertion being quarantined.
pub target_hash: Hash,
/// Why this assertion is quarantined.
pub reason: QuarantineReason,
/// Who issued the quarantine.
pub authority: QuarantineAuthority,
/// When the quarantine was issued.
pub timestamp: u64,
/// Whether the quarantine can be appealed/reversed.
pub reversible: bool,
/// Signature of the quarantine authority.
pub signature: [u8; 64],
}
/// Reasons for quarantine (not error correction).
pub enum QuarantineReason {
/// Court order or legal demand.
LegalTakedown { order_id: String, jurisdiction: String },
/// Violates safety policy.
SafetyViolation { policy_version: String },
/// Copyright claim (DMCA or equivalent).
CopyrightClaim { claimant: String, dmca_id: Option<String> },
/// Suspected compromised key.
CompromisedKey { detection_method: String },
}
/// Who can issue a quarantine.
pub enum QuarantineAuthority {
/// Cluster operator (legal/safety).
Operator { operator_id: [u8; 32] },
/// Automated system (flood detection, anomaly detection).
Automated { system_name: String },
}
Key distinction:
- Supersession = "I was wrong" (epistemic correction)
- Quarantine = "This is hidden for external reasons" (administrative action)
Lenses respect quarantine by default but can include quarantined assertions with include_quarantined: true for auditors/researchers.
Commons Mode: Steward Override
Stewards can supersede any assertion regardless of protection level. This is the escalation path for edit wars and disputes.
Steward actions are logged and auditable:
tracing::info!(
target_hash = %hex::encode(supersession.target_hash),
steward = %hex::encode(supersession.agent_id),
reason = %supersession.reason,
"Steward override supersession"
);
Attack Vectors and Mitigations
Owner-Only Creates "Immutable Misinformation"
Attack: Flood assertions with malicious content, then discard the signing key. Assertions are now uncorrectable.
Mitigation: Quarantine mechanism allows hiding without superseding. Flood detection + auto-quarantine for anomalous behavior.
/// Detect suspicious assertion patterns.
struct FloodDetector {
/// Track assertion rate per key
rate_limiter: HashMap<[u8; 32], TokenBucket>,
/// Threshold for auto-quarantine (assertions per minute)
threshold: u32,
}
impl FloodDetector {
fn is_flood(&self, agent_id: &[u8; 32]) -> bool {
self.rate_limiter.get(agent_id)
.map(|bucket| bucket.rate() > self.threshold)
.unwrap_or(false)
}
}
Key Loss = Lost Ownership
Attack: User loses private key, can never correct old assertions.
Mitigation: Accept this as a feature, not a bug. Key security matters. Users can re-assert with a new key (creates duplicate in DAG), and lenses will resolve based on recency/trust.
/// Re-assert old claims with a new key after key loss.
///
/// This creates a new assertion with the same content but different signer.
/// The old assertion remains (append-only). Lenses see both and resolve
/// based on recency, trust, or other criteria.
async fn re_assert_with_new_key(
old_assertion_id: Hash,
new_signer: &Keypair,
db: &StemeDB,
) -> Result<Hash> {
let old = db.get_assertion(&old_assertion_id).await?;
let new_assertion = Assertion {
subject: old.subject.clone(),
predicate: old.predicate.clone(),
object: old.object.clone(),
// ... new signer, new timestamp
};
db.ingest(new_assertion, new_signer).await
}
Key Compromise
Attack: Attacker steals key, posts false assertions, user rotates key but old false assertions persist.
Mitigation: Quarantine all recent assertions from suspected compromised key.
/// Quarantine all assertions from a suspected compromised key.
async fn quarantine_compromised_key(
key: &[u8; 32],
since: DateTime<Utc>,
db: &StemeDB,
) -> Result<u64> {
let assertions = db.get_assertions_by_signer(key, since).await?;
let mut count = 0;
for assertion in assertions {
db.quarantine(&assertion.hash, QuarantineReason::CompromisedKey {
detection_method: "user_reported".to_string(),
}).await?;
count += 1;
}
tracing::warn!(
key = %hex::encode(key),
quarantined_count = count,
"Quarantined assertions from suspected compromised key"
);
Ok(count)
}
Configuration
Cluster Config
pub struct ClusterConfig {
// ... existing fields ...
/// Governance model for supersession enforcement.
/// Default: Enterprise (backward compatible with current behavior).
pub governance: GovernanceModel,
}
TOML Examples
Enterprise deployment:
[cluster]
governance = "enterprise"
Personal research instance:
[cluster]
governance = "sovereign"
Public knowledge commons:
[cluster.governance]
type = "commons"
default_protection = "open"
[[cluster.governance.stewards]]
key = "abc123..." # hex-encoded public key
[[cluster.governance.stewards]]
key = "def456..."
Distributed Replication Considerations
Out-of-Order Delivery
Problem: Node A receives supersession before the target assertion (gossip timing).
Solution: Deferred validation with quarantine buffer.
async fn ingest_supersession(&self, data: &[u8]) -> Result<()> {
let supersession: Supersession = deserialize(data)?;
// Check if target exists locally
match self.fetch_assertion(&supersession.target_hash).await {
Ok(original) => {
// Target exists, validate ownership
self.validate_and_store(supersession, original).await
}
Err(NotFound) => {
// Target not here yet, quarantine for later
tracing::warn!(
target = %hex::encode(supersession.target_hash),
"Supersession before target, quarantining"
);
self.quarantine_pending_supersession(supersession).await
}
}
}
Quarantined supersessions are re-evaluated when new assertions arrive (anti-entropy sweep).
Cluster-Wide Consistency
All nodes in a cluster must use the same governance model. Enforcement happens at each node's ingestion worker, so:
- If target arrives first → validation succeeds immediately
- If supersession arrives first → quarantined until target arrives
- Both cases converge to the same final state
API Changes
New Endpoints
POST /v1/quarantine Create a quarantine record (operator only)
GET /v1/quarantine/{hash} Get quarantine status for an assertion
DELETE /v1/quarantine/{hash} Remove quarantine (if reversible)
GET /v1/governance Get cluster governance model
Query Params
pub struct QueryParams {
// ... existing fields ...
/// Include quarantined assertions in results.
/// Default: false (quarantined assertions are hidden).
pub include_quarantined: Option<bool>,
}
Supersede Request Validation
The existing /v1/supersede endpoint gains governance validation:
// In supersede handler
if !state.governance.can_supersede(...) {
return Err(ApiError::Forbidden(
"Supersession not authorized by governance model".to_string()
));
}
Migration Path
Phase 1: Add Types (No Enforcement)
- Add
GovernanceModel,ProtectionLevel,Quarantinetypes tostemedb-core - Add
governancefield toClusterConfig - Default to
Enterprise(backward compatible) - No enforcement yet
Phase 2: Sovereign Enforcement
- Add ownership check in ingestion worker
- When
governance == Sovereign, reject non-owner supersessions - Add quarantine store and basic quarantine API
Phase 3: Commons Enforcement
- Add
protectionfield to Assertion - Add steward check logic
- Add protection level management API
- Add conflict detection lens (surfaces "edit wars")
Phase 4: Safety Features
- Flood detection + auto-quarantine
- Compromised key quarantine workflow
- Quarantine appeal/reversal API
Design Decisions
-
Cluster-level, not per-assertion governance. Governance model is a fundamental trust topology, not a per-write decision. Mixing models would create confusion and split-brain scenarios in distributed replication.
-
First signature is owner. Simple, deterministic. Co-ownership can be added later if needed.
-
Quarantine ≠ Supersession. Semantically distinct operations. Supersession is epistemic ("I was wrong"). Quarantine is administrative ("hidden for external reasons"). Both needed.
-
Accept key loss = lost ownership. Feature, not bug. Encourages key security. Re-assertion with new key is the recovery path.
-
Stewards, not admins. In Commons mode, the term "steward" emphasizes service to the community, not authority over it. Matches Wikipedia terminology.
-
No per-assertion governance override. An assertion in a Sovereign cluster can't opt into Commons behavior. The governance model is a property of the cluster, not the data.
Relationship to Other Specs
-
Supersession: This spec defines WHO can supersede. The existing supersession spec defines WHAT supersession means (Invalidate, Temporal, Refinement, etc.).
-
Concept Hierarchy: Governance is orthogonal to concept paths. A Commons cluster could have hierarchical concept paths; a Sovereign cluster could have flat subjects.
-
Epochs: Epoch supersession follows the same governance rules. In Sovereign mode, only the epoch creator can supersede it.
-
TrustRank: Used by SemiProtected protection level. Low-trust agents can't supersede semi-protected assertions.
What We Do NOT Build
- No voting/consensus mechanism for governance changes (cluster config is operator-controlled)
- No delegation chains in v1 (can be added later)
- No per-assertion governance override
- No governance transitions while cluster is running (requires restart)
- No cross-cluster governance federation (each cluster has its own model)
- No UI for governance management (API + config only)