jordan 55349845d0 refactor: Split all files to enforce 500-line max

Break monolith source files into focused modules:
- stemedb-core/types.rs → types/ directory (assertion, source, gold_standard, etc.)
- stemedb-storage: audit_store, quota_store, trust_rank_store, vector_index, vote_store → module directories
- stemedb-ingest/worker.rs → worker/ with separate test modules
- stemedb-query: engine, materializer, query → module directories
- stemedb-lens: epoch_aware, skeptic → module directories
- stemedb-sim/lib.rs → agent, arenas/, helpers, runner, strategy, types
- stemedb-api/tests: integration_tests → http_basic, http_validation, http_epoch, http_pipeline
- stemedb-api/tests: e2e_flow_test → e2e_full_pipeline, e2e_lens_resolution
- stemedb-query/tests: e2e_pipeline → e2e_pipeline + e2e_decay

Also adds new features: gold standard verification, escalation handlers,
admin endpoints, concept hierarchy spec, arena roadmap, and Go SDK.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 01:13:45 -07:00

26 KiB

Raw Blame History

Sentinel Technical Spec

Status: Draft Date: 2026-02-02

Overview

Sentinel is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.

sentinel scan <project-root> [--config sentinel.toml] [--format table|json|sarif|markdown]
sentinel ack <concept-path> --reason "..."
sentinel baseline
sentinel diff
sentinel status

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        sentinel CLI                          │
│                                                              │
│  ┌──────────┐   ┌────────────┐   ┌──────────┐   ┌────────┐ │
│  │ Walker   │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
│  │          │   │            │   │          │   │        │ │
│  │ fs walk  │   │ tls_verify │   │ bridge   │   │ table  │ │
│  │ lang det │   │ jwt_config │   │ to       │   │ json   │ │
│  │ path map │   │ secrets    │   │ episteme │   │ sarif  │ │
│  │          │   │ timeouts   │   │          │   │ md     │ │
│  │          │   │ deps       │   │          │   │        │ │
│  │          │   │ cors       │   │          │   │        │ │
│  │          │   │ rate_limit │   │          │   │        │ │
│  └──────────┘   └────────────┘   └──────────┘   └────────┘ │
│                                       │              ▲      │
│                                       ▼              │      │
│                              ┌──────────────┐        │      │
│                              │   Episteme   │────────┘      │
│                              │   (local)    │  query +      │
│                              │              │  conflict     │
│                              └──────────────┘  scores       │
└──────────────────────────────────────────────────────────────┘

Sentinel depends on:

stemedb-core (types: ConceptPath, Assertion, SourceClass)
stemedb-storage (KVStore, IndexStore, AliasStore)
stemedb-ingest (ingestion pipeline)
stemedb-query (query engine, lenses)

It does not depend on stemedb-api. Sentinel talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).

Crate Structure

crates/
  sentinel/
    Cargo.toml
    src/
      main.rs              CLI entrypoint (clap)
      config.rs            sentinel.toml parsing
      walker/
        mod.rs             Project walker orchestration
        language.rs        Language detection
        path_mapper.rs     Directory → ConceptPath mapping
        normalizer.rs      Path normalization rules per language
      extractors/
        mod.rs             Extractor trait + registry
        tls_verify.rs      TLS certificate verification
        jwt_config.rs      JWT validation settings
        hardcoded_secrets.rs  Credentials in source
        timeout_config.rs  HTTP/DB/Redis timeouts
        dep_versions.rs    Vulnerable dependency versions
        cors_config.rs     CORS allow-origin
        rate_limit.rs      Rate limiting config
      bridge.rs            ExtractedClaim → Assertion conversion
      conflict.rs          Conflict query + scoring
      report/
        mod.rs             Report generation orchestration
        table.rs           Terminal table output
        json.rs            JSON output
        sarif.rs           SARIF for CI integration
        markdown.rs        Markdown output
      ack.rs               Acknowledge command
      baseline.rs          Baseline snapshot
      diff.rs              Delta since last scan

Configuration

sentinel.toml at project root (optional, sensible defaults):

[project]
name = "citadeldb"
language = "rust"            # auto-detected if omitted

[episteme]
data_dir = "~/.sentinel/db"  # local Episteme instance
# url = "http://localhost:3000"  # future: remote instance

[thresholds]
block = 0.7                  # conflict score >= this → BLOCK
flag = 0.4                   # conflict score >= this → FLAG
# below flag threshold → PASS (not reported)

[extractors]
enabled = ["tls_verify", "jwt_config", "hardcoded_secrets", "timeout_config", "dep_versions", "cors_config", "rate_limit"]
# disabled = ["rate_limit"]  # alternative: disable specific ones

[extractors.timeout_config]
min_reasonable_ms = 1000     # flag timeouts below this
max_reasonable_ms = 300000   # flag timeouts above this

[extractors.dep_versions]
advisory_db = "~/.sentinel/advisory-db"  # rustsec/advisory-db clone

[scan]
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
max_file_size = 1048576      # skip files > 1MB

[aliases]
auto_suggest = true          # suggest aliases when shared leaves detected
auto_accept_tier0 = true     # auto-accept alias suggestions to Tier 0 sources

Walker

Language Detection

Priority order:

Explicit language in sentinel.toml
Dominant language heuristic (count files by extension)
Per-file extension mapping

Extension	Language
`.rs`	rust
`.go`	go
`.py`	python
`.ts`, `.tsx`	typescript
`.js`, `.jsx`	javascript
`.yaml`, `.yml`	yaml
`.toml`	toml
`.json`	json
`.env`, `.env.*`	dotenv
`Dockerfile`, `docker-compose.*`	docker
`Cargo.toml`	cargo-manifest
`go.mod`	go-mod
`package.json`	npm-manifest
`requirements.txt`, `pyproject.toml`	python-manifest

Path Mapping

Directory structure maps to ConceptPath segments. Language-specific stripping rules remove boilerplate directories:

Rust:

Strip: src/, crates/
Keep: everything else

crates/citadeldb/src/auth/jwt.rs
  → ["rust", "citadeldb", "auth", "jwt"]
  → code://rust/citadeldb/auth/jwt/{leaf from extractor}

Go:

Strip: cmd/, internal/, pkg/
Keep: everything else

internal/auth/jwt/validator.go
  → ["go", "{module_name}", "auth", "jwt"]
  → code://go/{module_name}/auth/jwt/{leaf}

Python:

Strip: src/, lib/
Keep: everything else

src/auth/jwt_handler.py
  → ["python", "{package_name}", "auth", "jwt_handler"]
  → code://python/{package_name}/auth/jwt_handler/{leaf}

Config files:

config/production.yaml
  → code://config/{project_name}/production/{leaf}

.env.production
  → code://config/{project_name}/env_production/{leaf}

docker-compose.yml
  → code://docker/{project_name}/{leaf}

The project name comes from:

sentinel.toml project.name
Cargo.toml [package] name
go.mod module name (last segment)
package.json name
Directory name of project root

File Filtering

Skip:

Directories in scan.exclude list
Files larger than scan.max_file_size
Binary files (detected by null byte in first 8KB)
Generated files (*.generated.*, *.pb.go, *_generated.rs)
Test files (configurable: include or exclude)

Extractors

Trait Definition

/// A claim extractor that finds implicit decisions in source code.
pub trait Extractor: Send + Sync {
    /// Unique identifier for this extractor.
    fn name(&self) -> &str;

    /// File types this extractor operates on.
    fn languages(&self) -> &[Language];

    /// Extract claims from a file's content.
    ///
    /// - `path_segments`: The ConceptPath segments derived from the file's location.
    /// - `content`: The file content as a string.
    /// - `language`: The detected language of the file.
    ///
    /// Returns zero or more extracted claims.
    fn extract(
        &self,
        path_segments: &[String],
        content: &str,
        language: Language,
    ) -> Vec<ExtractedClaim>;
}

ExtractedClaim

/// A claim extracted from source code by an Extractor.
pub struct ExtractedClaim {
    /// The full ConceptPath for this claim.
    /// Scheme is always "code" for code-extracted claims.
    pub concept_path: ConceptPath,

    /// The predicate describing what aspect of the concept this claims.
    /// Examples: "enabled", "config_value", "version", "allow_origin"
    pub predicate: String,

    /// The extracted value.
    pub value: ObjectValue,

    /// Source file path relative to project root.
    pub file: String,

    /// Line number in the source file (1-indexed).
    pub line: usize,

    /// The matched source text (the actual code/config that was matched).
    pub matched_text: String,

    /// Confidence of extraction.
    /// 1.0 for exact regex matches.
    /// Lower for heuristic matches.
    pub confidence: f32,

    /// Human-readable description of what was found.
    /// Example: "JWT audience validation is disabled"
    pub description: String,
}

Extractor: tls_verify

What it finds: TLS/SSL certificate verification disabled.

Patterns:

Rust (reqwest):

Pattern: danger_accept_invalid_certs\s*\(\s*true\s*\)
Leaf: cert_verification
Predicate: enabled
Value: Boolean(false)

Rust (native-tls):

Pattern: accept_invalid_certs\s*\(\s*true\s*\)

Go (net/http):

Pattern: InsecureSkipVerify\s*:\s*true

Python (requests):

Pattern: verify\s*=\s*False

Node.js:

Pattern: rejectUnauthorized\s*:\s*false
Pattern: NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"]

YAML/TOML/JSON config:

Pattern: (tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off)

Extractor: jwt_config

What it finds: JWT validation settings.

Claims extracted per finding:

Leaf	Predicate	What it means
`audience_validation`	`enabled`	Whether `aud` claim is validated
`expiry_validation`	`enabled`	Whether `exp` claim is validated
`algorithm_restriction`	`config_value`	Allowed algorithms (or "none" if unrestricted)
`signature_verification`	`enabled`	Whether signatures are verified

Patterns (Rust, jsonwebtoken crate):

// aud validation
Pattern: set_audience.*\[\]|validate_aud.*false|aud.*None
Leaf: audience_validation
Value: Boolean(false)

// Dangerous: algorithm none
Pattern: Algorithm::None|alg.*none|allow_none.*true
Leaf: algorithm_restriction
Value: Text("none_allowed")

// Signature skip
Pattern: dangerous_insecure|skip_signature|verify.*false
Leaf: signature_verification
Value: Boolean(false)

Patterns (Go, golang-jwt):

Pattern: jwt\.Parse\(.*func\(.*\*jwt\.Token\).*\{[^}]*return.*signingKey
         (without any algorithm check in the callback)

This is a heuristic match (confidence < 1.0) — detecting missing validation is harder than detecting explicit disabling.

Extractor: hardcoded_secrets

What it finds: Credentials, API keys, tokens in source (not in .env or .gitignore'd files).

Patterns:

// API keys
Pattern: (api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["']
Leaf: api_key_storage
Predicate: storage_method
Value: Text("hardcoded")

// Passwords
Pattern: (password|passwd|pwd)\s*[:=]\s*["'][^"']+["']
  (excluding: "password", "changeme", "placeholder", "CHANGE_ME", "xxx", test patterns)
Leaf: password_storage
Predicate: storage_method
Value: Text("hardcoded")

// AWS keys
Pattern: (AKIA[0-9A-Z]{16})
Leaf: aws_credentials
Predicate: storage_method
Value: Text("hardcoded")

// Private keys
Pattern: -----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----
Leaf: private_key_storage
Predicate: storage_method
Value: Text("hardcoded_in_source")

Exclusions: Files matching *test*, *example*, *fixture*, *mock* are scanned but findings are marked with lower confidence (0.5).

Extractor: timeout_config

What it finds: HTTP client, database, and cache timeout values.

Patterns:

// Zero/infinite timeout
Pattern: timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf)
Leaf: {context}/timeout  (context from surrounding code: http, db, redis, etc.)
Predicate: config_value
Value: Number(0.0)
Description: "Timeout disabled (infinite wait)"

// Unreasonably low timeout
Pattern: timeout\s*[:=]\s*(\d+)
  where value_ms < config.min_reasonable_ms
Leaf: {context}/timeout
Value: Number(extracted_value)
Description: "Timeout {value}ms below minimum reasonable {min}ms"

// Unreasonably high timeout
Pattern: timeout\s*[:=]\s*(\d+)
  where value_ms > config.max_reasonable_ms

Unit detection: Heuristic based on magnitude and surrounding context:

Value > 1000000 → likely nanoseconds
Value > 1000 and < 1000000 → likely milliseconds
Value < 100 → likely seconds
Presence of "ms", "sec", "Duration::from_secs" → explicit unit

Extractor: dep_versions

What it finds: Dependencies with known vulnerabilities.

Sources:

Cargo.toml → check against RustSec Advisory DB
go.mod → check against Go Vulnerability Database
package.json → check against npm audit advisories
requirements.txt / pyproject.toml → check against PyPI advisory data

Output:

Leaf: dep/{package_name}/version
Predicate: installed_version
Value: Text("1.0.2")
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"

The advisory databases are downloaded locally and refreshed periodically. Sentinel doesn't call external APIs during scan.

Extractor: cors_config

What it finds: Overly permissive CORS configuration.

Patterns:

// Allow all origins
Pattern: allow_origin\s*\(\s*["']\*["']\s*\)|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true
Leaf: cors/allow_origin
Predicate: config_value
Value: Text("*")
Description: "CORS allows all origins"

// Allow credentials with wildcard
Pattern: (allow_credentials|AllowCredentials).*true
  (in proximity to allow_origin *)
Leaf: cors/credentials_with_wildcard
Predicate: enabled
Value: Boolean(true)
Description: "CORS allows credentials with wildcard origin"

Extractor: rate_limit

What it finds: Rate limiting disabled or set unreasonably high.

Patterns:

// Rate limiting disabled
Pattern: (rate_limit|ratelimit).*disabled|rate_limit\s*[:=]\s*(0|false|off|none)
Leaf: rate_limit/enabled
Predicate: enabled
Value: Boolean(false)

// Unreasonably high limit
Pattern: (rate_limit|ratelimit|max_requests)\s*[:=]\s*(\d+)
  where value > 10000 per minute (configurable)
Leaf: rate_limit/max_requests
Predicate: config_value
Value: Number(extracted_value)

Ingestion Bridge

Claim → Assertion Mapping

fn to_assertion(
    claim: &ExtractedClaim,
    agent_keypair: &Ed25519Keypair,
    scan_timestamp: u64,
) -> Assertion {
    let source_metadata = serde_json::to_vec(&json!({
        "file": claim.file,
        "line": claim.line,
        "matched_text": claim.matched_text,
        "extractor": claim.concept_path.leaf(),
        "scan_tool": "sentinel",
        "scan_version": env!("CARGO_PKG_VERSION"),
    }));

    let source_hash = blake3::hash(
        format!("{}:{}:{}", claim.file, claim.line, claim.matched_text).as_bytes()
    );

    Assertion {
        subject: claim.concept_path.to_string(),  // EntityId = String
        predicate: claim.predicate.clone(),
        object: claim.value.clone(),
        parent_hash: None,
        source_hash: *source_hash.as_bytes(),
        source_class: SourceClass::Expert,  // code:// scheme default
        visual_hash: None,
        epoch: None,
        source_metadata: source_metadata.ok(),
        lifecycle: LifecycleStage::Approved,
        signatures: vec![sign(agent_keypair, &claim)],
        confidence: claim.confidence,
        timestamp: scan_timestamp,
        vector: None,
    }
}

Idempotency

Same code produces the same claims. Same claims produce the same assertion hashes (content-addressed). Re-scanning a project that hasn't changed ingests nothing new. This is guaranteed by BLAKE3 content addressing in the existing Episteme pipeline.

When code changes between scans, new assertions are created. Old assertions remain (append-only). The diff command compares the current scan's assertions against the last scan's to show what changed.

Scan Metadata

Each scan is recorded as an assertion about itself:

Subject: sentinel://scan/{project_name}/{scan_id}
Predicate: completed
Object: Text(json!({
    "project": "citadeldb",
    "files_scanned": 142,
    "claims_extracted": 23,
    "conflicts_found": 3,
    "blocks": 2,
    "flags": 1,
    "timestamp": 1706832000
}))

This enables sentinel diff — compare two scan records and their associated assertions.

Conflict Detection

Query Strategy

After ingestion, for each extracted claim:

async fn check_conflict(
    claim: &ExtractedClaim,
    query_engine: &QueryEngine,
) -> Option<ConflictResult> {
    // 1. Query with Skeptic lens, resolving aliases
    let results = query_engine.query(Query {
        subject: Some(claim.concept_path.to_string()),
        predicate: Some(claim.predicate.clone()),
        lens: Some("skeptic".to_string()),
        resolve_aliases: true,
        source_class_decay: true,
        ..Default::default()
    }).await;

    // 2. Check if any authoritative source disagrees
    let code_value = &claim.value;
    let mut conflicts = Vec::new();

    for assertion in &results.assertions {
        if assertion.source_class.tier() < 3  // Tier 0, 1, or 2
            && assertion.object != *code_value  // Different value
        {
            conflicts.push(ConflictingSource {
                path: assertion.subject.clone(),
                source_class: assertion.source_class,
                value: assertion.object.clone(),
                confidence: assertion.confidence,
            });
        }
    }

    if conflicts.is_empty() {
        return None;
    }

    // 3. Compute conflict score
    //    Higher when tier spread is larger and authoritative sources are confident
    let max_tier_weight = conflicts.iter()
        .map(|c| c.source_class.authority_weight())
        .max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
        .unwrap_or(0.0);

    let code_weight = SourceClass::Expert.authority_weight(); // 0.5

    let conflict_score = max_tier_weight * (1.0 - code_weight);
    // Tier 0 vs Tier 3: 1.0 * 0.5 = 0.50 (minimum, boosted below)
    // Boosted by confidence of the authoritative source

    let boosted_score = conflict_score
        * conflicts.iter().map(|c| c.confidence).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(1.0);

    // Normalize: tier spread 0→3 maps to 0.4→0.95
    let tier_spread = conflicts.iter()
        .map(|c| c.source_class.tier())
        .min()
        .unwrap_or(3) as f32;
    let normalized = 0.4 + (3.0 - tier_spread) / 3.0 * 0.55;
    let final_score = normalized.max(boosted_score);

    Some(ConflictResult {
        claim: claim.clone(),
        conflicts,
        conflict_score: final_score,
        verdict: if final_score >= threshold_block { Verdict::Block }
                 else if final_score >= threshold_flag { Verdict::Flag }
                 else { Verdict::Pass },
    })
}

Verdict Levels

Verdict	Condition	Meaning
BLOCK	`conflict_score >= 0.7`	Authoritative source strongly contradicts. Fix or explicitly acknowledge.
FLAG	`conflict_score >= 0.4`	Potential disagreement. Review recommended.
PASS	`conflict_score < 0.4`	No significant conflict (or no authoritative data).
ACK	Any score, acknowledged	Conflict exists but has been explicitly accepted.

Acknowledged Conflicts

When a conflict has been acknowledged (via sentinel ack), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:

  ACK    code://rust/citadeldb/auth/jwt/audience_validation
         Your code:  aud validation disabled        (src/auth/jwt.rs:47)
         RFC 7519:   aud validation MUST be enabled  (Tier 0)
         Acknowledged: 2026-01-15 by jordan
         Reason: "Internal service, no external JWT consumers. SEC-2024-003."

The acknowledgment doesn't suppress the conflict. It adds context. A future --strict mode can treat acknowledged conflicts as blocks again (for audits).

Report Formats

SARIF (for CI)

SARIF (Static Analysis Results Interchange Format) is the standard for CI security tools. GitHub, GitLab, and Azure DevOps all consume it.

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "sentinel",
        "version": "0.1.0",
        "informationUri": "https://github.com/orchard9/sentinel"
      }
    },
    "results": [{
      "ruleId": "epistemic-drift/tls-verify",
      "level": "error",
      "message": {
        "text": "TLS certificate verification disabled. OWASP requires verification (Tier 1, conflict score 0.87)."
      },
      "locations": [{
        "physicalLocation": {
          "artifactLocation": { "uri": "src/net/client.rs" },
          "region": { "startLine": 23 }
        }
      }]
    }]
  }]
}

JSON (for programmatic consumption)

{
  "project": "citadeldb",
  "scan_id": "abc123",
  "timestamp": 1706832000,
  "summary": {
    "files_scanned": 142,
    "claims_extracted": 23,
    "conflicts": 3,
    "blocks": 2,
    "flags": 1
  },
  "conflicts": [
    {
      "concept_path": "code://rust/citadeldb/auth/jwt/audience_validation",
      "predicate": "enabled",
      "code_value": false,
      "file": "src/auth/jwt.rs",
      "line": 47,
      "conflict_score": 0.92,
      "verdict": "BLOCK",
      "conflicting_sources": [
        {
          "path": "rfc://7519/jwt/audience_validation",
          "source_class": "Regulatory",
          "value": true,
          "confidence": 1.0
        }
      ],
      "acknowledged": null
    }
  ]
}

Baseline and Diff

Baseline

sentinel baseline records the current scan as the baseline. Subsequent scans only report new conflicts.

Implementation: store the baseline scan ID in .sentinel/baseline in the project root. The diff logic compares the current scan's conflict set against the baseline's.

.sentinel/
  baseline        # scan ID of the baseline
  config.toml     # symlink or copy of sentinel.toml
  agent.key       # Ed25519 keypair for this project's Sentinel agent

Diff

sentinel diff shows:

New conflicts (in current scan but not baseline)
Resolved conflicts (in baseline but not current scan)
Changed conflicts (same concept, different score or verdict)

$ sentinel diff

  NEW    code://rust/citadeldb/cache/redis/max_connections
         Your code:  max_connections = 10000         (config/redis.yaml:5)
         Vendor:     recommended max 128 per instance (Tier 2)
         Conflict:   0.48 — FLAG

  RESOLVED  code://rust/citadeldb/net/tls/cert_verification
         Previously: verify = false → BLOCK
         Current:    verify = true  → PASS

1 new conflict, 1 resolved, 0 changed.

Agent Keypair

Sentinel signs assertions with a per-project Ed25519 keypair stored in .sentinel/agent.key. Generated on first sentinel scan if it doesn't exist.

The keypair identifies "Sentinel scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:

Per-project audit trails ("which Sentinel agent found this?")
TrustRank per Sentinel instance (a well-calibrated Sentinel gains reputation)
Distinguishing human-authored assertions from Sentinel-extracted ones

Episteme Instance

Local Mode (Default)

Sentinel ships with an embedded Episteme instance. No server needed. The database lives at ~/.sentinel/db/ (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (code://rust/citadeldb/... vs code://go/other-project/...).

The authoritative corpus (RFCs, OWASP) is also in the local instance. sentinel init bootstraps it.

$ sentinel init
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
Downloading OWASP cheat sheets ... 89 assertions ingested.
Ready. Run `sentinel scan <project>` to begin.

Remote Mode (Future)

[episteme]
url = "https://episteme.example.com"
api_key = "${SENTINEL_API_KEY}"

In remote mode, Sentinel ingests into and queries from a shared Episteme instance. This enables:

Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
Shared authoritative corpus (ingested once, used by all Sentinel agents)
Centralized acknowledgment management

Exit Codes

Code	Meaning
0	No conflicts above threshold
1	FLAG-level conflicts found (with `--exit-code`)
2	BLOCK-level conflicts found (with `--exit-code`)
3	Scan error (file access, Episteme connection, etc.)

--exit-code enables non-zero exits. Without it, Sentinel always exits 0 (for interactive use where the report is the output, not the exit code).

Performance Targets

Metric	Target
Scan time, 1000-file Rust project	< 5 seconds
Scan time, 10000-file monorepo	< 30 seconds
Per-file extraction (all extractors)	< 5ms
Conflict query per claim	< 10ms
Local Episteme startup	< 100ms
Memory usage during scan	< 256MB

The performance bottleneck is I/O (reading files), not extraction (regex matching). The conflict query is a local KV lookup, not a network call.

Dependencies

Dependency	Purpose
`clap`	CLI argument parsing
`ignore`	File walking (respects .gitignore, fast)
`regex`	Pattern matching in extractors
`serde` + `serde_json`	Config parsing, JSON output
`toml`	sentinel.toml parsing
`comfy-table`	Terminal table output
`stemedb-core`	Types
`stemedb-storage`	Local KV store
`stemedb-ingest`	Assertion ingestion
`stemedb-query`	Conflict queries
`ed25519-dalek`	Agent keypair + signing
`blake3`	Content hashing

No LLM dependency. No network dependency (in local mode). No runtime other than tokio (for async KV store operations).

26 KiB Raw Blame History

Sentinel Technical Spec

Overview

Architecture

Crate Structure

Configuration

Walker

Language Detection

Path Mapping

File Filtering

Extractors

Trait Definition

ExtractedClaim

Extractor: tls_verify

Extractor: jwt_config

Extractor: hardcoded_secrets

Extractor: timeout_config

Extractor: dep_versions

Extractor: cors_config

Extractor: rate_limit

Ingestion Bridge

Claim → Assertion Mapping

Idempotency

Scan Metadata

Conflict Detection

Query Strategy

Verdict Levels

Acknowledged Conflicts

Report Formats

SARIF (for CI)

JSON (for programmatic consumption)

Baseline and Diff

Baseline

Diff

Agent Keypair

Episteme Instance

Local Mode (Default)

Remote Mode (Future)

Exit Codes

Performance Targets

Dependencies

26 KiB

Raw Blame History