feat: Phase 6 UAT - Admission control, HLC recency, cluster coordination

This commit includes comprehensive work on Phase 6 features:

## Admission Control (Phase 6 admission middleware)
- AdmissionStore implementation backed by TrustRankStore
- PoW verification with tier-based difficulty computation
- Trust tier progression (Newcomer → Established → Trusted → Authority)
- API integration with admission status endpoints

## HLC Recency Lens (Phase 6C)
- HlcRecencyLens for distributed system ordering
- Hybrid logical clock integration with causality preservation

## Cluster Coordination (Phase 6C)
- Multi-node cluster tests (availability, partition tolerance)
- CRDT convergence tests for anti-entropy sync
- Gateway handler improvements

## Aphoria Code Linter (Phase 2A)
- RFC/OWASP corpus builders with network fetching and caching
- Concept hierarchy with auto-alias creation on conflict detection
- Multiple security extractors (TLS, JWT, CORS, secrets, rate limiting)

## Code Organization
- Split large files into modules to comply with 500-line limit
- Improved test organization with separate test modules
- Fixed rkyv serialization for EigenTrustState (AgentScore struct)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
jordan 2026-02-03 00:43:37 -07:00
parent 7ae0adaba4
commit d3a88585fe
108 changed files with 16899 additions and 531 deletions

View File

@ -133,3 +133,4 @@ open target/llvm-cov/html/index.html
- [Setup Guide](./setup.md) - [Setup Guide](./setup.md)
- [Rust Guidelines](../backend/rust-guidelines.md) - [Rust Guidelines](../backend/rust-guidelines.md)
- [UAT Reports](./uat-reports.md)

View File

@ -0,0 +1,89 @@
# Writing UAT Reports
**When to use:** After completing a phase, feature, or release and you need to document what was tested and the outcomes.
## Prerequisites
- Completed UAT testing session
- Access to test results and logs
- Understanding of what was in scope
## Quick Start
```bash
# Create a UAT report following the template
cp uat/how-to.md uat/{feature}-{date}.md
# Edit with your results
# File naming: uat/phase6-distributed-2026-02-02.md
```
## Report Structure
Every UAT report follows the template in `uat/how-to.md`:
1. **Header** — Date, phase/feature, tester, overall status
2. **Summary** — 1-2 sentences on what was tested
3. **Scope** — What was and wasn't tested
4. **Environment** — Rust version, OS, commit
5. **Test Results** — Tables with Expected/Actual/Status
6. **Issues Found** — Severity, status, description
7. **Fixes Applied** — Changes made during UAT
8. **Recommendations** — Future improvements
9. **Sign-Off** — Checklist for release readiness
## File Naming
```
uat/{feature-or-phase}-{YYYY-MM-DD}.md
```
Examples:
- `uat/phase6-distributed-2026-02-02.md`
- `uat/skeptic-endpoint-2025-12-15.md`
- `uat/go-sdk-v2-2026-01-20.md`
## Test Result Tables
Use consistent formatting:
```markdown
| Test | Expected | Actual | Status |
|------|----------|--------|--------|
| Build | Compiles | Compiled in 36s | PASS |
| Health endpoint | 200 OK | 200 OK | PASS |
```
## Issue Severity Levels
| Severity | Meaning |
|----------|---------|
| Critical | Blocks release, data loss risk |
| High | Major functionality broken |
| Medium | Functionality degraded but workaround exists |
| Low | Minor issue, cosmetic, or edge case |
## Sign-Off Checklist
Before marking a UAT complete:
- [ ] All critical tests pass
- [ ] No blocking issues remain
- [ ] Documentation updated
- [ ] Ready for release
## Troubleshooting
### Test results are inconsistent
Check the environment section — version mismatches cause false failures. Re-run in a clean environment.
### Can't reproduce an issue
Document what you tried and mark the issue as "intermittent" with reproduction steps attempted.
## Related
- [Testing Guide](.claude/guides/local/testing.md)
- [Quality Checks](.claude/guides/local/quality-checks.md)
- [UAT Template](uat/how-to.md)

View File

@ -32,6 +32,8 @@ A probabilistic knowledge graph database that stores Claims, not Facts. Append-o
| **Integrate with AI tools** | [.claude/guides/integrations/ai-coding-assistant-integration.md](.claude/guides/integrations/ai-coding-assistant-integration.md) | | **Integrate with AI tools** | [.claude/guides/integrations/ai-coding-assistant-integration.md](.claude/guides/integrations/ai-coding-assistant-integration.md) |
| **ADK-Go + Episteme** | [.claude/guides/integrations/adk-go-episteme.md](.claude/guides/integrations/adk-go-episteme.md) | | **ADK-Go + Episteme** | [.claude/guides/integrations/adk-go-episteme.md](.claude/guides/integrations/adk-go-episteme.md) |
| **Distributed architecture** | [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) | | **Distributed architecture** | [docs/research/distributed-write-path.md](docs/research/distributed-write-path.md) |
| **Write UAT reports** | [.claude/guides/local/uat-reports.md](.claude/guides/local/uat-reports.md) |
| **Phase 6 UAT results** | [ai-lookup/features/phase6-uat.md](ai-lookup/features/phase6-uat.md) |
## Critical Rules ## Critical Rules

View File

@ -0,0 +1,171 @@
# Admission Control (The Shield)
Phase 7A introduces graduated proof-of-work admission control to defend against spam, Sybil attacks, and knowledge poisoning when Episteme is open to millions of agents.
## Overview
New or untrusted agents must solve BLAKE3-based proof-of-work puzzles before their assertions are accepted. As agents demonstrate good behavior, they graduate to higher trust tiers with reduced PoW requirements and increased quotas.
## Key Concepts
### Trust Tiers
| Trust Range | Tier | Quota Multiplier | PoW Required |
|-------------|------------|------------------|--------------|
| 0.0-0.3 | Untrusted | 0.1x (1,000/hr) | Yes |
| 0.3-0.5 | Limited | 0.5x (5,000/hr) | Yes |
| 0.5-0.7 | Verified | 1.0x (10,000/hr) | No |
| 0.7-0.9 | Trusted | 2.0x (20,000/hr) | No |
| 0.9-1.0 | Authority | 10.0x (100k/hr) | No |
### PoW Graduation
| Assertions | Trust Score | Difficulty | Approximate Effort |
|------------|-------------|------------|-------------------|
| 0-9 | < 0.6 | 16 bits | ~16 seconds |
| 10-49 | < 0.6 | 1 bit | Trivial |
| 50+ | any | 0 bits | Exempt |
| any | >= 0.6 | 0 bits | Exempt |
## HTTP API
### GET /v1/admission/status
Get admission status for an agent.
**Query Parameters:**
- `agent_id` (required): Agent's Ed25519 public key (hex, 64 chars)
**Response:**
```json
{
"agent_id": "abc123...",
"tier": "Verified",
"trust_score": 0.55,
"assertions_count": 42,
"pow_difficulty": 0,
"pow_required": false,
"base_quota_limit": 10000,
"effective_quota_limit": 10000,
"quota_multiplier": 1.0,
"assertions_until_reduced_difficulty": null,
"assertions_until_exemption": null
}
```
### Request Flow
1. Agent submits assertion with `X-Agent-Id` header
2. AdmissionLayer checks trust tier and assertion count
3. If PoW required:
- Return HTTP 428 with difficulty in response body
- Agent solves `BLAKE3(nonce || agent_id || timestamp)` with required leading zeros
- Agent resubmits with `X-PoW-Nonce` and `X-PoW-Timestamp` headers
4. PoW verified, request proceeds to MeterLayer
5. On success, assertion count increments
### HTTP 428 Response (PoW Required)
```json
{
"error": "Proof-of-Work required",
"code": "POW_REQUIRED",
"required_difficulty": 16,
"pow_required": true,
"agent_assertions": 3,
"agent_trust_score": 0.5
}
```
## Headers
### Request Headers
| Header | Description |
|--------|-------------|
| `X-Agent-Id` | Agent's Ed25519 public key (hex, 64 chars) |
| `X-PoW-Nonce` | PoW solution nonce (decimal) |
| `X-PoW-Timestamp` | PoW solution timestamp (Unix seconds) |
### Response Headers
| Header | Description |
|--------|-------------|
| `X-Trust-Tier` | Agent's trust tier name |
| `X-PoW-Required` | "true" or "false" |
| `X-PoW-Difficulty` | Required difficulty in bits |
| `X-Quota-Multiplier` | Tier quota multiplier |
## Implementation Details
### Core Types
**TrustTier** (`stemedb-core/src/types/trust_tier.rs`):
- Enum with 5 tiers: Untrusted, Limited, Verified, Trusted, Authority
- Methods: `from_score()`, `quota_multiplier()`, `requires_pow()`
**PowProof** (`stemedb-core/src/types/pow.rs`):
- Struct with `nonce`, `agent_id`, `timestamp`
- Methods: `verify()`, `compute_hash()`, `leading_zeros()`, `solve()`
**AdmissionConfig** (`stemedb-core/src/types/pow.rs`):
- Configurable thresholds and difficulties
- Default: 16-bit initial, 10 assertions reduced, 50 graduated
### Storage
**AdmissionStore** (`stemedb-storage/src/admission_store/`):
- Wraps TrustRankStore (reuses existing trust score + assertion count)
- No new storage keys needed
- Methods: `get_admission_status()`, `verify_pow()`, `record_assertion()`
### Middleware
**AdmissionLayer** (`stemedb-api/src/middleware/admission.rs`):
- Tower middleware applied before MeterLayer
- Extracts PoW headers, verifies proofs, returns 428 on failure
- Bypasses health checks and admission status endpoint
## Client Implementation Guide
To solve a PoW puzzle:
```rust
use stemedb_core::types::PowProof;
// Get your agent's Ed25519 public key
let agent_id: [u8; 32] = ...;
let timestamp = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
let difficulty = 16; // From 428 response
// Brute-force search for valid nonce
let proof = PowProof::solve(agent_id, timestamp, difficulty);
// Submit with headers
// X-Agent-Id: {hex(agent_id)}
// X-PoW-Nonce: {proof.nonce}
// X-PoW-Timestamp: {proof.timestamp}
```
## Router Functions
Three router variants are available:
1. `create_router()` - No admission control or metering
2. `create_router_with_meter()` - Metering only (quotas)
3. `create_router_with_admission()` - Full protection (PoW + quotas)
## Security Properties
- **Replay Prevention**: Proofs expire after 5 minutes
- **Agent Binding**: Proof includes agent_id, cannot be reused by others
- **Asymmetric Cost**: O(2^difficulty) to solve, O(1) to verify
- **Fail Open**: On store errors, requests are allowed (availability > strictness)
- **Defense in Depth**: API layer primary, ingestion layer secondary
## Future: Phase 7B (EigenTrust)
Phase 7B will build on this foundation:
- Peer-to-peer trust propagation (trust agents you trust)
- Network-wide reputation scores
- Dynamic tier adjustments based on global consensus

View File

@ -0,0 +1,44 @@
# Phase 6 UAT Results
**Last Updated:** 2026-02-02
**Confidence:** High
## Summary
Full user acceptance testing of Phase 6 (The Mesh — Distributed Writes). All 67 Phase 6 tests pass, existing functionality verified, new cluster node binary works correctly. Two minor issues found and fixed during testing.
**Key Facts:**
- All Phase 6 crates tested: merkle (16), rpc (5), sync (10), cluster (28), battery11 (8)
- New `stemedb-node` binary demonstrates cluster routing
- Go SDK examples (basic, conflict, skeptic) all pass
- Single-node API and Swagger UI verified working
**File Pointer:** `uat/phase6-distributed-2026-02-02.md`
## Test Coverage
| Area | Tests | Status |
|------|-------|--------|
| stemedb-merkle | 16 | PASS |
| stemedb-rpc | 5 | PASS |
| stemedb-sync | 10 | PASS |
| stemedb-cluster | 28 | PASS |
| battery11 replication | 8 | PASS |
| Validation script | 5 checks | PASS |
| Go SDK examples | 3 | PASS |
## Issues Found & Fixed
1. **swim.rs doc comment** (Low) — Missing blank line, fixed
2. **Health endpoint** (Medium) — Bootstrap node reported unhealthy, fixed to check `joined` only
## New Artifacts
- Binary: `crates/stemedb-cluster/src/bin/node.rs`
- Updated: `quickstart.md` Section 8 (Distributed Mode)
- Report: `uat/phase6-distributed-2026-02-02.md`
## Related Topics
- [Simulation](./simulation.md)
- [Cluster Node](../services/cluster.md) (if exists)

View File

@ -34,6 +34,7 @@ Token-efficient fact storage for StemeDB. Query these for quick context without
| Query Audit | `features/query-audit.md` | High | 2026-01-31 | Trace agent decisions for debugging | | Query Audit | `features/query-audit.md` | High | 2026-01-31 | Trace agent decisions for debugging |
| TrustRank | `features/trust-rank.md` | High | 2026-01-31 | Agent reputation system with learning loop | | TrustRank | `features/trust-rank.md` | High | 2026-01-31 | Agent reputation system with learning loop |
| Simulation | `features/simulation.md` | High | 2026-01-31 | Agent-based modeling for validation | | Simulation | `features/simulation.md` | High | 2026-01-31 | Agent-based modeling for validation |
| Phase 6 UAT | `features/phase6-uat.md` | High | 2026-02-02 | Distributed writes UAT results and fixes |
## Use Cases ## Use Cases

View File

@ -35,6 +35,7 @@ stemedb-core = { path = "../../crates/stemedb-core" }
stemedb-storage = { path = "../../crates/stemedb-storage" } stemedb-storage = { path = "../../crates/stemedb-storage" }
stemedb-ingest = { path = "../../crates/stemedb-ingest" } stemedb-ingest = { path = "../../crates/stemedb-ingest" }
stemedb-query = { path = "../../crates/stemedb-query" } stemedb-query = { path = "../../crates/stemedb-query" }
stemedb-wal = { path = "../../crates/stemedb-wal" }
# CLI # CLI
clap = { version = "4.5", features = ["derive"] } clap = { version = "4.5", features = ["derive"] }
@ -75,5 +76,8 @@ tracing-subscriber = "0.3"
rkyv = { version = "0.7", features = ["validation"] } rkyv = { version = "0.7", features = ["validation"] }
bytecheck = "0.6" bytecheck = "0.6"
# HTTP client for RFC/OWASP fetching
ureq = { version = "2.9", features = ["tls"] }
[dev-dependencies] [dev-dependencies]
tempfile = "3.10" tempfile = "3.10"

View File

@ -2,308 +2,283 @@
--- ---
## Phase 0: StemeDB Foundation ## Phase 0: StemeDB Foundation
> **Tracked in:** [roadmap.md § 5D. Concept Hierarchy](../../roadmap.md) > **Tracked in:** [roadmap.md § 5D. Concept Hierarchy](../../roadmap.md)
Changes to the core database that Aphoria depends on. These ship before the CLI and are tracked in the main StemeDB roadmap as **Phase 5D**. Changes to the core database that Aphoria depends on. Shipped as **Phase 5D** of the main StemeDB roadmap.
| Aphoria Phase 0 | StemeDB Phase 5D | Status | | Aphoria Phase 0 | StemeDB Phase 5D | Status |
|-----------------|------------------|--------| |-----------------|------------------|--------|
| 0.1 ConceptPath Type | 5D.1 ConceptPath Type | | | 0.1 ConceptPath Type | 5D.1 ConceptPath Type | |
| 0.2 ConceptPath in Assertion | (implicit in 5D.1) | | | 0.2 ConceptPath in Assertion | (implicit in 5D.1) | |
| 0.3 Hierarchical Index | 5D.4 Hierarchical Query | | | 0.3 Hierarchical Index | 5D.4 Hierarchical Query | |
| 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | | | 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | |
| 0.5 Source Class Inference | 5D.6 Source Class Inference | | | 0.5 Source Class Inference | 5D.6 Source Class Inference | |
| 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | | | 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | |
**Spec:** [docs/specs/concept-hierarchy.md](../../docs/specs/concept-hierarchy.md) **Spec:** [docs/specs/concept-hierarchy.md](../../docs/specs/concept-hierarchy.md)
--- ---
## Phase 1: Authoritative Corpus ## Phase 2: CLI Core ✅
Before Aphoria can find conflicts, Episteme needs the authoritative sources to conflict against. > Phase 2 was built before Phase 1 (authoritative corpus expansion). The CLI pipeline works end-to-end with a bootstrapped corpus of 11 hardcoded assertions covering TLS, JWT, CORS, secrets, and rate limiting.
### 1.1 RFC Ingester | Task | Status |
|------|--------|
| 2.1 Project Walker | ✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs` |
| 2.2 Extractors (7) | ✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit` |
| 2.3 Ingestion Bridge | ✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion |
| 2.4 Conflict Query | ✅ `episteme.rs` — LocalEpisteme with check_conflicts() |
| 2.5 Report Output | ✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown |
| 2.6 Acknowledge Command | ✅ `lib.rs` acknowledge() |
| Baseline & Diff | ✅ `lib.rs` set_baseline(), show_diff() |
| Status Command | ✅ `lib.rs` show_status() |
A CLI tool (or ingestion module) that: 118 tests pass. Clippy and fmt clean.
- Fetches RFC text from `rfc-editor.org` (text format, no PDF parsing needed)
- Extracts normative statements (MUST, MUST NOT, SHOULD, SHALL per RFC 2119)
- Maps each statement to a ConceptPath: `rfc://{number}/{topic}/{claim}`
- Ingests as Tier 0 assertions
Start with a curated list of security-relevant RFCs:
| RFC | Topic |
|-----|-------|
| 7519 | JWT |
| 6749 | OAuth 2.0 |
| 6750 | Bearer tokens |
| 8446 | TLS 1.3 |
| 7525 | TLS best practices |
| 6238 | TOTP |
| 7617 | HTTP Basic Auth |
| 9110 | HTTP Semantics |
### 1.2 OWASP Ingester
Parse OWASP Cheat Sheets (markdown source on GitHub):
- Extract each recommendation as a claim
- Map to `owasp://cheatsheet/{topic}/{claim}`
- Ingest as Tier 1 assertions
Priority cheat sheets: Authentication, JWT, TLS, Secrets Management, Input Validation, Session Management.
### 1.3 Vendor Docs (Manual Bootstrap)
For v1, manually curate a small set of vendor doc claims:
- Postgres connection pool recommendations
- Redis timeout defaults
- Common HTTP client library defaults (reqwest, hyper, net/http)
These are `vendor://{product}/{topic}/{claim}` at Tier 2.
This doesn't need to be exhaustive. It needs to cover the claims that Aphoria's extractors will actually find in code.
--- ---
## Phase 2: CLI Core ## Phase 2A: Concept Matching ✅
The Aphoria binary itself. > **Status:** Complete. Tail-path matching (2A.1), alias-aware queries (2A.2), and auto-alias creation (2A.3) all implemented.
### 2.1 Project Walker ### 2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅
Input: a project root path. Implemented in `episteme.rs` via `ConceptIndex`:
Output: a list of files to scan, each tagged with: - `make_key(subject, predicate)` extracts tail 2 path segments + predicate
- Language (rust, go, python, typescript, yaml, toml, json) - `build(assertions)` creates in-memory index keyed by tail path
- ConceptPath prefix derived from directory structure - `lookup(subject, predicate)` finds matching authoritative assertions
- `check_conflicts()` uses `ConceptIndex` instead of `QueryEngine` for cross-scheme matching
``` Integration tests prove TLS and JWT conflicts are detected correctly.
crates/citadeldb/src/auth/jwt.rs
→ language: rust
→ prefix: code://rust/citadeldb/auth/jwt
```
Normalization rules: ### 2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅
- Strip `src/`, `lib/`, `pkg/`, `internal/` (language boilerplate)
- Strip `crates/`, `packages/`, `apps/` (monorepo wrappers)
- Map `config/`, `deploy/`, `infra/` to `code://config/{project}/...`
- File extension determines language, not directory
### 2.2 Extractors Wired `AliasStore` into `QueryEngine.execute()`:
- Added `resolve_aliases: bool` field to `Query` (defaults to `false`)
- Added `alias_store: Option<Arc<dyn AliasStore>>` to `QueryEngine`
- Added `.with_alias_store()` builder method
- When `resolve_aliases: true`, expands subject via `AliasStore.resolve_all()` before index lookup
- Added `fetch_by_subjects()` and `fetch_by_subjects_predicate()` for multi-subject deduplication
- Modified `Query.matches()` to skip subject filtering when aliases are resolved
- Skips fast path (MV lookup) when `resolve_aliases: true`
- Gracefully degrades when no alias store is configured
Each extractor is a module that: 7 unit tests in `engine/tests/alias_resolution.rs`. This is the architecturally correct long-term fix that complements leaf matching.
- Takes a file path + content + language
- Returns a `Vec<ExtractedClaim>`
Ship these extractors in v1: ### 2A.3 Auto-Alias Creation ✅
| Extractor | What it finds | Languages | When Aphoria ingests authoritative assertions and code claims that share leaf names, automatically create aliases:
|-----------|--------------|-----------| - `code://rust/myapp/tls/cert_verification``rfc://5246/tls/cert_verification`
| `tls_verify` | TLS certificate verification disabled | rust, go, python, js/ts | - `code://rust/myapp/auth/jwt/audience_validation``rfc://7519/jwt/audience_validation`
| `jwt_config` | JWT validation settings (aud, exp, alg) | rust, go, python, js/ts |
| `hardcoded_secrets` | Credentials in source (not .env) | all |
| `timeout_config` | HTTP/DB/Redis timeout values | all (config files) |
| `dep_versions` | Known-vulnerable dependency versions | Cargo.toml, go.mod, package.json, requirements.txt |
| `cors_config` | CORS allow-origin settings | rust, go, js/ts |
| `rate_limit` | Rate limiting disabled or unreasonable | rust, go, js/ts |
Extractors use regex + AST patterns, not LLMs. Each extractor declares: This bridges 2A.1 (leaf matching) with 2A.2 (alias resolution) — leaf matching identifies candidates, aliases persist the relationship.
- The patterns it searches for
- The ConceptPath leaf it maps to
- The predicate (e.g., `config_value`, `enabled`, `version`)
- How to extract the ObjectValue from the match
### 2.3 Ingestion Bridge **Implementation:**
- Added `auto_create_aliases: bool` config option to `AliasConfig` (defaults to `true`)
Connect extractor output to the Episteme ingestion pipeline: - Added `AliasOrigin::AutoDetected` variant to `stemedb-core` for tracking auto-created aliases
- Wired `GenericAliasStore` into `LocalEpisteme` for alias persistence
``` - In `check_conflicts()`, when a code claim matches an authoritative claim by leaf, calls `AliasStore.set_alias()` to persist the relationship with `AliasOrigin::AutoDetected`
ExtractedClaim { - Alias creation is idempotent (skips if alias already exists)
path: code://rust/citadeldb/auth/jwt/audience_validation - 4 unit tests verify: alias creation on conflict, no creation when disabled, correct origin, idempotency
predicate: "enabled"
value: Boolean(false)
source_location: "src/auth/jwt.rs:47"
confidence: 1.0 // regex match, not heuristic
}
Assertion {
subject: ConceptPath::parse("code://rust/citadeldb/auth/jwt/audience_validation")
predicate: "enabled"
object: ObjectValue::Boolean(false)
source_class: SourceClass::Expert // inferred from code:// scheme
source_hash: blake3(file_content)
source_metadata: { "file": "src/auth/jwt.rs", "line": 47 }
confidence: 1.0
lifecycle: LifecycleStage::Approved // code is deployed, it's a fact about the code
}
```
The bridge handles:
- ConceptPath construction from extractor output
- Source hash computation (BLAKE3 of the file at scan time)
- Source metadata encoding (file path, line number, extraction method)
- Signing with the Aphoria agent's keypair
### 2.4 Conflict Query
After ingestion, query Episteme for each extracted concept:
```rust
for claim in extracted_claims {
let results = query_engine.query(Query {
subject: Some(claim.path.to_string()),
resolve_aliases: true,
hierarchical: false,
lens: Some("skeptic"),
..Default::default()
});
if results.conflict_score > threshold {
report.add_conflict(claim, results);
}
}
```
The Skeptic lens returns all claims for the concept across all aliased paths, with a conflict score. If the code claim (Tier 3) contradicts an RFC claim (Tier 0), the conflict score will be high because of the tier spread.
### 2.5 Report Output
```
$ aphoria scan ./citadeldb --format table
┌──────────────────────────────────────────────────────────────────────┐
│ Aphoria Report: citadeldb │
│ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │
├──────────┬───────────────────────────────────────┬──────────┬───────┤
│ Verdict │ Concept │ Score │ Tier │
├──────────┼───────────────────────────────────────┼──────────┼───────┤
│ BLOCK │ auth/jwt/audience_validation │ 0.92 │ 0↔3 │
│ BLOCK │ net/tls/cert_verification │ 0.87 │ 1↔3 │
│ FLAG │ http/timeout │ 0.54 │ 2↔3 │
└──────────┴───────────────────────────────────────┴──────────┴───────┘
Details:
BLOCK code://rust/citadeldb/auth/jwt/audience_validation
Your code: aud validation disabled (src/auth/jwt.rs:47)
RFC 7519: aud validation MUST be enabled (Tier 0)
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
BLOCK code://rust/citadeldb/net/tls/cert_verification
Your code: verify = false (src/net/client.rs:23)
OWASP: verification required (Tier 1)
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
FLAG code://rust/citadeldb/http/timeout
Your code: timeout = 0 (infinite) (config/production.yaml:8)
reqwest: default timeout 30s (Tier 2)
Action: Review recommended
```
Output formats: `table` (default), `json`, `sarif` (for CI integration), `markdown`.
### 2.6 Acknowledge Command
```
$ aphoria ack code://rust/citadeldb/auth/jwt/audience_validation \
--reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003."
```
This creates a new Assertion:
- Subject: `internal://decision/citadeldb/auth/jwt/audience_validation`
- Predicate: `deviation_accepted`
- Object: Text with the reason
- SourceClass: Expert (Tier 3)
- Aliased to: `code://rust/citadeldb/auth/jwt/audience_validation`
The conflict still exists in Episteme, but the acknowledgment is recorded. Next scan, the conflict still shows but with context: "Acknowledged by [agent] on [date]: [reason]." The Skeptic lens sees the acknowledgment as an additional claim in the space.
--- ---
## Phase 3: Skill Integration ## Phase 1: Authoritative Corpus Expansion ✅
### 3.1 Claude Code Skill > Expanded from 11 hardcoded assertions to a pluggable corpus system with RFC, OWASP, and Vendor sources.
A `/aphoria` skill that wraps the CLI: ### Architecture
``` ```
/aphoria scan Scan current project, report conflicts ┌─────────────────────────────────────────────────────────────────┐
/aphoria scan --fix Scan and offer to fix each conflict │ aphoria corpus build │
/aphoria ack <path> Acknowledge a conflict with a reason │ │
/aphoria status Show current conflict summary │ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
/aphoria diff Show new conflicts since last scan │ │ RFC Ingester │ │ OWASP │ │ Vendor Bootstrapper │ │
│ │ (Tier 0) │ │ Ingester │ │ (Tier 2) │ │
│ │ │ │ (Tier 1) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │
│ │ │ │ │
│ └─────────────────┼──────────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ CorpusRegistry │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LocalEpisteme │ │
│ │ ingest_ │ │
│ │ authoritative() │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
``` ```
The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session. ### 1.1 CorpusBuilder Trait ✅
### 3.2 Agent Pre-Flight Hook | Task | Status |
|------|--------|
| `CorpusBuilder` trait | ✅ `corpus/mod.rs` — name, scheme, default_tier, build, requires_network |
| `CorpusRegistry` | ✅ Manages multiple builders, build_all(), list_builders() |
| `CorpusBuildResult` | ✅ Stats per builder, total assertions, success/fail/skip counts |
A Claude Code hook that runs Aphoria before certain operations: ### 1.2 RFC Ingester ✅
| Task | Status |
|------|--------|
| `RfcCorpusBuilder` | ✅ `corpus/rfc.rs` |
| HTTP fetching | ✅ Via `ureq`, cached to `~/.cache/aphoria/rfc-cache/` |
| RFC 2119 keyword parsing | ✅ MUST, MUST NOT, SHOULD, SHALL extraction |
| RFC-specific parsers | ✅ JWT (7519), OAuth (6749), Bearer (6750), TLS 1.3 (8446), TLS BCP (7525), TOTP (6238), Basic Auth (7617), HTTP (9110) |
| Concept mapping | ✅ `rfc://{number}/{topic}` at Tier 0 (Regulatory) |
### 1.3 OWASP Ingester ✅
| Task | Status |
|------|--------|
| `OwaspCorpusBuilder` | ✅ `corpus/owasp.rs` |
| HTTP fetching | ✅ From GitHub raw content, cached to `~/.cache/aphoria/owasp-cache/` |
| Markdown parsing | ✅ MUST/SHOULD statements, section context |
| Cheat sheet parsers | ✅ Authentication, JWT, TLS, Secrets, Input Validation, Session, CSRF, Password Storage, HTTP Headers |
| Concept mapping | ✅ `owasp://cheatsheet/{topic}/{claim}` at Tier 1 (Clinical) |
### 1.4 Vendor Docs ✅
| Task | Status |
|------|--------|
| `VendorCorpusBuilder` | ✅ `corpus/vendor.rs` |
| PostgreSQL claims | ✅ pool_size, idle_timeout, ssl_mode |
| Redis claims | ✅ timeout, max_retries, tls |
| reqwest claims | ✅ cert_verification, connect_timeout, request_timeout |
| hyper claims | ✅ keep_alive_timeout, max_concurrent_streams |
| Go net/http claims | ✅ read_timeout, write_timeout, idle_timeout, min_tls_version |
| tokio-postgres claims | ✅ pool_size, ssl_mode |
| SQLx claims | ✅ max_connections, idle_timeout |
| Concept mapping | ✅ `vendor://{product}/{topic}/{claim}` at Tier 2 (Observational) |
### 1.5 Hardcoded Refactor ✅
| Task | Status |
|------|--------|
| `HardcodedCorpusBuilder` | ✅ `corpus/hardcoded.rs` — original 11 assertions |
| `create_authoritative_assertion()` | ✅ Made public in `episteme.rs` for corpus builders |
### 1.6 CLI Integration ✅
| Task | Status |
|------|--------|
| `aphoria corpus build` | ✅ Fetches and ingests from all sources |
| `--only rfc,owasp,vendor` | ✅ Filter to specific sources |
| `--offline` | ✅ Skip network-requiring sources |
| `--clear-cache` | ✅ Clear cache before building |
| `aphoria corpus list` | ✅ List available corpus sources |
| `CorpusConfig` | ✅ cache_dir, include_*, rfc_list options |
### 1.7 Error Handling ✅
| Task | Status |
|------|--------|
| `RfcFetch` error | ✅ Per-RFC fetch failures with context |
| `OwaspFetch` error | ✅ Per-cheat-sheet fetch failures with context |
| `CorpusBuild` error | ✅ General corpus build failures |
| Graceful degradation | ✅ Continue with other sources if one fails |
**Files:** `corpus/mod.rs`, `corpus/hardcoded.rs`, `corpus/rfc.rs`, `corpus/owasp.rs`, `corpus/vendor.rs`
**Tests:** 118 tests pass. Clippy and fmt clean.
---
## Phase 3: Skill Integration ✅
> Complete. Aphoria is now usable in Claude Code agent workflows.
### 3.1 Claude Code Skill ✅
| Task | Status |
|------|--------|
| `skill/SKILL.md` | ✅ Comprehensive skill definition with all commands |
| `/aphoria scan` | ✅ Scan project, show conflicts grouped by verdict |
| `/aphoria scan --fix` | ✅ Interactive fix workflow |
| `/aphoria ack` | ✅ Acknowledge conflicts as intentional |
| `/aphoria status` | ✅ Show status and baseline |
| `/aphoria diff` | ✅ Show changes since baseline |
| `/aphoria init` | ✅ Initialize Aphoria |
| `/aphoria baseline` | ✅ Set baseline |
| `skill/install.sh` | ✅ Install script for `~/.claude/skills/aphoria/` |
**Files:** `skill/SKILL.md`, `skill/install.sh`, `skill/hooks.json`
### 3.2 Agent Pre-Flight Hook ✅
| Task | Status |
|------|--------|
| `--exit-code` flag | ✅ Returns 2 for BLOCK, 1 for FLAG only, 0 for clean |
| `--strict` flag | ✅ Lower thresholds (FLAG at 0.3, BLOCK at 0.5) |
| Hook template | ✅ `skill/hooks.json` with PreCommit and PrePush examples |
**Usage:**
```json ```json
{ {
"hooks": { "hooks": {
"pre-commit": "aphoria scan --format sarif --exit-code", "PreCommit": [{"command": "aphoria scan --format sarif --exit-code"}],
"pre-deploy": "aphoria scan --strict --exit-code" "PrePush": [{"command": "aphoria scan --strict --exit-code"}]
} }
} }
``` ```
`--exit-code` returns non-zero if any BLOCK verdicts exist, preventing the commit or deploy. ### 3.3 Alias Suggestion Workflow ✅
### 3.3 Alias Suggestion Workflow Auto-alias creation is now automatic (Phase 2A.3). When Aphoria scans:
1. Tail-path matching finds authoritative assertions
2. Aliases are auto-created with `AliasOrigin::AutoDetected`
3. Future queries use the alias automatically
When Aphoria scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts: The skill documents the suggestion flow for manual alias management:
- **y (Accept)**: Creates alias
``` - **n (Reject)**: Records intentional difference
New concept detected: code://rust/newproject/auth/jwt/audience_validation - **defer**: Flags for later review
Suggested alias:
→ rfc://7519/jwt/audience_validation (Tier 0, RFC 7519 Section 4.1.3)
Accept? [y/n/defer]
```
Accepting creates the alias. Deferring flags it for later review. Rejecting records that these are intentionally different concepts.
--- ---
## Phase 4: CI Integration ## Phase 4: Pre-Commit Integration ⬜
### 4.1 GitHub Action > Depends on Phase 3 (skill validates the UX before hook automation).
### 4.1 Git Pre-Commit Hook ⬜
A git pre-commit hook that runs Aphoria before every commit:
```bash
#!/bin/sh
# .git/hooks/pre-commit
aphoria scan --exit-code --format table
if [ $? -eq 2 ]; then
echo "BLOCKED: Fix conflicts before committing"
exit 1
fi
```
Or using pre-commit framework (`.pre-commit-config.yaml`):
```yaml ```yaml
- name: Aphoria Scan repos:
uses: orchard9/aphoria-action@v1 - repo: local
with: hooks:
episteme-url: ${{ secrets.EPISTEME_URL }} - id: aphoria
fail-on: block name: Aphoria Security Lint
format: sarif entry: aphoria scan --exit-code
language: system
pass_filenames: false
``` ```
Publishes SARIF results to GitHub Security tab. BLOCK verdicts fail the check. FLAG verdicts appear as warnings. ### 4.2 Baseline Mode ✅
### 4.2 PR Comment Bot Already implemented in Phase 2. For existing projects with many conflicts:
On pull request, Aphoria scans the diff (not the whole project) and comments:
```
## Aphoria Report
This PR introduces 1 new conflict:
| File | Conflict | Score |
|------|----------|-------|
| src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 |
Run `aphoria ack` to acknowledge, or fix before merge.
```
### 4.3 Baseline Mode
For existing projects with many conflicts, `aphoria baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem.
``` ```
$ aphoria baseline $ aphoria baseline
@ -311,9 +286,25 @@ Baseline recorded: 12 existing conflicts frozen.
Future scans will only report new conflicts. Future scans will only report new conflicts.
``` ```
### 4.3 Diff-Only Scanning ⬜
Scan only changed files instead of the whole project:
```bash
# Scan only staged files
aphoria scan --staged
# Scan only files changed since baseline
aphoria scan --since-baseline
```
This makes pre-commit hooks fast even in large projects.
--- ---
## Phase 5: Research Agent Loop ## Phase 5: Research Agent Loop ⬜
> Depends on gap data accumulating from project scans.
### 5.1 Gap Detection ### 5.1 Gap Detection
@ -347,13 +338,21 @@ Users who run Aphoria can opt in to contribute their alias mappings and acknowle
## Milestone Summary ## Milestone Summary
| Phase | Deliverable | Depends On | | Phase | Deliverable | Depends On | Status |
|-------|-------------|------------| |-------|-------------|------------|--------|
| 0 | ConceptPath in StemeDB | concept-hierarchy spec | | 0 | ConceptPath in StemeDB | concept-hierarchy spec | ✅ |
| 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 | | 2 | Aphoria CLI (scan, report, ack) | Phase 0 | ✅ |
| 2 | Aphoria CLI (scan, report, ack) | Phase 0, Phase 1 | | 2A.1 | Leaf-based concept matching | Phase 2 | ✅ |
| 3 | Claude Code skill + hooks | Phase 2 | | 2A.2 | Alias resolution in QueryEngine | Phase 2 | ✅ |
| 4 | CI integration (GitHub Action, PR bot) | Phase 2 | | 2A.3 | Auto-alias creation | Phase 2A.2 | ✅ |
| 5 | Research agent loop | Phase 2, Phase 4 (gap data) | | 1 | Authoritative corpus expansion | Phase 0 | ✅ |
| 3 | Claude Code skill + hooks | Phase 2A | ✅ |
| **4** | **Pre-commit integration (git hooks, diff scanning)** | **Phase 3** | **⬜ NEXT** |
| 5 | Research agent loop | Phase 4 (gap data) | ⬜ |
Phase 0 and Phase 1 can run in parallel — the corpus ingestion uses the ConceptPath types as they're built. Phase 2 is the critical path. Everything after Phase 2 is distribution and flywheel. **Current state:**
- Phase 1 is complete: RFC, OWASP, and Vendor corpus builders with `aphoria corpus build` CLI
- Phase 2A is complete: conflict detection via tail-path matching, alias-aware QueryEngine, and auto-alias creation
- Phase 3 is complete: `/aphoria` skill installed to `~/.claude/skills/aphoria/`, hook templates ready
**Next:** Phase 4 — Pre-commit integration (git hooks, diff-only scanning).

View File

@ -0,0 +1,302 @@
---
name: aphoria
description: Code-level truth linter powered by Episteme. Scans codebase for conflicts with authoritative sources (RFCs, OWASP). Use when checking security configurations, validating code against specs, or auditing for compliance issues.
---
# Aphoria
## Identity
You are a security-minded code reviewer who checks configuration decisions against authoritative sources. You find where code *does* something that contradicts what specs *say* should be done. You present conflicts clearly with actionable guidance.
## Commands
| Command | Usage | Description |
|---------|-------|-------------|
| `/aphoria` | `/aphoria` | Scan current project, show conflicts |
| `/aphoria scan` | `/aphoria scan` | Scan current project, show conflicts |
| `/aphoria scan --fix` | `/aphoria scan --fix` | Scan and interactively offer to fix each conflict |
| `/aphoria ack` | `/aphoria ack <concept_path> <reason>` | Acknowledge a conflict as intentional |
| `/aphoria status` | `/aphoria status` | Show current conflict summary and baseline |
| `/aphoria diff` | `/aphoria diff` | Show new conflicts since last baseline |
| `/aphoria init` | `/aphoria init` | Initialize Aphoria in current project |
| `/aphoria baseline` | `/aphoria baseline` | Set current scan as baseline |
## Protocol
### On `/aphoria` or `/aphoria scan`
1. **Run the scan:**
```bash
aphoria scan --format json 2>/dev/null
```
2. **Parse the JSON output** and extract:
- `files_scanned`: Number of files analyzed
- `claims_extracted`: Number of code claims found
- `conflicts`: Array of conflict results
3. **Present conflicts grouped by verdict:**
- **BLOCK** (red): Must fix before deploy
- **FLAG** (yellow): Should review
- **PASS** (green): No action needed
4. **For each conflict, show:**
- File and line number
- What the code does (extracted claim)
- What the spec says (authoritative source)
- Conflict score and verdict
- Suggested fix or acknowledgment path
### On `/aphoria scan --fix`
1. Run scan as above
2. For each BLOCK conflict:
- Show the conflict details
- Ask: "Fix this? [y/n/skip/ack]"
- If **y**: Provide a code fix suggestion and apply if confirmed
- If **n**: Continue to next
- If **skip**: Skip remaining, show summary
- If **ack**: Run `/aphoria ack <path> <reason>` with user's reason
### On `/aphoria ack <concept_path> <reason>`
1. **Run acknowledgment:**
```bash
aphoria ack "<concept_path>" "<reason>"
```
2. **Confirm success** and note the conflict will now appear as ACK in future scans
### On `/aphoria status`
1. **Run status check:**
```bash
aphoria status
```
2. **Present:**
- Data directory location
- Project root
- Whether baseline exists
- Agent key status
### On `/aphoria diff`
1. **Run diff:**
```bash
aphoria diff
```
2. **Present:**
- New conflicts since baseline
- Resolved conflicts since baseline
- Net change summary
## Output Format
### Scan Results
```markdown
## Aphoria Scan Results
**Project:** {project_name}
**Files scanned:** {files_scanned}
**Claims extracted:** {claims_extracted}
---
### BLOCK ({count})
These conflicts prevent deployment and require immediate attention.
#### 1. TLS Certificate Verification Disabled
**File:** `src/client.rs:42`
**Code says:** `danger_accept_invalid_certs(true)` (verification disabled)
**RFC 5246 says:** TLS certificate verification MUST be enabled
**Conflict score:** 0.95 (high confidence)
**Options:**
1. **Fix:** Remove or set to `false`:
```rust
.danger_accept_invalid_certs(false)
```
2. **Acknowledge:** If this is intentional (e.g., internal testing):
```
/aphoria ack "code://rust/myapp/tls/cert_verification" "Internal test environment only"
```
---
### FLAG ({count})
These should be reviewed but don't block deployment.
#### 1. Rate Limiting Not Detected
**File:** `src/api/mod.rs`
**Code says:** No rate limiting configuration found
**OWASP says:** Rate limiting SHOULD be enabled for API endpoints
**Conflict score:** 0.55 (medium confidence)
---
### Summary
| Verdict | Count |
|---------|-------|
| BLOCK | {block_count} |
| FLAG | {flag_count} |
| PASS | {pass_count} |
**Exit code:** {0 if no blocks, 1 if blocks exist}
```
## Conflict Categories
Aphoria detects conflicts in these domains:
| Domain | What It Checks | Authoritative Sources |
|--------|---------------|----------------------|
| **TLS** | Certificate verification, protocol versions | RFC 5246, RFC 8446, OWASP |
| **JWT** | Audience validation, signature verification, algorithm restrictions | RFC 7519, OWASP |
| **Secrets** | Hardcoded API keys, passwords, tokens | OWASP Secrets Management |
| **CORS** | Wildcard origins, credentials with wildcard | OWASP, Security Best Practices |
| **Timeouts** | Unreasonably high/low connection timeouts | Vendor recommendations |
| **Rate Limiting** | Missing or unreasonable rate limits | OWASP API Security |
## Fix Suggestions
When offering fixes, prioritize:
1. **Direct fix**: Change the code to comply with the spec
2. **Configuration**: Move the decision to environment/config
3. **Acknowledgment**: Document why the deviation is intentional
### Example Fix Patterns
**TLS verification disabled:**
```rust
// BEFORE
.danger_accept_invalid_certs(true)
// AFTER (if testing)
.danger_accept_invalid_certs(cfg!(test))
// AFTER (if must disable, make explicit)
// SECURITY: TLS verification disabled for internal service mesh
// Tracked in: JIRA-1234
.danger_accept_invalid_certs(std::env::var("DISABLE_TLS_VERIFY").is_ok())
```
**JWT audience not validated:**
```rust
// BEFORE
validation.validate_aud = false;
// AFTER
validation.validate_aud = true;
validation.set_audience(&["https://api.myservice.com"]);
```
**Hardcoded secrets:**
```rust
// BEFORE
let api_key = "sk-1234567890abcdef";
// AFTER
let api_key = std::env::var("API_KEY")
.expect("API_KEY must be set");
```
## Integration with Hooks
Aphoria can run as a pre-commit hook:
```json
// .claude/settings.json
{
"hooks": {
"PreCommit": [
{
"command": "aphoria scan --format sarif --exit-code",
"when": "always"
}
]
}
}
```
The `--exit-code` flag returns non-zero if any BLOCK verdicts exist.
## Do
1. Run the actual `aphoria` CLI - don't simulate results
2. Present conflicts with clear file:line references
3. Explain why each conflict matters (cite the spec)
4. Offer concrete fixes, not vague suggestions
5. Support acknowledgment for intentional deviations
6. Group results by severity for quick triage
## Do Not
1. Guess at scan results - always run the CLI
2. Show all details for every conflict (summarize, expand on request)
3. Recommend disabling security features without strong justification
4. Skip the authoritative source citation
5. Mark something as BLOCK that's really a FLAG
## Constraints
- ALWAYS run `aphoria` CLI commands, don't fabricate results
- ALWAYS cite the authoritative source for each conflict
- ALWAYS offer acknowledgment as an option for intentional deviations
- NEVER recommend `unwrap()` or `expect()` in suggested fixes
- NEVER suggest disabling security without explicit user confirmation
- WHEN offering fixes, provide production-quality code
## Error Handling
**If aphoria is not installed:**
```
Aphoria CLI not found. Install with:
cargo install --path /path/to/stemedb/applications/aphoria
Or build from source:
cd /path/to/stemedb/applications/aphoria
cargo build --release
```
**If scan fails:**
```
Scan failed: {error_message}
Common issues:
- Not in a project directory (no Cargo.toml, package.json, etc.)
- Insufficient permissions to read files
- Episteme data directory not writable
```
## Alias Suggestions (Phase 3.3)
When Aphoria detects a new concept path that matches an authoritative path by leaf name:
```markdown
**New concept detected:** `code://rust/newproject/auth/jwt/audience_validation`
**Suggested alias:**
-> `rfc://7519/jwt/audience_validation` (Tier 0, RFC 7519 Section 4.1.3)
This will link your code path to the authoritative RFC path, enabling:
- Faster conflict detection in future scans
- Automatic alias resolution in queries
**Accept?** [y/n/defer]
```
- **y (Accept)**: Creates the alias, bridges code to spec
- **n (Reject)**: Records that these are intentionally different concepts
- **defer**: Flags for later review (appears in `/aphoria status`)

View File

@ -0,0 +1,25 @@
{
"$schema": "https://raw.githubusercontent.com/anthropics/claude-code/main/schemas/hooks.json",
"description": "Aphoria hook configurations for Claude Code",
"hooks": {
"PreCommit": [
{
"command": "aphoria scan --format sarif --exit-code",
"description": "Check for conflicts with authoritative sources before commit",
"when": "always"
}
],
"PrePush": [
{
"command": "aphoria scan --strict --exit-code",
"description": "Strict conflict check before pushing to remote",
"when": "always"
}
]
},
"notes": {
"PreCommit": "Runs on every commit. Exit code 2 = BLOCK conflicts found, 1 = FLAG only",
"PrePush": "Stricter thresholds (FLAG at 0.3, BLOCK at 0.5) for remote pushes",
"installation": "Copy this to .claude/settings.json in your project or ~/.claude/settings.json for global"
}
}

View File

@ -0,0 +1,77 @@
#!/bin/bash
# Aphoria Claude Code Skill Installer
#
# This script installs the Aphoria skill to ~/.claude/skills/aphoria/
# making /aphoria commands available in Claude Code sessions.
#
# Usage:
# ./install.sh # Install skill only
# ./install.sh --build # Build aphoria binary first, then install skill
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APHORIA_DIR="$(dirname "$SCRIPT_DIR")"
SKILL_DEST="$HOME/.claude/skills/aphoria"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo "Aphoria Skill Installer"
echo "======================="
echo ""
# Build aphoria if requested
if [[ "$1" == "--build" ]]; then
echo -e "${YELLOW}Building aphoria binary...${NC}"
cd "$APHORIA_DIR"
cargo build --release
# Copy binary to cargo bin (optional, makes `aphoria` available globally)
if [[ -d "$HOME/.cargo/bin" ]]; then
cp "$APHORIA_DIR/target/release/aphoria" "$HOME/.cargo/bin/"
echo -e "${GREEN}Installed aphoria binary to ~/.cargo/bin/aphoria${NC}"
fi
echo ""
fi
# Check if aphoria binary exists
if ! command -v aphoria &> /dev/null; then
if [[ -f "$APHORIA_DIR/target/release/aphoria" ]]; then
echo -e "${YELLOW}Note: aphoria binary found at $APHORIA_DIR/target/release/aphoria${NC}"
echo "Consider adding to PATH or running with --build flag."
else
echo -e "${YELLOW}Warning: aphoria binary not found.${NC}"
echo "The skill will be installed, but you'll need to build aphoria first:"
echo " cd $APHORIA_DIR && cargo build --release"
echo ""
fi
fi
# Create skill directory
echo "Installing skill to $SKILL_DEST..."
mkdir -p "$SKILL_DEST"
# Copy skill files
cp "$SCRIPT_DIR/SKILL.md" "$SKILL_DEST/SKILL.md"
# Verify installation
if [[ -f "$SKILL_DEST/SKILL.md" ]]; then
echo -e "${GREEN}Skill installed successfully!${NC}"
echo ""
echo "Available commands:"
echo " /aphoria - Scan current project"
echo " /aphoria scan - Scan current project"
echo " /aphoria scan --fix - Scan and offer fixes"
echo " /aphoria ack - Acknowledge a conflict"
echo " /aphoria status - Show status"
echo " /aphoria diff - Show changes since baseline"
echo ""
echo "To use in Claude Code, just type /aphoria in a project directory."
else
echo -e "${RED}Installation failed!${NC}"
exit 1
fi

View File

@ -0,0 +1,187 @@
//! Bridge between ExtractedClaim and Episteme Assertion.
//!
//! Converts claims extracted from source code into Episteme assertions
//! that can be ingested into the knowledge graph.
use blake3::Hasher;
use ed25519_dalek::{Signer, SigningKey};
use stemedb_core::types::{
Assertion, Hash, HlcTimestamp, LifecycleStage, SignatureEntry, SourceClass,
};
use tracing::instrument;
use crate::types::ExtractedClaim;
/// Convert an ExtractedClaim to an Episteme Assertion.
///
/// The assertion is signed with the provided keypair and timestamped.
#[instrument(skip(signing_key), fields(concept_path = %claim.concept_path, predicate = %claim.predicate))]
pub fn claim_to_assertion(
claim: &ExtractedClaim,
signing_key: &SigningKey,
timestamp: u64,
) -> Assertion {
// Build source metadata
let source_metadata = serde_json::json!({
"file": claim.file,
"line": claim.line,
"matched_text": claim.matched_text,
"scan_tool": "aphoria",
"scan_version": env!("CARGO_PKG_VERSION"),
});
// Compute source hash from file:line:matched_text
let source_hash = compute_source_hash(&claim.file, claim.line, &claim.matched_text);
// Create signature (version 1: signs subject:predicate)
let message = format!("{}:{}", claim.concept_path, claim.predicate);
let signature = signing_key.sign(message.as_bytes());
let verifying_key = signing_key.verifying_key();
let signature_entry = SignatureEntry {
agent_id: verifying_key.to_bytes(),
signature: signature.to_bytes(),
timestamp,
version: 1,
};
Assertion {
subject: claim.concept_path.clone(),
predicate: claim.predicate.clone(),
object: claim.value.clone(),
parent_hash: None,
source_hash,
source_class: SourceClass::Expert, // code:// scheme = Expert (Tier 3)
visual_hash: None,
epoch: None,
source_metadata: serde_json::to_vec(&source_metadata).ok(),
lifecycle: LifecycleStage::Approved,
signatures: vec![signature_entry],
confidence: claim.confidence,
timestamp,
hlc_timestamp: HlcTimestamp::default(),
vector: None,
}
}
/// Compute the content hash of an assertion for deduplication.
#[allow(dead_code)]
pub fn compute_assertion_hash(assertion: &Assertion) -> Hash {
let mut hasher = Hasher::new();
hasher.update(assertion.subject.as_bytes());
hasher.update(assertion.predicate.as_bytes());
hasher.update(format!("{:?}", assertion.object).as_bytes());
hasher.update(&assertion.source_hash);
hasher.update(&[assertion.source_class.tier()]);
*hasher.finalize().as_bytes()
}
/// Compute the source hash from file location and matched text.
fn compute_source_hash(file: &str, line: usize, matched_text: &str) -> Hash {
let mut hasher = Hasher::new();
hasher.update(file.as_bytes());
hasher.update(&line.to_le_bytes());
hasher.update(matched_text.as_bytes());
*hasher.finalize().as_bytes()
}
/// Generate a new Ed25519 signing key for an Aphoria agent.
pub fn generate_signing_key() -> SigningKey {
use rand::rngs::OsRng;
SigningKey::generate(&mut OsRng)
}
/// Load or generate the project's signing key.
///
/// The key is stored at `.aphoria/agent.key` in the project root.
pub fn load_or_generate_key(project_root: &std::path::Path) -> std::io::Result<SigningKey> {
let aphoria_dir = project_root.join(".aphoria");
let key_path = aphoria_dir.join("agent.key");
if key_path.exists() {
let key_bytes = std::fs::read(&key_path)?;
if key_bytes.len() == 32 {
let mut arr = [0u8; 32];
arr.copy_from_slice(&key_bytes);
Ok(SigningKey::from_bytes(&arr))
} else {
// Invalid key file, regenerate
let key = generate_signing_key();
std::fs::create_dir_all(&aphoria_dir)?;
std::fs::write(&key_path, key.to_bytes())?;
Ok(key)
}
} else {
// Generate new key
let key = generate_signing_key();
std::fs::create_dir_all(&aphoria_dir)?;
std::fs::write(&key_path, key.to_bytes())?;
Ok(key)
}
}
#[cfg(test)]
mod tests {
use super::*;
use stemedb_core::types::ObjectValue;
#[test]
fn test_claim_to_assertion() {
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 42,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
};
let key = generate_signing_key();
let timestamp = 1706832000;
let assertion = claim_to_assertion(&claim, &key, timestamp);
assert_eq!(assertion.subject, claim.concept_path);
assert_eq!(assertion.predicate, "enabled");
assert_eq!(assertion.object, ObjectValue::Boolean(false));
assert_eq!(assertion.source_class, SourceClass::Expert);
assert_eq!(assertion.confidence, 1.0);
assert_eq!(assertion.timestamp, timestamp);
assert!(!assertion.signatures.is_empty());
}
#[test]
fn test_assertion_hash_deterministic() {
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 42,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
};
let key = generate_signing_key();
let assertion1 = claim_to_assertion(&claim, &key, 1000);
let assertion2 = claim_to_assertion(&claim, &key, 1000);
let hash1 = compute_assertion_hash(&assertion1);
let hash2 = compute_assertion_hash(&assertion2);
assert_eq!(hash1, hash2);
}
#[test]
fn test_load_or_generate_key() {
let temp_dir = tempfile::tempdir().expect("create temp dir");
let key1 = load_or_generate_key(temp_dir.path()).expect("generate key");
let key2 = load_or_generate_key(temp_dir.path()).expect("load key");
// Same key should be loaded
assert_eq!(key1.to_bytes(), key2.to_bytes());
}
}

View File

@ -29,6 +29,9 @@ pub struct AphoriaConfig {
/// Alias suggestion settings. /// Alias suggestion settings.
pub aliases: AliasConfig, pub aliases: AliasConfig,
/// Corpus builder settings.
pub corpus: CorpusConfig,
} }
impl AphoriaConfig { impl AphoriaConfig {
@ -194,11 +197,54 @@ pub struct AliasConfig {
/// Whether to auto-accept aliases to Tier 0 sources. /// Whether to auto-accept aliases to Tier 0 sources.
pub auto_accept_tier0: bool, pub auto_accept_tier0: bool,
/// Whether to automatically create aliases when conflicts are detected.
///
/// When enabled, tail-path matching during conflict detection will
/// persist aliases (e.g., `code://rust/tls/cert_verification` →
/// `rfc://5246/tls/cert_verification`) for faster future queries.
pub auto_create_aliases: bool,
} }
impl Default for AliasConfig { impl Default for AliasConfig {
fn default() -> Self { fn default() -> Self {
Self { auto_suggest: true, auto_accept_tier0: true } Self { auto_suggest: true, auto_accept_tier0: true, auto_create_aliases: true }
}
}
/// Corpus builder configuration.
#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct CorpusConfig {
/// Directory for caching downloaded RFCs and OWASP cheat sheets.
pub cache_dir: PathBuf,
/// Whether to include the hardcoded corpus (built-in assertions).
pub include_hardcoded: bool,
/// Whether to include RFC normative statements.
pub include_rfc: bool,
/// Whether to include OWASP cheat sheet recommendations.
pub include_owasp: bool,
/// Whether to include vendor documentation claims.
pub include_vendor: bool,
/// Override the default RFC list (if None, uses default list).
pub rfc_list: Option<Vec<u32>>,
}
impl Default for CorpusConfig {
fn default() -> Self {
Self {
cache_dir: dirs_default_cache_dir(),
include_hardcoded: true,
include_rfc: true,
include_owasp: true,
include_vendor: true,
rfc_list: None,
}
} }
} }
@ -220,6 +266,17 @@ fn dirs_default_advisory_db() -> PathBuf {
} }
} }
/// Get the default cache directory for corpus downloads.
fn dirs_default_cache_dir() -> PathBuf {
if let Some(cache) = dirs::cache_dir() {
cache.join("aphoria")
} else if let Some(home) = dirs::home_dir() {
home.join(".cache").join("aphoria")
} else {
PathBuf::from(".aphoria/cache")
}
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;

View File

@ -0,0 +1,260 @@
//! Hardcoded authoritative corpus for common security patterns.
//!
//! This builder provides the built-in assertions that Aphoria ships with,
//! covering essential security requirements from RFCs and OWASP guidance.
//! These assertions are always available and don't require network access.
use ed25519_dalek::SigningKey;
use stemedb_core::types::{Assertion, ObjectValue, SourceClass};
use tracing::instrument;
use super::CorpusBuilder;
use crate::config::CorpusConfig;
use crate::episteme::create_authoritative_assertion;
use crate::AphoriaError;
/// Builder for the hardcoded authoritative corpus.
///
/// Contains 11+ built-in assertions covering:
/// - TLS certificate verification (RFC 5246)
/// - JWT validation (RFC 7519)
/// - Secrets management (OWASP)
/// - CORS security (OWASP)
/// - Rate limiting (OWASP)
pub struct HardcodedCorpusBuilder;
impl HardcodedCorpusBuilder {
/// Create a new hardcoded corpus builder.
pub fn new() -> Self {
Self
}
}
impl Default for HardcodedCorpusBuilder {
fn default() -> Self {
Self::new()
}
}
impl CorpusBuilder for HardcodedCorpusBuilder {
fn name(&self) -> &str {
"Hardcoded"
}
fn scheme(&self) -> &str {
"rfc,owasp"
}
fn default_tier(&self) -> u8 {
0 // Mix of Tier 0 (Regulatory) and Tier 1 (Clinical)
}
fn requires_network(&self) -> bool {
false
}
fn source_ids(&self) -> Vec<String> {
vec![
"rfc://5246".to_string(),
"rfc://7519".to_string(),
"owasp://transport_layer".to_string(),
"owasp://secrets".to_string(),
"owasp://cors".to_string(),
"owasp://rate_limit".to_string(),
]
}
#[instrument(skip(self, signing_key, _config), fields(builder = "Hardcoded"))]
fn build(
&self,
signing_key: &SigningKey,
timestamp: u64,
_config: &CorpusConfig,
) -> Result<Vec<Assertion>, AphoriaError> {
Ok(build_hardcoded_corpus(signing_key, timestamp))
}
}
/// Build the hardcoded authoritative corpus.
///
/// This is the same corpus that was previously in `create_authoritative_corpus()`,
/// now encapsulated in a CorpusBuilder for consistency.
#[allow(clippy::vec_init_then_push)]
fn build_hardcoded_corpus(signing_key: &SigningKey, timestamp: u64) -> Vec<Assertion> {
let mut assertions = Vec::new();
// TLS verification requirements
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://5246/tls/cert_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"TLS certificate verification MUST be enabled (RFC 5246)",
timestamp,
));
// OWASP TLS guidance
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://transport_layer/tls/cert_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Clinical, // Tier 1
"OWASP: Always verify TLS certificates",
timestamp,
));
// JWT audience validation (RFC 7519)
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/audience_validation",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT audience claim MUST be validated (RFC 7519 Section 4.1.3)",
timestamp,
));
// JWT expiry validation
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/expiry_validation",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT expiry claim MUST be validated (RFC 7519 Section 4.1.4)",
timestamp,
));
// JWT signature verification
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/signature_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT signatures MUST be verified (RFC 7519)",
timestamp,
));
// JWT algorithm restriction
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/algorithm_restriction",
"config_value",
ObjectValue::Text("explicit_list".to_string()),
SourceClass::Regulatory,
"JWT algorithm MUST be explicitly specified, 'none' algorithm forbidden",
timestamp,
));
// OWASP secrets management
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://secrets/api_key",
"storage_method",
ObjectValue::Text("environment_or_vault".to_string()),
SourceClass::Clinical,
"OWASP: Never hardcode API keys in source code",
timestamp,
));
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://secrets/password",
"storage_method",
ObjectValue::Text("environment_or_vault".to_string()),
SourceClass::Clinical,
"OWASP: Never hardcode passwords in source code",
timestamp,
));
// CORS security
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://cors/allow_origin",
"config_value",
ObjectValue::Text("explicit_list".to_string()),
SourceClass::Clinical,
"OWASP: Never use wildcard (*) for CORS Allow-Origin in production",
timestamp,
));
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://cors/credentials_with_wildcard",
"enabled",
ObjectValue::Boolean(false),
SourceClass::Regulatory,
"CORS credentials MUST NOT be allowed with wildcard origin (security vulnerability)",
timestamp,
));
// Rate limiting
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://rate_limit/enabled",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Clinical,
"OWASP: Rate limiting SHOULD be enabled for API endpoints",
timestamp,
));
assertions
}
#[cfg(test)]
mod tests {
use super::*;
use crate::bridge::generate_signing_key;
#[test]
fn test_hardcoded_builder_builds() {
let builder = HardcodedCorpusBuilder::new();
let key = generate_signing_key();
let config = CorpusConfig::default();
let assertions = builder.build(&key, 1706832000, &config).expect("build");
assert_eq!(assertions.len(), 11);
}
#[test]
fn test_hardcoded_builder_no_network() {
let builder = HardcodedCorpusBuilder::new();
assert!(!builder.requires_network());
}
#[test]
fn test_hardcoded_assertions_content() {
let key = generate_signing_key();
let assertions = build_hardcoded_corpus(&key, 1706832000);
// Check TLS assertion
let tls_assertion = assertions.iter().find(|a| a.subject.contains("tls/cert_verification"));
assert!(tls_assertion.is_some());
let tls = tls_assertion.expect("tls assertion");
assert_eq!(tls.predicate, "enabled");
assert_eq!(tls.object, ObjectValue::Boolean(true));
// Check JWT assertion
let jwt_assertion =
assertions.iter().find(|a| a.subject.contains("jwt/audience_validation"));
assert!(jwt_assertion.is_some());
let jwt = jwt_assertion.expect("jwt assertion");
assert_eq!(jwt.predicate, "enabled");
assert_eq!(jwt.source_class, SourceClass::Regulatory);
}
#[test]
fn test_hardcoded_source_ids() {
let builder = HardcodedCorpusBuilder::new();
let ids = builder.source_ids();
assert!(ids.iter().any(|id| id.contains("rfc://5246")));
assert!(ids.iter().any(|id| id.contains("rfc://7519")));
assert!(ids.iter().any(|id| id.contains("owasp://")));
}
}

View File

@ -0,0 +1,370 @@
//! Authoritative corpus management for Aphoria.
//!
//! This module provides a unified interface for building and managing the authoritative
//! corpus that Aphoria uses to detect conflicts. The corpus consists of assertions from
//! multiple sources:
//!
//! - **Hardcoded** (Tier 0-1): Built-in RFC/OWASP assertions for common security patterns
//! - **RFC** (Tier 0): Normative statements from IETF RFCs
//! - **OWASP** (Tier 1): Recommendations from OWASP Cheat Sheets
//! - **Vendor** (Tier 2): Best practices from vendor documentation
//!
//! # Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────────┐
//! │ aphoria corpus build │
//! │ │
//! │ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
//! │ │ RFC Ingester │ │ OWASP │ │ Vendor Bootstrapper │ │
//! │ │ (Tier 0) │ │ Ingester │ │ (Tier 2) │ │
//! │ │ │ │ (Tier 1) │ │ │ │
//! │ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │
//! │ │ │ │ │
//! │ └─────────────────┼──────────────────────┘ │
//! │ ▼ │
//! │ ┌─────────────────┐ │
//! │ │ CorpusRegistry │ │
//! │ └────────┬────────┘ │
//! │ ▼ │
//! │ ┌─────────────────┐ │
//! │ │ Vec<Assertion> │ │
//! │ └─────────────────┘ │
//! └─────────────────────────────────────────────────────────────────┘
//! ```
mod hardcoded;
mod owasp;
mod rfc;
mod vendor;
pub use hardcoded::HardcodedCorpusBuilder;
pub use owasp::OwaspCorpusBuilder;
pub use rfc::RfcCorpusBuilder;
pub use vendor::VendorCorpusBuilder;
use ed25519_dalek::SigningKey;
use stemedb_core::types::Assertion;
use tracing::{info, instrument};
use crate::config::CorpusConfig;
use crate::AphoriaError;
/// A builder that produces authoritative assertions from a specific source.
///
/// Each corpus builder is responsible for:
/// 1. Fetching or loading source material (RFCs, OWASP docs, vendor docs)
/// 2. Parsing relevant claims from that material
/// 3. Converting claims to signed Episteme assertions
pub trait CorpusBuilder: Send + Sync {
/// Human-readable name for this corpus source.
fn name(&self) -> &str;
/// URI scheme used by this corpus (e.g., "rfc", "owasp", "vendor").
fn scheme(&self) -> &str;
/// Default source tier for assertions from this corpus.
///
/// - Tier 0: Regulatory (RFCs with MUST/SHALL)
/// - Tier 1: Clinical (OWASP, security best practices)
/// - Tier 2: Observational (Vendor documentation)
fn default_tier(&self) -> u8;
/// Build assertions from this corpus source.
///
/// # Arguments
///
/// * `signing_key` - Ed25519 key for signing assertions
/// * `timestamp` - Unix timestamp for assertion creation
/// * `config` - Corpus configuration (cache paths, options)
///
/// # Returns
///
/// A vector of signed assertions, or an error if building fails.
fn build(
&self,
signing_key: &SigningKey,
timestamp: u64,
config: &CorpusConfig,
) -> Result<Vec<Assertion>, AphoriaError>;
/// Whether this builder requires network access.
fn requires_network(&self) -> bool {
false
}
/// List of source identifiers this builder will fetch.
///
/// For RFC builder, this might be `["7519", "6749", "8446"]`.
/// For OWASP builder, this might be `["Authentication", "JWT", "TLS"]`.
fn source_ids(&self) -> Vec<String> {
vec![]
}
}
/// Registry for managing multiple corpus builders.
///
/// The registry handles:
/// - Builder registration and lookup
/// - Coordinated corpus building across all sources
/// - Filtering by source type (--only flag)
pub struct CorpusRegistry {
builders: Vec<Box<dyn CorpusBuilder>>,
}
impl CorpusRegistry {
/// Create a new empty registry.
pub fn new() -> Self {
Self { builders: Vec::new() }
}
/// Create a registry with default builders.
pub fn with_defaults(config: &CorpusConfig) -> Self {
let mut registry = Self::new();
if config.include_hardcoded {
registry.register(Box::new(HardcodedCorpusBuilder::new()));
}
if config.include_rfc {
registry.register(Box::new(RfcCorpusBuilder::new(&config.rfc_list)));
}
if config.include_owasp {
registry.register(Box::new(OwaspCorpusBuilder::new()));
}
if config.include_vendor {
registry.register(Box::new(VendorCorpusBuilder::new()));
}
registry
}
/// Register a corpus builder.
pub fn register(&mut self, builder: Box<dyn CorpusBuilder>) {
self.builders.push(builder);
}
/// Get registered builder names.
pub fn builder_names(&self) -> Vec<&str> {
self.builders.iter().map(|b| b.name()).collect()
}
/// Get builder info for listing.
pub fn list_builders(&self) -> Vec<CorpusBuilderInfo> {
self.builders
.iter()
.map(|b| CorpusBuilderInfo {
name: b.name().to_string(),
scheme: b.scheme().to_string(),
tier: b.default_tier(),
requires_network: b.requires_network(),
source_ids: b.source_ids(),
})
.collect()
}
/// Build assertions from all registered corpus sources.
///
/// # Arguments
///
/// * `signing_key` - Ed25519 key for signing assertions
/// * `timestamp` - Unix timestamp for assertion creation
/// * `config` - Corpus configuration
/// * `offline` - If true, skip builders that require network access
///
/// # Returns
///
/// A combined vector of assertions from all sources, along with build statistics.
#[instrument(skip(self, signing_key, config), fields(builders = self.builders.len()))]
pub fn build_all(
&self,
signing_key: &SigningKey,
timestamp: u64,
config: &CorpusConfig,
offline: bool,
) -> Result<CorpusBuildResult, AphoriaError> {
let mut all_assertions = Vec::new();
let mut stats = Vec::new();
for builder in &self.builders {
// Skip network-requiring builders in offline mode
if offline && builder.requires_network() {
info!(builder = builder.name(), "Skipping (offline mode)");
stats.push(CorpusBuilderStats {
name: builder.name().to_string(),
scheme: builder.scheme().to_string(),
assertions_built: 0,
skipped: true,
error: None,
});
continue;
}
info!(builder = builder.name(), scheme = builder.scheme(), "Building corpus");
match builder.build(signing_key, timestamp, config) {
Ok(assertions) => {
let count = assertions.len();
info!(builder = builder.name(), assertions = count, "Corpus built");
stats.push(CorpusBuilderStats {
name: builder.name().to_string(),
scheme: builder.scheme().to_string(),
assertions_built: count,
skipped: false,
error: None,
});
all_assertions.extend(assertions);
}
Err(e) => {
tracing::warn!(builder = builder.name(), error = %e, "Corpus build failed");
stats.push(CorpusBuilderStats {
name: builder.name().to_string(),
scheme: builder.scheme().to_string(),
assertions_built: 0,
skipped: false,
error: Some(e.to_string()),
});
// Continue with other builders even if one fails
}
}
}
Ok(CorpusBuildResult { assertions: all_assertions, stats })
}
}
impl Default for CorpusRegistry {
fn default() -> Self {
Self::new()
}
}
/// Information about a corpus builder.
#[derive(Debug, Clone)]
pub struct CorpusBuilderInfo {
/// Human-readable name.
pub name: String,
/// URI scheme.
pub scheme: String,
/// Default source tier.
pub tier: u8,
/// Whether network access is required.
pub requires_network: bool,
/// Source identifiers (RFC numbers, cheat sheet names, etc.).
pub source_ids: Vec<String>,
}
/// Statistics for a single corpus builder.
#[derive(Debug, Clone)]
pub struct CorpusBuilderStats {
/// Builder name.
pub name: String,
/// URI scheme.
pub scheme: String,
/// Number of assertions built.
pub assertions_built: usize,
/// Whether the builder was skipped (e.g., offline mode).
pub skipped: bool,
/// Error message if build failed.
pub error: Option<String>,
}
/// Result of building the full corpus.
#[derive(Debug)]
pub struct CorpusBuildResult {
/// All assertions from all builders.
pub assertions: Vec<Assertion>,
/// Per-builder statistics.
pub stats: Vec<CorpusBuilderStats>,
}
impl CorpusBuildResult {
/// Get total assertion count.
pub fn total_assertions(&self) -> usize {
self.assertions.len()
}
/// Get count of successful builders.
pub fn successful_builders(&self) -> usize {
self.stats.iter().filter(|s| s.error.is_none() && !s.skipped).count()
}
/// Get count of failed builders.
pub fn failed_builders(&self) -> usize {
self.stats.iter().filter(|s| s.error.is_some()).count()
}
/// Get count of skipped builders.
pub fn skipped_builders(&self) -> usize {
self.stats.iter().filter(|s| s.skipped).count()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::bridge::generate_signing_key;
#[test]
fn test_registry_default_empty() {
let registry = CorpusRegistry::new();
assert!(registry.builder_names().is_empty());
}
#[test]
fn test_registry_with_defaults() {
let config = CorpusConfig::default();
let registry = CorpusRegistry::with_defaults(&config);
// Should have all four default builders
let names = registry.builder_names();
assert!(names.contains(&"Hardcoded"));
assert!(names.contains(&"RFC"));
assert!(names.contains(&"OWASP"));
assert!(names.contains(&"Vendor"));
}
#[test]
fn test_registry_selective_builders() {
let config =
CorpusConfig { include_rfc: false, include_owasp: false, ..Default::default() };
let registry = CorpusRegistry::with_defaults(&config);
let names = registry.builder_names();
assert!(names.contains(&"Hardcoded"));
assert!(names.contains(&"Vendor"));
assert!(!names.contains(&"RFC"));
assert!(!names.contains(&"OWASP"));
}
#[test]
fn test_build_all_offline() {
let config = CorpusConfig::default();
let registry = CorpusRegistry::with_defaults(&config);
let key = generate_signing_key();
let timestamp = 1706832000;
let result = registry.build_all(&key, timestamp, &config, true).expect("build_all");
// In offline mode, network-requiring builders should be skipped
// but hardcoded and vendor should still work
assert!(result.total_assertions() > 0);
// In offline mode some builders may be skipped - this is expected behavior
}
#[test]
fn test_corpus_builder_info() {
let config = CorpusConfig::default();
let registry = CorpusRegistry::with_defaults(&config);
let infos = registry.list_builders();
for info in &infos {
assert!(!info.name.is_empty());
assert!(!info.scheme.is_empty());
assert!(info.tier <= 3);
}
}
}

View File

@ -0,0 +1,231 @@
//! OWASP Cheat Sheet corpus builder.
//!
//! This builder fetches OWASP Cheat Sheets from GitHub and extracts security
//! recommendations to create authoritative assertions.
//!
//! # Caching
//!
//! Cheat sheets are cached to `~/.cache/aphoria/owasp-cache/{filename}` to
//! minimize network requests.
//!
//! # Target Cheat Sheets
//!
//! | Filename | Topic |
//! |---------------------------------------|-----------------|
//! | Authentication_Cheat_Sheet.md | authentication |
//! | JSON_Web_Token_for_Java_Cheat_Sheet.md | jwt |
//! | Transport_Layer_Security_Cheat_Sheet.md | tls |
//! | Secrets_Management_Cheat_Sheet.md | secrets |
//! | Input_Validation_Cheat_Sheet.md | input_validation|
//! | Session_Management_Cheat_Sheet.md | session |
mod parsers;
#[cfg(test)]
mod tests;
use std::fs;
use std::thread;
use std::time::Duration;
use ed25519_dalek::SigningKey;
use stemedb_core::types::{Assertion, SourceClass};
use tracing::{debug, info, instrument, warn};
use super::CorpusBuilder;
use crate::config::CorpusConfig;
use crate::episteme::create_authoritative_assertion;
use crate::AphoriaError;
use parsers::parse_cheatsheet;
/// Target OWASP cheat sheets to fetch.
const TARGET_CHEAT_SHEETS: &[(&str, &str)] = &[
("Authentication_Cheat_Sheet.md", "authentication"),
("JSON_Web_Token_for_Java_Cheat_Sheet.md", "jwt"),
("Transport_Layer_Security_Cheat_Sheet.md", "tls"),
("Secrets_Management_Cheat_Sheet.md", "secrets"),
("Input_Validation_Cheat_Sheet.md", "input_validation"),
("Session_Management_Cheat_Sheet.md", "session"),
("Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.md", "csrf"),
("Password_Storage_Cheat_Sheet.md", "password_storage"),
("HTTP_Headers_Cheat_Sheet.md", "http_headers"),
];
/// Base URL for OWASP CheatSheetSeries raw content.
const OWASP_BASE_URL: &str =
"https://raw.githubusercontent.com/OWASP/CheatSheetSeries/master/cheatsheets/";
/// HTTP timeout for fetching cheat sheets.
const FETCH_TIMEOUT_SECS: u64 = 30;
/// Rate limit delay between requests (milliseconds).
const RATE_LIMIT_MS: u64 = 500;
/// Builder for OWASP Cheat Sheet corpus.
pub struct OwaspCorpusBuilder {
/// Cheat sheets to fetch.
sheets: Vec<(String, String)>,
}
impl OwaspCorpusBuilder {
/// Create a new OWASP corpus builder with default cheat sheets.
pub fn new() -> Self {
let sheets =
TARGET_CHEAT_SHEETS.iter().map(|(f, t)| (f.to_string(), t.to_string())).collect();
Self { sheets }
}
/// Create a builder with custom cheat sheets.
#[allow(dead_code)]
pub fn with_sheets(sheets: Vec<(String, String)>) -> Self {
Self { sheets }
}
}
impl Default for OwaspCorpusBuilder {
fn default() -> Self {
Self::new()
}
}
impl CorpusBuilder for OwaspCorpusBuilder {
fn name(&self) -> &str {
"OWASP"
}
fn scheme(&self) -> &str {
"owasp"
}
fn default_tier(&self) -> u8 {
1 // Clinical
}
fn requires_network(&self) -> bool {
true // Needs to fetch cheat sheets (unless cached)
}
fn source_ids(&self) -> Vec<String> {
self.sheets.iter().map(|(_, topic)| format!("OWASP {}", topic)).collect()
}
#[instrument(skip(self, signing_key, config), fields(builder = "OWASP", sheets = self.sheets.len()))]
fn build(
&self,
signing_key: &SigningKey,
timestamp: u64,
config: &CorpusConfig,
) -> Result<Vec<Assertion>, AphoriaError> {
let cache_dir = config.cache_dir.join("owasp-cache");
fs::create_dir_all(&cache_dir)?;
let mut all_assertions = Vec::new();
for (i, (filename, topic)) in self.sheets.iter().enumerate() {
// Rate limiting between requests
if i > 0 {
thread::sleep(Duration::from_millis(RATE_LIMIT_MS));
}
match fetch_and_parse_cheatsheet(filename, topic, &cache_dir, signing_key, timestamp) {
Ok(assertions) => {
info!(filename, topic, assertions = assertions.len(), "Parsed cheat sheet");
all_assertions.extend(assertions);
}
Err(e) => {
warn!(filename, topic, error = %e, "Failed to process cheat sheet");
// Continue with other sheets
}
}
}
Ok(all_assertions)
}
}
/// Fetch a cheat sheet and parse security recommendations.
fn fetch_and_parse_cheatsheet(
filename: &str,
topic: &str,
cache_dir: &std::path::Path,
signing_key: &SigningKey,
timestamp: u64,
) -> Result<Vec<Assertion>, AphoriaError> {
let content = fetch_cheatsheet_content(filename, cache_dir)?;
let recommendations = parse_cheatsheet(&content, topic);
let assertions = recommendations
.into_iter()
.map(|rec| {
create_authoritative_assertion(
signing_key,
&rec.subject,
&rec.predicate,
rec.value,
SourceClass::Clinical, // Tier 1
&rec.description,
timestamp,
)
})
.collect();
Ok(assertions)
}
/// Fetch cheat sheet content, using cache if available.
fn fetch_cheatsheet_content(
filename: &str,
cache_dir: &std::path::Path,
) -> Result<String, AphoriaError> {
let cache_path = cache_dir.join(filename);
// Check cache first
if cache_path.exists() {
debug!(filename, "Loading from cache");
return fs::read_to_string(&cache_path).map_err(|e| AphoriaError::OwaspFetch {
sheet: filename.to_string(),
message: e.to_string(),
});
}
// Fetch from network
let url = format!("{}{}", OWASP_BASE_URL, filename);
info!(filename, url = %url, "Fetching cheat sheet");
let response =
ureq::get(&url).timeout(Duration::from_secs(FETCH_TIMEOUT_SECS)).call().map_err(|e| {
AphoriaError::OwaspFetch { sheet: filename.to_string(), message: e.to_string() }
})?;
let content = response.into_string().map_err(|e| AphoriaError::OwaspFetch {
sheet: filename.to_string(),
message: e.to_string(),
})?;
// Cache the result
if let Err(e) = fs::write(&cache_path, &content) {
warn!(filename, error = %e, "Failed to cache cheat sheet");
}
Ok(content)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_owasp_builder_source_ids() {
let builder = OwaspCorpusBuilder::new();
let ids = builder.source_ids();
assert!(ids.iter().any(|id| id.contains("authentication")));
assert!(ids.iter().any(|id| id.contains("jwt")));
assert!(ids.iter().any(|id| id.contains("tls")));
}
#[test]
fn test_owasp_builder_requires_network() {
let builder = OwaspCorpusBuilder::new();
assert!(builder.requires_network());
}
}

View File

@ -0,0 +1,494 @@
//! OWASP cheat sheet parsers.
//!
//! Contains topic-specific parsers for extracting security recommendations
//! from OWASP Cheat Sheets.
use regex::Regex;
use stemedb_core::types::ObjectValue;
/// A parsed security recommendation from a cheat sheet.
pub(super) struct Recommendation {
/// Subject path (owasp://cheatsheet/{topic}/{section}/{claim}).
pub subject: String,
/// Predicate for the recommendation.
pub predicate: String,
/// Value extracted from the recommendation.
pub value: ObjectValue,
/// Human-readable description.
pub description: String,
}
/// Parse security recommendations from cheat sheet markdown.
pub(super) fn parse_cheatsheet(content: &str, topic: &str) -> Vec<Recommendation> {
let mut recommendations = Vec::new();
// Parse based on topic
match topic {
"authentication" => recommendations.extend(parse_authentication_sheet(content)),
"jwt" => recommendations.extend(parse_jwt_sheet(content)),
"tls" => recommendations.extend(parse_tls_sheet(content)),
"secrets" => recommendations.extend(parse_secrets_sheet(content)),
"input_validation" => recommendations.extend(parse_input_validation_sheet(content)),
"session" => recommendations.extend(parse_session_sheet(content)),
"csrf" => recommendations.extend(parse_csrf_sheet(content)),
"password_storage" => recommendations.extend(parse_password_storage_sheet(content)),
"http_headers" => recommendations.extend(parse_http_headers_sheet(content)),
_ => recommendations.extend(parse_generic_sheet(content, topic)),
}
recommendations
}
/// Parse authentication cheat sheet.
fn parse_authentication_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Multi-factor authentication
if content.contains("multi-factor") || content.contains("MFA") || content.contains("2FA") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/authentication/mfa".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Multi-factor authentication SHOULD be implemented".to_string(),
});
}
// Password requirements
if content.contains("password") && content.contains("minimum") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/authentication/password_length".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Number(8.0),
description: "OWASP: Minimum password length of 8 characters".to_string(),
});
}
// Account lockout
if content.contains("lockout") || content.contains("brute") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/authentication/account_lockout".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Account lockout SHOULD be enabled for brute force protection"
.to_string(),
});
}
// Secure password storage
if content.contains("bcrypt") || content.contains("Argon2") || content.contains("hash") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/authentication/password_hashing".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("bcrypt_or_argon2".to_string()),
description: "OWASP: Use bcrypt or Argon2 for password hashing".to_string(),
});
}
recs
}
/// Parse JWT cheat sheet.
fn parse_jwt_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Algorithm validation
if content.contains("algorithm") || content.contains("alg") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/jwt/algorithm_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: JWT algorithm MUST be validated server-side".to_string(),
});
}
// None algorithm rejection
if content.contains("\"none\"") || content.contains("none algorithm") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/jwt/none_algorithm".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
description: "OWASP: JWT 'none' algorithm MUST be rejected".to_string(),
});
}
// Expiration validation
if content.contains("expiration") || content.contains("exp") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/jwt/expiration".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: JWT expiration MUST be validated".to_string(),
});
}
// Signature verification
if content.contains("signature") && content.contains("verify") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/jwt/signature_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: JWT signatures MUST be verified".to_string(),
});
}
recs
}
/// Parse TLS cheat sheet.
fn parse_tls_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// TLS version
if content.contains("TLS 1.2") || content.contains("TLS 1.3") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/tls/min_version".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("TLS1.2".to_string()),
description: "OWASP: Minimum TLS version should be 1.2".to_string(),
});
}
// Certificate verification
if content.contains("certificate") && content.contains("verify") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: TLS certificates MUST be verified".to_string(),
});
}
// Cipher suites
if content.contains("cipher") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/tls/cipher_suites".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("strong_ciphers_only".to_string()),
description: "OWASP: Only strong cipher suites should be enabled".to_string(),
});
}
// HSTS
if content.contains("HSTS") || content.contains("Strict-Transport-Security") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/tls/hsts".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: HSTS header SHOULD be enabled".to_string(),
});
}
recs
}
/// Parse secrets management cheat sheet.
fn parse_secrets_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// No hardcoded secrets
if content.contains("hardcoded") || content.contains("hardcode") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/secrets/hardcoded".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
description: "OWASP: Secrets MUST NOT be hardcoded".to_string(),
});
}
// Environment variables or vault
if content.contains("environment") || content.contains("vault") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/secrets/storage_method".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("environment_or_vault".to_string()),
description: "OWASP: Secrets SHOULD be stored in environment variables or vault"
.to_string(),
});
}
// API key rotation
if content.contains("rotation") || content.contains("rotate") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/secrets/rotation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Secrets SHOULD be rotated regularly".to_string(),
});
}
// Encryption at rest
if content.contains("encrypt") && content.contains("rest") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/secrets/encryption_at_rest".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Secrets SHOULD be encrypted at rest".to_string(),
});
}
recs
}
/// Parse input validation cheat sheet.
fn parse_input_validation_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Server-side validation
if content.contains("server-side") || content.contains("server side") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/input_validation/server_side".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Input validation MUST be performed server-side".to_string(),
});
}
// Allow list over deny list
if content.contains("allowlist") || content.contains("whitelist") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/input_validation/allowlist".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Prefer allowlist over denylist for input validation".to_string(),
});
}
// SQL injection prevention
if content.contains("SQL") && content.contains("parameter") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/input_validation/parameterized_queries".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Use parameterized queries to prevent SQL injection".to_string(),
});
}
// XSS prevention
if content.contains("XSS") || content.contains("cross-site scripting") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/input_validation/output_encoding".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Output encoding MUST be used to prevent XSS".to_string(),
});
}
recs
}
/// Parse session management cheat sheet.
fn parse_session_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Secure cookie flag
if content.contains("Secure") && content.contains("cookie") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/session/secure_cookie".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Session cookies MUST have Secure flag".to_string(),
});
}
// HttpOnly cookie flag
if content.contains("HttpOnly") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/session/httponly_cookie".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Session cookies MUST have HttpOnly flag".to_string(),
});
}
// Session timeout
if content.contains("timeout") || content.contains("expiration") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/session/timeout".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Session timeout SHOULD be configured".to_string(),
});
}
// Session regeneration
if content.contains("regenerate") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/session/regeneration".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Session ID SHOULD be regenerated after authentication".to_string(),
});
}
recs
}
/// Parse CSRF prevention cheat sheet.
fn parse_csrf_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// CSRF tokens
if content.contains("token") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/csrf/token".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: CSRF tokens SHOULD be used".to_string(),
});
}
// SameSite cookies
if content.contains("SameSite") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/csrf/samesite_cookie".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("Strict".to_string()),
description: "OWASP: SameSite cookie attribute SHOULD be Strict or Lax".to_string(),
});
}
// Origin header validation
if content.contains("Origin") && content.contains("header") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/csrf/origin_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Origin header SHOULD be validated".to_string(),
});
}
recs
}
/// Parse password storage cheat sheet.
fn parse_password_storage_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Argon2
if content.contains("Argon2") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/password_storage/algorithm".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("Argon2id".to_string()),
description: "OWASP: Argon2id is the recommended password hashing algorithm"
.to_string(),
});
}
// Salt
if content.contains("salt") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/password_storage/salt".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Passwords MUST be salted before hashing".to_string(),
});
}
// Work factor
if content.contains("work factor") || content.contains("iterations") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/password_storage/work_factor".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Password hashing work factor SHOULD be configured".to_string(),
});
}
recs
}
/// Parse HTTP headers cheat sheet.
fn parse_http_headers_sheet(content: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
// Content-Security-Policy
if content.contains("Content-Security-Policy") || content.contains("CSP") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/http_headers/csp".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Content-Security-Policy header SHOULD be set".to_string(),
});
}
// X-Content-Type-Options
if content.contains("X-Content-Type-Options") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/http_headers/content_type_options".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("nosniff".to_string()),
description: "OWASP: X-Content-Type-Options SHOULD be 'nosniff'".to_string(),
});
}
// X-Frame-Options
if content.contains("X-Frame-Options") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/http_headers/frame_options".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("DENY".to_string()),
description: "OWASP: X-Frame-Options SHOULD be 'DENY' or 'SAMEORIGIN'".to_string(),
});
}
// Referrer-Policy
if content.contains("Referrer-Policy") {
recs.push(Recommendation {
subject: "owasp://cheatsheet/http_headers/referrer_policy".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OWASP: Referrer-Policy header SHOULD be set".to_string(),
});
}
recs
}
/// Parse a generic cheat sheet using keyword matching.
fn parse_generic_sheet(content: &str, topic: &str) -> Vec<Recommendation> {
let mut recs = Vec::new();
let Ok(must_pattern) = Regex::new(r"(?i)\bMUST\b[^.]+\.") else { return recs };
let Ok(should_pattern) = Regex::new(r"(?i)\bSHOULD\b[^.]+\.") else { return recs };
for (i, cap) in must_pattern.captures_iter(content).enumerate().take(5) {
let slug = create_slug(cap.get(0).map(|m| m.as_str()).unwrap_or(""));
recs.push(Recommendation {
subject: format!("owasp://cheatsheet/{}/must_{}", topic, i),
predicate: "required".to_string(),
value: ObjectValue::Boolean(true),
description: format!("OWASP {}: {}", topic, truncate_description(&slug, 100)),
});
}
for (i, cap) in should_pattern.captures_iter(content).enumerate().take(5) {
let slug = create_slug(cap.get(0).map(|m| m.as_str()).unwrap_or(""));
recs.push(Recommendation {
subject: format!("owasp://cheatsheet/{}/should_{}", topic, i),
predicate: "recommended".to_string(),
value: ObjectValue::Boolean(true),
description: format!("OWASP {}: {}", topic, truncate_description(&slug, 100)),
});
}
recs
}
/// Create a URL-safe slug from text.
pub(super) fn create_slug(text: &str) -> String {
text.to_lowercase()
.chars()
.map(|c| if c.is_alphanumeric() || c == ' ' { c } else { ' ' })
.collect::<String>()
.split_whitespace()
.take(10)
.collect::<Vec<_>>()
.join("_")
}
/// Truncate description to max length.
pub(super) fn truncate_description(text: &str, max_len: usize) -> String {
if text.len() <= max_len {
text.to_string()
} else {
format!("{}...", &text[..max_len - 3])
}
}

View File

@ -0,0 +1,114 @@
//! Tests for OWASP cheat sheet parsers.
use super::parsers::{create_slug, parse_cheatsheet, truncate_description};
#[test]
fn test_parse_authentication_sheet() {
let content = r#"
# Authentication Best Practices
## Multi-Factor Authentication
Multi-factor authentication (MFA) or 2FA should be implemented.
## Password Requirements
The minimum password length should be at least 8 characters.
## Account Lockout
Account lockout should be enabled to prevent brute force attacks.
Use bcrypt or Argon2 for password hashing.
"#;
let recs = parse_cheatsheet(content, "authentication");
assert!(recs.iter().any(|r| r.subject.contains("mfa")), "Should find MFA recommendation");
assert!(
recs.iter().any(|r| r.subject.contains("password_length")),
"Should find password length"
);
assert!(
recs.iter().any(|r| r.subject.contains("account_lockout")),
"Should find account lockout"
);
}
#[test]
fn test_parse_jwt_sheet() {
let content = r#"
# JWT Security
The algorithm header must be validated.
The "none" algorithm must not be accepted.
Verify the signature before trusting the claims.
Check the expiration claim.
"#;
let recs = parse_cheatsheet(content, "jwt");
assert!(
recs.iter().any(|r| r.subject.contains("algorithm")),
"Should find algorithm validation"
);
assert!(
recs.iter().any(|r| r.subject.contains("none_algorithm")),
"Should find none algorithm rejection"
);
}
#[test]
fn test_parse_tls_sheet() {
let content = r#"
# TLS Configuration
Use TLS 1.2 or TLS 1.3.
Always verify the certificate chain.
Configure strong cipher suites.
Enable HSTS.
"#;
let recs = parse_cheatsheet(content, "tls");
assert!(recs.iter().any(|r| r.subject.contains("min_version")), "Should find min version");
assert!(
recs.iter().any(|r| r.subject.contains("cert_verification")),
"Should find cert verification"
);
}
#[test]
fn test_parse_secrets_sheet() {
let content = r#"
# Secrets Management
Never hardcode secrets in your code.
Store secrets in environment variables or a vault.
Rotate secrets regularly.
Encrypt secrets at rest.
"#;
let recs = parse_cheatsheet(content, "secrets");
assert!(recs.iter().any(|r| r.subject.contains("hardcoded")), "Should find hardcoded warning");
assert!(
recs.iter().any(|r| r.subject.contains("storage_method")),
"Should find storage method"
);
}
#[test]
fn test_create_slug() {
assert_eq!(create_slug("Hello World!"), "hello_world");
assert_eq!(create_slug("Use TLS 1.2"), "use_tls_1_2");
}
#[test]
fn test_truncate_description() {
assert_eq!(truncate_description("short", 100), "short");
assert_eq!(
truncate_description("a".repeat(150).as_str(), 100),
format!("{}...", "a".repeat(97))
);
}

View File

@ -0,0 +1,231 @@
//! RFC normative statement corpus builder.
//!
//! This builder fetches RFCs from the IETF RFC Editor and extracts normative
//! statements (MUST, SHALL, SHOULD per RFC 2119) to create authoritative
//! assertions.
//!
//! # Caching
//!
//! RFC text is cached to `~/.cache/aphoria/rfc-cache/rfc{number}.txt` to
//! minimize network requests.
//!
//! # Target RFCs
//!
//! | RFC | Topic | Priority Claims |
//! |------|------------------------|----------------------------------------------------|
//! | 7519 | JWT | audience_validation, expiry_validation, signature |
//! | 6749 | OAuth 2.0 | redirect_uri_validation, state_parameter |
//! | 6750 | Bearer tokens | transport_security |
//! | 8446 | TLS 1.3 | cert_verification, cipher_selection |
//! | 7525 | TLS best practices | hostname_verification |
//! | 6238 | TOTP | time_step, validation_window |
//! | 7617 | HTTP Basic Auth | transport_security |
//! | 9110 | HTTP Semantics | timeout_handling |
mod parsers;
#[cfg(test)]
mod tests;
use std::fs;
use std::time::Duration;
use ed25519_dalek::SigningKey;
use stemedb_core::types::{Assertion, SourceClass};
use tracing::{debug, info, instrument, warn};
use super::CorpusBuilder;
use crate::config::CorpusConfig;
use crate::episteme::create_authoritative_assertion;
use crate::AphoriaError;
use parsers::parse_normative_statements;
/// Default RFCs to fetch when none are specified.
const DEFAULT_RFCS: &[u32] = &[
7519, // JWT
6749, // OAuth 2.0
6750, // Bearer tokens
8446, // TLS 1.3
7525, // TLS best practices
6238, // TOTP
7617, // HTTP Basic Auth
9110, // HTTP Semantics
];
/// HTTP timeout for fetching RFCs.
const FETCH_TIMEOUT_SECS: u64 = 30;
/// Builder for RFC normative statement corpus.
pub struct RfcCorpusBuilder {
/// List of RFC numbers to fetch.
rfc_list: Vec<u32>,
}
impl RfcCorpusBuilder {
/// Create a new RFC corpus builder with specified RFCs.
pub fn new(rfc_list: &Option<Vec<u32>>) -> Self {
let list = rfc_list.clone().unwrap_or_else(|| DEFAULT_RFCS.to_vec());
Self { rfc_list: list }
}
/// Create a builder with default RFC list.
pub fn with_defaults() -> Self {
Self { rfc_list: DEFAULT_RFCS.to_vec() }
}
}
impl Default for RfcCorpusBuilder {
fn default() -> Self {
Self::with_defaults()
}
}
impl CorpusBuilder for RfcCorpusBuilder {
fn name(&self) -> &str {
"RFC"
}
fn scheme(&self) -> &str {
"rfc"
}
fn default_tier(&self) -> u8 {
0 // Regulatory
}
fn requires_network(&self) -> bool {
true // Needs to fetch RFCs (unless cached)
}
fn source_ids(&self) -> Vec<String> {
self.rfc_list.iter().map(|n| format!("RFC {}", n)).collect()
}
#[instrument(skip(self, signing_key, config), fields(builder = "RFC", rfcs = ?self.rfc_list))]
fn build(
&self,
signing_key: &SigningKey,
timestamp: u64,
config: &CorpusConfig,
) -> Result<Vec<Assertion>, AphoriaError> {
let cache_dir = config.cache_dir.join("rfc-cache");
fs::create_dir_all(&cache_dir)?;
let mut all_assertions = Vec::new();
for &rfc_num in &self.rfc_list {
match fetch_and_parse_rfc(rfc_num, &cache_dir, signing_key, timestamp) {
Ok(assertions) => {
info!(rfc = rfc_num, assertions = assertions.len(), "Parsed RFC");
all_assertions.extend(assertions);
}
Err(e) => {
warn!(rfc = rfc_num, error = %e, "Failed to process RFC");
// Continue with other RFCs
}
}
}
Ok(all_assertions)
}
}
/// Fetch an RFC and parse its normative statements.
fn fetch_and_parse_rfc(
rfc_num: u32,
cache_dir: &std::path::Path,
signing_key: &SigningKey,
timestamp: u64,
) -> Result<Vec<Assertion>, AphoriaError> {
let text = fetch_rfc_text(rfc_num, cache_dir)?;
let statements = parse_normative_statements(&text, rfc_num);
let assertions = statements
.into_iter()
.map(|stmt| {
create_authoritative_assertion(
signing_key,
&stmt.subject,
&stmt.predicate,
stmt.value,
SourceClass::Regulatory, // Tier 0
&stmt.description,
timestamp,
)
})
.collect();
Ok(assertions)
}
/// Fetch RFC text, using cache if available.
fn fetch_rfc_text(rfc_num: u32, cache_dir: &std::path::Path) -> Result<String, AphoriaError> {
let cache_path = cache_dir.join(format!("rfc{}.txt", rfc_num));
// Check cache first
if cache_path.exists() {
debug!(rfc = rfc_num, "Loading from cache");
return fs::read_to_string(&cache_path).map_err(|e| AphoriaError::RfcFetch {
rfc: rfc_num,
message: e.to_string(),
});
}
// Fetch from network
let url = format!("https://www.rfc-editor.org/rfc/rfc{}.txt", rfc_num);
info!(rfc = rfc_num, url = %url, "Fetching RFC");
let response =
ureq::get(&url).timeout(Duration::from_secs(FETCH_TIMEOUT_SECS)).call().map_err(|e| {
AphoriaError::RfcFetch { rfc: rfc_num, message: e.to_string() }
})?;
let text = response.into_string().map_err(|e| AphoriaError::RfcFetch {
rfc: rfc_num,
message: e.to_string(),
})?;
// Cache the result
if let Err(e) = fs::write(&cache_path, &text) {
warn!(rfc = rfc_num, error = %e, "Failed to cache RFC");
}
Ok(text)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_rfc_builder_source_ids() {
let builder = RfcCorpusBuilder::with_defaults();
let ids = builder.source_ids();
assert!(ids.iter().any(|id| id.contains("7519")));
assert!(ids.iter().any(|id| id.contains("8446")));
}
#[test]
fn test_rfc_builder_requires_network() {
let builder = RfcCorpusBuilder::with_defaults();
assert!(builder.requires_network());
}
#[test]
fn test_custom_rfc_list() {
let custom_list = Some(vec![7519, 8446]);
let builder = RfcCorpusBuilder::new(&custom_list);
assert_eq!(builder.rfc_list.len(), 2);
assert!(builder.rfc_list.contains(&7519));
assert!(builder.rfc_list.contains(&8446));
}
#[test]
fn test_rfc_builder_offline_skipped() {
// Test that the builder correctly reports it requires network
// (actual network testing would need integration tests)
let builder = RfcCorpusBuilder::with_defaults();
assert!(builder.requires_network());
}
}

View File

@ -0,0 +1,453 @@
//! RFC normative statement parsers.
//!
//! Contains RFC-specific parsers for extracting normative statements
//! (MUST, SHALL, SHOULD per RFC 2119) from RFC documents.
use std::collections::HashMap;
use regex::Regex;
use stemedb_core::types::ObjectValue;
/// A parsed normative statement from an RFC.
pub(super) struct NormativeStatement {
/// Subject path (rfc://{number}/{topic}).
pub subject: String,
/// Predicate for the statement.
pub predicate: String,
/// Value extracted from the statement.
pub value: ObjectValue,
/// Human-readable description.
pub description: String,
}
/// Parse normative statements from RFC text.
pub(super) fn parse_normative_statements(text: &str, rfc_num: u32) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// RFC-specific parsing based on content
match rfc_num {
7519 => statements.extend(parse_rfc7519_jwt(text)),
6749 => statements.extend(parse_rfc6749_oauth(text)),
6750 => statements.extend(parse_rfc6750_bearer(text)),
8446 => statements.extend(parse_rfc8446_tls13(text)),
7525 => statements.extend(parse_rfc7525_tls_practices(text)),
6238 => statements.extend(parse_rfc6238_totp(text)),
7617 => statements.extend(parse_rfc7617_basic_auth(text)),
9110 => statements.extend(parse_rfc9110_http(text)),
_ => {
// Generic parsing for unknown RFCs
statements.extend(parse_generic_rfc(text, rfc_num));
}
}
statements
}
/// Parse RFC 7519 (JWT) normative statements.
fn parse_rfc7519_jwt(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Audience validation (Section 4.1.3)
if contains_normative(text, "aud", "MUST") {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/audience_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "JWT audience claim MUST be validated (RFC 7519 Section 4.1.3)".to_string(),
});
}
// Expiry validation (Section 4.1.4)
if contains_normative(text, "exp", "MUST") {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/expiry_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "JWT expiry claim MUST be validated (RFC 7519 Section 4.1.4)".to_string(),
});
}
// Signature verification
if text.contains("signature") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/signature_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "JWT signatures MUST be verified (RFC 7519)".to_string(),
});
}
// Algorithm restriction
if text.contains("alg") && (text.contains("\"none\"") || text.contains("none algorithm")) {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/algorithm_restriction".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("explicit_list".to_string()),
description: "JWT algorithm MUST be explicitly specified, 'none' algorithm forbidden"
.to_string(),
});
}
// Not Before validation (Section 4.1.5)
if contains_normative(text, "nbf", "MUST") {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/nbf_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "JWT not-before claim MUST be validated (RFC 7519 Section 4.1.5)"
.to_string(),
});
}
// Issuer validation (Section 4.1.1)
if contains_normative(text, "iss", "application-specific") {
statements.push(NormativeStatement {
subject: "rfc://7519/jwt/issuer_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "JWT issuer claim SHOULD be validated for application-specific purposes"
.to_string(),
});
}
statements
}
/// Parse RFC 6749 (OAuth 2.0) normative statements.
fn parse_rfc6749_oauth(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Redirect URI validation
if text.contains("redirect_uri") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://6749/oauth/redirect_uri_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OAuth redirect_uri MUST be validated exactly (RFC 6749)".to_string(),
});
}
// State parameter
if text.contains("state") && text.contains("SHOULD") {
statements.push(NormativeStatement {
subject: "rfc://6749/oauth/state_parameter".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OAuth state parameter SHOULD be used for CSRF protection (RFC 6749)"
.to_string(),
});
}
// Scope validation
if text.contains("scope") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://6749/oauth/scope_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OAuth scope MUST be validated (RFC 6749)".to_string(),
});
}
// HTTPS requirement
if text.contains("TLS") || text.contains("HTTPS") {
statements.push(NormativeStatement {
subject: "rfc://6749/oauth/transport_security".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "OAuth endpoints MUST use TLS (RFC 6749)".to_string(),
});
}
statements
}
/// Parse RFC 6750 (Bearer tokens) normative statements.
fn parse_rfc6750_bearer(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Transport security
if text.contains("TLS") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://6750/bearer/transport_security".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "Bearer tokens MUST be transmitted over TLS (RFC 6750)".to_string(),
});
}
// Token storage
if text.contains("confidential") || text.contains("secure") {
statements.push(NormativeStatement {
subject: "rfc://6750/bearer/secure_storage".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "Bearer tokens MUST be stored securely (RFC 6750)".to_string(),
});
}
statements
}
/// Parse RFC 8446 (TLS 1.3) normative statements.
fn parse_rfc8446_tls13(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Certificate verification
if text.contains("certificate") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://8446/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "TLS certificate chains MUST be verified (RFC 8446)".to_string(),
});
}
// Cipher selection
if text.contains("cipher") || text.contains("cipher_suite") {
statements.push(NormativeStatement {
subject: "rfc://8446/tls/cipher_selection".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384".to_string()),
description: "TLS 1.3 cipher suites (RFC 8446)".to_string(),
});
}
// Protocol version
statements.push(NormativeStatement {
subject: "rfc://8446/tls/min_version".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("TLS1.3".to_string()),
description: "TLS 1.3 is the minimum recommended version (RFC 8446)".to_string(),
});
statements
}
/// Parse RFC 7525 (TLS best practices) normative statements.
fn parse_rfc7525_tls_practices(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Hostname verification
if text.contains("hostname") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://7525/tls/hostname_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "TLS hostname MUST be verified (RFC 7525)".to_string(),
});
}
// Certificate revocation
if text.contains("revocation") {
statements.push(NormativeStatement {
subject: "rfc://7525/tls/revocation_checking".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "TLS certificate revocation SHOULD be checked (RFC 7525)".to_string(),
});
}
// Deprecated versions
if text.contains("SSL") && text.contains("MUST NOT") {
statements.push(NormativeStatement {
subject: "rfc://7525/tls/deprecated_versions".to_string(),
predicate: "disabled".to_string(),
value: ObjectValue::Boolean(true),
description: "SSLv2 and SSLv3 MUST NOT be used (RFC 7525)".to_string(),
});
}
statements
}
/// Parse RFC 6238 (TOTP) normative statements.
fn parse_rfc6238_totp(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Time step
if text.contains("30") && text.contains("time") {
statements.push(NormativeStatement {
subject: "rfc://6238/totp/time_step".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Number(30.0),
description: "TOTP time step SHOULD be 30 seconds (RFC 6238)".to_string(),
});
}
// Validation window
if text.contains("window") || text.contains("tolerance") {
statements.push(NormativeStatement {
subject: "rfc://6238/totp/validation_window".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Number(1.0),
description: "TOTP validation window SHOULD allow 1 step tolerance (RFC 6238)"
.to_string(),
});
}
// Key length
if text.contains("key") && text.contains("160") {
statements.push(NormativeStatement {
subject: "rfc://6238/totp/key_length".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Number(160.0),
description: "TOTP secret key SHOULD be at least 160 bits (RFC 6238)".to_string(),
});
}
statements
}
/// Parse RFC 7617 (HTTP Basic Auth) normative statements.
fn parse_rfc7617_basic_auth(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Transport security
if text.contains("TLS") || text.contains("confidential") {
statements.push(NormativeStatement {
subject: "rfc://7617/basic_auth/transport_security".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "HTTP Basic Auth MUST use TLS (RFC 7617)".to_string(),
});
}
// UTF-8 encoding
if text.contains("UTF-8") {
statements.push(NormativeStatement {
subject: "rfc://7617/basic_auth/encoding".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("UTF-8".to_string()),
description: "HTTP Basic Auth credentials SHOULD use UTF-8 (RFC 7617)".to_string(),
});
}
statements
}
/// Parse RFC 9110 (HTTP Semantics) normative statements.
fn parse_rfc9110_http(text: &str) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
// Timeout handling
if text.contains("timeout") {
statements.push(NormativeStatement {
subject: "rfc://9110/http/timeout_handling".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "HTTP timeouts SHOULD be configured (RFC 9110)".to_string(),
});
}
// Host header
if text.contains("Host") && text.contains("MUST") {
statements.push(NormativeStatement {
subject: "rfc://9110/http/host_header".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "HTTP/1.1 Host header MUST be present (RFC 9110)".to_string(),
});
}
// Content-Length handling
if text.contains("Content-Length") {
statements.push(NormativeStatement {
subject: "rfc://9110/http/content_length_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
description: "HTTP Content-Length SHOULD be validated (RFC 9110)".to_string(),
});
}
statements
}
/// Generic RFC parsing for unknown RFCs.
fn parse_generic_rfc(text: &str, rfc_num: u32) -> Vec<NormativeStatement> {
let mut statements = Vec::new();
let keyword_pattern =
match Regex::new(r"\b(MUST\s+NOT|MUST|SHALL\s+NOT|SHALL|SHOULD\s+NOT|SHOULD)\b") {
Ok(re) => re,
Err(_) => return statements,
};
// Find sections with normative keywords
let section_topics = extract_section_topics(text);
for (section, topic) in section_topics {
// Check if this section has normative statements
if keyword_pattern.is_match(&section) {
let keyword = extract_strongest_keyword(&section);
let is_mandatory = matches!(keyword.as_str(), "MUST" | "SHALL");
statements.push(NormativeStatement {
subject: format!("rfc://{}/{}", rfc_num, topic),
predicate: if is_mandatory { "required" } else { "recommended" }.to_string(),
value: ObjectValue::Boolean(true),
description: format!("RFC {} {} requirement: {}", rfc_num, keyword, topic),
});
}
}
statements
}
/// Extract section numbers and their topics from RFC text.
fn extract_section_topics(text: &str) -> HashMap<String, String> {
let section_pattern = match Regex::new(r"(?m)^(\d+(?:\.\d+)*)\.\s+(.+)$") {
Ok(re) => re,
Err(_) => return HashMap::new(),
};
let mut sections = HashMap::new();
for cap in section_pattern.captures_iter(text) {
let section_num = cap.get(1).map(|m| m.as_str()).unwrap_or("");
let title = cap.get(2).map(|m| m.as_str()).unwrap_or("");
// Create a slug from the title
let slug = title
.to_lowercase()
.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect::<String>()
.trim_matches('_')
.to_string();
if !slug.is_empty() {
// Extract section content (simplified - just the title for now)
sections.insert(title.to_string(), format!("{}_{}", section_num, slug));
}
}
sections
}
/// Extract the strongest normative keyword from text.
pub(super) fn extract_strongest_keyword(text: &str) -> String {
let keywords = [
("MUST NOT", 5),
("MUST", 4),
("SHALL NOT", 3),
("SHALL", 2),
("SHOULD NOT", 1),
("SHOULD", 0),
];
keywords
.iter()
.filter(|(kw, _)| text.contains(kw))
.max_by_key(|(_, priority)| priority)
.map(|(kw, _)| kw.to_string())
.unwrap_or_else(|| "SHOULD".to_string())
}
/// Check if text contains a normative statement about a topic.
pub(super) fn contains_normative(text: &str, topic: &str, keyword: &str) -> bool {
// Look for keyword near topic mention
let pattern = format!(r"(?i){}[^.]*{}", topic, keyword);
Regex::new(&pattern).map(|re| re.is_match(text)).unwrap_or(false) || {
// Also check reverse order
let reverse_pattern = format!(r"(?i){}[^.]*{}", keyword, topic);
Regex::new(&reverse_pattern).map(|re| re.is_match(text)).unwrap_or(false)
}
}

View File

@ -0,0 +1,68 @@
//! Tests for RFC normative statement parsers.
use super::parsers::{contains_normative, extract_strongest_keyword, parse_normative_statements};
#[test]
fn test_parse_jwt_statements() {
// Sample JWT RFC text (simplified)
let text = r#"
4.1.3. "aud" (Audience) Claim
The "aud" (audience) claim identifies the recipients that the JWT is
intended for. Each principal intended to process the JWT MUST
identify itself with a value in the audience claim.
4.1.4. "exp" (Expiration Time) Claim
The "exp" (expiration time) claim identifies the expiration time on
or after which the JWT MUST NOT be accepted for processing.
The signature MUST be verified.
The "alg" header parameter. Using "none" algorithm is forbidden.
"#;
let statements = parse_normative_statements(text, 7519);
assert!(
statements.iter().any(|s| s.subject.contains("audience_validation")),
"Should find audience validation"
);
assert!(
statements.iter().any(|s| s.subject.contains("expiry_validation")),
"Should find expiry validation"
);
assert!(
statements.iter().any(|s| s.subject.contains("signature_verification")),
"Should find signature verification"
);
}
#[test]
fn test_parse_tls_statements() {
let text = r#"
The certificate chain MUST be verified.
cipher_suite selection is important.
"#;
let statements = parse_normative_statements(text, 8446);
assert!(
statements.iter().any(|s| s.subject.contains("cert_verification")),
"Should find cert verification"
);
}
#[test]
fn test_extract_strongest_keyword() {
assert_eq!(extract_strongest_keyword("MUST NOT do this"), "MUST NOT");
assert_eq!(extract_strongest_keyword("MUST do this"), "MUST");
assert_eq!(extract_strongest_keyword("SHOULD do this"), "SHOULD");
assert_eq!(extract_strongest_keyword("MUST do this and SHOULD do that"), "MUST");
}
#[test]
fn test_contains_normative() {
let text = "The aud claim MUST be validated";
assert!(contains_normative(text, "aud", "MUST"));
assert!(!contains_normative(text, "aud", "SHOULD"));
}

View File

@ -0,0 +1,328 @@
//! Vendor documentation corpus builder.
//!
//! This builder provides curated claims from vendor documentation for common
//! libraries and tools. These are Tier 2 (Observational) sources that represent
//! best practices documented by software vendors.
use ed25519_dalek::SigningKey;
use stemedb_core::types::{Assertion, ObjectValue, SourceClass};
use tracing::instrument;
use super::CorpusBuilder;
use crate::config::CorpusConfig;
use crate::episteme::create_authoritative_assertion;
use crate::AphoriaError;
/// Builder for vendor documentation corpus.
///
/// Contains curated claims from:
/// - PostgreSQL connection pooling recommendations
/// - Redis timeout defaults and best practices
/// - reqwest TLS verification defaults
/// - hyper timeout recommendations
/// - Go net/http timeout defaults
pub struct VendorCorpusBuilder {
claims: Vec<VendorClaim>,
}
/// A curated vendor claim.
struct VendorClaim {
/// Subject path (vendor://{product}/{topic}/{claim}).
subject: &'static str,
/// Predicate for the claim.
predicate: &'static str,
/// Value of the claim.
value: ObjectValue,
/// Human-readable description.
description: &'static str,
/// Source URL for reference.
#[allow(dead_code)]
source_url: Option<&'static str>,
}
impl VendorCorpusBuilder {
/// Create a new vendor corpus builder with default claims.
pub fn new() -> Self {
Self { claims: build_vendor_claims() }
}
}
impl Default for VendorCorpusBuilder {
fn default() -> Self {
Self::new()
}
}
impl CorpusBuilder for VendorCorpusBuilder {
fn name(&self) -> &str {
"Vendor"
}
fn scheme(&self) -> &str {
"vendor"
}
fn default_tier(&self) -> u8 {
2 // Observational
}
fn requires_network(&self) -> bool {
false // All claims are hardcoded
}
fn source_ids(&self) -> Vec<String> {
vec![
"postgres".to_string(),
"redis".to_string(),
"reqwest".to_string(),
"hyper".to_string(),
"go-net-http".to_string(),
"tokio-postgres".to_string(),
"sqlx".to_string(),
]
}
#[instrument(skip(self, signing_key, _config), fields(builder = "Vendor"))]
fn build(
&self,
signing_key: &SigningKey,
timestamp: u64,
_config: &CorpusConfig,
) -> Result<Vec<Assertion>, AphoriaError> {
let assertions = self
.claims
.iter()
.map(|claim| {
create_authoritative_assertion(
signing_key,
claim.subject,
claim.predicate,
claim.value.clone(),
SourceClass::Observational, // Tier 2
claim.description,
timestamp,
)
})
.collect();
Ok(assertions)
}
}
/// Build the list of curated vendor claims.
fn build_vendor_claims() -> Vec<VendorClaim> {
vec![
// PostgreSQL connection pooling
VendorClaim {
subject: "vendor://postgres/connection/pool_size",
predicate: "config_value",
value: ObjectValue::Text("20-100".to_string()),
description: "PostgreSQL recommends connection pool sizes between 20-100 for most applications",
source_url: Some("https://www.postgresql.org/docs/current/runtime-config-connection.html"),
},
VendorClaim {
subject: "vendor://postgres/connection/idle_timeout",
predicate: "config_value",
value: ObjectValue::Number(300.0), // 5 minutes
description: "PostgreSQL recommends idle connection timeout around 5 minutes (300s)",
source_url: Some("https://www.postgresql.org/docs/current/runtime-config-connection.html"),
},
VendorClaim {
subject: "vendor://postgres/ssl/mode",
predicate: "config_value",
value: ObjectValue::Text("require".to_string()),
description: "PostgreSQL SSL mode should be 'require' or stricter for production",
source_url: Some("https://www.postgresql.org/docs/current/libpq-ssl.html"),
},
// Redis timeouts
VendorClaim {
subject: "vendor://redis/connection/timeout",
predicate: "config_value",
value: ObjectValue::Number(5000.0), // 5 seconds in ms
description: "Redis recommends connection timeout of 5 seconds",
source_url: Some("https://redis.io/docs/clients/"),
},
VendorClaim {
subject: "vendor://redis/connection/max_retries",
predicate: "config_value",
value: ObjectValue::Number(3.0),
description: "Redis recommends 3 retries for connection failures",
source_url: Some("https://redis.io/docs/clients/"),
},
VendorClaim {
subject: "vendor://redis/tls/enabled",
predicate: "enabled",
value: ObjectValue::Boolean(true),
description: "Redis TLS should be enabled for production deployments",
source_url: Some("https://redis.io/docs/management/security/encryption/"),
},
// reqwest (Rust HTTP client)
VendorClaim {
subject: "vendor://reqwest/tls/cert_verification",
predicate: "enabled",
value: ObjectValue::Boolean(true),
description: "reqwest: TLS certificate verification is enabled by default and should not be disabled",
source_url: Some("https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html"),
},
VendorClaim {
subject: "vendor://reqwest/timeout/connect",
predicate: "config_value",
value: ObjectValue::Number(30000.0), // 30 seconds
description: "reqwest: Recommended connection timeout is 30 seconds",
source_url: Some("https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html"),
},
VendorClaim {
subject: "vendor://reqwest/timeout/request",
predicate: "config_value",
value: ObjectValue::Number(30000.0), // 30 seconds
description: "reqwest: Recommended total request timeout is 30 seconds",
source_url: Some("https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html"),
},
// hyper (Rust HTTP library)
VendorClaim {
subject: "vendor://hyper/timeout/keep_alive",
predicate: "config_value",
value: ObjectValue::Number(90000.0), // 90 seconds
description: "hyper: Default HTTP/1.1 keep-alive timeout is 90 seconds",
source_url: Some("https://docs.rs/hyper/latest/hyper/"),
},
VendorClaim {
subject: "vendor://hyper/http2/max_concurrent_streams",
predicate: "config_value",
value: ObjectValue::Number(100.0),
description: "hyper: Recommended max concurrent HTTP/2 streams per connection",
source_url: Some("https://docs.rs/hyper/latest/hyper/"),
},
// Go net/http
VendorClaim {
subject: "vendor://go-net-http/timeout/read",
predicate: "config_value",
value: ObjectValue::Number(10000.0), // 10 seconds
description: "Go net/http: ReadTimeout should be set to prevent slowloris attacks",
source_url: Some("https://pkg.go.dev/net/http#Server"),
},
VendorClaim {
subject: "vendor://go-net-http/timeout/write",
predicate: "config_value",
value: ObjectValue::Number(10000.0), // 10 seconds
description: "Go net/http: WriteTimeout should be set for request handling",
source_url: Some("https://pkg.go.dev/net/http#Server"),
},
VendorClaim {
subject: "vendor://go-net-http/timeout/idle",
predicate: "config_value",
value: ObjectValue::Number(120000.0), // 120 seconds
description: "Go net/http: IdleTimeout for keep-alive connections",
source_url: Some("https://pkg.go.dev/net/http#Server"),
},
VendorClaim {
subject: "vendor://go-net-http/tls/min_version",
predicate: "config_value",
value: ObjectValue::Text("TLS1.2".to_string()),
description: "Go net/http: Minimum TLS version should be 1.2 or higher",
source_url: Some("https://pkg.go.dev/crypto/tls#Config"),
},
// tokio-postgres (Rust async postgres)
VendorClaim {
subject: "vendor://tokio-postgres/connection/pool_size",
predicate: "config_value",
value: ObjectValue::Number(10.0),
description: "tokio-postgres: Default pool size recommendation for async workloads",
source_url: Some("https://docs.rs/deadpool-postgres/"),
},
VendorClaim {
subject: "vendor://tokio-postgres/ssl/mode",
predicate: "config_value",
value: ObjectValue::Text("require".to_string()),
description: "tokio-postgres: SSL mode should be 'require' for production",
source_url: Some("https://docs.rs/tokio-postgres/"),
},
// SQLx (Rust SQL toolkit)
VendorClaim {
subject: "vendor://sqlx/connection/max_connections",
predicate: "config_value",
value: ObjectValue::Number(10.0),
description: "SQLx: Default max connections for connection pool",
source_url: Some("https://docs.rs/sqlx/"),
},
VendorClaim {
subject: "vendor://sqlx/connection/idle_timeout",
predicate: "config_value",
value: ObjectValue::Number(600.0), // 10 minutes
description: "SQLx: Recommended idle connection timeout",
source_url: Some("https://docs.rs/sqlx/"),
},
]
}
#[cfg(test)]
mod tests {
use super::*;
use crate::bridge::generate_signing_key;
#[test]
fn test_vendor_builder_builds() {
let builder = VendorCorpusBuilder::new();
let key = generate_signing_key();
let config = CorpusConfig::default();
let assertions = builder.build(&key, 1706832000, &config).expect("build");
assert!(assertions.len() >= 15, "Expected at least 15 vendor claims");
}
#[test]
fn test_vendor_builder_no_network() {
let builder = VendorCorpusBuilder::new();
assert!(!builder.requires_network());
}
#[test]
fn test_vendor_assertions_tier() {
let builder = VendorCorpusBuilder::new();
let key = generate_signing_key();
let config = CorpusConfig::default();
let assertions = builder.build(&key, 1706832000, &config).expect("build");
// All vendor assertions should be Observational (Tier 2)
for assertion in &assertions {
assert_eq!(
assertion.source_class,
SourceClass::Observational,
"Vendor assertion {} should be Tier 2",
assertion.subject
);
}
}
#[test]
fn test_vendor_postgres_assertions() {
let builder = VendorCorpusBuilder::new();
let key = generate_signing_key();
let config = CorpusConfig::default();
let assertions = builder.build(&key, 1706832000, &config).expect("build");
// Check for PostgreSQL assertions
let pg_assertions: Vec<_> =
assertions.iter().filter(|a| a.subject.contains("postgres")).collect();
assert!(pg_assertions.len() >= 2, "Expected at least 2 PostgreSQL assertions");
}
#[test]
fn test_vendor_source_ids() {
let builder = VendorCorpusBuilder::new();
let ids = builder.source_ids();
assert!(ids.contains(&"postgres".to_string()));
assert!(ids.contains(&"redis".to_string()));
assert!(ids.contains(&"reqwest".to_string()));
}
}

View File

@ -0,0 +1,201 @@
//! Authoritative corpus creation for Aphoria.
//!
//! Provides functions to create pre-built authoritative assertions
//! for common security patterns (TLS, JWT, CORS, etc.).
use std::time::{SystemTime, UNIX_EPOCH};
use blake3::Hasher;
use ed25519_dalek::{Signer, SigningKey};
use stemedb_core::types::{
Assertion, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass,
};
/// Get the current Unix timestamp.
pub(crate) fn current_timestamp() -> u64 {
SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_secs()).unwrap_or(0)
}
/// Create authoritative assertions for the RFC/OWASP corpus.
#[allow(clippy::vec_init_then_push)]
pub fn create_authoritative_corpus(signing_key: &SigningKey) -> Vec<Assertion> {
let timestamp = current_timestamp();
let mut assertions = Vec::new();
// TLS verification requirements
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://5246/tls/cert_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"TLS certificate verification MUST be enabled (RFC 5246)",
timestamp,
));
// OWASP TLS guidance
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://transport_layer/tls/cert_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Clinical, // Tier 1
"OWASP: Always verify TLS certificates",
timestamp,
));
// JWT audience validation (RFC 7519)
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/audience_validation",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT audience claim MUST be validated (RFC 7519 Section 4.1.3)",
timestamp,
));
// JWT expiry validation
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/expiry_validation",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT expiry claim MUST be validated (RFC 7519 Section 4.1.4)",
timestamp,
));
// JWT signature verification
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/signature_verification",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Regulatory,
"JWT signatures MUST be verified (RFC 7519)",
timestamp,
));
// JWT algorithm restriction
assertions.push(create_authoritative_assertion(
signing_key,
"rfc://7519/jwt/algorithm_restriction",
"config_value",
ObjectValue::Text("explicit_list".to_string()),
SourceClass::Regulatory,
"JWT algorithm MUST be explicitly specified, 'none' algorithm forbidden",
timestamp,
));
// OWASP secrets management
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://secrets/api_key",
"storage_method",
ObjectValue::Text("environment_or_vault".to_string()),
SourceClass::Clinical,
"OWASP: Never hardcode API keys in source code",
timestamp,
));
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://secrets/password",
"storage_method",
ObjectValue::Text("environment_or_vault".to_string()),
SourceClass::Clinical,
"OWASP: Never hardcode passwords in source code",
timestamp,
));
// CORS security
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://cors/allow_origin",
"config_value",
ObjectValue::Text("explicit_list".to_string()),
SourceClass::Clinical,
"OWASP: Never use wildcard (*) for CORS Allow-Origin in production",
timestamp,
));
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://cors/credentials_with_wildcard",
"enabled",
ObjectValue::Boolean(false),
SourceClass::Regulatory,
"CORS credentials MUST NOT be allowed with wildcard origin (security vulnerability)",
timestamp,
));
// Rate limiting
assertions.push(create_authoritative_assertion(
signing_key,
"owasp://rate_limit/enabled",
"enabled",
ObjectValue::Boolean(true),
SourceClass::Clinical,
"OWASP: Rate limiting SHOULD be enabled for API endpoints",
timestamp,
));
assertions
}
/// Create a signed authoritative assertion.
///
/// This helper is used by corpus builders to create signed assertions with
/// consistent structure and metadata.
pub fn create_authoritative_assertion(
signing_key: &SigningKey,
subject: &str,
predicate: &str,
object: ObjectValue,
source_class: SourceClass,
description: &str,
timestamp: u64,
) -> Assertion {
// Compute source hash
let mut hasher = Hasher::new();
hasher.update(subject.as_bytes());
hasher.update(predicate.as_bytes());
hasher.update(description.as_bytes());
let source_hash = *hasher.finalize().as_bytes();
// Create signature
let message = format!("{}:{}", subject, predicate);
let signature = signing_key.sign(message.as_bytes());
let verifying_key = signing_key.verifying_key();
let signature_entry = SignatureEntry {
agent_id: verifying_key.to_bytes(),
signature: signature.to_bytes(),
timestamp,
version: 1,
};
let source_metadata = serde_json::json!({
"description": description,
"source": "authoritative_corpus",
});
Assertion {
subject: subject.to_string(),
predicate: predicate.to_string(),
object,
parent_hash: None,
source_hash,
source_class,
visual_hash: None,
epoch: None,
source_metadata: serde_json::to_vec(&source_metadata).ok(),
lifecycle: LifecycleStage::Approved,
signatures: vec![signature_entry],
confidence: 1.0,
timestamp,
hlc_timestamp: HlcTimestamp::default(),
vector: None,
}
}

View File

@ -0,0 +1,438 @@
//! Local Episteme integration for Aphoria.
//!
//! Provides a simplified interface to the local Episteme instance for:
//! - Ingesting assertions from extracted claims
//! - Querying for conflicts with authoritative sources
//! - Managing the authoritative corpus
//! - Auto-creating aliases when conflicts are detected (Phase 2A.3)
mod corpus;
#[cfg(test)]
mod tests;
use std::collections::HashMap;
use std::path::Path;
use std::sync::Arc;
use ed25519_dalek::SigningKey;
use stemedb_core::types::{
AliasOrigin, Assertion, ConceptAlias, ConceptPath, SourceClass,
};
use stemedb_ingest::{serialize_assertion, Ingestor};
use stemedb_storage::{AliasStore, GenericAliasStore, HybridStore};
use stemedb_wal::Journal;
use tokio::sync::Mutex;
use tracing::{debug, info, instrument, warn};
use crate::bridge::{claim_to_assertion, load_or_generate_key};
use crate::config::AphoriaConfig;
use crate::types::{ConflictResult, ConflictingSource, ExtractedClaim, Verdict};
use crate::AphoriaError;
pub use corpus::{create_authoritative_assertion, create_authoritative_corpus};
use corpus::current_timestamp;
/// In-memory index for concept matching by tail path segments.
///
/// Maps `{tail_seg1}/{tail_seg2}::{predicate}` → `Vec<Assertion>`.
/// This enables matching claims across different URI schemes by their
/// trailing path components.
///
/// # Example
///
/// Both of these subjects produce the same key `"tls/cert_verification::enabled"`:
/// - `rfc://5246/tls/cert_verification`
/// - `code://rust/myapp/client/tls/cert_verification`
pub struct ConceptIndex {
entries: HashMap<String, Vec<Assertion>>,
}
impl ConceptIndex {
/// Build a ConceptIndex from a slice of assertions.
pub fn build(assertions: &[Assertion]) -> Self {
// Pre-allocate based on expected unique keys
let mut entries: HashMap<String, Vec<Assertion>> = HashMap::with_capacity(assertions.len());
for assertion in assertions {
if let Some(key) = Self::make_key(&assertion.subject, &assertion.predicate) {
entries.entry(key).or_default().push(assertion.clone());
}
}
Self { entries }
}
/// Look up assertions matching the tail segments of a subject and predicate.
pub fn lookup(&self, subject: &str, predicate: &str) -> Option<&Vec<Assertion>> {
let key = Self::make_key(subject, predicate)?;
self.entries.get(&key)
}
/// Create a lookup key from subject and predicate.
///
/// Algorithm:
/// 1. Split subject on `"://"`, take path part
/// 2. Split path on `"/"` in reverse, get last 2 non-empty segments
/// 3. If < 2 segments, return None
/// 4. Return `"{seg[-2]}/{seg[-1]}::{predicate}"`
pub fn make_key(subject: &str, predicate: &str) -> Option<String> {
// Split on "://" to separate scheme from path
let path = subject.find("://").map(|i| &subject[i + 3..]).unwrap_or(subject);
// Get last two non-empty segments using rsplit (avoids Vec allocation)
let mut segments = path.rsplit('/').filter(|s| !s.is_empty());
let tail2 = segments.next()?;
let tail1 = segments.next()?;
Some(format!("{}/{}::{}", tail1, tail2, predicate))
}
}
/// Local Episteme instance for Aphoria.
pub struct LocalEpisteme {
journal: Arc<Mutex<Journal>>,
/// Store is owned by this struct but accessed via the Ingestor and AliasStore.
/// Keeping a reference ensures the store outlives dependent structs.
#[allow(dead_code)]
store: Arc<HybridStore>,
ingestor: Ingestor<HybridStore>,
signing_key: SigningKey,
/// AliasStore for persisting cross-scheme aliases discovered during conflict detection.
alias_store: GenericAliasStore<Arc<HybridStore>>,
}
impl LocalEpisteme {
/// Open or create a local Episteme instance.
#[instrument(skip(config), fields(data_dir = %config.episteme.data_dir.display()))]
pub async fn open(config: &AphoriaConfig, project_root: &Path) -> Result<Self, AphoriaError> {
let data_dir = &config.episteme.data_dir;
// Create directories if needed
std::fs::create_dir_all(data_dir)?;
// Canonicalize paths (required by fjall/lsm-tree)
let data_dir = data_dir.canonicalize().map_err(|e| {
AphoriaError::Storage(format!("Failed to canonicalize data_dir: {}", e))
})?;
let wal_dir = data_dir.join("wal");
let store_dir = data_dir.join("store");
std::fs::create_dir_all(&wal_dir)?;
std::fs::create_dir_all(&store_dir)?;
info!("Opening local Episteme at {}", data_dir.display());
// Open WAL
let journal = Arc::new(Mutex::new(
Journal::open(&wal_dir).map_err(|e| AphoriaError::Storage(e.to_string()))?,
));
// Open store
let store = Arc::new(
HybridStore::open(&store_dir).map_err(|e| AphoriaError::Storage(e.to_string()))?,
);
// Create ingestor
let mut ingestor = Ingestor::new(journal.clone(), store.clone())
.await
.map_err(|e| AphoriaError::Storage(e.to_string()))?;
ingestor.start();
// Load or generate signing key
let signing_key =
load_or_generate_key(project_root).map_err(|e| AphoriaError::Storage(e.to_string()))?;
// Create alias store for auto-alias persistence
let alias_store = GenericAliasStore::new(store.clone());
Ok(Self { journal, store, ingestor, signing_key, alias_store })
}
/// Ingest a batch of extracted claims into Episteme.
#[instrument(skip(self, claims), fields(claim_count = claims.len()))]
pub async fn ingest_claims(&self, claims: &[ExtractedClaim]) -> Result<usize, AphoriaError> {
let timestamp = current_timestamp();
let mut ingested = 0;
for claim in claims {
let assertion = claim_to_assertion(claim, &self.signing_key, timestamp);
// Serialize and write to WAL
let record_bytes = serialize_assertion(&assertion)
.map_err(|e| AphoriaError::Storage(e.to_string()))?;
let mut journal = self.journal.lock().await;
journal.append(record_bytes).map_err(|e| AphoriaError::Storage(e.to_string()))?;
debug!(
concept_path = %claim.concept_path,
predicate = %claim.predicate,
"Ingested claim"
);
ingested += 1;
}
// Sync WAL
{
let mut journal = self.journal.lock().await;
journal.force_sync().map_err(|e| AphoriaError::Storage(e.to_string()))?;
}
// Wait for ingestion to process
self.ingestor.process_pending().await.map_err(|e| AphoriaError::Storage(e.to_string()))?;
info!(ingested, "Ingested claims into Episteme");
Ok(ingested)
}
/// Check for conflicts between extracted claims and authoritative sources.
///
/// Uses tail-path matching via `ConceptIndex` to find conflicts across different
/// URI schemes. For example, a code claim at `code://rust/myapp/tls/cert_verification`
/// will match authoritative assertions at `rfc://5246/tls/cert_verification`.
///
/// When `config.aliases.auto_create_aliases` is enabled, this method will
/// automatically persist aliases for matched concepts, enabling faster future
/// queries via `QueryEngine` with `resolve_aliases: true`.
#[instrument(skip(self, claims, config, index), fields(claim_count = claims.len()))]
pub async fn check_conflicts(
&self,
claims: &[ExtractedClaim],
config: &AphoriaConfig,
index: &ConceptIndex,
) -> Result<Vec<ConflictResult>, AphoriaError> {
let mut results = Vec::new();
let mut aliases_created = 0usize;
let timestamp = current_timestamp();
let agent_id = self.agent_id();
for claim in claims {
// Look up authoritative assertions matching this claim's tail path
let auth_assertions = match index.lookup(&claim.concept_path, &claim.predicate) {
Some(assertions) => assertions,
None => continue, // No authoritative coverage for this concept
};
// Find conflicting authoritative sources
let mut conflicts = Vec::new();
for assertion in auth_assertions {
// Skip if it's our own assertion (same source class)
if assertion.source_class == SourceClass::Expert {
continue;
}
// Auto-create alias if enabled (regardless of value conflict)
// This bridges the code path to the authoritative path for future queries
if config.aliases.auto_create_aliases {
if let Err(e) = self
.create_alias_if_new(
&claim.concept_path,
&assertion.subject,
agent_id,
timestamp,
)
.await
{
warn!(
code_path = %claim.concept_path,
auth_path = %assertion.subject,
error = %e,
"Failed to create alias"
);
} else {
aliases_created += 1;
}
}
// Check if value differs (for conflict reporting)
if assertion.object != claim.value {
// Only consider Tier 0-2 as authoritative
if assertion.source_class.tier() <= 2 {
conflicts.push(ConflictingSource {
path: assertion.subject.clone(),
source_class: assertion.source_class,
value: assertion.object.clone(),
confidence: assertion.confidence,
});
}
}
}
if conflicts.is_empty() {
continue;
}
// Compute conflict score
let conflict_score = compute_conflict_score(&conflicts, claim.confidence);
// Determine verdict
let verdict = if conflict_score >= config.thresholds.block {
Verdict::Block
} else if conflict_score >= config.thresholds.flag {
Verdict::Flag
} else {
Verdict::Pass
};
results.push(ConflictResult {
claim: claim.clone(),
conflicts,
conflict_score,
verdict,
acknowledged: None,
});
}
info!(
conflicts = results.len(),
blocks = results.iter().filter(|r| r.verdict == Verdict::Block).count(),
flags = results.iter().filter(|r| r.verdict == Verdict::Flag).count(),
aliases_created,
"Conflict check complete"
);
Ok(results)
}
/// Ingest authoritative assertions (RFC, OWASP, etc.).
#[instrument(skip(self, assertions), fields(count = assertions.len()))]
pub async fn ingest_authoritative(
&self,
assertions: &[Assertion],
) -> Result<usize, AphoriaError> {
let mut ingested = 0;
for assertion in assertions {
let record_bytes =
serialize_assertion(assertion).map_err(|e| AphoriaError::Storage(e.to_string()))?;
let mut journal = self.journal.lock().await;
journal.append(record_bytes).map_err(|e| AphoriaError::Storage(e.to_string()))?;
ingested += 1;
}
// Sync and process
{
let mut journal = self.journal.lock().await;
journal.force_sync().map_err(|e| AphoriaError::Storage(e.to_string()))?;
}
self.ingestor.process_pending().await.map_err(|e| AphoriaError::Storage(e.to_string()))?;
info!(ingested, "Ingested authoritative assertions");
Ok(ingested)
}
/// Shut down the Episteme instance gracefully.
pub async fn shutdown(&mut self) {
info!("Shutting down local Episteme");
self.ingestor.shutdown(std::time::Duration::from_secs(2)).await;
}
/// Get the signing key's public key bytes for alias creation.
pub fn agent_id(&self) -> [u8; 32] {
self.signing_key.verifying_key().to_bytes()
}
/// Create an alias from a code path to an authoritative path, if it doesn't already exist.
///
/// This is used during conflict detection to persist the relationship between
/// code concepts and their authoritative counterparts.
#[instrument(skip(self), fields(code_path = %code_path, auth_path = %auth_path))]
async fn create_alias_if_new(
&self,
code_path: &str,
auth_path: &str,
agent_id: [u8; 32],
timestamp: u64,
) -> Result<(), AphoriaError> {
// Check if alias already exists
let existing = self
.alias_store
.get_canonical(code_path)
.await
.map_err(|e| AphoriaError::Storage(e.to_string()))?;
if existing.is_some() {
debug!("Alias already exists, skipping");
return Ok(());
}
// Parse paths
let alias_path = ConceptPath::parse(code_path)
.map_err(|e| AphoriaError::Storage(format!("Invalid code path: {}", e)))?;
let canonical_path = ConceptPath::parse(auth_path)
.map_err(|e| AphoriaError::Storage(format!("Invalid auth path: {}", e)))?;
// Create and persist alias
let alias = ConceptAlias::new(
alias_path,
canonical_path,
agent_id,
timestamp,
AliasOrigin::AutoDetected,
);
self.alias_store
.set_alias(&alias)
.await
.map_err(|e| AphoriaError::Storage(e.to_string()))?;
debug!("Created auto-detected alias");
Ok(())
}
/// Get a reference to the alias store for querying created aliases.
#[allow(dead_code)]
pub fn alias_store(&self) -> &GenericAliasStore<Arc<HybridStore>> {
&self.alias_store
}
}
/// Compute conflict score based on authoritative sources and claim confidence.
///
/// The score uses two approaches and takes the maximum:
///
/// 1. **Boosted score**: `max_tier_weight * (1.0 - code_weight) * max_confidence`
/// where code_weight = Expert (Tier 3) = 0.5. This is low unless the
/// authoritative source has very high authority weight.
///
/// 2. **Normalized score**: Linear mapping from tier distance to score:
/// - Tier 0 (Regulatory) vs code → 0.95 (above BLOCK threshold 0.7)
/// - Tier 1 (Clinical) vs code → 0.77 (above BLOCK threshold 0.7)
/// - Tier 2 (Observational) vs code → 0.58 (above FLAG threshold 0.4)
/// - Tier 3 (same tier) vs code → 0.40 (at FLAG threshold)
///
/// The final score is capped at 1.0.
fn compute_conflict_score(conflicts: &[ConflictingSource], _claim_confidence: f32) -> f32 {
if conflicts.is_empty() {
return 0.0;
}
// Get max tier weight from conflicting sources
let max_tier_weight = conflicts
.iter()
.map(|c| c.source_class.authority_weight())
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(0.0);
// Code claims are Expert (Tier 3) = 0.5 weight
let code_weight = SourceClass::Expert.authority_weight();
// Base conflict score from tier spread
let base_score = max_tier_weight * (1.0 - code_weight);
// Boost by authoritative source confidence
let max_confidence = conflicts
.iter()
.map(|c| c.confidence)
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(1.0);
let boosted_score = base_score * max_confidence;
// Normalize: tier spread 0→3 maps to 0.4→0.95
let min_tier = conflicts.iter().map(|c| c.source_class.tier()).min().unwrap_or(3) as f32;
let normalized = 0.4 + (3.0 - min_tier) / 3.0 * 0.55;
normalized.max(boosted_score).min(1.0)
}

View File

@ -0,0 +1,383 @@
//! Tests for the Episteme integration module.
use stemedb_core::types::ObjectValue;
use super::*;
use crate::types::ConflictingSource;
// ==========================================================================
// ConceptIndex::make_key tests
// ==========================================================================
#[test]
fn test_make_key_rfc() {
let key = ConceptIndex::make_key("rfc://5246/tls/cert_verification", "enabled");
assert_eq!(key, Some("tls/cert_verification::enabled".to_string()));
}
#[test]
fn test_make_key_code() {
let key = ConceptIndex::make_key("code://rust/myapp/client/tls/cert_verification", "enabled");
assert_eq!(key, Some("tls/cert_verification::enabled".to_string()));
}
#[test]
fn test_make_key_owasp() {
let key = ConceptIndex::make_key("owasp://secrets/api_key", "storage_method");
assert_eq!(key, Some("secrets/api_key::storage_method".to_string()));
}
#[test]
fn test_make_key_single_segment_returns_none() {
// Only one segment after scheme - cannot form tail pair
let key = ConceptIndex::make_key("scheme://single", "predicate");
assert_eq!(key, None);
}
#[test]
fn test_make_key_no_scheme() {
// No "://" - whole string is path
let key = ConceptIndex::make_key("tls/cert_verification", "enabled");
assert_eq!(key, Some("tls/cert_verification::enabled".to_string()));
}
#[test]
fn test_make_key_empty_segments() {
// Double slashes should be filtered out
let key = ConceptIndex::make_key("rfc://5246//tls//cert_verification", "enabled");
assert_eq!(key, Some("tls/cert_verification::enabled".to_string()));
}
// ==========================================================================
// ConceptIndex::lookup tests
// ==========================================================================
#[test]
fn test_lookup_matches_across_schemes() {
let key = crate::bridge::generate_signing_key();
let corpus = create_authoritative_corpus(&key);
let index = ConceptIndex::build(&corpus);
// Code claim should find RFC assertion
let matches = index.lookup("code://rust/myapp/tls/cert_verification", "enabled");
assert!(matches.is_some(), "Should find matches for TLS cert verification");
let assertions = matches.expect("matches should exist");
assert!(!assertions.is_empty(), "Should have at least one matching assertion");
assert!(
assertions.iter().any(|a| a.subject.contains("rfc://") || a.subject.contains("owasp://")),
"Matches should include authoritative sources"
);
}
#[test]
fn test_lookup_predicate_must_match() {
let key = crate::bridge::generate_signing_key();
let corpus = create_authoritative_corpus(&key);
let index = ConceptIndex::build(&corpus);
// Same path but wrong predicate should not match
let matches = index.lookup("code://rust/myapp/tls/cert_verification", "wrong_predicate");
assert!(matches.is_none(), "Wrong predicate should not match");
}
#[test]
fn test_no_match_for_uncovered_concept() {
let key = crate::bridge::generate_signing_key();
let corpus = create_authoritative_corpus(&key);
let index = ConceptIndex::build(&corpus);
// Concept not in authoritative corpus
let matches = index.lookup("code://rust/myapp/random/uncovered_concept", "some_predicate");
assert!(matches.is_none(), "Uncovered concept should not match");
}
#[test]
fn test_lookup_jwt_audience() {
let key = crate::bridge::generate_signing_key();
let corpus = create_authoritative_corpus(&key);
let index = ConceptIndex::build(&corpus);
// JWT audience validation
let matches = index.lookup("code://rust/myapp/jwt/audience_validation", "enabled");
assert!(matches.is_some(), "Should find JWT audience validation");
}
// ==========================================================================
// Conflict score tests
// ==========================================================================
#[test]
fn test_conflict_score_tier0_vs_tier3() {
let conflicts = vec![ConflictingSource {
path: "rfc://5246/tls/cert_verification".to_string(),
source_class: stemedb_core::types::SourceClass::Regulatory, // Tier 0
value: ObjectValue::Boolean(true),
confidence: 1.0,
}];
let score = compute_conflict_score(&conflicts, 1.0);
// Tier 0 (1.0 weight) vs Tier 3 (0.5 weight) should produce high score
assert!(score >= 0.7, "Expected high conflict score, got {}", score);
}
#[test]
fn test_conflict_score_tier1_vs_tier3() {
let conflicts = vec![ConflictingSource {
path: "owasp://transport_layer/tls".to_string(),
source_class: stemedb_core::types::SourceClass::Clinical, // Tier 1
value: ObjectValue::Boolean(true),
confidence: 0.95,
}];
let score = compute_conflict_score(&conflicts, 1.0);
// Should still be above FLAG threshold
assert!(score >= 0.4, "Expected medium conflict score, got {}", score);
}
#[test]
fn test_authoritative_corpus_creation() {
let key = crate::bridge::generate_signing_key();
let corpus = create_authoritative_corpus(&key);
// Should have at least 10 authoritative assertions
assert!(corpus.len() >= 10, "Expected at least 10 assertions, got {}", corpus.len());
// Check that TLS and JWT assertions exist
assert!(corpus.iter().any(|a| a.subject.contains("tls")));
assert!(corpus.iter().any(|a| a.subject.contains("jwt")));
}
// ==========================================================================
// Auto-alias creation tests (Phase 2A.3)
// ==========================================================================
#[tokio::test]
async fn test_auto_alias_creation_on_conflict() {
use crate::types::ExtractedClaim;
use stemedb_storage::AliasStore;
let temp_dir =
tempfile::Builder::new().prefix("aphoria_alias_test").tempdir().expect("create temp dir");
let mut config = crate::config::AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
config.aliases.auto_create_aliases = true; // Explicitly enable
// Create .aphoria directory for the agent key
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
// Open LocalEpisteme
let mut episteme = LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
// Create authoritative corpus and index
let signing_key = crate::bridge::load_or_generate_key(temp_dir.path()).expect("load key");
let corpus = create_authoritative_corpus(&signing_key);
let index = ConceptIndex::build(&corpus);
// Create a claim that will conflict with the authoritative corpus
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false), // Conflicts with RFC (true)
file: "src/client.rs".to_string(),
line: 42,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
};
// Run check_conflicts
let conflicts =
episteme.check_conflicts(&[claim], &config, &index).await.expect("check conflicts");
// Assert: conflict was detected
assert!(!conflicts.is_empty(), "Should have detected a conflict");
// Assert: alias was created
let canonical = episteme
.alias_store()
.get_canonical("code://rust/myapp/tls/cert_verification")
.await
.expect("get canonical");
assert!(canonical.is_some(), "Alias should have been auto-created for code path");
let canonical_path = canonical.expect("canonical exists");
assert!(
canonical_path.scheme == "rfc" || canonical_path.scheme == "owasp",
"Canonical should be an authoritative source (rfc or owasp), got: {}",
canonical_path.scheme
);
episteme.shutdown().await;
}
#[tokio::test]
async fn test_auto_alias_not_created_when_disabled() {
use crate::types::ExtractedClaim;
use stemedb_storage::AliasStore;
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_alias_disabled")
.tempdir()
.expect("create temp dir");
let mut config = crate::config::AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
config.aliases.auto_create_aliases = false; // Explicitly disable
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
let mut episteme = LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
let signing_key = crate::bridge::load_or_generate_key(temp_dir.path()).expect("load key");
let corpus = create_authoritative_corpus(&signing_key);
let index = ConceptIndex::build(&corpus);
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 42,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
};
let conflicts =
episteme.check_conflicts(&[claim], &config, &index).await.expect("check conflicts");
// Conflict should still be detected
assert!(!conflicts.is_empty(), "Should have detected a conflict");
// But alias should NOT have been created
let canonical = episteme
.alias_store()
.get_canonical("code://rust/myapp/tls/cert_verification")
.await
.expect("get canonical");
assert!(
canonical.is_none(),
"Alias should NOT be created when auto_create_aliases is false"
);
episteme.shutdown().await;
}
#[tokio::test]
async fn test_auto_alias_uses_auto_detected_origin() {
use crate::types::ExtractedClaim;
use stemedb_storage::AliasStore;
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_alias_origin")
.tempdir()
.expect("create temp dir");
let mut config = crate::config::AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
config.aliases.auto_create_aliases = true;
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
let mut episteme = LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
let signing_key = crate::bridge::load_or_generate_key(temp_dir.path()).expect("load key");
let corpus = create_authoritative_corpus(&signing_key);
let index = ConceptIndex::build(&corpus);
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/jwt/audience_validation".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/auth.rs".to_string(),
line: 100,
matched_text: "validate_aud = false".to_string(),
confidence: 1.0,
description: "JWT audience validation disabled".to_string(),
};
let _conflicts =
episteme.check_conflicts(&[claim], &config, &index).await.expect("check conflicts");
// Verify alias was created (we can check it exists)
let canonical = episteme
.alias_store()
.get_canonical("code://rust/myapp/jwt/audience_validation")
.await
.expect("get canonical");
assert!(canonical.is_some(), "Alias should have been created for JWT path");
// The AliasOrigin is stored internally; we verified it's set to AutoDetected
// in the create_alias_if_new implementation. The existence of the alias
// confirms the code path was executed.
episteme.shutdown().await;
}
#[tokio::test]
async fn test_auto_alias_idempotent() {
use crate::types::ExtractedClaim;
use stemedb_storage::AliasStore;
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_alias_idempotent")
.tempdir()
.expect("create temp dir");
let mut config = crate::config::AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
config.aliases.auto_create_aliases = true;
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
let mut episteme = LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
let signing_key = crate::bridge::load_or_generate_key(temp_dir.path()).expect("load key");
let corpus = create_authoritative_corpus(&signing_key);
let index = ConceptIndex::build(&corpus);
let claim = ExtractedClaim {
concept_path: "code://rust/myapp/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 42,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
};
// Run check_conflicts twice
let _conflicts1 = episteme
.check_conflicts(std::slice::from_ref(&claim), &config, &index)
.await
.expect("check conflicts 1");
let _conflicts2 =
episteme.check_conflicts(&[claim], &config, &index).await.expect("check conflicts 2");
// List all aliases - should only have one entry for this code path
let all_aliases = episteme.alias_store().list_all_aliases().await.expect("list aliases");
let tls_aliases: Vec<_> =
all_aliases.iter().filter(|(alias, _)| alias.contains("tls/cert_verification")).collect();
// Should have exactly one TLS alias (the code path → RFC)
assert!(
tls_aliases.len() <= 2, // May have both rfc and owasp matches
"Repeated calls should not create duplicate aliases. Found: {:?}",
tls_aliases
);
episteme.shutdown().await;
}

View File

@ -62,4 +62,26 @@ pub enum AphoriaError {
/// Acknowledgment error. /// Acknowledgment error.
#[error("Acknowledgment error: {0}")] #[error("Acknowledgment error: {0}")]
Acknowledge(String), Acknowledge(String),
/// RFC fetch error.
#[error("Failed to fetch RFC {rfc}: {message}")]
RfcFetch {
/// The RFC number that failed to fetch.
rfc: u32,
/// The error message.
message: String,
},
/// OWASP cheat sheet fetch error.
#[error("Failed to fetch OWASP cheat sheet {sheet}: {message}")]
OwaspFetch {
/// The cheat sheet that failed to fetch.
sheet: String,
/// The error message.
message: String,
},
/// Corpus build error.
#[error("Corpus build error: {0}")]
CorpusBuild(String),
} }

View File

@ -0,0 +1,187 @@
//! CORS configuration extractor.
//!
//! Detects overly permissive CORS settings that could expose
//! the application to cross-origin attacks.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Extractor for CORS configuration issues.
pub struct CorsConfigExtractor {
/// Wildcard allow-origin patterns
allow_all_origins: Regex,
/// Credentials enabled pattern
allow_credentials: Regex,
}
impl Default for CorsConfigExtractor {
fn default() -> Self {
Self::new()
}
}
impl CorsConfigExtractor {
/// Create a new CORS config extractor.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new() -> Self {
Self {
allow_all_origins: Regex::new(
r#"(?i)(allow_origin\s*[:=\(]\s*["']\*["']|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true|cors.*origin.*\*)"#,
)
.expect("valid regex"),
allow_credentials: Regex::new(
r"(?i)(allow_credentials|AllowCredentials|credentials)\s*[:=]\s*true",
)
.expect("valid regex"),
}
}
}
impl Extractor for CorsConfigExtractor {
fn name(&self) -> &str {
"cors_config"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
_language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
let mut found_wildcard_origin = false;
let mut wildcard_line = 0;
let mut wildcard_text = String::new();
for (line_idx, line) in content.lines().enumerate() {
let line_num = line_idx + 1;
// Wildcard allow-origin detection
if let Some(matched) = self.allow_all_origins.find(line) {
found_wildcard_origin = true;
wildcard_line = line_num;
wildcard_text = matched.as_str().to_string();
let mut concept_path = path_segments.to_vec();
concept_path.push("cors".to_string());
concept_path.push("allow_origin".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "config_value".to_string(),
value: ObjectValue::Text("*".to_string()),
file: file.to_string(),
line: line_num,
matched_text: matched.as_str().to_string(),
confidence: 1.0,
description: "CORS allows all origins".to_string(),
});
}
}
// Check for credentials with wildcard (dangerous combination)
// Look within a reasonable proximity (same file suggests related config)
if found_wildcard_origin && self.allow_credentials.is_match(content) {
let mut concept_path = path_segments.to_vec();
concept_path.push("cors".to_string());
concept_path.push("credentials_with_wildcard".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(true),
file: file.to_string(),
line: wildcard_line,
matched_text: wildcard_text,
confidence: 0.9, // Slightly lower - we're inferring the combination
description: "CORS allows credentials with wildcard origin (security risk)"
.to_string(),
});
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_wildcard_origin() {
let extractor = CorsConfigExtractor::new();
let content = r#"
cors = tower_http::cors::CorsLayer::permissive()
.allow_origin("*")
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/app.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("allow_origin"));
}
#[test]
fn test_access_control_header() {
let extractor = CorsConfigExtractor::new();
let content = r#"
res.setHeader("Access-Control-Allow-Origin", "*");
"#;
let claims =
extractor.extract(&["js".to_string()], content, Language::JavaScript, "server.js");
assert_eq!(claims.len(), 1);
}
#[test]
fn test_credentials_with_wildcard() {
let extractor = CorsConfigExtractor::new();
let content = r#"
cors:
allow_origin: "*"
allow_credentials: true
"#;
let claims =
extractor.extract(&["config".to_string()], content, Language::Yaml, "config/cors.yaml");
assert_eq!(claims.len(), 2);
assert!(claims.iter().any(|c| c.concept_path.contains("credentials_with_wildcard")));
}
#[test]
fn test_go_allow_all_origins() {
let extractor = CorsConfigExtractor::new();
let content = r#"
c := cors.New(cors.Config{
AllowAllOrigins: true,
})
"#;
let claims = extractor.extract(&["go".to_string()], content, Language::Go, "main.go");
assert_eq!(claims.len(), 1);
}
}

View File

@ -0,0 +1,350 @@
//! Dependency version extractor.
//!
//! Checks for dependencies with known vulnerabilities by comparing
//! installed versions against advisory databases.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Extractor for vulnerable dependency versions.
///
/// Note: This is a simplified version that detects common patterns.
/// A full implementation would integrate with RustSec, npm audit, etc.
pub struct DepVersionsExtractor {
/// Cargo.toml dependency patterns
cargo_dep: Regex,
/// package.json dependency patterns
npm_dep: Regex,
/// go.mod dependency patterns
go_dep: Regex,
/// requirements.txt patterns
pip_dep: Regex,
}
impl Default for DepVersionsExtractor {
fn default() -> Self {
Self::new()
}
}
impl DepVersionsExtractor {
/// Create a new dependency version extractor.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new() -> Self {
Self {
// Matches: package = "1.0.0" or package = { version = "1.0.0" }
cargo_dep: Regex::new(
r#"^([a-zA-Z0-9_-]+)\s*=\s*(?:"([^"]+)"|.*version\s*=\s*"([^"]+)")"#,
)
.expect("valid regex"),
// Matches: "package": "^1.0.0"
npm_dep: Regex::new(r#""([^"]+)":\s*"([~^]?[\d.]+[^"]*)""#).expect("valid regex"),
// Matches: module/path v1.0.0
go_dep: Regex::new(r"^\s*([a-zA-Z0-9./_-]+)\s+(v[\d.]+(?:-[a-zA-Z0-9.]+)?)")
.expect("valid regex"),
// Matches: package==1.0.0 or package>=1.0.0
pip_dep: Regex::new(r"^([a-zA-Z0-9_-]+)(?:==|>=|<=|~=|!=)?([\d.]+(?:\.[a-zA-Z0-9]+)?)")
.expect("valid regex"),
}
}
fn extract_cargo(
&self,
path_segments: &[String],
content: &str,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
if let Some(captures) = self.cargo_dep.captures(line) {
let package = captures.get(1).map(|m| m.as_str()).unwrap_or("");
let version = captures.get(2).or(captures.get(3)).map(|m| m.as_str()).unwrap_or("");
if !package.is_empty() && !version.is_empty() && version != "*" {
// Record the dependency for potential advisory lookup
let mut concept_path = path_segments.to_vec();
concept_path.push("dep".to_string());
concept_path.push(package.to_string());
concept_path.push("version".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "installed_version".to_string(),
value: ObjectValue::Text(version.to_string()),
file: file.to_string(),
line: line_idx + 1,
matched_text: line.trim().to_string(),
confidence: 1.0,
description: format!("Dependency {} at version {}", package, version),
});
}
}
}
claims
}
fn extract_npm(
&self,
path_segments: &[String],
content: &str,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
if let Some(captures) = self.npm_dep.captures(line) {
let package = captures.get(1).map(|m| m.as_str()).unwrap_or("");
let version = captures.get(2).map(|m| m.as_str()).unwrap_or("");
// Skip npm metadata fields
if package.starts_with('@')
|| [
"name",
"version",
"description",
"main",
"scripts",
"devDependencies",
"dependencies",
"peerDependencies",
]
.contains(&package)
{
continue;
}
if !package.is_empty() && !version.is_empty() {
let mut concept_path = path_segments.to_vec();
concept_path.push("dep".to_string());
concept_path.push(package.to_string());
concept_path.push("version".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "installed_version".to_string(),
value: ObjectValue::Text(version.to_string()),
file: file.to_string(),
line: line_idx + 1,
matched_text: line.trim().to_string(),
confidence: 1.0,
description: format!("Dependency {} at version {}", package, version),
});
}
}
}
claims
}
fn extract_go(
&self,
path_segments: &[String],
content: &str,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
let mut in_require = false;
for (line_idx, line) in content.lines().enumerate() {
// Track require block
if line.contains("require (") || line.contains("require(") {
in_require = true;
continue;
}
if in_require && line.contains(')') {
in_require = false;
continue;
}
if in_require {
if let Some(captures) = self.go_dep.captures(line) {
let package = captures.get(1).map(|m| m.as_str()).unwrap_or("");
let version = captures.get(2).map(|m| m.as_str()).unwrap_or("");
if !package.is_empty() && !version.is_empty() {
// Use last segment of path as package name
let short_name = package.rsplit('/').next().unwrap_or(package);
let mut concept_path = path_segments.to_vec();
concept_path.push("dep".to_string());
concept_path.push(short_name.to_string());
concept_path.push("version".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "installed_version".to_string(),
value: ObjectValue::Text(version.to_string()),
file: file.to_string(),
line: line_idx + 1,
matched_text: line.trim().to_string(),
confidence: 1.0,
description: format!("Dependency {} at version {}", package, version),
});
}
}
}
}
claims
}
fn extract_pip(
&self,
path_segments: &[String],
content: &str,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
let line = line.trim();
// Skip comments and empty lines
if line.starts_with('#') || line.is_empty() {
continue;
}
if let Some(captures) = self.pip_dep.captures(line) {
let package = captures.get(1).map(|m| m.as_str()).unwrap_or("");
let version = captures.get(2).map(|m| m.as_str()).unwrap_or("");
if !package.is_empty() && !version.is_empty() {
let mut concept_path = path_segments.to_vec();
concept_path.push("dep".to_string());
concept_path.push(package.to_string());
concept_path.push("version".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "installed_version".to_string(),
value: ObjectValue::Text(version.to_string()),
file: file.to_string(),
line: line_idx + 1,
matched_text: line.to_string(),
confidence: 1.0,
description: format!("Dependency {} at version {}", package, version),
});
}
}
}
claims
}
}
impl Extractor for DepVersionsExtractor {
fn name(&self) -> &str {
"dep_versions"
}
fn languages(&self) -> &[Language] {
&[Language::CargoManifest, Language::NpmManifest, Language::GoMod, Language::PythonManifest]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
match language {
Language::CargoManifest => self.extract_cargo(path_segments, content, file),
Language::NpmManifest => self.extract_npm(path_segments, content, file),
Language::GoMod => self.extract_go(path_segments, content, file),
Language::PythonManifest => self.extract_pip(path_segments, content, file),
_ => Vec::new(),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_cargo_dependency_extraction() {
let extractor = DepVersionsExtractor::new();
let content = r#"
[dependencies]
tokio = "1.28"
serde = { version = "1.0", features = ["derive"] }
"#;
let claims = extractor.extract(
&["rust".to_string()],
content,
Language::CargoManifest,
"Cargo.toml",
);
assert_eq!(claims.len(), 2);
assert!(claims.iter().any(|c| c.concept_path.contains("tokio")));
assert!(claims.iter().any(|c| c.concept_path.contains("serde")));
}
#[test]
fn test_npm_dependency_extraction() {
let extractor = DepVersionsExtractor::new();
let content = r#"
{
"dependencies": {
"express": "^4.18.0",
"lodash": "4.17.21"
}
}
"#;
let claims =
extractor.extract(&["js".to_string()], content, Language::NpmManifest, "package.json");
assert_eq!(claims.len(), 2);
}
#[test]
fn test_go_mod_extraction() {
let extractor = DepVersionsExtractor::new();
let content = r#"
module myapp
go 1.21
require (
github.com/gin-gonic/gin v1.9.0
golang.org/x/crypto v0.14.0
)
"#;
let claims = extractor.extract(&["go".to_string()], content, Language::GoMod, "go.mod");
assert_eq!(claims.len(), 2);
assert!(claims.iter().any(|c| c.concept_path.contains("gin")));
assert!(claims.iter().any(|c| c.concept_path.contains("crypto")));
}
#[test]
fn test_pip_requirements_extraction() {
let extractor = DepVersionsExtractor::new();
let content = r#"
# Python requirements
requests==2.28.0
flask>=2.0.0
"#;
let claims = extractor.extract(
&["python".to_string()],
content,
Language::PythonManifest,
"requirements.txt",
);
assert_eq!(claims.len(), 2);
}
}

View File

@ -0,0 +1,291 @@
//! Hardcoded secrets extractor.
//!
//! Detects credentials, API keys, and tokens embedded in source code,
//! which violates security best practices.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Extractor for hardcoded secrets in source code.
pub struct HardcodedSecretsExtractor {
/// API keys (generic)
api_key: Regex,
/// Passwords
password: Regex,
/// AWS access key IDs (AKIA prefix)
aws_key: Regex,
/// Private keys (PEM format)
private_key: Regex,
/// Generic secrets/tokens
secret_token: Regex,
/// Placeholder values to exclude
placeholder: Regex,
}
impl Default for HardcodedSecretsExtractor {
fn default() -> Self {
Self::new()
}
}
impl HardcodedSecretsExtractor {
/// Create a new secrets extractor with compiled regexes.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new() -> Self {
Self {
api_key: Regex::new(r#"(?i)(api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["']"#)
.expect("valid regex"),
password: Regex::new(r#"(?i)(password|passwd|pwd)\s*[:=]\s*["'][^"']{4,}["']"#)
.expect("valid regex"),
aws_key: Regex::new(r"AKIA[0-9A-Z]{16}").expect("valid regex"),
private_key: Regex::new(r"-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----")
.expect("valid regex"),
secret_token: Regex::new(
r#"(?i)(secret|token|auth[_-]?key)\s*[:=]\s*["'][A-Za-z0-9_\-/.+=]{16,}["']"#,
)
.expect("valid regex"),
placeholder: Regex::new(
r#"(?i)(password|changeme|placeholder|CHANGE_ME|xxx|your[_-]?|example|test|dummy|fake|sample)"#,
)
.expect("valid regex"),
}
}
fn is_placeholder(&self, value: &str) -> bool {
self.placeholder.is_match(value)
}
fn is_test_file(&self, file: &str) -> bool {
let lower = file.to_lowercase();
lower.contains("test")
|| lower.contains("spec")
|| lower.contains("example")
|| lower.contains("fixture")
|| lower.contains("mock")
}
fn extract_secret(
&self,
path_segments: &[String],
file: &str,
line: usize,
matched_text: &str,
leaf: &str,
description: &str,
) -> ExtractedClaim {
let mut concept_path = path_segments.to_vec();
concept_path.push("secrets".to_string());
concept_path.push(leaf.to_string());
// Lower confidence for test files
let confidence = if self.is_test_file(file) { 0.5 } else { 1.0 };
ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "storage_method".to_string(),
value: ObjectValue::Text("hardcoded".to_string()),
file: file.to_string(),
line,
matched_text: matched_text.to_string(),
confidence,
description: description.to_string(),
}
}
}
impl Extractor for HardcodedSecretsExtractor {
fn name(&self) -> &str {
"hardcoded_secrets"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
Language::Dotenv,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
_language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
let line_num = line_idx + 1;
// API key detection
if let Some(matched) = self.api_key.find(line) {
let matched_str = matched.as_str();
if !self.is_placeholder(matched_str) {
claims.push(self.extract_secret(
path_segments,
file,
line_num,
matched_str,
"api_key",
"API key is hardcoded in source",
));
}
}
// Password detection
if let Some(matched) = self.password.find(line) {
let matched_str = matched.as_str();
if !self.is_placeholder(matched_str) {
claims.push(self.extract_secret(
path_segments,
file,
line_num,
matched_str,
"password",
"Password is hardcoded in source",
));
}
}
// AWS key detection
if let Some(matched) = self.aws_key.find(line) {
claims.push(self.extract_secret(
path_segments,
file,
line_num,
matched.as_str(),
"aws_credentials",
"AWS access key ID is hardcoded in source",
));
}
// Private key detection
if let Some(matched) = self.private_key.find(line) {
claims.push(self.extract_secret(
path_segments,
file,
line_num,
matched.as_str(),
"private_key",
"Private key is embedded in source",
));
}
// Generic secret/token detection
if let Some(matched) = self.secret_token.find(line) {
let matched_str = matched.as_str();
if !self.is_placeholder(matched_str) {
claims.push(self.extract_secret(
path_segments,
file,
line_num,
matched_str,
"secret_token",
"Secret or token is hardcoded in source",
));
}
}
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_api_key_detection() {
let extractor = HardcodedSecretsExtractor::new();
let content = r#"
const API_KEY = "sk_live_1234567890abcdefghij";
"#;
let claims =
extractor.extract(&["js".to_string()], content, Language::JavaScript, "src/config.js");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("api_key"));
}
#[test]
fn test_aws_key_detection() {
let extractor = HardcodedSecretsExtractor::new();
let content = r#"
aws_access_key = "AKIAIOSFODNN7EXAMPLE"
"#;
let claims =
extractor.extract(&["python".to_string()], content, Language::Python, "config.py");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("aws_credentials"));
}
#[test]
fn test_private_key_detection() {
let extractor = HardcodedSecretsExtractor::new();
let content = r#"
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEA...
-----END RSA PRIVATE KEY-----
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/cert.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("private_key"));
}
#[test]
fn test_excludes_placeholders() {
let extractor = HardcodedSecretsExtractor::new();
let content = r#"
password = "changeme"
api_key = "your_api_key_here"
secret = "CHANGE_ME"
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Yaml,
"config/example.yaml",
);
assert!(claims.is_empty());
}
#[test]
fn test_lower_confidence_for_test_files() {
let extractor = HardcodedSecretsExtractor::new();
let content = r#"
const API_KEY = "sk_live_1234567890abcdefghij";
"#;
let claims = extractor.extract(
&["js".to_string()],
content,
Language::JavaScript,
"src/__tests__/api.spec.js",
);
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].confidence, 0.5);
}
}

View File

@ -0,0 +1,267 @@
//! JWT configuration extractor.
//!
//! Detects patterns where JWT validation is misconfigured,
//! violating RFC 7519 security requirements.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Extractor for JWT validation configuration.
pub struct JwtConfigExtractor {
/// Audience validation disabled
aud_disabled: Regex,
/// Algorithm none allowed
alg_none: Regex,
/// Signature verification skipped
sig_skip: Regex,
/// Expiry validation disabled
exp_disabled: Regex,
/// Go jwt.Parse without algorithm check (heuristic)
go_parse_insecure: Regex,
}
impl Default for JwtConfigExtractor {
fn default() -> Self {
Self::new()
}
}
impl JwtConfigExtractor {
/// Create a new JWT config extractor with compiled regexes.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new() -> Self {
Self {
aud_disabled: Regex::new(
r"(?i)(set_audience.*\[\]|validate_aud.*false|aud.*None|ValidateAudience.*false)",
)
.expect("valid regex"),
alg_none: Regex::new(
r"(?i)(Algorithm::None|alg.*none|allow_none.*true|SigningMethodNone)",
)
.expect("valid regex"),
sig_skip: Regex::new(
r"(?i)(dangerous_insecure|skip_signature|verify.*false|RequireSignedTokens.*false)",
)
.expect("valid regex"),
exp_disabled: Regex::new(
r"(?i)(validate_exp.*false|RequireExpirationTime.*false|IgnoreExpiration)",
)
.expect("valid regex"),
go_parse_insecure: Regex::new(
r"jwt\.Parse\([^,]+,\s*func\s*\([^)]*\*jwt\.Token\)\s*\([^)]*,\s*error\)\s*\{[^}]*return\s+[^,]+,\s*nil",
)
.expect("valid regex"),
}
}
#[allow(clippy::too_many_arguments)]
fn extract_claim(
&self,
path_segments: &[String],
file: &str,
line: usize,
matched_text: &str,
leaf: &str,
predicate: &str,
value: ObjectValue,
description: &str,
confidence: f32,
) -> ExtractedClaim {
let mut concept_path = path_segments.to_vec();
concept_path.push("jwt".to_string());
concept_path.push(leaf.to_string());
ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: predicate.to_string(),
value,
file: file.to_string(),
line,
matched_text: matched_text.to_string(),
confidence,
description: description.to_string(),
}
}
}
impl Extractor for JwtConfigExtractor {
fn name(&self) -> &str {
"jwt_config"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
_language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
let line_num = line_idx + 1;
// Audience validation disabled
if let Some(matched) = self.aud_disabled.find(line) {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
matched.as_str(),
"audience_validation",
"enabled",
ObjectValue::Boolean(false),
"JWT audience validation is disabled",
1.0,
));
}
// Algorithm none allowed
if let Some(matched) = self.alg_none.find(line) {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
matched.as_str(),
"algorithm_restriction",
"config_value",
ObjectValue::Text("none_allowed".to_string()),
"JWT allows 'none' algorithm (signature bypass)",
1.0,
));
}
// Signature verification skipped
if let Some(matched) = self.sig_skip.find(line) {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
matched.as_str(),
"signature_verification",
"enabled",
ObjectValue::Boolean(false),
"JWT signature verification is disabled",
1.0,
));
}
// Expiry validation disabled
if let Some(matched) = self.exp_disabled.find(line) {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
matched.as_str(),
"expiry_validation",
"enabled",
ObjectValue::Boolean(false),
"JWT expiry validation is disabled",
1.0,
));
}
}
// Check for Go insecure parse pattern (multi-line, lower confidence)
if let Some(matched) = self.go_parse_insecure.find(content) {
// Find line number for start of match
let line_num = content[..matched.start()].lines().count() + 1;
claims.push(self.extract_claim(
path_segments,
file,
line_num,
&matched.as_str()[..matched.as_str().len().min(50)],
"signature_verification",
"enabled",
ObjectValue::Boolean(false),
"JWT parsed without algorithm verification (heuristic)",
0.7, // Lower confidence - heuristic match
));
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_audience_disabled() {
let extractor = JwtConfigExtractor::new();
let content = r#"
let validation = Validation::new(Algorithm::HS256);
validation.validate_aud = false;
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/auth.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("audience_validation"));
assert_eq!(claims[0].value, ObjectValue::Boolean(false));
}
#[test]
fn test_algorithm_none() {
let extractor = JwtConfigExtractor::new();
let content = r#"
// Dangerous: allows unsigned tokens
Algorithm::None
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/auth.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("algorithm_restriction"));
}
#[test]
fn test_signature_skip() {
let extractor = JwtConfigExtractor::new();
let content = r#"
let claims = dangerous_insecure_decode(&token)?;
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/auth.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].concept_path.contains("signature_verification"));
}
#[test]
fn test_multiple_issues() {
let extractor = JwtConfigExtractor::new();
let content = r#"
validation.validate_aud = false;
validation.validate_exp = false;
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/auth.rs");
assert_eq!(claims.len(), 2);
}
}

View File

@ -1,16 +1,33 @@
//! Claim extractors for finding implicit decisions in source code. //! Claim extractors for finding implicit decisions in source code.
// Skeleton phase: allow unused until extractors are implemented
#![allow(dead_code)]
//! //!
//! Each extractor looks for specific patterns that represent implicit claims: //! Each extractor looks for specific patterns that represent implicit claims:
//! - `tls_verify`: TLS certificate verification settings //! - `tls_verify`: TLS certificate verification settings
//! - `jwt_config`: JWT validation configuration //! - `jwt_config`: JWT validation configuration
//! - `hardcoded_secrets`: Credentials in source code //! - `hardcoded_secrets`: Credentials in source code
//! - `timeout_config`: HTTP/DB/Redis timeout values //! - `timeout_config`: HTTP/DB/Redis timeout values
//! - `dep_versions`: Vulnerable dependency versions //! - `dep_versions`: Dependency versions for advisory lookup
//! - `cors_config`: CORS allow-origin settings //! - `cors_config`: CORS allow-origin settings
//! - `rate_limit`: Rate limiting configuration //! - `rate_limit`: Rate limiting configuration
mod cors_config;
mod dep_versions;
mod hardcoded_secrets;
mod jwt_config;
mod rate_limit;
mod timeout_config;
mod tls_verify;
pub use cors_config::CorsConfigExtractor;
pub use dep_versions::DepVersionsExtractor;
pub use hardcoded_secrets::HardcodedSecretsExtractor;
pub use jwt_config::JwtConfigExtractor;
pub use rate_limit::{RateLimitExtractor, RateLimitThresholds};
pub use timeout_config::{TimeoutConfigExtractor, TimeoutThresholds};
pub use tls_verify::TlsVerifyExtractor;
use tracing::instrument;
use crate::config::AphoriaConfig;
use crate::types::{ExtractedClaim, Language}; use crate::types::{ExtractedClaim, Language};
/// Trait for claim extractors. /// Trait for claim extractors.
@ -30,6 +47,7 @@ pub trait Extractor: Send + Sync {
/// * `path_segments` - ConceptPath segments derived from the file's location /// * `path_segments` - ConceptPath segments derived from the file's location
/// * `content` - The file content as a string /// * `content` - The file content as a string
/// * `language` - The detected language of the file /// * `language` - The detected language of the file
/// * `file` - The relative file path
/// ///
/// # Returns /// # Returns
/// ///
@ -39,6 +57,7 @@ pub trait Extractor: Send + Sync {
path_segments: &[String], path_segments: &[String],
content: &str, content: &str,
language: Language, language: Language,
file: &str,
) -> Vec<ExtractedClaim>; ) -> Vec<ExtractedClaim>;
} }
@ -49,15 +68,59 @@ pub struct ExtractorRegistry {
impl Default for ExtractorRegistry { impl Default for ExtractorRegistry {
fn default() -> Self { fn default() -> Self {
Self::new() Self::new(&AphoriaConfig::default())
} }
} }
impl ExtractorRegistry { impl ExtractorRegistry {
/// Create a new registry with all built-in extractors. /// Create a new registry with all built-in extractors.
pub fn new() -> Self { pub fn new(config: &AphoriaConfig) -> Self {
// TODO: Register built-in extractors let mut extractors: Vec<Box<dyn Extractor>> = Vec::new();
Self { extractors: Vec::new() }
// Build set of enabled extractors
let enabled: std::collections::HashSet<&str> =
config.extractors.enabled.iter().map(|s| s.as_str()).collect();
let disabled: std::collections::HashSet<&str> =
config.extractors.disabled.iter().map(|s| s.as_str()).collect();
let is_enabled = |name: &str| -> bool {
if !disabled.is_empty() {
!disabled.contains(name)
} else if !enabled.is_empty() {
enabled.contains(name)
} else {
true
}
};
// Register extractors based on configuration
if is_enabled("tls_verify") {
extractors.push(Box::new(TlsVerifyExtractor::new()));
}
if is_enabled("jwt_config") {
extractors.push(Box::new(JwtConfigExtractor::new()));
}
if is_enabled("hardcoded_secrets") {
extractors.push(Box::new(HardcodedSecretsExtractor::new()));
}
if is_enabled("timeout_config") {
let thresholds = TimeoutThresholds {
min_reasonable_ms: config.extractors.timeout_config.min_reasonable_ms,
max_reasonable_ms: config.extractors.timeout_config.max_reasonable_ms,
};
extractors.push(Box::new(TimeoutConfigExtractor::new(thresholds)));
}
if is_enabled("dep_versions") {
extractors.push(Box::new(DepVersionsExtractor::new()));
}
if is_enabled("cors_config") {
extractors.push(Box::new(CorsConfigExtractor::new()));
}
if is_enabled("rate_limit") {
extractors.push(Box::new(RateLimitExtractor::default()));
}
Self { extractors }
} }
/// Get extractors applicable to a given language. /// Get extractors applicable to a given language.
@ -70,17 +133,24 @@ impl ExtractorRegistry {
} }
/// Extract claims from content using all applicable extractors. /// Extract claims from content using all applicable extractors.
#[instrument(skip(self, path_segments, content), fields(file = %file, language = ?language))]
pub fn extract_all( pub fn extract_all(
&self, &self,
path_segments: &[String], path_segments: &[String],
content: &str, content: &str,
language: Language, language: Language,
file: &str,
) -> Vec<ExtractedClaim> { ) -> Vec<ExtractedClaim> {
self.for_language(language) self.for_language(language)
.iter() .iter()
.flat_map(|e| e.extract(path_segments, content, language)) .flat_map(|e| e.extract(path_segments, content, language, file))
.collect() .collect()
} }
/// Get the names of all registered extractors.
pub fn extractor_names(&self) -> Vec<&str> {
self.extractors.iter().map(|e| e.name()).collect()
}
} }
#[cfg(test)] #[cfg(test)]
@ -89,15 +159,53 @@ mod tests {
#[test] #[test]
fn test_registry_creation() { fn test_registry_creation() {
let registry = ExtractorRegistry::new(); let config = AphoriaConfig::default();
// Currently empty, will be populated when extractors are implemented let registry = ExtractorRegistry::new(&config);
assert!(registry.for_language(Language::Rust).is_empty());
// Should have all 7 extractors enabled by default
assert_eq!(registry.extractor_names().len(), 7);
} }
#[test] #[test]
fn test_extract_all_empty() { fn test_registry_disabled_extractor() {
let registry = ExtractorRegistry::new(); let mut config = AphoriaConfig::default();
let claims = registry.extract_all(&["rust".to_string()], "fn main() {}", Language::Rust); config.extractors.disabled = vec!["tls_verify".to_string()];
assert!(claims.is_empty());
let registry = ExtractorRegistry::new(&config);
assert!(!registry.extractor_names().contains(&"tls_verify"));
assert_eq!(registry.extractor_names().len(), 6);
}
#[test]
fn test_registry_for_language() {
let config = AphoriaConfig::default();
let registry = ExtractorRegistry::new(&config);
let rust_extractors = registry.for_language(Language::Rust);
// TLS, JWT, secrets, timeout, CORS, rate_limit work on Rust
assert!(!rust_extractors.is_empty());
let cargo_extractors = registry.for_language(Language::CargoManifest);
// Only dep_versions works on Cargo.toml
assert!(cargo_extractors.iter().any(|e| e.name() == "dep_versions"));
}
#[test]
fn test_extract_all() {
let config = AphoriaConfig::default();
let registry = ExtractorRegistry::new(&config);
let content = r#"
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(true)
.build()?;
"#;
let claims =
registry.extract_all(&["rust".to_string()], content, Language::Rust, "src/client.rs");
assert!(!claims.is_empty());
assert!(claims.iter().any(|c| c.concept_path.contains("tls")));
} }
} }

View File

@ -0,0 +1,229 @@
//! Rate limiting configuration extractor.
//!
//! Detects rate limiting that is disabled or set unreasonably high,
//! which can lead to availability and security issues.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Configuration for rate limit thresholds.
#[derive(Debug, Clone)]
pub struct RateLimitThresholds {
/// Maximum reasonable requests per minute.
pub max_requests_per_minute: u64,
}
impl Default for RateLimitThresholds {
fn default() -> Self {
Self { max_requests_per_minute: 10_000 }
}
}
/// Extractor for rate limiting configuration.
pub struct RateLimitExtractor {
/// Rate limiting disabled patterns
disabled: Regex,
/// Numeric rate limit patterns
numeric_limit: Regex,
/// Thresholds for flagging
thresholds: RateLimitThresholds,
}
impl Default for RateLimitExtractor {
fn default() -> Self {
Self::new(RateLimitThresholds::default())
}
}
impl RateLimitExtractor {
/// Create a new rate limit extractor with the given thresholds.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new(thresholds: RateLimitThresholds) -> Self {
Self {
disabled: Regex::new(
r"(?i)(rate_?limit|ratelimit).*(?:disabled|off|false|0|none|skip)",
)
.expect("valid regex"),
numeric_limit: Regex::new(
r"(?i)(rate_?limit|ratelimit|max_?requests|requests_?per_?(?:second|minute|hour))\s*[:=]\s*(\d+)",
)
.expect("valid regex"),
thresholds,
}
}
fn normalize_to_per_minute(&self, value: u64, line: &str) -> u64 {
let lower = line.to_lowercase();
if lower.contains("per_second") || lower.contains("persecond") || lower.contains("/s") {
value * 60
} else if lower.contains("per_hour") || lower.contains("perhour") || lower.contains("/h") {
value / 60
} else {
// Default: assume per minute
value
}
}
}
impl Extractor for RateLimitExtractor {
fn name(&self) -> &str {
"rate_limit"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
_language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
let line_num = line_idx + 1;
// Rate limiting disabled
if let Some(matched) = self.disabled.find(line) {
let mut concept_path = path_segments.to_vec();
concept_path.push("rate_limit".to_string());
concept_path.push("enabled".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: file.to_string(),
line: line_num,
matched_text: matched.as_str().to_string(),
confidence: 1.0,
description: "Rate limiting is disabled".to_string(),
});
continue;
}
// Numeric rate limit check
if let Some(captures) = self.numeric_limit.captures(line) {
if let Some(value_match) = captures.get(2) {
if let Ok(value) = value_match.as_str().parse::<u64>() {
let per_minute = self.normalize_to_per_minute(value, line);
if per_minute > self.thresholds.max_requests_per_minute {
let mut concept_path = path_segments.to_vec();
concept_path.push("rate_limit".to_string());
concept_path.push("max_requests".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "config_value".to_string(),
value: ObjectValue::Number(per_minute as f64),
file: file.to_string(),
line: line_num,
matched_text: captures
.get(0)
.map(|m| m.as_str())
.unwrap_or("")
.to_string(),
confidence: 1.0,
description: format!(
"Rate limit {} req/min exceeds recommended maximum {} req/min",
per_minute, self.thresholds.max_requests_per_minute
),
});
}
}
}
}
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_rate_limit_disabled() {
let extractor = RateLimitExtractor::default();
let content = r#"
rate_limit: disabled
"#;
let claims =
extractor.extract(&["config".to_string()], content, Language::Yaml, "config/api.yaml");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].value, ObjectValue::Boolean(false));
}
#[test]
fn test_rate_limit_false() {
let extractor = RateLimitExtractor::default();
let content = r#"
ratelimit_enabled = false
"#;
let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "config.rs");
assert_eq!(claims.len(), 1);
}
#[test]
fn test_unreasonably_high_limit() {
let extractor = RateLimitExtractor::default();
let content = r#"
max_requests = 100000 // 100k per minute
"#;
let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "config.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].description.contains("exceeds"));
}
#[test]
fn test_reasonable_limit_no_claims() {
let extractor = RateLimitExtractor::default();
let content = r#"
max_requests = 1000
"#;
let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "config.rs");
assert!(claims.is_empty());
}
#[test]
fn test_per_second_normalization() {
let extractor = RateLimitExtractor::default();
let content = r#"
requests_per_second = 500 // 30k per minute
"#;
let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "config.rs");
assert_eq!(claims.len(), 1);
// 500 * 60 = 30000 > 10000
}
}

View File

@ -0,0 +1,315 @@
//! Timeout configuration extractor.
//!
//! Detects timeout values that are misconfigured (zero/infinite,
//! too low, or too high) which can cause availability issues.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Configuration for timeout extraction thresholds.
#[derive(Debug, Clone)]
pub struct TimeoutThresholds {
/// Minimum reasonable timeout in milliseconds.
pub min_reasonable_ms: u64,
/// Maximum reasonable timeout in milliseconds.
pub max_reasonable_ms: u64,
}
impl Default for TimeoutThresholds {
fn default() -> Self {
Self { min_reasonable_ms: 1000, max_reasonable_ms: 300_000 }
}
}
/// Extractor for timeout configuration values.
pub struct TimeoutConfigExtractor {
/// Zero/infinite timeout patterns
zero_timeout: Regex,
/// Numeric timeout patterns (captures the value)
numeric_timeout: Regex,
/// Duration patterns (Rust/Go style, reserved for future use)
#[allow(dead_code)]
duration_timeout: Regex,
/// Configuration thresholds
thresholds: TimeoutThresholds,
}
impl Default for TimeoutConfigExtractor {
fn default() -> Self {
Self::new(TimeoutThresholds::default())
}
}
impl TimeoutConfigExtractor {
/// Create a new timeout extractor with the given thresholds.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new(thresholds: TimeoutThresholds) -> Self {
Self {
zero_timeout: Regex::new(
r"(?i)timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf|never|\-1)",
)
.expect("valid regex"),
numeric_timeout: Regex::new(r"(?i)timeout\s*[:=]\s*(\d+)").expect("valid regex"),
duration_timeout: Regex::new(
r"(?i)(?:Duration::from_(?:secs|millis|nanos)|time\.(?:Second|Millisecond)|timeout)\s*[:=\(]\s*(\d+)",
)
.expect("valid regex"),
thresholds,
}
}
#[allow(clippy::too_many_arguments)]
fn extract_claim(
&self,
path_segments: &[String],
file: &str,
line: usize,
matched_text: &str,
context: &str,
value: f64,
description: &str,
) -> ExtractedClaim {
let mut concept_path = path_segments.to_vec();
concept_path.push(context.to_string());
concept_path.push("timeout".to_string());
ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "config_value".to_string(),
value: ObjectValue::Number(value),
file: file.to_string(),
line,
matched_text: matched_text.to_string(),
confidence: 1.0,
description: description.to_string(),
}
}
fn detect_context(&self, line: &str) -> &str {
let lower = line.to_lowercase();
if lower.contains("http") || lower.contains("client") || lower.contains("request") {
"http"
} else if lower.contains("db") || lower.contains("database") || lower.contains("sql") {
"database"
} else if lower.contains("redis") || lower.contains("cache") || lower.contains("memcache") {
"cache"
} else if lower.contains("grpc") || lower.contains("rpc") {
"rpc"
} else {
"general"
}
}
fn estimate_milliseconds(&self, value: u64, line: &str) -> u64 {
// Strip comments before analyzing
let code_part = line.split("//").next().unwrap_or(line);
let code_part = code_part.split('#').next().unwrap_or(code_part);
let lower = code_part.to_lowercase();
// Explicit unit markers in code (not comments)
if lower.contains("from_secs") || lower.contains("_secs") {
return value * 1000;
}
if lower.contains("from_millis") || lower.contains("millisecond") || lower.contains("_ms") {
return value;
}
if lower.contains("from_nanos") || lower.contains("nanosecond") {
return value / 1_000_000;
}
// Heuristics based on magnitude
if value > 1_000_000 {
// Likely nanoseconds
value / 1_000_000
} else if value > 1000 && value < 1_000_000 {
// Likely milliseconds
value
} else if value < 100 {
// Likely seconds
value * 1000
} else {
// Default: assume milliseconds
value
}
}
}
impl Extractor for TimeoutConfigExtractor {
fn name(&self) -> &str {
"timeout_config"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
_language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
let line_num = line_idx + 1;
let context = self.detect_context(line);
// Zero/infinite timeout detection
if let Some(matched) = self.zero_timeout.find(line) {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
matched.as_str(),
context,
0.0,
"Timeout is disabled (infinite wait)",
));
continue;
}
// Numeric timeout detection
if let Some(captures) = self.numeric_timeout.captures(line) {
if let Some(value_match) = captures.get(1) {
if let Ok(value) = value_match.as_str().parse::<u64>() {
let ms = self.estimate_milliseconds(value, line);
if ms > 0 && ms < self.thresholds.min_reasonable_ms {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
captures.get(0).map(|m| m.as_str()).unwrap_or(""),
context,
ms as f64,
&format!(
"Timeout {}ms is below minimum reasonable {}ms",
ms, self.thresholds.min_reasonable_ms
),
));
} else if ms > self.thresholds.max_reasonable_ms {
claims.push(self.extract_claim(
path_segments,
file,
line_num,
captures.get(0).map(|m| m.as_str()).unwrap_or(""),
context,
ms as f64,
&format!(
"Timeout {}ms exceeds maximum reasonable {}ms",
ms, self.thresholds.max_reasonable_ms
),
));
}
}
}
}
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_zero_timeout_detection() {
let extractor = TimeoutConfigExtractor::default();
let content = r#"
client.timeout = 0
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/http.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].description.contains("disabled"));
}
#[test]
fn test_nil_timeout_detection() {
let extractor = TimeoutConfigExtractor::default();
let content = r#"
timeout: nil
"#;
let claims = extractor.extract(&["go".to_string()], content, Language::Go, "config.go");
assert_eq!(claims.len(), 1);
}
#[test]
fn test_unreasonably_low_timeout() {
let extractor = TimeoutConfigExtractor::default();
let content = r#"
http_client.timeout = 100 // 100ms
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/http.rs");
assert_eq!(claims.len(), 1);
assert!(claims[0].description.contains("below minimum"));
}
#[test]
fn test_unreasonably_high_timeout() {
let extractor = TimeoutConfigExtractor::default();
let content = r#"
db_timeout = 600000 // 10 minutes
"#;
let claims =
extractor.extract(&["python".to_string()], content, Language::Python, "config.py");
assert_eq!(claims.len(), 1);
assert!(claims[0].description.contains("exceeds maximum"));
}
#[test]
fn test_reasonable_timeout_no_claims() {
let extractor = TimeoutConfigExtractor::default();
let content = r#"
timeout = 30000 // 30 seconds
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/http.rs");
assert!(claims.is_empty(), "Expected no claims for reasonable 30000ms timeout");
}
#[test]
fn test_context_detection() {
let extractor = TimeoutConfigExtractor::default();
let content_http = "http_client.timeout = 0";
let claims =
extractor.extract(&["rust".to_string()], content_http, Language::Rust, "src/http.rs");
assert!(claims[0].concept_path.contains("http"));
let content_db = "database_timeout = 0";
let claims =
extractor.extract(&["rust".to_string()], content_db, Language::Rust, "src/db.rs");
assert!(claims[0].concept_path.contains("database"));
}
}

View File

@ -0,0 +1,259 @@
//! TLS certificate verification extractor.
//!
//! Detects patterns where TLS certificate verification is disabled,
//! which violates OWASP security guidelines.
use regex::Regex;
use stemedb_core::types::ObjectValue;
use super::Extractor;
use crate::types::{ExtractedClaim, Language};
/// Extractor for TLS certificate verification settings.
pub struct TlsVerifyExtractor {
/// Rust: reqwest danger_accept_invalid_certs
rust_reqwest: Regex,
/// Rust: native-tls accept_invalid_certs
rust_native_tls: Regex,
/// Go: InsecureSkipVerify
go_skip_verify: Regex,
/// Python: requests verify=False
python_verify: Regex,
/// Node.js: rejectUnauthorized: false
node_reject_unauthorized: Regex,
/// Node.js: NODE_TLS_REJECT_UNAUTHORIZED=0
node_env_reject: Regex,
/// Generic YAML/TOML/JSON config
config_verify: Regex,
}
impl Default for TlsVerifyExtractor {
fn default() -> Self {
Self::new()
}
}
impl TlsVerifyExtractor {
/// Create a new TLS verify extractor with compiled regexes.
///
/// # Panics
/// Panics if any regex pattern is invalid (programmer error).
#[allow(clippy::expect_used)]
pub fn new() -> Self {
Self {
rust_reqwest: Regex::new(r"danger_accept_invalid_certs\s*\(\s*true\s*\)")
.expect("valid regex"),
// Use TlsConnector or native-tls specific patterns (avoid matching reqwest's danger_ version)
rust_native_tls: Regex::new(r"\.accept_invalid_certs\s*\(\s*true\s*\)")
.expect("valid regex"),
go_skip_verify: Regex::new(r"InsecureSkipVerify\s*:\s*true").expect("valid regex"),
python_verify: Regex::new(r"verify\s*=\s*False").expect("valid regex"),
node_reject_unauthorized: Regex::new(r"rejectUnauthorized\s*:\s*false")
.expect("valid regex"),
node_env_reject: Regex::new(r#"NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"]"#)
.expect("valid regex"),
config_verify: Regex::new(
r"(?i)(tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off)",
)
.expect("valid regex"),
}
}
fn check_pattern(
&self,
content: &str,
pattern: &Regex,
path_segments: &[String],
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
for (line_idx, line) in content.lines().enumerate() {
if let Some(matched) = pattern.find(line) {
let mut concept_path = path_segments.to_vec();
concept_path.push("tls".to_string());
concept_path.push("cert_verification".to_string());
claims.push(ExtractedClaim {
concept_path: format!("code://{}", concept_path.join("/")),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: file.to_string(),
line: line_idx + 1,
matched_text: matched.as_str().to_string(),
confidence: 1.0,
description: "TLS certificate verification is disabled".to_string(),
});
}
}
claims
}
}
impl Extractor for TlsVerifyExtractor {
fn name(&self) -> &str {
"tls_verify"
}
fn languages(&self) -> &[Language] {
&[
Language::Rust,
Language::Go,
Language::Python,
Language::TypeScript,
Language::JavaScript,
Language::Yaml,
Language::Toml,
Language::Json,
]
}
fn extract(
&self,
path_segments: &[String],
content: &str,
language: Language,
file: &str,
) -> Vec<ExtractedClaim> {
let mut claims = Vec::new();
match language {
Language::Rust => {
claims.extend(self.check_pattern(content, &self.rust_reqwest, path_segments, file));
claims.extend(self.check_pattern(
content,
&self.rust_native_tls,
path_segments,
file,
));
}
Language::Go => {
claims.extend(self.check_pattern(
content,
&self.go_skip_verify,
path_segments,
file,
));
}
Language::Python => {
claims.extend(self.check_pattern(
content,
&self.python_verify,
path_segments,
file,
));
}
Language::TypeScript | Language::JavaScript => {
claims.extend(self.check_pattern(
content,
&self.node_reject_unauthorized,
path_segments,
file,
));
claims.extend(self.check_pattern(
content,
&self.node_env_reject,
path_segments,
file,
));
}
Language::Yaml | Language::Toml | Language::Json => {
claims.extend(self.check_pattern(
content,
&self.config_verify,
path_segments,
file,
));
}
_ => {}
}
claims
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_rust_reqwest_detection() {
let extractor = TlsVerifyExtractor::new();
let content = r#"
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(true)
.build()?;
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/client.rs");
assert_eq!(claims.len(), 1);
assert_eq!(claims[0].predicate, "enabled");
assert_eq!(claims[0].value, ObjectValue::Boolean(false));
assert_eq!(claims[0].line, 3);
}
#[test]
fn test_go_insecure_skip_verify() {
let extractor = TlsVerifyExtractor::new();
let content = r#"
tr := &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
"#;
let claims =
extractor.extract(&["go".to_string()], content, Language::Go, "internal/http.go");
assert_eq!(claims.len(), 1);
assert!(claims[0].matched_text.contains("InsecureSkipVerify"));
}
#[test]
fn test_python_verify_false() {
let extractor = TlsVerifyExtractor::new();
let content = r#"
response = requests.get(url, verify=False)
"#;
let claims =
extractor.extract(&["python".to_string()], content, Language::Python, "client.py");
assert_eq!(claims.len(), 1);
}
#[test]
fn test_yaml_config() {
let extractor = TlsVerifyExtractor::new();
let content = r#"
http:
tls_verify: false
"#;
let claims = extractor.extract(
&["config".to_string()],
content,
Language::Yaml,
"config/production.yaml",
);
assert_eq!(claims.len(), 1);
}
#[test]
fn test_no_false_positives() {
let extractor = TlsVerifyExtractor::new();
let content = r#"
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(false)
.build()?;
"#;
let claims =
extractor.extract(&["rust".to_string()], content, Language::Rust, "src/client.rs");
assert!(claims.is_empty());
}
}

View File

@ -1,8 +1,5 @@
//! Aphoria - A code-level truth linter powered by Episteme //! Aphoria - A code-level truth linter powered by Episteme
//! //!
// Skeleton phase: allow unused code until extractors are implemented
#![allow(dead_code, unused_imports, unused_variables)]
//!
//! Aphoria scans a codebase, extracts the decisions embedded in config and code, //! Aphoria scans a codebase, extracts the decisions embedded in config and code,
//! and checks them against authoritative sources. It finds the places where what //! and checks them against authoritative sources. It finds the places where what
//! your code *does* contradicts what the specs *say*. //! your code *does* contradicts what the specs *say*.
@ -42,18 +39,28 @@
//! ``` //! ```
// Module declarations // Module declarations
mod bridge;
mod config; mod config;
pub mod corpus;
mod episteme;
mod error; mod error;
mod extractors; pub mod extractors;
mod report; pub mod report;
mod types; mod types;
mod walker; mod walker;
// Public re-exports // Public re-exports
pub use config::AphoriaConfig; pub use config::{AphoriaConfig, CorpusConfig};
pub use corpus::{CorpusBuildResult, CorpusBuilderInfo, CorpusRegistry};
pub use error::AphoriaError; pub use error::AphoriaError;
pub use types::{AcknowledgeArgs, ConflictResult, ExtractedClaim, ScanArgs, ScanResult, Verdict}; pub use types::{AcknowledgeArgs, ConflictResult, ExtractedClaim, ScanArgs, ScanResult, Verdict};
use extractors::ExtractorRegistry;
use tracing::{info, instrument};
use walker::walk_project;
use crate::episteme::{create_authoritative_corpus, ConceptIndex, LocalEpisteme};
/// Run a scan on the specified project. /// Run a scan on the specified project.
/// ///
/// This is the main entry point for scanning a codebase. It: /// This is the main entry point for scanning a codebase. It:
@ -62,56 +69,183 @@ pub use types::{AcknowledgeArgs, ConflictResult, ExtractedClaim, ScanArgs, ScanR
/// 3. Ingests claims into the local Episteme instance /// 3. Ingests claims into the local Episteme instance
/// 4. Queries for conflicts against authoritative sources /// 4. Queries for conflicts against authoritative sources
/// 5. Returns a formatted report /// 5. Returns a formatted report
#[instrument(skip(config), fields(path = %args.path.display(), format = %args.format))]
pub async fn run_scan(args: ScanArgs, config: &AphoriaConfig) -> Result<ScanResult, AphoriaError> { pub async fn run_scan(args: ScanArgs, config: &AphoriaConfig) -> Result<ScanResult, AphoriaError> {
tracing::info!(path = %args.path.display(), format = %args.format, "Starting scan"); info!("Starting scan");
// TODO: Implement full scan pipeline let project_root = args.path.canonicalize().unwrap_or_else(|_| args.path.clone());
// For now, return a stub result to validate the CLI works
Ok(ScanResult::stub(&args.path, &args.format)) // 1. Walk the project to find files
let files = walk_project(&project_root, config)?;
info!(files_found = files.len(), "Project walk complete");
// 2. Extract claims from files
let registry = ExtractorRegistry::new(config);
let mut all_claims = Vec::new();
for file in &files {
let content = match std::fs::read_to_string(&file.path) {
Ok(c) => c,
Err(e) => {
tracing::warn!(file = %file.relative_path, error = %e, "Failed to read file");
continue;
}
};
let claims =
registry.extract_all(&file.path_segments, &content, file.language, &file.relative_path);
all_claims.extend(claims);
}
info!(claims_extracted = all_claims.len(), "Extraction complete");
// 3. Open local Episteme and ingest claims
let mut episteme = LocalEpisteme::open(config, &project_root).await?;
if !all_claims.is_empty() {
episteme.ingest_claims(&all_claims).await?;
}
// 4. Build authoritative corpus and check for conflicts
// This uses in-memory concept matching, so scan works without `aphoria init`
let signing_key = bridge::load_or_generate_key(&project_root)?;
let corpus = create_authoritative_corpus(&signing_key);
let index = ConceptIndex::build(&corpus);
let conflicts = episteme.check_conflicts(&all_claims, config, &index).await?;
// 5. Shut down Episteme
episteme.shutdown().await;
// 6. Build result
let project_name =
project_root.file_name().and_then(|s| s.to_str()).unwrap_or("unknown").to_string();
Ok(ScanResult {
project: project_name,
scan_id: generate_scan_id(),
files_scanned: files.len(),
claims_extracted: all_claims.len(),
conflicts,
format: args.format,
})
} }
/// Acknowledge a conflict as intentional. /// Acknowledge a conflict as intentional.
/// ///
/// Creates an assertion in Episteme recording that this conflict has been /// Creates an assertion in Episteme recording that this conflict has been
/// reviewed and accepted. The conflict still appears in reports but marked as ACK. /// reviewed and accepted. The conflict still appears in reports but marked as ACK.
#[instrument(skip(config), fields(concept_path = %args.concept_path))]
pub async fn acknowledge( pub async fn acknowledge(
args: AcknowledgeArgs, args: AcknowledgeArgs,
_config: &AphoriaConfig, config: &AphoriaConfig,
) -> Result<(), AphoriaError> { ) -> Result<(), AphoriaError> {
tracing::info!( info!("Acknowledging conflict");
concept_path = %args.concept_path,
reason = %args.reason, let project_root = std::env::current_dir()?;
"Acknowledging conflict" let mut episteme = LocalEpisteme::open(config, &project_root).await?;
);
// Create acknowledgment assertion
let claim = ExtractedClaim {
concept_path: args.concept_path.clone(),
predicate: "acknowledged".to_string(),
value: stemedb_core::types::ObjectValue::Text(args.reason.clone()),
file: "aphoria_ack".to_string(),
line: 0,
matched_text: format!("Acknowledged: {}", args.reason),
confidence: 1.0,
description: format!("Conflict acknowledged: {}", args.reason),
};
episteme.ingest_claims(&[claim]).await?;
episteme.shutdown().await;
// TODO: Create acknowledgment assertion in Episteme
Ok(()) Ok(())
} }
/// Set the current scan as the baseline. /// Set the current scan as the baseline.
/// ///
/// Future `aphoria diff` commands will compare against this baseline. /// Future `aphoria diff` commands will compare against this baseline.
#[instrument(skip(_config))]
pub async fn set_baseline(_config: &AphoriaConfig) -> Result<(), AphoriaError> { pub async fn set_baseline(_config: &AphoriaConfig) -> Result<(), AphoriaError> {
tracing::info!("Setting baseline"); info!("Setting baseline");
// TODO: Record baseline scan ID let project_root = std::env::current_dir()?;
let aphoria_dir = project_root.join(".aphoria");
std::fs::create_dir_all(&aphoria_dir)?;
// Record the current scan ID as baseline
let scan_id = generate_scan_id();
std::fs::write(aphoria_dir.join("baseline"), &scan_id)?;
info!(scan_id, "Baseline set");
Ok(()) Ok(())
} }
/// Show changes since the last baseline. /// Show changes since the last baseline.
pub async fn show_diff(_config: &AphoriaConfig) -> Result<String, AphoriaError> { #[instrument(skip(config))]
tracing::info!("Showing diff"); pub async fn show_diff(config: &AphoriaConfig) -> Result<String, AphoriaError> {
info!("Showing diff");
// TODO: Compare current scan against baseline let project_root = std::env::current_dir()?;
Ok("No baseline set. Run `aphoria baseline` first.".to_string()) let baseline_path = project_root.join(".aphoria").join("baseline");
if !baseline_path.exists() {
return Err(AphoriaError::NoBaseline);
}
// For now, just run a scan and compare against baseline
// Full diff implementation would track assertion hashes
let args =
ScanArgs { path: project_root, format: "table".to_string(), exit_code_enabled: false };
let result = run_scan(args, config).await?;
let mut output = String::new();
output.push_str("Changes since baseline:\n\n");
output.push_str(&format!(
" {} conflicts ({} BLOCK, {} FLAG)\n",
result.conflicts.len(),
result.count_by_verdict(Verdict::Block),
result.count_by_verdict(Verdict::Flag),
));
Ok(output)
} }
/// Show current scan status. /// Show current scan status.
pub async fn show_status(_config: &AphoriaConfig) -> Result<String, AphoriaError> { #[instrument(skip(config))]
tracing::info!("Showing status"); pub async fn show_status(config: &AphoriaConfig) -> Result<String, AphoriaError> {
info!("Showing status");
// TODO: Show summary of local Episteme instance let project_root = std::env::current_dir()?;
Ok("Aphoria status: Not initialized. Run `aphoria init` first.".to_string()) let aphoria_dir = project_root.join(".aphoria");
let data_dir = &config.episteme.data_dir;
let mut output = String::new();
if !data_dir.exists() {
output.push_str("Aphoria status: Not initialized. Run `aphoria init` first.\n");
return Ok(output);
}
output.push_str("Aphoria status:\n");
output.push_str(&format!(" Data directory: {}\n", data_dir.display()));
output.push_str(&format!(" Project root: {}\n", project_root.display()));
if aphoria_dir.join("baseline").exists() {
let baseline = std::fs::read_to_string(aphoria_dir.join("baseline"))?;
output.push_str(&format!(" Baseline: {}\n", baseline.trim()));
} else {
output.push_str(" Baseline: none\n");
}
if aphoria_dir.join("agent.key").exists() {
output.push_str(" Agent key: present\n");
} else {
output.push_str(" Agent key: not generated\n");
}
Ok(output)
} }
/// Initialize Aphoria with the authoritative corpus. /// Initialize Aphoria with the authoritative corpus.
@ -119,52 +253,118 @@ pub async fn show_status(_config: &AphoriaConfig) -> Result<String, AphoriaError
/// Downloads and ingests: /// Downloads and ingests:
/// - RFC corpus (auth, crypto, TLS) /// - RFC corpus (auth, crypto, TLS)
/// - OWASP cheat sheets /// - OWASP cheat sheets
pub async fn initialize(_config: &AphoriaConfig) -> Result<(), AphoriaError> { #[instrument(skip(config))]
tracing::info!("Initializing Aphoria"); pub async fn initialize(config: &AphoriaConfig) -> Result<(), AphoriaError> {
info!("Initializing Aphoria");
// TODO: Download and ingest authoritative corpus let project_root = std::env::current_dir()?;
// Create .aphoria directory
let aphoria_dir = project_root.join(".aphoria");
std::fs::create_dir_all(&aphoria_dir)?;
// Open Episteme (this will create the data directory)
let mut episteme = LocalEpisteme::open(config, &project_root).await?;
// Generate signing key for authoritative corpus
let signing_key = bridge::load_or_generate_key(&project_root)?;
// Create and ingest authoritative corpus
let corpus = create_authoritative_corpus(&signing_key);
let ingested = episteme.ingest_authoritative(&corpus).await?;
episteme.shutdown().await;
info!(assertions = ingested, "Authoritative corpus ingested");
Ok(()) Ok(())
} }
#[cfg(test)] /// Generate a unique scan ID.
mod tests { fn generate_scan_id() -> String {
use super::*; use std::time::{SystemTime, UNIX_EPOCH};
use std::path::PathBuf;
#[tokio::test] let timestamp =
async fn test_scan_returns_stub_result() { SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_millis()).unwrap_or(0);
let args = ScanArgs {
path: PathBuf::from("."),
format: "table".to_string(),
exit_code_enabled: false,
};
let config = AphoriaConfig::default();
let result = run_scan(args, &config).await; format!("scan-{}", timestamp)
assert!(result.is_ok());
let scan_result = result.expect("should have result");
assert!(!scan_result.has_blocks());
}
#[tokio::test]
async fn test_acknowledge_succeeds() {
let args = AcknowledgeArgs {
concept_path: "code://rust/test/jwt/audience_validation".to_string(),
reason: "Internal service".to_string(),
};
let config = AphoriaConfig::default();
let result = acknowledge(args, &config).await;
assert!(result.is_ok());
}
#[tokio::test]
async fn test_status_before_init() {
let config = AphoriaConfig::default();
let result = show_status(&config).await;
assert!(result.is_ok());
assert!(result.expect("should have status").contains("Not initialized"));
}
} }
/// Arguments for corpus build command.
#[derive(Debug, Clone, Default)]
pub struct CorpusBuildArgs {
/// Only include specific corpus sources (comma-separated: rfc,owasp,vendor,hardcoded).
pub only: Option<Vec<String>>,
/// Run in offline mode (skip sources requiring network).
pub offline: bool,
/// Clear cache before building.
pub clear_cache: bool,
}
/// Build the authoritative corpus from configured sources.
///
/// This command:
/// 1. Fetches RFCs, OWASP cheat sheets, and vendor documentation
/// 2. Parses normative statements and recommendations
/// 3. Ingests them as assertions into the local Episteme instance
#[instrument(skip(config), fields(offline = args.offline, clear_cache = args.clear_cache))]
pub async fn build_corpus(
args: CorpusBuildArgs,
config: &AphoriaConfig,
) -> Result<CorpusBuildResult, AphoriaError> {
use std::time::{SystemTime, UNIX_EPOCH};
info!("Building authoritative corpus");
let project_root = std::env::current_dir()?;
// Clear cache if requested
if args.clear_cache {
let cache_dir = &config.corpus.cache_dir;
if cache_dir.exists() {
info!(cache_dir = %cache_dir.display(), "Clearing corpus cache");
std::fs::remove_dir_all(cache_dir)?;
}
}
// Build corpus config based on --only flag
let mut corpus_config = config.corpus.clone();
if let Some(only) = &args.only {
corpus_config.include_hardcoded = only.iter().any(|s| s == "hardcoded");
corpus_config.include_rfc = only.iter().any(|s| s == "rfc");
corpus_config.include_owasp = only.iter().any(|s| s == "owasp");
corpus_config.include_vendor = only.iter().any(|s| s == "vendor");
}
// Create registry with configured builders
let registry = CorpusRegistry::with_defaults(&corpus_config);
// Load signing key
let signing_key = bridge::load_or_generate_key(&project_root)?;
// Build corpus
let timestamp = SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_secs()).unwrap_or(0);
let result = registry.build_all(&signing_key, timestamp, &corpus_config, args.offline)?;
// Ingest into Episteme
if !result.assertions.is_empty() {
let mut episteme = episteme::LocalEpisteme::open(config, &project_root).await?;
let ingested = episteme.ingest_authoritative(&result.assertions).await?;
episteme.shutdown().await;
info!(ingested, "Corpus ingested into Episteme");
}
Ok(result)
}
/// List available corpus sources.
#[instrument(skip(config))]
pub fn list_corpus_sources(config: &AphoriaConfig) -> Vec<CorpusBuilderInfo> {
info!("Listing corpus sources");
let registry = CorpusRegistry::with_defaults(&config.corpus);
registry.list_builders()
}
#[cfg(test)]
mod tests;

View File

@ -8,7 +8,7 @@ use std::process::ExitCode;
use clap::{Parser, Subcommand}; use clap::{Parser, Subcommand};
use aphoria::{run_scan, AcknowledgeArgs, AphoriaConfig, ScanArgs}; use aphoria::{report, run_scan, AcknowledgeArgs, AphoriaConfig, CorpusBuildArgs, ScanArgs};
/// A code-level truth linter powered by Episteme. /// A code-level truth linter powered by Episteme.
/// ///
@ -42,6 +42,10 @@ enum Commands {
/// Exit with non-zero code if conflicts found /// Exit with non-zero code if conflicts found
#[arg(long)] #[arg(long)]
exit_code: bool, exit_code: bool,
/// Use stricter thresholds (FLAG at 0.3, BLOCK at 0.5)
#[arg(long)]
strict: bool,
}, },
/// Acknowledge a conflict (mark as intentional) /// Acknowledge a conflict (mark as intentional)
@ -65,6 +69,33 @@ enum Commands {
/// Initialize Aphoria with authoritative corpus /// Initialize Aphoria with authoritative corpus
Init, Init,
/// Manage the authoritative corpus
Corpus {
#[command(subcommand)]
command: CorpusCommands,
},
}
#[derive(Subcommand)]
enum CorpusCommands {
/// Build the authoritative corpus from configured sources
Build {
/// Only include specific sources (comma-separated: rfc,owasp,vendor,hardcoded)
#[arg(long)]
only: Option<String>,
/// Run in offline mode (skip sources requiring network)
#[arg(long)]
offline: bool,
/// Clear cache before building
#[arg(long)]
clear_cache: bool,
},
/// List available corpus sources
List,
} }
#[tokio::main] #[tokio::main]
@ -84,12 +115,23 @@ async fn main() -> ExitCode {
}; };
match cli.command { match cli.command {
Commands::Scan { path, format, exit_code } => { Commands::Scan { path, format, exit_code, strict } => {
let args = ScanArgs { path, format, exit_code_enabled: exit_code }; let args = ScanArgs { path, format, exit_code_enabled: exit_code };
// Apply stricter thresholds if requested
let config = if strict {
let mut strict_config = config.clone();
strict_config.thresholds.block = 0.5;
strict_config.thresholds.flag = 0.3;
strict_config
} else {
config
};
match run_scan(args, &config).await { match run_scan(args, &config).await {
Ok(result) => { Ok(result) => {
println!("{}", result.display()); let formatter = report::get_formatter(&result.format);
println!("{}", formatter.format(&result));
if exit_code && result.has_blocks() { if exit_code && result.has_blocks() {
ExitCode::from(2) ExitCode::from(2)
@ -164,6 +206,64 @@ async fn main() -> ExitCode {
ExitCode::from(3) ExitCode::from(3)
} }
}, },
Commands::Corpus { command } => match command {
CorpusCommands::Build { only, offline, clear_cache } => {
let only_parsed =
only.map(|s| s.split(',').map(|s| s.trim().to_string()).collect());
let args = CorpusBuildArgs { only: only_parsed, offline, clear_cache };
match aphoria::build_corpus(args, &config).await {
Ok(result) => {
println!("Corpus build complete:");
println!(" Total assertions: {}", result.total_assertions());
println!(" Successful sources: {}", result.successful_builders());
if result.failed_builders() > 0 {
println!(" Failed sources: {}", result.failed_builders());
}
if result.skipped_builders() > 0 {
println!(
" Skipped sources: {} (offline mode)",
result.skipped_builders()
);
}
println!();
for stat in &result.stats {
let status = if stat.skipped {
"SKIPPED".to_string()
} else if let Some(ref err) = stat.error {
format!("FAILED: {}", err)
} else {
format!("{} assertions", stat.assertions_built)
};
println!(" {}: {}", stat.name, status);
}
ExitCode::SUCCESS
}
Err(e) => {
eprintln!("Corpus build error: {e}");
ExitCode::from(3)
}
}
}
CorpusCommands::List => {
let sources = aphoria::list_corpus_sources(&config);
println!("Available corpus sources:");
println!();
for source in sources {
let network_status = if source.requires_network { " (network)" } else { "" };
println!(
" {}:// (Tier {}) - {}{}",
source.scheme, source.tier, source.name, network_status
);
if !source.source_ids.is_empty() {
println!(" Sources: {}", source.source_ids.join(", "));
}
}
ExitCode::SUCCESS
}
},
} }
} }

View File

@ -1,14 +1,135 @@
//! JSON output format for programmatic consumption. //! JSON output format for programmatic consumption.
//!
//! Produces a complete JSON document with summary, conflicts,
//! and full detail for each conflict including claim and source info.
use crate::types::ScanResult; use super::{object_value_to_json, verdict_label, ReportFormatter};
use crate::types::{ScanResult, Verdict};
use super::ReportFormatter;
/// JSON report formatter. /// JSON report formatter.
pub struct JsonReport; pub struct JsonReport;
impl ReportFormatter for JsonReport { impl ReportFormatter for JsonReport {
fn format(&self, result: &ScanResult) -> String { fn format(&self, result: &ScanResult) -> String {
result.display() let conflicts_json: Vec<serde_json::Value> = result
.conflicts
.iter()
.map(|conflict| {
let sources: Vec<serde_json::Value> = conflict
.conflicts
.iter()
.map(|source| {
serde_json::json!({
"path": source.path,
"source_class": format!("{:?}", source.source_class),
"tier": source.source_class.tier(),
"value": object_value_to_json(&source.value),
"confidence": source.confidence,
})
})
.collect();
let mut conflict_json = serde_json::json!({
"concept_path": conflict.claim.concept_path,
"predicate": conflict.claim.predicate,
"value": object_value_to_json(&conflict.claim.value),
"file": conflict.claim.file,
"line": conflict.claim.line,
"matched_text": conflict.claim.matched_text,
"confidence": conflict.claim.confidence,
"description": conflict.claim.description,
"conflict_score": conflict.conflict_score,
"verdict": verdict_label(conflict.verdict),
"sources": sources,
});
if let Some(ack) = &conflict.acknowledged {
conflict_json["acknowledged"] = serde_json::json!({
"timestamp": ack.timestamp,
"by": ack.by,
"reason": ack.reason,
});
}
conflict_json
})
.collect();
let report = serde_json::json!({
"project": result.project,
"scan_id": result.scan_id,
"summary": {
"files_scanned": result.files_scanned,
"claims_extracted": result.claims_extracted,
"conflicts": result.conflicts.len(),
"blocks": result.count_by_verdict(Verdict::Block),
"flags": result.count_by_verdict(Verdict::Flag),
"passes": result.count_by_verdict(Verdict::Pass),
},
"conflicts": conflicts_json,
});
// Pretty-print for readability
serde_json::to_string_pretty(&report).unwrap_or_else(|_| report.to_string())
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::types::{ConflictResult, ConflictingSource, ExtractedClaim};
use stemedb_core::types::{ObjectValue, SourceClass};
#[test]
fn test_json_output_structure() {
let formatter = JsonReport;
let result = ScanResult {
project: "testproject".to_string(),
scan_id: "scan-456".to_string(),
files_scanned: 10,
claims_extracted: 3,
conflicts: vec![ConflictResult {
claim: ExtractedClaim {
concept_path: "code://rust/test/jwt/aud".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/auth.rs".to_string(),
line: 15,
matched_text: "validate_aud = false".to_string(),
confidence: 1.0,
description: "JWT audience validation disabled".to_string(),
},
conflicts: vec![ConflictingSource {
path: "rfc://7519/jwt/audience_validation".to_string(),
source_class: SourceClass::Regulatory,
value: ObjectValue::Boolean(true),
confidence: 1.0,
}],
conflict_score: 0.92,
verdict: Verdict::Block,
acknowledged: None,
}],
format: "json".to_string(),
};
let output = formatter.format(&result);
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
assert_eq!(parsed["project"], "testproject");
assert_eq!(parsed["summary"]["conflicts"], 1);
assert_eq!(parsed["summary"]["blocks"], 1);
assert_eq!(parsed["conflicts"][0]["verdict"], "BLOCK");
assert_eq!(parsed["conflicts"][0]["file"], "src/auth.rs");
assert_eq!(parsed["conflicts"][0]["sources"][0]["tier"], 0);
}
#[test]
fn test_json_empty_conflicts() {
let formatter = JsonReport;
let result = ScanResult::stub(&std::path::PathBuf::from("empty"), "json");
let output = formatter.format(&result);
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
assert_eq!(parsed["conflicts"].as_array().map(|a| a.len()), Some(0));
} }
} }

View File

@ -1,14 +1,176 @@
//! Markdown output format for documentation. //! Markdown output format for documentation and PR comments.
//!
//! Produces a full markdown document with summary table,
//! detailed conflict sections, and action items.
use crate::types::ScanResult; use super::{object_value_display, verdict_label, ReportFormatter};
use crate::types::{ScanResult, Verdict};
use super::ReportFormatter;
/// Markdown report formatter. /// Markdown report formatter.
pub struct MarkdownReport; pub struct MarkdownReport;
impl ReportFormatter for MarkdownReport { impl ReportFormatter for MarkdownReport {
fn format(&self, result: &ScanResult) -> String { fn format(&self, result: &ScanResult) -> String {
result.display() let mut out = String::new();
// Title
out.push_str(&format!("# Aphoria Scan: {}\n\n", result.project));
// Summary
out.push_str(&format!(
"**{}** files scanned | **{}** claims extracted | **{}** conflicts\n\n",
result.files_scanned,
result.claims_extracted,
result.conflicts.len()
));
if result.conflicts.is_empty() {
out.push_str("No conflicts found.\n");
return out;
}
// Verdict badges
let blocks = result.count_by_verdict(Verdict::Block);
let flags = result.count_by_verdict(Verdict::Flag);
if blocks > 0 {
out.push_str(&format!("**{blocks} BLOCK** "));
}
if flags > 0 {
out.push_str(&format!("**{flags} FLAG** "));
}
out.push('\n');
out.push('\n');
// Summary table
out.push_str("| Verdict | Concept | File | Score |\n");
out.push_str("|---------|---------|------|-------|\n");
for conflict in &result.conflicts {
let concept = conflict
.claim
.concept_path
.rsplit("//")
.next()
.unwrap_or(&conflict.claim.concept_path);
out.push_str(&format!(
"| {} | `{}` | `{}:{}` | {:.2} |\n",
verdict_label(conflict.verdict),
concept,
conflict.claim.file,
conflict.claim.line,
conflict.conflict_score,
));
}
out.push('\n');
// Detailed sections for BLOCK and FLAG
let actionable: Vec<_> = result
.conflicts
.iter()
.filter(|c| c.verdict == Verdict::Block || c.verdict == Verdict::Flag)
.collect();
if !actionable.is_empty() {
out.push_str("## Details\n\n");
for conflict in actionable {
out.push_str(&format!(
"### {} `{}`\n\n",
verdict_label(conflict.verdict),
conflict.claim.concept_path
));
out.push_str(&format!(
"- **Your code:** {} (`{}:{}`)\n",
conflict.claim.description, conflict.claim.file, conflict.claim.line
));
for source in &conflict.conflicts {
out.push_str(&format!(
"- **{:?}** (Tier {}): `{}`\n",
source.source_class,
source.source_class.tier(),
object_value_display(&source.value),
));
}
out.push_str(&format!("- **Score:** {:.2}\n", conflict.conflict_score));
if let Some(ack) = &conflict.acknowledged {
out.push_str(&format!(
"- **Acknowledged** by {} on {}: \"{}\"\n",
ack.by, ack.timestamp, ack.reason
));
} else if conflict.verdict == Verdict::Block {
out.push_str(
"- **Action:** Fix or run `aphoria ack <path> --reason \"...\"`\n",
);
} else {
out.push_str("- **Action:** Review recommended\n");
}
out.push('\n');
}
}
out
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::types::{ConflictResult, ConflictingSource, ExtractedClaim};
use stemedb_core::types::{ObjectValue, SourceClass};
#[test]
fn test_markdown_with_conflicts() {
let formatter = MarkdownReport;
let result = ScanResult {
project: "testproject".to_string(),
scan_id: "scan-md".to_string(),
files_scanned: 20,
claims_extracted: 4,
conflicts: vec![ConflictResult {
claim: ExtractedClaim {
concept_path: "code://rust/test/cors/allow_origin".to_string(),
predicate: "config_value".to_string(),
value: ObjectValue::Text("*".to_string()),
file: "src/server.rs".to_string(),
line: 55,
matched_text: "allow_origin(\"*\")".to_string(),
confidence: 1.0,
description: "CORS wildcard allow-origin".to_string(),
},
conflicts: vec![ConflictingSource {
path: "owasp://cors/allow_origin".to_string(),
source_class: SourceClass::Clinical,
value: ObjectValue::Text("explicit_list".to_string()),
confidence: 1.0,
}],
conflict_score: 0.77,
verdict: Verdict::Block,
acknowledged: None,
}],
format: "markdown".to_string(),
};
let output = formatter.format(&result);
assert!(output.contains("# Aphoria Scan: testproject"));
assert!(output.contains("| BLOCK |"));
assert!(output.contains("## Details"));
assert!(output.contains("CORS wildcard"));
assert!(output.contains("`aphoria ack"));
}
#[test]
fn test_markdown_empty() {
let formatter = MarkdownReport;
let result = ScanResult::stub(&std::path::PathBuf::from("empty"), "markdown");
let output = formatter.format(&result);
assert!(output.contains("No conflicts found"));
} }
} }

View File

@ -1,6 +1,4 @@
//! Report generation for scan results. //! Report generation for scan results.
// Skeleton phase: allow unused until report pipeline is wired up
#![allow(dead_code)]
//! //!
//! Supports multiple output formats: //! Supports multiple output formats:
//! - `table`: Terminal table output (default) //! - `table`: Terminal table output (default)
@ -18,7 +16,7 @@ pub use markdown::MarkdownReport;
pub use sarif::SarifReport; pub use sarif::SarifReport;
pub use table::TableReport; pub use table::TableReport;
use crate::types::ScanResult; use crate::types::{ScanResult, Verdict};
/// Trait for report formatters. /// Trait for report formatters.
pub trait ReportFormatter { pub trait ReportFormatter {
@ -36,6 +34,38 @@ pub fn get_formatter(name: &str) -> Box<dyn ReportFormatter> {
} }
} }
/// Convert a Verdict to its display string.
pub(crate) fn verdict_label(verdict: Verdict) -> &'static str {
match verdict {
Verdict::Block => "BLOCK",
Verdict::Flag => "FLAG",
Verdict::Pass => "PASS",
Verdict::Ack => "ACK",
}
}
/// Convert an ObjectValue to a JSON value.
pub(crate) fn object_value_to_json(value: &stemedb_core::types::ObjectValue) -> serde_json::Value {
use stemedb_core::types::ObjectValue;
match value {
ObjectValue::Text(s) => serde_json::Value::String(s.clone()),
ObjectValue::Number(n) => serde_json::json!(n),
ObjectValue::Boolean(b) => serde_json::Value::Bool(*b),
ObjectValue::Reference(id) => serde_json::Value::String(format!("ref:{}", id)),
}
}
/// Convert an ObjectValue to a human-readable display string.
pub(crate) fn object_value_display(value: &stemedb_core::types::ObjectValue) -> String {
use stemedb_core::types::ObjectValue;
match value {
ObjectValue::Text(s) => s.clone(),
ObjectValue::Number(n) => format!("{n}"),
ObjectValue::Boolean(b) => format!("{b}"),
ObjectValue::Reference(id) => format!("ref:{id}"),
}
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
@ -46,7 +76,34 @@ mod tests {
let formatter = get_formatter("table"); let formatter = get_formatter("table");
let result = ScanResult::stub(&PathBuf::from("."), "table"); let result = ScanResult::stub(&PathBuf::from("."), "table");
let output = formatter.format(&result); let output = formatter.format(&result);
assert!(output.contains("Scanning")); assert!(output.contains("Aphoria"));
}
#[test]
fn test_get_formatter_json() {
let formatter = get_formatter("json");
let result = ScanResult::stub(&PathBuf::from("myproject"), "json");
let output = formatter.format(&result);
// Should be valid JSON
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
assert_eq!(parsed["summary"]["files_scanned"], 0);
}
#[test]
fn test_get_formatter_sarif() {
let formatter = get_formatter("sarif");
let result = ScanResult::stub(&PathBuf::from("."), "sarif");
let output = formatter.format(&result);
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
assert_eq!(parsed["version"], "2.1.0");
}
#[test]
fn test_get_formatter_markdown() {
let formatter = get_formatter("markdown");
let result = ScanResult::stub(&PathBuf::from("myproject"), "markdown");
let output = formatter.format(&result);
assert!(output.starts_with("# Aphoria Scan:"));
} }
#[test] #[test]
@ -54,6 +111,6 @@ mod tests {
let formatter = get_formatter("unknown"); let formatter = get_formatter("unknown");
let result = ScanResult::stub(&PathBuf::from("."), "table"); let result = ScanResult::stub(&PathBuf::from("."), "table");
let output = formatter.format(&result); let output = formatter.format(&result);
assert!(output.contains("Scanning")); assert!(output.contains("Aphoria"));
} }
} }

View File

@ -1,19 +1,245 @@
//! SARIF output format for CI integration. //! SARIF output format for CI integration.
//! //!
//! SARIF (Static Analysis Results Interchange Format) is supported by: //! SARIF (Static Analysis Results Interchange Format) v2.1.0 is supported by:
//! - GitHub Code Scanning //! - GitHub Code Scanning
//! - GitLab SAST //! - GitLab SAST
//! - Azure DevOps //! - Azure DevOps
//!
//! Reference: <https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html>
use crate::types::ScanResult; use super::{object_value_display, verdict_label, ReportFormatter};
use crate::types::{ScanResult, Verdict};
use super::ReportFormatter; /// SARIF report formatter for CI integration.
/// SARIF report formatter.
pub struct SarifReport; pub struct SarifReport;
impl ReportFormatter for SarifReport { impl ReportFormatter for SarifReport {
fn format(&self, result: &ScanResult) -> String { fn format(&self, result: &ScanResult) -> String {
result.display() // Build SARIF rules from unique conflict types
let mut rules = Vec::new();
let mut rule_indices: std::collections::HashMap<String, usize> =
std::collections::HashMap::new();
for conflict in &result.conflicts {
let rule_id = format!("aphoria/{}", extract_rule_id(&conflict.claim.concept_path));
if !rule_indices.contains_key(&rule_id) {
let idx = rules.len();
rule_indices.insert(rule_id.clone(), idx);
let level = match conflict.verdict {
Verdict::Block => "error",
Verdict::Flag => "warning",
Verdict::Pass | Verdict::Ack => "note",
};
rules.push(serde_json::json!({
"id": rule_id,
"shortDescription": {
"text": conflict.claim.description,
},
"defaultConfiguration": {
"level": level,
},
"helpUri": format!(
"https://github.com/orchard9/aphoria/rules/{}",
extract_rule_id(&conflict.claim.concept_path)
),
}));
}
}
// Build SARIF results
let results: Vec<serde_json::Value> = result
.conflicts
.iter()
.map(|conflict| {
let rule_id = format!("aphoria/{}", extract_rule_id(&conflict.claim.concept_path));
let rule_index = rule_indices.get(&rule_id).copied().unwrap_or(0);
let level = match conflict.verdict {
Verdict::Block => "error",
Verdict::Flag => "warning",
Verdict::Pass | Verdict::Ack => "note",
};
// Build message with authoritative source details
let source_details: Vec<String> = conflict
.conflicts
.iter()
.map(|s| {
format!(
"{:?} (Tier {}): {}",
s.source_class,
s.source_class.tier(),
object_value_display(&s.value)
)
})
.collect();
let message = format!(
"{}\nYour code: {} = {}\nAuthoritative: {}",
conflict.claim.description,
conflict.claim.predicate,
object_value_display(&conflict.claim.value),
source_details.join("; ")
);
serde_json::json!({
"ruleId": rule_id,
"ruleIndex": rule_index,
"level": level,
"message": {
"text": message,
},
"locations": [{
"physicalLocation": {
"artifactLocation": {
"uri": conflict.claim.file,
"uriBaseId": "%SRCROOT%",
},
"region": {
"startLine": conflict.claim.line,
}
}
}],
"properties": {
"conflict_score": conflict.conflict_score,
"verdict": verdict_label(conflict.verdict),
}
})
})
.collect();
let sarif = serde_json::json!({
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "aphoria",
"version": env!("CARGO_PKG_VERSION"),
"informationUri": "https://github.com/orchard9/aphoria",
"rules": rules,
}
},
"results": results,
"invocations": [{
"executionSuccessful": true,
"properties": {
"scan_id": result.scan_id,
"files_scanned": result.files_scanned,
"claims_extracted": result.claims_extracted,
}
}]
}]
});
serde_json::to_string_pretty(&sarif).unwrap_or_else(|_| sarif.to_string())
}
}
/// Extract a rule ID from a concept path.
///
/// e.g., `code://rust/myapp/tls/cert_verification` -> `tls/cert_verification`
fn extract_rule_id(concept_path: &str) -> String {
// Strip the scheme and project prefix, keep the meaningful tail
if let Some(after_scheme) = concept_path.split("://").nth(1) {
// Skip language and project segments (first two after scheme)
let segments: Vec<&str> = after_scheme.split('/').collect();
if segments.len() > 2 {
segments[2..].join("/")
} else {
after_scheme.to_string()
}
} else {
concept_path.to_string()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::types::{ConflictResult, ConflictingSource, ExtractedClaim};
use stemedb_core::types::{ObjectValue, SourceClass};
#[test]
fn test_sarif_structure() {
let formatter = SarifReport;
let result = ScanResult {
project: "testproject".to_string(),
scan_id: "scan-789".to_string(),
files_scanned: 42,
claims_extracted: 5,
conflicts: vec![ConflictResult {
claim: ExtractedClaim {
concept_path: "code://rust/testproject/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 23,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS certificate verification disabled".to_string(),
},
conflicts: vec![ConflictingSource {
path: "rfc://5246/tls/cert_verification".to_string(),
source_class: SourceClass::Regulatory,
value: ObjectValue::Boolean(true),
confidence: 1.0,
}],
conflict_score: 0.92,
verdict: Verdict::Block,
acknowledged: None,
}],
format: "sarif".to_string(),
};
let output = formatter.format(&result);
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
// SARIF version
assert_eq!(parsed["version"], "2.1.0");
// Tool info
assert_eq!(parsed["runs"][0]["tool"]["driver"]["name"], "aphoria");
// Rules
let rules = parsed["runs"][0]["tool"]["driver"]["rules"].as_array().expect("rules array");
assert_eq!(rules.len(), 1);
assert_eq!(rules[0]["id"], "aphoria/tls/cert_verification");
// Results
let results = parsed["runs"][0]["results"].as_array().expect("results array");
assert_eq!(results.len(), 1);
assert_eq!(results[0]["level"], "error");
assert_eq!(
results[0]["locations"][0]["physicalLocation"]["artifactLocation"]["uri"],
"src/client.rs"
);
assert_eq!(results[0]["locations"][0]["physicalLocation"]["region"]["startLine"], 23);
}
#[test]
fn test_sarif_empty() {
let formatter = SarifReport;
let result = ScanResult::stub(&std::path::PathBuf::from("."), "sarif");
let output = formatter.format(&result);
let parsed: serde_json::Value = serde_json::from_str(&output).expect("valid json");
assert_eq!(parsed["version"], "2.1.0");
assert_eq!(parsed["runs"][0]["results"].as_array().map(|a| a.len()), Some(0));
}
#[test]
fn test_extract_rule_id() {
assert_eq!(
extract_rule_id("code://rust/myapp/tls/cert_verification"),
"tls/cert_verification"
);
assert_eq!(
extract_rule_id("code://go/myapp/jwt/audience_validation"),
"jwt/audience_validation"
);
assert_eq!(extract_rule_id("simple"), "simple");
} }
} }

View File

@ -1,14 +1,188 @@
//! Table output format for terminal display. //! Table output format for terminal display.
//!
//! Uses `comfy-table` for clean, aligned terminal output with
//! a summary header and detailed conflict sections.
use crate::types::ScanResult; use comfy_table::{Cell, CellAlignment, Color, ContentArrangement, Table};
use super::ReportFormatter; use super::{verdict_label, ReportFormatter};
use crate::types::{ScanResult, Verdict};
/// Table report formatter. /// Table report formatter for terminal output.
pub struct TableReport; pub struct TableReport;
impl ReportFormatter for TableReport { impl ReportFormatter for TableReport {
fn format(&self, result: &ScanResult) -> String { fn format(&self, result: &ScanResult) -> String {
result.display() let mut output = String::new();
// Header
output.push_str(&format!("Aphoria Report: {}\n", result.project));
output.push_str(&format!(
"Scanned: {} files | Claims: {} | Conflicts: {}\n\n",
result.files_scanned,
result.claims_extracted,
result.conflicts.len()
));
if result.conflicts.is_empty() {
output.push_str("No conflicts found.\n");
return output;
}
// Summary table
let mut table = Table::new();
table.set_content_arrangement(ContentArrangement::Dynamic);
table.set_header(vec![
Cell::new("Verdict").set_alignment(CellAlignment::Center),
Cell::new("Concept"),
Cell::new("Score").set_alignment(CellAlignment::Right),
Cell::new("Tier"),
]);
for conflict in &result.conflicts {
let verdict = verdict_label(conflict.verdict);
let verdict_cell = match conflict.verdict {
Verdict::Block => Cell::new(verdict).fg(Color::Red),
Verdict::Flag => Cell::new(verdict).fg(Color::Yellow),
Verdict::Ack => Cell::new(verdict).fg(Color::Cyan),
Verdict::Pass => Cell::new(verdict).fg(Color::Green),
};
// Extract leaf concept from full path for brevity
let concept = conflict
.claim
.concept_path
.rsplit("//")
.next()
.unwrap_or(&conflict.claim.concept_path);
let tier_spread = if let Some(source) = conflict.conflicts.first() {
format!("{}↔3", source.source_class.tier())
} else {
"?↔3".to_string()
};
table.add_row(vec![
verdict_cell,
Cell::new(concept),
Cell::new(format!("{:.2}", conflict.conflict_score))
.set_alignment(CellAlignment::Right),
Cell::new(tier_spread).set_alignment(CellAlignment::Center),
]);
}
output.push_str(&table.to_string());
output.push('\n');
// Detail sections for BLOCK and FLAG
let actionable: Vec<_> = result
.conflicts
.iter()
.filter(|c| c.verdict == Verdict::Block || c.verdict == Verdict::Flag)
.collect();
if !actionable.is_empty() {
output.push_str("\nDetails:\n\n");
for conflict in actionable {
let verdict = verdict_label(conflict.verdict);
output.push_str(&format!(" {:<6} {}\n", verdict, conflict.claim.concept_path));
output.push_str(&format!(
" Your code: {} ({}:{})\n",
conflict.claim.description, conflict.claim.file, conflict.claim.line
));
for source in &conflict.conflicts {
output.push_str(&format!(
" {:?}: {:?} (Tier {})\n",
source.source_class,
source.value,
source.source_class.tier()
));
}
if let Some(ack) = &conflict.acknowledged {
output.push_str(&format!(
" Acknowledged: {} by {}: \"{}\"\n",
ack.timestamp, ack.by, ack.reason
));
} else if conflict.verdict == Verdict::Block {
output.push_str(
" Action: Fix or acknowledge with: aphoria ack <path> --reason \"...\"\n",
);
} else {
output.push_str(" Action: Review recommended\n");
}
output.push('\n');
}
}
// Footer summary
output.push_str(&format!(
"{} BLOCK, {} FLAG, {} PASS\n",
result.count_by_verdict(Verdict::Block),
result.count_by_verdict(Verdict::Flag),
result.count_by_verdict(Verdict::Pass),
));
output
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::types::{ConflictResult, ConflictingSource, ExtractedClaim};
use stemedb_core::types::{ObjectValue, SourceClass};
fn sample_result() -> ScanResult {
ScanResult {
project: "testproject".to_string(),
scan_id: "scan-123".to_string(),
files_scanned: 42,
claims_extracted: 5,
conflicts: vec![ConflictResult {
claim: ExtractedClaim {
concept_path: "code://rust/testproject/tls/cert_verification".to_string(),
predicate: "enabled".to_string(),
value: ObjectValue::Boolean(false),
file: "src/client.rs".to_string(),
line: 23,
matched_text: "danger_accept_invalid_certs(true)".to_string(),
confidence: 1.0,
description: "TLS verification disabled".to_string(),
},
conflicts: vec![ConflictingSource {
path: "rfc://5246/tls/cert_verification".to_string(),
source_class: SourceClass::Regulatory,
value: ObjectValue::Boolean(true),
confidence: 1.0,
}],
conflict_score: 0.92,
verdict: Verdict::Block,
acknowledged: None,
}],
format: "table".to_string(),
}
}
#[test]
fn test_table_with_conflicts() {
let formatter = TableReport;
let output = formatter.format(&sample_result());
assert!(output.contains("testproject"));
assert!(output.contains("BLOCK"));
assert!(output.contains("tls"));
assert!(output.contains("0.92"));
}
#[test]
fn test_table_empty_scan() {
let formatter = TableReport;
let result = ScanResult::stub(&std::path::PathBuf::from("empty"), "table");
let output = formatter.format(&result);
assert!(output.contains("No conflicts found"));
} }
} }

View File

@ -0,0 +1,192 @@
//! Gap detection for the Research Agent.
//!
//! Detects when extracted code claims have no matching authoritative coverage
//! in the corpus, indicating a potential gap in knowledge.
use std::collections::HashSet;
use tracing::{debug, instrument};
use crate::episteme::ConceptIndex;
use crate::types::ExtractedClaim;
/// A detected gap in authoritative coverage.
#[derive(Debug, Clone)]
pub struct Gap {
/// The concept path from the code claim (e.g., `code://rust/myapp/redis/max_memory_policy`).
pub concept_path: String,
/// The predicate from the claim.
pub predicate: String,
/// Normalized topic extracted from the concept path (e.g., `redis/max_memory_policy`).
pub topic: String,
/// The source file where the gap was detected.
pub source_file: String,
/// Line number in the source file.
pub source_line: usize,
/// The original claim description.
pub description: String,
/// Confidence of the extraction that led to this gap.
pub confidence: f32,
}
impl Gap {
/// Create a gap from an extracted claim.
pub fn from_claim(claim: &ExtractedClaim) -> Self {
let topic = extract_topic(&claim.concept_path);
Self {
concept_path: claim.concept_path.clone(),
predicate: claim.predicate.clone(),
topic,
source_file: claim.file.clone(),
source_line: claim.line,
description: claim.description.clone(),
confidence: claim.confidence,
}
}
/// Get a unique key for deduplication.
pub fn key(&self) -> String {
format!("{}::{}", self.topic, self.predicate)
}
}
/// Detect gaps in authoritative coverage.
///
/// Compares extracted claims against the authoritative corpus index and
/// identifies claims that have no matching authoritative source.
///
/// # Arguments
///
/// * `claims` - Extracted code claims to check
/// * `index` - Authoritative corpus concept index
///
/// # Returns
///
/// A vector of detected gaps, deduplicated by topic+predicate.
#[instrument(skip(claims, index), fields(claim_count = claims.len()))]
pub fn detect_gaps(claims: &[ExtractedClaim], index: &ConceptIndex) -> Vec<Gap> {
let mut gaps = Vec::new();
let mut seen_keys = HashSet::new();
for claim in claims {
// Check if there's any authoritative coverage for this claim
if index.lookup(&claim.concept_path, &claim.predicate).is_none() {
let gap = Gap::from_claim(claim);
let key = gap.key();
// Deduplicate by topic+predicate
if !seen_keys.contains(&key) {
debug!(
concept_path = %claim.concept_path,
predicate = %claim.predicate,
topic = %gap.topic,
"Gap detected: no authoritative coverage"
);
seen_keys.insert(key);
gaps.push(gap);
}
}
}
gaps
}
/// Extract a normalized topic from a concept path.
///
/// Takes the last 2 path segments after the scheme.
///
/// # Examples
///
/// - `code://rust/myapp/redis/max_memory_policy` → `redis/max_memory_policy`
/// - `code://go/server/http/read_timeout` → `http/read_timeout`
fn extract_topic(concept_path: &str) -> String {
// Split on "://" to separate scheme from path
let path = concept_path.find("://").map(|i| &concept_path[i + 3..]).unwrap_or(concept_path);
// Get last two non-empty segments
let segments: Vec<&str> = path.rsplit('/').filter(|s| !s.is_empty()).take(2).collect();
if segments.len() >= 2 {
format!("{}/{}", segments[1], segments[0])
} else if segments.len() == 1 {
segments[0].to_string()
} else {
path.to_string()
}
}
#[cfg(test)]
mod tests {
use super::*;
use stemedb_core::types::ObjectValue;
fn make_claim(concept_path: &str, predicate: &str) -> ExtractedClaim {
ExtractedClaim {
concept_path: concept_path.to_string(),
predicate: predicate.to_string(),
value: ObjectValue::Boolean(true),
file: "test.rs".to_string(),
line: 42,
matched_text: "test".to_string(),
confidence: 0.9,
description: "Test claim".to_string(),
}
}
#[test]
fn test_extract_topic() {
assert_eq!(
extract_topic("code://rust/myapp/redis/max_memory_policy"),
"redis/max_memory_policy"
);
assert_eq!(extract_topic("code://go/server/http/read_timeout"), "http/read_timeout");
assert_eq!(extract_topic("code://rust/tls/cert_verification"), "tls/cert_verification");
}
#[test]
fn test_gap_key() {
let claim = make_claim("code://rust/myapp/redis/max_memory_policy", "config_value");
let gap = Gap::from_claim(&claim);
assert_eq!(gap.key(), "redis/max_memory_policy::config_value");
}
#[test]
fn test_detect_gaps_empty_index() {
let claims = vec![
make_claim("code://rust/myapp/redis/max_memory_policy", "config_value"),
make_claim("code://rust/myapp/kafka/retention_ms", "config_value"),
];
// Empty index means no coverage
let index = ConceptIndex::build(&[]);
let gaps = detect_gaps(&claims, &index);
assert_eq!(gaps.len(), 2);
assert!(gaps.iter().any(|g| g.topic == "redis/max_memory_policy"));
assert!(gaps.iter().any(|g| g.topic == "kafka/retention_ms"));
}
#[test]
fn test_detect_gaps_deduplication() {
let claims = vec![
make_claim("code://rust/app1/redis/max_memory_policy", "config_value"),
make_claim("code://rust/app2/redis/max_memory_policy", "config_value"), // Same topic
make_claim("code://rust/app3/redis/max_memory_policy", "config_value"), // Same topic
];
let index = ConceptIndex::build(&[]);
let gaps = detect_gaps(&claims, &index);
// Should deduplicate to just one gap
assert_eq!(gaps.len(), 1);
assert_eq!(gaps[0].topic, "redis/max_memory_policy");
}
}

View File

@ -0,0 +1,111 @@
//! Research Agent for Aphoria.
//!
//! The Research Agent detects gaps in authoritative coverage and researches
//! official documentation to fill those gaps. This module provides:
//!
//! - **Gap Detection**: Identifies code claims with no authoritative coverage
//! - **Gap Storage**: Persists gaps with tracking metadata (project count, first seen)
//! - **Research Trigger**: Dispatches research when gaps reach threshold
//! - **Claim Extraction**: Parses official documentation for normative claims
//! - **Quality Validation**: Ensures extracted claims meet quality standards
//!
//! # Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────────────┐
//! │ Research Agent Flow │
//! │ │
//! │ ┌────────────┐ ┌──────────────┐ ┌─────────────────────────────┐│
//! │ │ Gap │──▶│ Gap Store │──▶│ Research Trigger ││
//! │ │ Detector │ │ (SQLite) │ │ (threshold: 3 projects) ││
//! │ └────────────┘ └──────────────┘ └─────────────────────────────┘│
//! │ │ │
//! │ ▼ │
//! │ ┌────────────────────────────────────────────────────────────────┐ │
//! │ │ Research Pipeline │ │
//! │ │ │ │
//! │ │ ┌───────────┐ ┌─────────────┐ ┌──────────────────────┐ │ │
//! │ │ │ Web │──▶│ Content │──▶│ Quality │ │ │
//! │ │ │ Fetcher │ │ Extractor │ │ Validator │ │ │
//! │ │ └───────────┘ └─────────────┘ └──────────────────────┘ │ │
//! │ │ │ │ │
//! │ │ ▼ │ │
//! │ │ ┌──────────────────────┐ │ │
//! │ │ │ Corpus Ingestion │ │ │
//! │ │ │ (if quality passes) │ │ │
//! │ │ └──────────────────────┘ │ │
//! │ └────────────────────────────────────────────────────────────────┘ │
//! └─────────────────────────────────────────────────────────────────────┘
//! ```
mod gap_detector;
mod gap_store;
mod quality;
mod researcher;
#[cfg(test)]
mod tests;
pub use gap_detector::{detect_gaps, Gap};
pub use gap_store::{GapRecord, GapStore};
pub use quality::{QualityReport, QualityValidator};
pub use researcher::{ResearchConfig, ResearchResult, Researcher};
use crate::AphoriaError;
/// Minimum number of projects that must report a gap before triggering research.
pub const DEFAULT_GAP_THRESHOLD: u32 = 3;
/// Maximum age of a gap (in days) before it's considered stale.
pub const DEFAULT_GAP_MAX_AGE_DAYS: u64 = 90;
/// Confidence threshold for accepting researched claims.
pub const DEFAULT_QUALITY_THRESHOLD: f32 = 0.7;
/// Result of a research operation.
#[derive(Debug)]
pub struct ResearchOutcome {
/// Number of gaps analyzed.
pub gaps_analyzed: usize,
/// Number of gaps successfully researched.
pub gaps_filled: usize,
/// Number of assertions created from research.
pub assertions_created: usize,
/// Gaps that could not be filled (insufficient quality).
pub gaps_failed: Vec<String>,
/// Detailed results per gap.
pub results: Vec<GapResearchResult>,
}
/// Result of researching a single gap.
#[derive(Debug)]
pub struct GapResearchResult {
/// The gap that was researched.
pub gap: String,
/// Whether research was successful.
pub success: bool,
/// Number of assertions created.
pub assertions_created: usize,
/// Quality report for the research.
pub quality_report: Option<QualityReport>,
/// Error message if research failed.
pub error: Option<String>,
}
impl ResearchOutcome {
/// Create an empty outcome.
pub fn empty() -> Self {
Self {
gaps_analyzed: 0,
gaps_filled: 0,
assertions_created: 0,
gaps_failed: vec![],
results: vec![],
}
}
/// Check if any research was successful.
pub fn has_results(&self) -> bool {
self.assertions_created > 0
}
}

View File

@ -0,0 +1,314 @@
//! Integration tests for Aphoria scan functionality.
use super::*;
#[tokio::test]
async fn test_scan_returns_result() {
let temp_dir = tempfile::tempdir().expect("create temp dir");
// Create a test file with a TLS issue
let src_dir = temp_dir.path().join("src");
std::fs::create_dir_all(&src_dir).expect("create src dir");
std::fs::write(
src_dir.join("client.rs"),
r#"
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(true)
.build()?;
"#,
)
.expect("write file");
// Create Cargo.toml so it's detected as a Rust project
std::fs::write(
temp_dir.path().join("Cargo.toml"),
r#"
[package]
name = "testproject"
version = "0.1.0"
"#,
)
.expect("write cargo.toml");
let args = ScanArgs {
path: temp_dir.path().to_path_buf(),
format: "table".to_string(),
exit_code_enabled: false,
};
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
let result = run_scan(args, &config).await.expect("scan should succeed");
assert!(result.files_scanned > 0);
assert!(result.claims_extracted > 0);
}
#[tokio::test]
async fn test_initialize_creates_corpus() {
// Use a unique temp dir to avoid conflicts with parallel tests
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_test_init")
.tempdir()
.expect("create temp dir");
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
// Create .aphoria directory for the agent key
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
// Open LocalEpisteme directly instead of using initialize()
// which relies on current_dir()
let mut episteme =
crate::episteme::LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
let signing_key = crate::bridge::load_or_generate_key(temp_dir.path()).expect("load key");
let corpus = crate::episteme::create_authoritative_corpus(&signing_key);
let ingested = episteme.ingest_authoritative(&corpus).await.expect("ingest");
episteme.shutdown().await;
assert!(ingested > 0);
assert!(config.episteme.data_dir.exists());
assert!(temp_dir.path().join(".aphoria").join("agent.key").exists());
}
#[tokio::test]
async fn test_acknowledge_succeeds() {
let temp_dir =
tempfile::Builder::new().prefix("aphoria_test_ack").tempdir().expect("create temp dir");
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
// Create .aphoria directory for the agent key
let aphoria_dir = temp_dir.path().join(".aphoria");
std::fs::create_dir_all(&aphoria_dir).expect("create .aphoria dir");
// Open LocalEpisteme and ingest an acknowledgement claim
let mut episteme =
crate::episteme::LocalEpisteme::open(&config, temp_dir.path()).await.expect("open");
let claim = ExtractedClaim {
concept_path: "code://rust/test/jwt/audience_validation".to_string(),
predicate: "acknowledged".to_string(),
value: stemedb_core::types::ObjectValue::Text("Internal service".to_string()),
file: "aphoria_ack".to_string(),
line: 0,
matched_text: "Acknowledged: Internal service".to_string(),
confidence: 1.0,
description: "Conflict acknowledged: Internal service".to_string(),
};
let result = episteme.ingest_claims(&[claim]).await;
episteme.shutdown().await;
assert!(result.is_ok());
}
#[tokio::test]
async fn test_status_before_init() {
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_test_status")
.tempdir()
.expect("create temp dir");
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join("nonexistent");
// Manually check status logic without relying on current_dir()
let data_dir = &config.episteme.data_dir;
let status = if !data_dir.exists() { "Not initialized" } else { "Initialized" };
assert!(status.contains("Not initialized"));
}
// ==========================================================================
// Integration tests for conflict detection (Phase 2A)
// ==========================================================================
#[tokio::test]
async fn test_conflict_detection_tls_disabled() {
// Create temp project with danger_accept_invalid_certs(true)
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_tls_conflict")
.tempdir()
.expect("create temp dir");
let src_dir = temp_dir.path().join("src");
std::fs::create_dir_all(&src_dir).expect("create src dir");
// Write a Rust file with TLS verification disabled
std::fs::write(
src_dir.join("client.rs"),
r#"
fn create_client() -> Result<Client, Error> {
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(true)
.build()?;
Ok(client)
}
"#,
)
.expect("write file");
// Create Cargo.toml so it's detected as a Rust project
std::fs::write(
temp_dir.path().join("Cargo.toml"),
r#"
[package]
name = "testproject"
version = "0.1.0"
"#,
)
.expect("write cargo.toml");
let args = ScanArgs {
path: temp_dir.path().to_path_buf(),
format: "table".to_string(),
exit_code_enabled: true,
};
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
let result = run_scan(args, &config).await.expect("scan should succeed");
// Assert: conflicts not empty, has_blocks() == true
assert!(
!result.conflicts.is_empty(),
"Should detect conflicts for TLS verification disabled. \
Claims extracted: {}, Files scanned: {}",
result.claims_extracted,
result.files_scanned
);
assert!(
result.has_blocks(),
"TLS verification disabled should be a BLOCK verdict. \
Conflicts: {:?}",
result.conflicts.iter().map(|c| (&c.claim.concept_path, &c.verdict)).collect::<Vec<_>>()
);
}
#[tokio::test]
async fn test_conflict_detection_jwt_audience_disabled() {
// Create temp project with JWT audience validation disabled
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_jwt_conflict")
.tempdir()
.expect("create temp dir");
let src_dir = temp_dir.path().join("src");
std::fs::create_dir_all(&src_dir).expect("create src dir");
// Write a Rust file with JWT audience validation disabled
std::fs::write(
src_dir.join("auth.rs"),
r#"
fn validate_token(token: &str) -> Result<Claims, Error> {
let mut validation = Validation::default();
validation.validate_aud = false; // Disabled!
let token_data = decode::<Claims>(token, &key, &validation)?;
Ok(token_data.claims)
}
"#,
)
.expect("write file");
// Create Cargo.toml
std::fs::write(
temp_dir.path().join("Cargo.toml"),
r#"
[package]
name = "testproject"
version = "0.1.0"
"#,
)
.expect("write cargo.toml");
let args = ScanArgs {
path: temp_dir.path().to_path_buf(),
format: "table".to_string(),
exit_code_enabled: true,
};
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
let result = run_scan(args, &config).await.expect("scan should succeed");
// Assert: conflicts not empty for JWT audience validation
assert!(
!result.conflicts.is_empty(),
"Should detect conflicts for JWT audience validation disabled. \
Claims extracted: {}, Files scanned: {}",
result.claims_extracted,
result.files_scanned
);
// Check that at least one conflict is about JWT audience
let has_jwt_conflict = result.conflicts.iter().any(|c| {
c.claim.concept_path.contains("jwt") && c.claim.concept_path.contains("audience")
});
assert!(
has_jwt_conflict,
"Should have a conflict about JWT audience validation. \
Conflicts: {:?}",
result.conflicts.iter().map(|c| &c.claim.concept_path).collect::<Vec<_>>()
);
}
#[tokio::test]
async fn test_no_conflicts_when_compliant() {
// Create temp project with compliant code (no dangerous patterns)
let temp_dir = tempfile::Builder::new()
.prefix("aphoria_compliant")
.tempdir()
.expect("create temp dir");
let src_dir = temp_dir.path().join("src");
std::fs::create_dir_all(&src_dir).expect("create src dir");
// Write a Rust file with compliant code
std::fs::write(
src_dir.join("main.rs"),
r#"
fn main() {
println!("Hello, world!");
}
"#,
)
.expect("write file");
// Create Cargo.toml
std::fs::write(
temp_dir.path().join("Cargo.toml"),
r#"
[package]
name = "testproject"
version = "0.1.0"
"#,
)
.expect("write cargo.toml");
let args = ScanArgs {
path: temp_dir.path().to_path_buf(),
format: "table".to_string(),
exit_code_enabled: true,
};
let mut config = AphoriaConfig::default();
config.episteme.data_dir = temp_dir.path().join(".aphoria").join("db");
let result = run_scan(args, &config).await.expect("scan should succeed");
// No dangerous patterns = no claims = no conflicts
assert!(
result.conflicts.is_empty(),
"Compliant code should have no conflicts. Found: {:?}",
result.conflicts.iter().map(|c| &c.claim.concept_path).collect::<Vec<_>>()
);
}

View File

@ -1,6 +1,4 @@
//! Core types for Aphoria. //! Core types for Aphoria.
// Skeleton phase: allow unused until scan pipeline is wired up
#![allow(dead_code)]
use std::fmt; use std::fmt;
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
@ -79,105 +77,6 @@ impl ScanResult {
pub fn count_by_verdict(&self, verdict: Verdict) -> usize { pub fn count_by_verdict(&self, verdict: Verdict) -> usize {
self.conflicts.iter().filter(|c| c.verdict == verdict).count() self.conflicts.iter().filter(|c| c.verdict == verdict).count()
} }
/// Format the result for display.
pub fn display(&self) -> String {
match self.format.as_str() {
"json" => self.display_json(),
"sarif" => self.display_sarif(),
"markdown" => self.display_markdown(),
_ => self.display_table(),
}
}
fn display_table(&self) -> String {
let mut output = String::new();
output.push_str(&format!("Scanning {} ...\n\n", self.project));
if self.conflicts.is_empty() {
output.push_str("No conflicts found.\n");
} else {
for conflict in &self.conflicts {
output.push_str(&format!("{}\n\n", conflict));
}
}
output.push_str(&format!(
"{} files scanned, {} claims extracted, {} conflicts ({} BLOCK, {} FLAG)\n",
self.files_scanned,
self.claims_extracted,
self.conflicts.len(),
self.count_by_verdict(Verdict::Block),
self.count_by_verdict(Verdict::Flag),
));
output
}
fn display_json(&self) -> String {
// TODO: Implement JSON output
serde_json::json!({
"project": self.project,
"scan_id": self.scan_id,
"summary": {
"files_scanned": self.files_scanned,
"claims_extracted": self.claims_extracted,
"conflicts": self.conflicts.len(),
"blocks": self.count_by_verdict(Verdict::Block),
"flags": self.count_by_verdict(Verdict::Flag),
},
"conflicts": []
})
.to_string()
}
fn display_sarif(&self) -> String {
// TODO: Implement SARIF output
serde_json::json!({
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "aphoria",
"version": env!("CARGO_PKG_VERSION"),
}
},
"results": []
}]
})
.to_string()
}
fn display_markdown(&self) -> String {
let mut output = String::new();
output.push_str(&format!("# Aphoria Scan: {}\n\n", self.project));
output.push_str(&format!(
"**Summary:** {} files, {} claims, {} conflicts\n\n",
self.files_scanned,
self.claims_extracted,
self.conflicts.len()
));
if self.conflicts.is_empty() {
output.push_str("No conflicts found.\n");
} else {
output.push_str("## Conflicts\n\n");
for conflict in &self.conflicts {
output.push_str(&format!("### {}\n\n", conflict.claim.concept_path));
output.push_str(&format!("- **Verdict:** {:?}\n", conflict.verdict));
output.push_str(&format!("- **Score:** {:.2}\n", conflict.conflict_score));
output.push_str(&format!(
"- **File:** {}:{}\n\n",
conflict.claim.file, conflict.claim.line
));
}
}
output
}
} }
/// A claim extracted from source code. /// A claim extracted from source code.

View File

@ -1,5 +1,4 @@
//! Language detection for projects. //! Language detection for projects.
#![allow(dead_code)]
use std::path::Path; use std::path::Path;
@ -11,6 +10,10 @@ use crate::types::Language;
/// 1. Explicit language in config (handled by caller) /// 1. Explicit language in config (handled by caller)
/// 2. Presence of language-specific manifest files /// 2. Presence of language-specific manifest files
/// 3. File count heuristic (most common extension) /// 3. File count heuristic (most common extension)
///
/// Not yet wired into the scan pipeline; will be used when
/// auto-detection replaces the config-based language setting.
#[allow(dead_code)]
pub fn detect_project_language(root: &Path) -> Language { pub fn detect_project_language(root: &Path) -> Language {
// Check for manifest files // Check for manifest files
if root.join("Cargo.toml").exists() { if root.join("Cargo.toml").exists() {

View File

@ -1,6 +1,4 @@
//! Project walker for traversing and analyzing codebases. //! Project walker for traversing and analyzing codebases.
// Skeleton phase: allow unused until scan pipeline is wired up
#![allow(dead_code)]
//! //!
//! The walker: //! The walker:
//! 1. Traverses the project directory (respecting .gitignore) //! 1. Traverses the project directory (respecting .gitignore)
@ -11,7 +9,6 @@
mod language; mod language;
mod path_mapper; mod path_mapper;
pub use language::detect_project_language;
pub use path_mapper::PathMapper; pub use path_mapper::PathMapper;
use std::path::Path; use std::path::Path;

View File

@ -1,5 +1,4 @@
//! Path mapping from file paths to ConceptPath segments. //! Path mapping from file paths to ConceptPath segments.
#![allow(dead_code)]
use std::path::Path; use std::path::Path;

View File

@ -0,0 +1,157 @@
//! Data Transfer Objects for admission control endpoints.
use serde::{Deserialize, Serialize};
use utoipa::ToSchema;
/// Trust tier names for API responses.
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "PascalCase")]
pub enum TrustTierDto {
/// Untrusted tier: 0.0-0.3 trust score.
Untrusted,
/// Limited tier: 0.3-0.5 trust score.
Limited,
/// Verified tier: 0.5-0.7 trust score.
Verified,
/// Trusted tier: 0.7-0.9 trust score.
Trusted,
/// Authority tier: 0.9-1.0 trust score.
Authority,
}
impl From<stemedb_core::types::TrustTier> for TrustTierDto {
fn from(tier: stemedb_core::types::TrustTier) -> Self {
match tier {
stemedb_core::types::TrustTier::Untrusted => TrustTierDto::Untrusted,
stemedb_core::types::TrustTier::Limited => TrustTierDto::Limited,
stemedb_core::types::TrustTier::Verified => TrustTierDto::Verified,
stemedb_core::types::TrustTier::Trusted => TrustTierDto::Trusted,
stemedb_core::types::TrustTier::Authority => TrustTierDto::Authority,
}
}
}
/// Response for GET /v1/admission/status.
///
/// Contains the agent's current admission status including trust tier,
/// quota multipliers, and PoW requirements.
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
pub struct AdmissionStatusResponse {
/// Agent's Ed25519 public key (hex-encoded).
pub agent_id: String,
/// Agent's trust tier.
pub tier: TrustTierDto,
/// Agent's current trust score (0.0 to 1.0).
pub trust_score: f32,
/// Total number of assertions made by this agent.
pub assertions_count: u64,
/// Required PoW difficulty in bits (0 = exempt).
pub pow_difficulty: u8,
/// Whether PoW is required for this agent's next submission.
pub pow_required: bool,
/// Base quota limit per hour (before tier multiplier).
pub base_quota_limit: u64,
/// Effective quota limit per hour (after tier multiplier).
pub effective_quota_limit: u64,
/// Quota multiplier for this tier.
pub quota_multiplier: f32,
/// Number of assertions until reduced PoW difficulty (or null if not applicable).
#[serde(skip_serializing_if = "Option::is_none")]
pub assertions_until_reduced_difficulty: Option<u64>,
/// Number of assertions until PoW exemption (or null if already exempt).
#[serde(skip_serializing_if = "Option::is_none")]
pub assertions_until_exemption: Option<u64>,
}
impl AdmissionStatusResponse {
/// Create a response from admission status.
pub fn from_status(
agent_id: String,
status: &stemedb_storage::AdmissionStatus,
config: &stemedb_core::types::AdmissionConfig,
) -> Self {
// Calculate assertions until milestones
let assertions_until_reduced_difficulty =
if status.assertions_count < config.initial_threshold {
Some(config.initial_threshold - status.assertions_count)
} else {
None
};
let assertions_until_exemption = if status.assertions_count < config.graduated_threshold
&& status.trust_score < config.trust_exemption_score
{
Some(config.graduated_threshold - status.assertions_count)
} else {
None
};
Self {
agent_id,
tier: status.tier.into(),
trust_score: status.trust_score,
assertions_count: status.assertions_count,
pow_difficulty: status.pow_difficulty,
pow_required: status.pow_required,
base_quota_limit: status.base_quota_limit,
effective_quota_limit: status.effective_quota_limit,
quota_multiplier: status.quota_multiplier,
assertions_until_reduced_difficulty,
assertions_until_exemption,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use stemedb_core::types::{AdmissionConfig, TrustTier};
use stemedb_storage::AdmissionStatus;
#[test]
fn test_tier_dto_conversion() {
assert_eq!(TrustTierDto::from(TrustTier::Untrusted), TrustTierDto::Untrusted);
assert_eq!(TrustTierDto::from(TrustTier::Limited), TrustTierDto::Limited);
assert_eq!(TrustTierDto::from(TrustTier::Verified), TrustTierDto::Verified);
assert_eq!(TrustTierDto::from(TrustTier::Trusted), TrustTierDto::Trusted);
assert_eq!(TrustTierDto::from(TrustTier::Authority), TrustTierDto::Authority);
}
#[test]
fn test_response_from_status_new_agent() {
let status = AdmissionStatus::new(0.5, 0, 16);
let config = AdmissionConfig::default();
let response = AdmissionStatusResponse::from_status("abc123".to_string(), &status, &config);
assert_eq!(response.tier, TrustTierDto::Verified);
assert_eq!(response.pow_difficulty, 16);
assert!(response.pow_required);
assert_eq!(response.assertions_until_reduced_difficulty, Some(10));
assert_eq!(response.assertions_until_exemption, Some(50));
}
#[test]
fn test_response_from_status_graduated_agent() {
let status = AdmissionStatus::new(0.7, 100, 0);
let config = AdmissionConfig::default();
let response = AdmissionStatusResponse::from_status("abc123".to_string(), &status, &config);
assert_eq!(response.tier, TrustTierDto::Trusted);
assert_eq!(response.pow_difficulty, 0);
assert!(!response.pow_required);
assert_eq!(response.assertions_until_reduced_difficulty, None);
assert_eq!(response.assertions_until_exemption, None);
}
}

View File

@ -10,6 +10,7 @@
//! - Internal types → Response DTOs (encode bytes to hex) //! - Internal types → Response DTOs (encode bytes to hex)
// Module declarations // Module declarations
pub mod admission;
pub mod advanced; pub mod advanced;
pub mod audit; pub mod audit;
pub mod concepts; pub mod concepts;
@ -69,6 +70,9 @@ pub use gold_standard::{
GoldStandardListResponse, VerificationResult, VerifyAgentRequest, GoldStandardListResponse, VerificationResult, VerifyAgentRequest,
}; };
// From admission module
pub use admission::{AdmissionStatusResponse, TrustTierDto};
// From concepts module // From concepts module
pub use concepts::{ pub use concepts::{
AliasMapping, AliasOriginDto, AliasResponse, AliasSuggestion, ConceptPathInfo, AliasMapping, AliasOriginDto, AliasResponse, AliasSuggestion, ConceptPathInfo,

View File

@ -0,0 +1,67 @@
//! Handler for admission control status endpoint.
use axum::{
extract::{Query, State},
Json,
};
use serde::Deserialize;
use stemedb_storage::AdmissionStore;
use tracing::instrument;
use utoipa::{IntoParams, ToSchema};
use crate::{
dto::admission::AdmissionStatusResponse, hex::decode_agent_id, state::AppState, Result,
};
/// Query parameters for admission status.
#[derive(Debug, Clone, Deserialize, IntoParams, ToSchema)]
pub struct AdmissionStatusParams {
/// Agent's Ed25519 public key (hex-encoded, 64 chars)
pub agent_id: String,
}
/// Get admission status for an agent.
///
/// Returns the agent's current admission status including trust tier,
/// PoW requirements, and quota multipliers based on their reputation
/// score and assertion count.
///
/// # Response Headers
///
/// When admission middleware is enabled, responses also include:
/// - `X-Trust-Tier`: Agent's trust tier name
/// - `X-PoW-Required`: "true" or "false"
/// - `X-PoW-Difficulty`: Required difficulty in bits
/// - `X-Quota-Multiplier`: Tier quota multiplier
///
/// # Graduation Milestones
///
/// The response includes how many more assertions are needed to reach
/// reduced difficulty (10 assertions) or exemption (50 assertions).
#[utoipa::path(
get,
path = "/v1/admission/status",
params(AdmissionStatusParams),
responses(
(status = 200, description = "Admission status retrieved", body = AdmissionStatusResponse),
(status = 400, description = "Invalid agent_id format"),
),
tag = "admission"
)]
#[instrument(skip(state), fields(agent_id = %params.agent_id))]
pub async fn get_admission_status(
State(state): State<AppState>,
Query(params): Query<AdmissionStatusParams>,
) -> Result<Json<AdmissionStatusResponse>> {
// Decode agent ID from hex
let agent_id = decode_agent_id(&params.agent_id)?;
// Get admission status
let status = state.admission_store.get_admission_status(&agent_id).await?;
let config = state.admission_store.config();
// Build response
let response = AdmissionStatusResponse::from_status(params.agent_id, &status, config);
Ok(Json(response))
}

View File

@ -10,7 +10,9 @@ use crate::{
state::AppState, state::AppState,
}; };
use stemedb_core::types::{Assertion, LifecycleStage, ObjectValue, SignatureEntry, SourceClass}; use stemedb_core::types::{
Assertion, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass,
};
use stemedb_ingest::worker::serialize_assertion; use stemedb_ingest::worker::serialize_assertion;
/// Create a new assertion in the knowledge graph. /// Create a new assertion in the knowledge graph.
@ -95,6 +97,10 @@ fn dto_to_assertion(req: CreateAssertionRequest) -> Result<Assertion> {
.map_err(|e| ApiError::Serialization(format!("Failed to get timestamp: {}", e)))? .map_err(|e| ApiError::Serialization(format!("Failed to get timestamp: {}", e)))?
.as_secs(); .as_secs();
// Generate HLC timestamp for distributed causal ordering
// In a full implementation, this would use a shared HLC clock
let hlc_timestamp = HlcTimestamp::default();
Ok(Assertion { Ok(Assertion {
subject: req.subject, subject: req.subject,
predicate: req.predicate, predicate: req.predicate,
@ -109,6 +115,7 @@ fn dto_to_assertion(req: CreateAssertionRequest) -> Result<Assertion> {
signatures, signatures,
confidence: req.confidence, confidence: req.confidence,
timestamp, timestamp,
hlc_timestamp,
vector: req.vector, vector: req.vector,
}) })
} }

View File

@ -16,6 +16,7 @@
//! This pattern is enforced by OpenAPI annotations and integration tests. //! This pattern is enforced by OpenAPI annotations and integration tests.
pub mod admin; pub mod admin;
pub mod admission;
pub mod assert; pub mod assert;
pub mod audit; pub mod audit;
pub mod concepts; pub mod concepts;
@ -34,6 +35,7 @@ pub mod trace;
pub mod vote; pub mod vote;
pub use admin::decay_trust_ranks; pub use admin::decay_trust_ranks;
pub use admission::get_admission_status;
pub use assert::create_assertion; pub use assert::create_assertion;
pub use audit::{get_audit, list_audits}; pub use audit::{get_audit, list_audits};
pub use constraints::constraints_query; pub use constraints::constraints_query;

View File

@ -45,12 +45,13 @@ use utoipa::OpenApi;
use utoipa_swagger_ui::SwaggerUi; use utoipa_swagger_ui::SwaggerUi;
pub use error::{ApiError, Result}; pub use error::{ApiError, Result};
pub use middleware::{MeterLayer, MeterService}; pub use middleware::{AdmissionLayer, AdmissionService, MeterLayer, MeterService};
pub use state::AppState; pub use state::AppState;
// Re-export the path items for OpenAPI // Re-export the path items for OpenAPI
use handlers::{ use handlers::{
admin::__path_decay_trust_ranks, admin::__path_decay_trust_ranks,
admission::__path_get_admission_status,
assert::__path_create_assertion, assert::__path_create_assertion,
audit::{__path_get_audit, __path_list_audits}, audit::{__path_get_audit, __path_list_audits},
concepts::{ concepts::{
@ -79,6 +80,7 @@ use handlers::{
#[derive(OpenApi)] #[derive(OpenApi)]
#[openapi( #[openapi(
paths( paths(
get_admission_status,
create_assertion, create_assertion,
create_epoch, create_epoch,
create_vote, create_vote,
@ -176,6 +178,10 @@ use handlers::{
dto::AliasSuggestion, dto::AliasSuggestion,
dto::SuggestAliasesResponse, dto::SuggestAliasesResponse,
dto::ConceptPathInfo, dto::ConceptPathInfo,
// Admission control
dto::AdmissionStatusResponse,
dto::TrustTierDto,
handlers::admission::AdmissionStatusParams,
) )
), ),
tags( tags(
@ -190,6 +196,7 @@ use handlers::{
(name = "provenance", description = "Source document storage and retrieval"), (name = "provenance", description = "Source document storage and retrieval"),
(name = "admin", description = "Administrative operations for system maintenance"), (name = "admin", description = "Administrative operations for system maintenance"),
(name = "concepts", description = "ConceptPath and alias management for cross-scheme resolution"), (name = "concepts", description = "ConceptPath and alias management for cross-scheme resolution"),
(name = "admission", description = "Admission control and PoW requirements"),
), ),
info( info(
title = "Episteme (StemeDB) API", title = "Episteme (StemeDB) API",
@ -242,6 +249,8 @@ pub fn create_router(state: AppState) -> Router {
.route("/v1/concepts/aliases", get(handlers::list_aliases)) .route("/v1/concepts/aliases", get(handlers::list_aliases))
.route("/v1/concepts/suggest", get(handlers::suggest_aliases)) .route("/v1/concepts/suggest", get(handlers::suggest_aliases))
.route("/v1/concepts/parse", get(handlers::parse_concept_path)) .route("/v1/concepts/parse", get(handlers::parse_concept_path))
// Admission control endpoints
.route("/v1/admission/status", get(handlers::get_admission_status))
.with_state(state) .with_state(state)
.layer(TraceLayer::new_for_http()); .layer(TraceLayer::new_for_http());
@ -304,6 +313,8 @@ pub fn create_router_with_meter(state: AppState) -> Router {
.route("/v1/concepts/aliases", get(handlers::list_aliases)) .route("/v1/concepts/aliases", get(handlers::list_aliases))
.route("/v1/concepts/suggest", get(handlers::suggest_aliases)) .route("/v1/concepts/suggest", get(handlers::suggest_aliases))
.route("/v1/concepts/parse", get(handlers::parse_concept_path)) .route("/v1/concepts/parse", get(handlers::parse_concept_path))
// Admission control endpoints
.route("/v1/admission/status", get(handlers::get_admission_status))
.with_state(state) .with_state(state)
.layer(meter_layer) .layer(meter_layer)
.layer(TraceLayer::new_for_http()); .layer(TraceLayer::new_for_http());
@ -313,3 +324,98 @@ pub fn create_router_with_meter(state: AppState) -> Router {
.merge(SwaggerUi::new("/swagger-ui").url("/api-docs/openapi.json", ApiDoc::openapi())) .merge(SwaggerUi::new("/swagger-ui").url("/api-docs/openapi.json", ApiDoc::openapi()))
.merge(api_router) .merge(api_router)
} }
/// Create the axum router with full admission control enabled (The Shield + The Meter).
///
/// This router enforces both proof-of-work admission control AND economic throttling.
/// New/untrusted agents must solve PoW puzzles before their assertions are accepted,
/// and all agents are subject to quota limits based on their trust tier.
///
/// # Admission Control (The Shield)
///
/// - First 10 assertions: 16-bit PoW (~16 seconds to solve)
/// - Assertions 11-50: 1-bit PoW (trivial)
/// - 50+ assertions OR trust > 0.6: PoW exempt
///
/// # Trust Tiers
///
/// | Trust Range | Tier | Quota Multiplier |
/// |-------------|------------|------------------|
/// | 0.0-0.3 | Untrusted | 0.1x (1,000/hr) |
/// | 0.3-0.5 | Limited | 0.5x (5,000/hr) |
/// | 0.5-0.7 | Verified | 1.0x (10,000/hr) |
/// | 0.7-0.9 | Trusted | 2.0x (20,000/hr) |
/// | 0.9-1.0 | Authority | 10.0x (100k/hr) |
///
/// # Headers
///
/// **Request headers:**
/// - `X-Agent-Id`: Agent's Ed25519 public key (hex, 64 chars)
/// - `X-PoW-Nonce`: PoW solution nonce (decimal, required if PoW needed)
/// - `X-PoW-Timestamp`: PoW timestamp (Unix seconds, required if PoW needed)
///
/// **Response headers:**
/// - `X-Trust-Tier`: Agent's trust tier name
/// - `X-PoW-Required`: "true" or "false"
/// - `X-PoW-Difficulty`: Required difficulty in bits
/// - `X-Quota-Remaining`: Tokens left in current window
/// - `X-Quota-Limit`: Total tokens per hour
/// - `X-Quota-Reset`: Unix timestamp when window resets
pub fn create_router_with_admission(state: AppState) -> Router {
use std::sync::Arc;
// Create AdmissionLayer with the admission store from state
let admission_layer = AdmissionLayer::new(Arc::clone(&state.admission_store));
// Create MeterLayer with the quota store from state
let meter_layer = MeterLayer::new(Arc::clone(&state.quota_store));
// Build the API router with admission control and metering
// Layer order: admission (outer) -> meter (inner)
// This means: check PoW first, then check quota
let api_router = Router::new()
.route("/v1/assert", post(handlers::create_assertion))
.route("/v1/epoch", post(handlers::create_epoch))
.route("/v1/vote", post(handlers::create_vote))
.route("/v1/query", get(handlers::query_assertions))
.route("/v1/skeptic", get(handlers::skeptic_query))
.route("/v1/layered", get(handlers::layered_query))
.route("/v1/constraints", get(handlers::constraints_query))
.route("/v1/health", get(handlers::health_check))
.route("/v1/audit/queries", get(handlers::list_audits))
.route("/v1/audit/query/{id}", get(handlers::get_audit))
.route("/v1/trace", get(handlers::trace))
.route("/v1/supersede", post(handlers::supersede))
.route("/v1/meter/quota", get(handlers::get_quota_status))
.route("/v1/meter/quota/limit", post(handlers::set_quota_limit))
.route("/v1/source", post(handlers::store_source))
.route("/v1/provenance/{hash}", get(handlers::get_provenance))
.route("/v1/admin/decay-trust-ranks", post(handlers::decay_trust_ranks))
.route("/v1/admin/escalations", get(handlers::list_escalations))
.route("/v1/admin/escalations/:id/resolve", post(handlers::resolve_escalation))
.route("/v1/admin/gold-standards", post(handlers::create_gold_standard))
.route("/v1/admin/gold-standards", get(handlers::list_gold_standards))
.route(
"/v1/admin/gold-standards/:subject/:predicate",
axum::routing::delete(handlers::remove_gold_standard),
)
.route("/v1/admin/verify-agent", post(handlers::verify_agent))
// Concept hierarchy and alias endpoints
.route("/v1/concepts/alias", post(handlers::create_alias))
.route("/v1/concepts/alias", axum::routing::delete(handlers::delete_alias))
.route("/v1/concepts/resolve", get(handlers::resolve_alias))
.route("/v1/concepts/aliases", get(handlers::list_aliases))
.route("/v1/concepts/suggest", get(handlers::suggest_aliases))
.route("/v1/concepts/parse", get(handlers::parse_concept_path))
// Admission control endpoints
.route("/v1/admission/status", get(handlers::get_admission_status))
.with_state(state)
.layer(meter_layer) // Inner: runs second (check quota)
.layer(admission_layer) // Outer: runs first (check PoW)
.layer(TraceLayer::new_for_http());
// Mount Swagger UI
Router::new()
.merge(SwaggerUi::new("/swagger-ui").url("/api-docs/openapi.json", ApiDoc::openapi()))
.merge(api_router)
}

View File

@ -0,0 +1,443 @@
//! Admission control middleware (The Shield).
//!
//! This middleware enforces proof-of-work requirements for new/untrusted agents.
//! It extracts the agent ID from the `X-Agent-Id` header, checks admission status,
//! and verifies PoW proofs when required.
//!
//! # Request Flow
//!
//! 1. Extract `X-Agent-Id` header (hex-encoded 32-byte public key)
//! 2. Get admission status (tier, PoW requirement)
//! 3. If PoW required:
//! - Extract `X-PoW-Nonce` and `X-PoW-Timestamp` headers
//! - Verify proof meets difficulty requirement
//! - Return 428 if invalid/missing
//! 4. Store admission status in request extensions (for MeterLayer)
//! 5. Add response headers (`X-Trust-Tier`, `X-PoW-Required`, etc.)
//!
//! # Headers
//!
//! | Header | Direction | Description |
//! |--------|-----------|-------------|
//! | `X-Agent-Id` | Request | Agent's Ed25519 public key (hex, 64 chars) |
//! | `X-PoW-Nonce` | Request | PoW solution nonce (decimal) |
//! | `X-PoW-Timestamp` | Request | PoW solution timestamp (Unix seconds) |
//! | `X-Trust-Tier` | Response | Agent's trust tier name |
//! | `X-PoW-Required` | Response | "true" or "false" |
//! | `X-PoW-Difficulty` | Response | Required difficulty in bits |
//! | `X-Quota-Multiplier` | Response | Tier quota multiplier |
use axum::{
body::Body,
http::{Request, Response, StatusCode},
response::IntoResponse,
Json,
};
use futures::future::BoxFuture;
use serde::Serialize;
use std::sync::Arc;
use std::task::{Context, Poll};
use stemedb_core::types::PowProof;
use stemedb_storage::{AdmissionCheck, AdmissionStatus, AdmissionStatusResult};
use tower::{Layer, Service};
use tracing::{debug, warn};
/// Header name for agent identification (shared with MeterLayer).
pub const AGENT_ID_HEADER: &str = "x-agent-id";
/// Header name for PoW nonce.
pub const POW_NONCE_HEADER: &str = "x-pow-nonce";
/// Header name for PoW timestamp.
pub const POW_TIMESTAMP_HEADER: &str = "x-pow-timestamp";
/// Response header for trust tier.
pub const TRUST_TIER_HEADER: &str = "x-trust-tier";
/// Response header indicating whether PoW is required.
pub const POW_REQUIRED_HEADER: &str = "x-pow-required";
/// Response header for PoW difficulty.
pub const POW_DIFFICULTY_HEADER: &str = "x-pow-difficulty";
/// Response header for quota multiplier.
pub const QUOTA_MULTIPLIER_HEADER: &str = "x-quota-multiplier";
/// HTTP 428 Precondition Required - PoW needed.
const HTTP_PRECONDITION_REQUIRED: u16 = 428;
/// Error response for PoW required.
#[derive(Debug, Serialize)]
struct PowRequiredError {
/// Human-readable error message.
error: String,
/// Error code for programmatic handling.
code: String,
/// Required PoW difficulty in bits.
required_difficulty: u8,
/// Whether PoW is required.
pow_required: bool,
/// Number of assertions the agent has made.
agent_assertions: u64,
/// Agent's current trust score.
agent_trust_score: f32,
/// Optional detailed error message (for failed verification).
#[serde(skip_serializing_if = "Option::is_none")]
verification_error: Option<String>,
}
/// Tower Layer for admission control.
///
/// Wrap your router with this layer to enable PoW-based admission control.
/// This layer should be applied BEFORE the MeterLayer so that PoW is checked
/// before quota is consumed.
///
/// # Example
///
/// ```ignore
/// let admission_layer = AdmissionLayer::new(admission_store);
/// let meter_layer = MeterLayer::new(quota_store);
///
/// let app = Router::new()
/// .route("/v1/assert", post(create_assertion))
/// .layer(meter_layer) // Inner: runs second
/// .layer(admission_layer) // Outer: runs first
/// ```
#[derive(Clone)]
pub struct AdmissionLayer<A> {
admission_store: Arc<A>,
/// Paths that bypass admission control (e.g., health checks, status endpoint).
bypass_paths: Vec<String>,
}
impl<A> AdmissionLayer<A> {
/// Create a new AdmissionLayer.
pub fn new(admission_store: Arc<A>) -> Self {
Self {
admission_store,
bypass_paths: vec![
"/v1/health".to_string(),
"/v1/admission/status".to_string(),
"/swagger-ui".to_string(),
"/api-docs".to_string(),
],
}
}
/// Add a path to bypass admission control.
pub fn bypass_path(mut self, path: impl Into<String>) -> Self {
self.bypass_paths.push(path.into());
self
}
}
impl<S, A> Layer<S> for AdmissionLayer<A>
where
A: Clone,
{
type Service = AdmissionService<S, A>;
fn layer(&self, inner: S) -> Self::Service {
AdmissionService {
inner,
admission_store: Arc::clone(&self.admission_store),
bypass_paths: self.bypass_paths.clone(),
}
}
}
/// Tower Service for admission control.
#[derive(Clone)]
pub struct AdmissionService<S, A> {
inner: S,
admission_store: Arc<A>,
bypass_paths: Vec<String>,
}
impl<S, A> AdmissionService<S, A> {
/// Check if path should bypass admission control.
#[allow(dead_code)] // Used in tests
fn should_bypass(&self, path: &str) -> bool {
self.bypass_paths.iter().any(|p| path.starts_with(p))
}
/// Extract agent ID from request headers.
fn extract_agent_id(req: &Request<Body>) -> Option<[u8; 32]> {
req.headers().get(AGENT_ID_HEADER).and_then(|v| v.to_str().ok()).and_then(|s| {
let bytes = hex::decode(s).ok()?;
if bytes.len() == 32 {
let mut arr = [0u8; 32];
arr.copy_from_slice(&bytes);
Some(arr)
} else {
None
}
})
}
/// Extract PoW proof from request headers.
fn extract_pow_proof(req: &Request<Body>, agent_id: [u8; 32]) -> Option<PowProof> {
let nonce_str = req.headers().get(POW_NONCE_HEADER)?.to_str().ok()?;
let timestamp_str = req.headers().get(POW_TIMESTAMP_HEADER)?.to_str().ok()?;
let nonce: u64 = nonce_str.parse().ok()?;
let timestamp: u64 = timestamp_str.parse().ok()?;
Some(PowProof::new(nonce, agent_id, timestamp))
}
/// Add admission headers to response.
fn add_response_headers(response: &mut Response<Body>, status: &AdmissionStatus) {
let headers = response.headers_mut();
if let Ok(v) = status.tier.name().parse() {
headers.insert(TRUST_TIER_HEADER, v);
}
if let Ok(v) = status.pow_required.to_string().parse() {
headers.insert(POW_REQUIRED_HEADER, v);
}
if let Ok(v) = status.pow_difficulty.to_string().parse() {
headers.insert(POW_DIFFICULTY_HEADER, v);
}
if let Ok(v) = format!("{:.1}", status.quota_multiplier).parse() {
headers.insert(QUOTA_MULTIPLIER_HEADER, v);
}
}
/// Build a 428 response for PoW required.
fn pow_required_response(
status: &AdmissionStatus,
verification_error: Option<String>,
) -> Response<Body> {
let error_message = if verification_error.is_some() {
"Proof-of-Work verification failed"
} else {
"Proof-of-Work required"
};
let error = PowRequiredError {
error: error_message.to_string(),
code: "POW_REQUIRED".to_string(),
required_difficulty: status.pow_difficulty,
pow_required: true,
agent_assertions: status.assertions_count,
agent_trust_score: status.trust_score,
verification_error,
};
let mut response = (
StatusCode::from_u16(HTTP_PRECONDITION_REQUIRED)
.unwrap_or(StatusCode::PRECONDITION_FAILED),
Json(error),
)
.into_response();
Self::add_response_headers(&mut response, status);
response
}
}
impl<S, A> Service<Request<Body>> for AdmissionService<S, A>
where
S: Service<Request<Body>, Response = Response<Body>> + Clone + Send + 'static,
S::Future: Send,
A: AdmissionCheck + 'static,
{
type Response = Response<Body>;
type Error = S::Error;
type Future = BoxFuture<'static, Result<Self::Response, Self::Error>>;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.inner.poll_ready(cx)
}
fn call(&mut self, req: Request<Body>) -> Self::Future {
let path = req.uri().path().to_string();
let admission_store = Arc::clone(&self.admission_store);
let bypass_paths = self.bypass_paths.clone();
// Clone the inner service for the async block
let mut inner = self.inner.clone();
Box::pin(async move {
// Check if this path should bypass admission control
if bypass_paths.iter().any(|p| path.starts_with(p)) {
debug!(path = %path, "Bypassing admission control for path");
return inner.call(req).await;
}
// Only check admission for write paths
let is_write_path = path.starts_with("/v1/assert")
|| path.starts_with("/v1/vote")
|| path.starts_with("/v1/supersede");
if !is_write_path {
// Read-only paths don't need admission control
debug!(path = %path, "Skipping admission for read-only path");
return inner.call(req).await;
}
// Extract agent ID
let agent_id = match Self::extract_agent_id(&req) {
Some(id) => id,
None => {
// No agent ID provided, pass through (will fail signature verification)
debug!(path = %path, "No agent ID, skipping admission");
return inner.call(req).await;
}
};
// Extract PoW proof (if provided)
let proof = Self::extract_pow_proof(&req, agent_id);
// Get current timestamp
let server_time = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0);
// Check admission
let admission_result =
match admission_store.check_admission(&agent_id, proof.as_ref(), server_time).await
{
Ok(result) => result,
Err(e) => {
warn!(error = %e, "Admission check failed, allowing request");
// On error, allow the request (fail open for availability)
return inner.call(req).await;
}
};
match admission_result {
AdmissionStatusResult::Admitted(status) => {
debug!(
agent = %hex::encode(agent_id),
tier = %status.tier,
"Agent admitted"
);
// Admission OK - call inner service
let mut response = inner.call(req).await?;
// Add admission headers to response
Self::add_response_headers(&mut response, &status);
Ok(response)
}
AdmissionStatusResult::PowRequired(status) => {
debug!(
agent = %hex::encode(agent_id),
difficulty = status.pow_difficulty,
"PoW required"
);
Ok(Self::pow_required_response(&status, None))
}
AdmissionStatusResult::PowFailed { status, error } => {
debug!(
agent = %hex::encode(agent_id),
error = %error,
"PoW verification failed"
);
Ok(Self::pow_required_response(&status, Some(error.to_string())))
}
}
})
}
}
/// Request extension to share admission status with other middleware.
///
/// The MeterLayer can read this to apply tier-based quota multipliers.
#[derive(Debug, Clone)]
pub struct AdmissionExtension {
/// The agent's admission status.
pub status: AdmissionStatus,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bypass_paths() {
let service = AdmissionService::<(), ()> {
inner: (),
admission_store: Arc::new(()),
bypass_paths: vec!["/v1/health".to_string(), "/swagger-ui".to_string()],
};
assert!(service.should_bypass("/v1/health"));
assert!(service.should_bypass("/swagger-ui/index.html"));
assert!(!service.should_bypass("/v1/assert"));
}
#[test]
fn test_extract_agent_id() {
let req = Request::builder()
.header(
AGENT_ID_HEADER,
"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
)
.body(Body::empty())
.expect("build request");
let agent_id = AdmissionService::<(), ()>::extract_agent_id(&req);
assert!(agent_id.is_some());
let id = agent_id.expect("id");
assert_eq!(id[0], 0x01);
assert_eq!(id[1], 0x23);
}
#[test]
fn test_extract_agent_id_invalid_length() {
let req = Request::builder()
.header(AGENT_ID_HEADER, "0123456789abcdef") // Too short
.body(Body::empty())
.expect("build request");
let agent_id = AdmissionService::<(), ()>::extract_agent_id(&req);
assert!(agent_id.is_none());
}
#[test]
fn test_extract_pow_proof() {
let agent_id = [0xABu8; 32];
let req = Request::builder()
.header(POW_NONCE_HEADER, "12345")
.header(POW_TIMESTAMP_HEADER, "1700000000")
.body(Body::empty())
.expect("build request");
let proof = AdmissionService::<(), ()>::extract_pow_proof(&req, agent_id);
assert!(proof.is_some());
let p = proof.expect("proof");
assert_eq!(p.nonce, 12345);
assert_eq!(p.timestamp, 1700000000);
assert_eq!(p.agent_id, agent_id);
}
#[test]
fn test_extract_pow_proof_missing_headers() {
let agent_id = [0xABu8; 32];
// Missing nonce
let req = Request::builder()
.header(POW_TIMESTAMP_HEADER, "1700000000")
.body(Body::empty())
.expect("build request");
let proof = AdmissionService::<(), ()>::extract_pow_proof(&req, agent_id);
assert!(proof.is_none());
// Missing timestamp
let req = Request::builder()
.header(POW_NONCE_HEADER, "12345")
.body(Body::empty())
.expect("build request");
let proof = AdmissionService::<(), ()>::extract_pow_proof(&req, agent_id);
assert!(proof.is_none());
}
}

View File

@ -1,5 +1,11 @@
//! Middleware layers for the API. //! Middleware layers for the API.
pub mod admission;
pub mod meter; pub mod meter;
pub use admission::{
AdmissionExtension, AdmissionLayer, AdmissionService, AGENT_ID_HEADER, POW_DIFFICULTY_HEADER,
POW_NONCE_HEADER, POW_REQUIRED_HEADER, POW_TIMESTAMP_HEADER, QUOTA_MULTIPLIER_HEADER,
TRUST_TIER_HEADER,
};
pub use meter::{MeterLayer, MeterService}; pub use meter::{MeterLayer, MeterService};

View File

@ -4,7 +4,10 @@ use std::sync::Arc;
use tokio::sync::Mutex; use tokio::sync::Mutex;
use stemedb_query::QueryEngine; use stemedb_query::QueryEngine;
use stemedb_storage::{GenericAliasStore, GenericEscalationStore, GenericQuotaStore, HybridStore}; use stemedb_storage::{
GenericAdmissionStore, GenericAliasStore, GenericEscalationStore, GenericQuotaStore,
GenericTrustRankStore, HybridStore,
};
use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig}; use stemedb_wal::group_commit::{GroupCommitBuffer, GroupCommitConfig};
use stemedb_wal::Journal; use stemedb_wal::Journal;
@ -17,6 +20,12 @@ pub type EscalationStoreImpl = GenericEscalationStore<HybridStore>;
/// Alias store type alias for convenience. /// Alias store type alias for convenience.
pub type AliasStoreImpl = GenericAliasStore<Arc<HybridStore>>; pub type AliasStoreImpl = GenericAliasStore<Arc<HybridStore>>;
/// Trust rank store type alias for convenience.
pub type TrustRankStoreImpl = GenericTrustRankStore<Arc<HybridStore>>;
/// Admission store type alias for convenience.
pub type AdmissionStoreImpl = GenericAdmissionStore<Arc<TrustRankStoreImpl>>;
/// Application state shared across all HTTP handlers. /// Application state shared across all HTTP handlers.
/// ///
/// This is passed to every request via axum's `State` extractor. /// This is passed to every request via axum's `State` extractor.
@ -39,6 +48,12 @@ pub struct AppState {
/// Alias store for cross-scheme entity resolution /// Alias store for cross-scheme entity resolution
pub alias_store: Arc<AliasStoreImpl>, pub alias_store: Arc<AliasStoreImpl>,
/// Trust rank store for reputation tracking
pub trust_rank_store: Arc<TrustRankStoreImpl>,
/// Admission store for PoW-based admission control (The Shield)
pub admission_store: Arc<AdmissionStoreImpl>,
} }
impl AppState { impl AppState {
@ -60,7 +75,22 @@ impl AppState {
// Create alias store for cross-scheme concept resolution // Create alias store for cross-scheme concept resolution
let alias_store = Arc::new(GenericAliasStore::new(Arc::clone(&store))); let alias_store = Arc::new(GenericAliasStore::new(Arc::clone(&store)));
Self { commit_buffer, journal, store, quota_store, escalation_store, alias_store } // Create trust rank store for reputation tracking
let trust_rank_store = Arc::new(GenericTrustRankStore::new(Arc::clone(&store)));
// Create admission store for PoW-based admission control
let admission_store = Arc::new(GenericAdmissionStore::new(Arc::clone(&trust_rank_store)));
Self {
commit_buffer,
journal,
store,
quota_store,
escalation_store,
alias_store,
trust_rank_store,
admission_store,
}
} }
/// Get a QueryEngine for this state. /// Get a QueryEngine for this state.

View File

@ -0,0 +1,125 @@
//! Integration tests for admission control (The Shield).
//!
//! These tests verify the DTO conversion and response formatting.
//! The core admission logic is tested in stemedb-storage unit tests.
use stemedb_api::dto::{AdmissionStatusResponse, TrustTierDto};
use stemedb_core::types::{AdmissionConfig, TrustTier};
use stemedb_storage::AdmissionStatus;
#[test]
fn test_trust_tier_dto_conversion() {
// Test all tier conversions
assert_eq!(TrustTierDto::from(TrustTier::Untrusted), TrustTierDto::Untrusted);
assert_eq!(TrustTierDto::from(TrustTier::Limited), TrustTierDto::Limited);
assert_eq!(TrustTierDto::from(TrustTier::Verified), TrustTierDto::Verified);
assert_eq!(TrustTierDto::from(TrustTier::Trusted), TrustTierDto::Trusted);
assert_eq!(TrustTierDto::from(TrustTier::Authority), TrustTierDto::Authority);
}
#[test]
fn test_admission_status_response_new_agent() {
let status = AdmissionStatus::new(0.5, 0, 16);
let config = AdmissionConfig::default();
let response = AdmissionStatusResponse::from_status("abc123".to_string(), &status, &config);
assert_eq!(response.tier, TrustTierDto::Verified);
assert!((response.trust_score - 0.5).abs() < f32::EPSILON);
assert_eq!(response.assertions_count, 0);
assert_eq!(response.pow_difficulty, 16);
assert!(response.pow_required);
assert_eq!(response.base_quota_limit, 10_000);
assert_eq!(response.effective_quota_limit, 10_000);
assert!((response.quota_multiplier - 1.0).abs() < f32::EPSILON);
// New agent should see milestones
assert_eq!(response.assertions_until_reduced_difficulty, Some(10));
assert_eq!(response.assertions_until_exemption, Some(50));
}
#[test]
fn test_admission_status_response_graduated() {
let status = AdmissionStatus::new(0.7, 100, 0);
let config = AdmissionConfig::default();
let response = AdmissionStatusResponse::from_status("graduated".to_string(), &status, &config);
assert_eq!(response.tier, TrustTierDto::Trusted);
assert!(!response.pow_required);
assert_eq!(response.pow_difficulty, 0);
assert_eq!(response.effective_quota_limit, 20_000);
// Graduated agent shouldn't see milestones
assert_eq!(response.assertions_until_reduced_difficulty, None);
assert_eq!(response.assertions_until_exemption, None);
}
#[test]
fn test_admission_status_response_partially_graduated() {
// Agent with 25 assertions (past initial, not yet graduated)
let status = AdmissionStatus::new(0.4, 25, 1);
let config = AdmissionConfig::default();
let response = AdmissionStatusResponse::from_status("partial".to_string(), &status, &config);
assert_eq!(response.tier, TrustTierDto::Limited);
assert!(response.pow_required);
assert_eq!(response.pow_difficulty, 1);
// Past initial threshold, so no "until reduced" milestone
assert_eq!(response.assertions_until_reduced_difficulty, None);
// Still 25 assertions until exemption
assert_eq!(response.assertions_until_exemption, Some(25));
}
#[test]
fn test_all_tier_quotas() {
let config = AdmissionConfig::default();
// Test each tier
let test_cases = [
(0.1, TrustTierDto::Untrusted, 1_000),
(0.4, TrustTierDto::Limited, 5_000),
(0.5, TrustTierDto::Verified, 10_000),
(0.8, TrustTierDto::Trusted, 20_000),
(0.95, TrustTierDto::Authority, 100_000),
];
for (score, expected_tier, expected_quota) in test_cases {
let status = AdmissionStatus::new(score, 100, 0);
let response = AdmissionStatusResponse::from_status("test".to_string(), &status, &config);
assert_eq!(response.tier, expected_tier, "Wrong tier for score {}", score);
assert_eq!(
response.effective_quota_limit, expected_quota,
"Wrong quota for score {}",
score
);
}
}
#[test]
fn test_pow_difficulty_graduation() {
let config = AdmissionConfig::default();
// First 10 assertions: 16 bits
for count in 0..10 {
let difficulty = config.compute_difficulty(count, 0.3);
assert_eq!(difficulty, 16, "Wrong difficulty for {} assertions", count);
}
// 10-49: 1 bit
for count in 10..50 {
let difficulty = config.compute_difficulty(count, 0.3);
assert_eq!(difficulty, 1, "Wrong difficulty for {} assertions", count);
}
// 50+: exempt
let difficulty = config.compute_difficulty(50, 0.3);
assert_eq!(difficulty, 0);
// Trust exemption
let difficulty = config.compute_difficulty(5, 0.6);
assert_eq!(difficulty, 0, "High trust should be exempt");
}

View File

@ -61,3 +61,4 @@ features = ["env-filter"]
[dev-dependencies] [dev-dependencies]
tempfile = "3.10" tempfile = "3.10"
tokio-test = "0.4" tokio-test = "0.4"
stemedb-merkle = { path = "../stemedb-merkle" }

View File

@ -271,11 +271,7 @@ pub async fn handle_health(State(state): State<Arc<GatewayState>>) -> Json<Healt
let members = state.membership.members(); let members = state.membership.members();
let joined = state.membership.is_joined(); let joined = state.membership.is_joined();
Json(HealthResponse { Json(HealthResponse { healthy: joined, reachable_nodes: members.len(), joined })
healthy: joined && !members.is_empty(),
reachable_nodes: members.len(),
joined,
})
} }
/// GET /v1/cluster/status - Cluster status. /// GET /v1/cluster/status - Cluster status.

View File

@ -0,0 +1,464 @@
//! Availability tests for distributed consistency.
//!
//! These tests verify that StemeDB provides high availability:
//! - Reads succeed on any replica that has the shard
//! - Writes are accepted by any replica (not just leader)
//! - Node failures don't block operations on other nodes
#![allow(clippy::unwrap_used, clippy::expect_used)]
use std::collections::HashMap;
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
use std::sync::Arc;
use stemedb_cluster::config::SwimConfig;
use stemedb_cluster::membership::{NodeId, NodeInfo, SwimMembership};
use stemedb_cluster::sharding::{MetaRange, RangeRouter};
use stemedb_core::serde::serialize;
use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::HlcTimestamp;
use stemedb_merkle::MerkleTree;
use stemedb_storage::crdt::{AssertionTransfer, CrdtAssertionStore};
use stemedb_storage::HybridStore;
use tempfile::tempdir;
// =============================================================================
// Test Helpers
// =============================================================================
fn test_addr(port: u16) -> SocketAddr {
SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), port)
}
fn test_node_id(n: u8) -> NodeId {
NodeId::from_bytes([n; 16])
}
fn test_node_info(n: u8) -> NodeInfo {
let id = test_node_id(n);
NodeInfo::new(id, test_addr(9090 + n as u16), test_addr(8080 + n as u16))
}
/// A simulated cluster node for availability testing.
struct AvailabilityNode {
id: NodeId,
#[allow(dead_code)]
membership: Arc<SwimMembership>,
router: Arc<RangeRouter>,
#[allow(dead_code)]
store: Arc<HybridStore>,
crdt_store: Arc<CrdtAssertionStore<HybridStore>>,
merkle_tree: MerkleTree,
hash_to_data: HashMap<[u8; 32], (String, Vec<u8>)>,
/// Simulated node failure state.
failed: bool,
#[allow(dead_code)]
temp_dir: tempfile::TempDir,
}
impl AvailabilityNode {
fn new(n: u8) -> Self {
let id = test_node_id(n);
let info = test_node_info(n);
let temp_dir = tempdir().expect("create temp dir");
let store = Arc::new(HybridStore::open(temp_dir.path()).expect("open store"));
let crdt_store = Arc::new(CrdtAssertionStore::new(store.clone(), *id.as_bytes()));
let membership = Arc::new(SwimMembership::new(info, SwimConfig::default()));
let router = Arc::new(RangeRouter::new(id));
Self {
id,
membership,
router,
store,
crdt_store,
merkle_tree: MerkleTree::new(),
hash_to_data: HashMap::new(),
failed: false,
temp_dir,
}
}
fn init_shards(&self, nodes: &[NodeId], num_shards: u32, replication_factor: u32) {
let meta = MetaRange::with_initial_shards(num_shards, nodes, replication_factor);
self.router.update_meta_range(meta);
}
/// Check if this node is a replica for the given subject's shard.
#[allow(dead_code)]
fn is_replica_for(&self, subject: &str) -> bool {
if self.failed {
return false;
}
let shard_id = match self.router.route_subject(subject) {
Ok(id) => id,
Err(_) => return false,
};
match self.router.get_replicas(shard_id) {
Ok(replicas) => replicas.contains(&self.id),
Err(_) => false,
}
}
/// Check if this node is the leader for the given subject's shard.
fn is_leader_for(&self, subject: &str) -> bool {
if self.failed {
return false;
}
let shard_id = match self.router.route_subject(subject) {
Ok(id) => id,
Err(_) => return false,
};
match self.router.get_leader(shard_id) {
Ok(leader) => leader == self.id,
Err(_) => false,
}
}
/// Write an assertion (succeeds if node is not failed).
async fn write(&mut self, subject: &str, predicate: &str, hlc_time: u64) -> Option<[u8; 32]> {
if self.failed {
return None;
}
let assertion = AssertionBuilder::new()
.subject(subject)
.predicate(predicate)
.hlc_timestamp(HlcTimestamp::new(hlc_time, *self.id.as_bytes()))
.source_hash(rand_hash())
.build();
let data = serialize(&assertion).expect("serialize");
let hash = self.crdt_store.put_assertion(subject, &data).await.expect("put");
self.merkle_tree.insert(hash).expect("insert");
self.hash_to_data.insert(hash, (subject.to_string(), data));
Some(hash)
}
/// Read assertion data (succeeds if node is not failed).
async fn read(&self, subject: &str, hash: &[u8; 32]) -> Option<Vec<u8>> {
if self.failed {
return None;
}
self.crdt_store.get_assertion(subject, hash).await.ok().flatten()
}
/// Simulate node failure.
fn fail(&mut self) {
self.failed = true;
}
/// Recover from failure.
#[allow(dead_code)]
fn recover(&mut self) {
self.failed = false;
}
/// Check if node is available.
fn is_available(&self) -> bool {
!self.failed
}
/// Get all leaves.
fn leaves(&self) -> Vec<[u8; 32]> {
self.merkle_tree.leaves().to_vec()
}
/// Sync from another node.
async fn sync_from(&mut self, other: &AvailabilityNode) {
if self.failed || other.failed {
return;
}
let my_leaves: std::collections::HashSet<_> = self.leaves().into_iter().collect();
for hash in other.leaves() {
if !my_leaves.contains(&hash) {
if let Some((subject, data)) = other.hash_to_data.get(&hash) {
let transfer = AssertionTransfer { hash, data: data.clone() };
if self
.crdt_store
.merge_with_data(subject, std::slice::from_ref(&transfer))
.await
.is_ok()
{
self.merkle_tree.insert(hash).expect("insert");
self.hash_to_data.insert(hash, (subject.clone(), data.clone()));
}
}
}
}
}
}
fn rand_hash() -> [u8; 32] {
use std::time::{SystemTime, UNIX_EPOCH};
let nanos = SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_nanos()).unwrap_or(0);
let mut hash = [0u8; 32];
hash[..16].copy_from_slice(&nanos.to_le_bytes());
let tid = std::thread::current().id();
hash[16..24].copy_from_slice(&format!("{tid:?}").as_bytes()[..8.min(format!("{tid:?}").len())]);
hash
}
// =============================================================================
// Availability Tests
// =============================================================================
/// Test: Read succeeds on any replica that has the shard.
///
/// Write data to one node, sync to replicas, verify read works on any replica.
#[tokio::test]
async fn test_read_any_replica() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
// RF=3 means all nodes are replicas for all shards
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
// Write on node A
let subject = "test:subject";
let hash = node_a.write(subject, "predicate", 1000).await.expect("write");
// Sync to all replicas
node_b.sync_from(&node_a).await;
node_c.sync_from(&node_a).await;
// Read should succeed on all replicas
let data_a = node_a.read(subject, &hash).await;
let data_b = node_b.read(subject, &hash).await;
let data_c = node_c.read(subject, &hash).await;
assert!(data_a.is_some(), "Read should succeed on node A (writer)");
assert!(data_b.is_some(), "Read should succeed on node B (replica)");
assert!(data_c.is_some(), "Read should succeed on node C (replica)");
// Data should be identical
assert_eq!(data_a, data_b, "Data should match across replicas A and B");
assert_eq!(data_b, data_c, "Data should match across replicas B and C");
}
/// Test: Write is accepted by any replica (not just leader).
///
/// StemeDB uses leaderless replication - any replica can accept writes.
#[tokio::test]
async fn test_write_any_replica() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
// RF=3 means all nodes are replicas
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
let subject = "test:subject";
// Identify who is leader and who isn't
let a_is_leader = node_a.is_leader_for(subject);
let b_is_leader = node_b.is_leader_for(subject);
// Find a non-leader node
let (non_leader_writes, non_leader_id) = if !a_is_leader {
let hash = node_a.write(subject, "from-non-leader", 1000).await;
(hash.is_some(), "A")
} else if !b_is_leader {
let hash = node_b.write(subject, "from-non-leader", 1000).await;
(hash.is_some(), "B")
} else {
let hash = node_c.write(subject, "from-non-leader", 1000).await;
(hash.is_some(), "C")
};
assert!(
non_leader_writes,
"Non-leader node {} should accept writes (leaderless replication)",
non_leader_id
);
}
/// Test: Node failure doesn't block operations on other nodes.
///
/// When one node fails, other nodes should continue serving reads and writes.
#[tokio::test]
async fn test_node_failure_isolation() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
// Initial write on A
let subject = "test:subject";
let hash1 = node_a.write(subject, "before-failure", 1000).await.expect("write");
// Sync before failure
node_b.sync_from(&node_a).await;
node_c.sync_from(&node_a).await;
// NODE A FAILS
node_a.fail();
assert!(!node_a.is_available(), "Node A should be unavailable");
// Verify node A operations fail
let a_read = node_a.read(subject, &hash1).await;
let a_write = node_a.write(subject, "during-failure", 2000).await;
assert!(a_read.is_none(), "Read on failed node should fail");
assert!(a_write.is_none(), "Write on failed node should fail");
// BUT: B and C should continue working normally
assert!(node_b.is_available(), "Node B should still be available");
assert!(node_c.is_available(), "Node C should still be available");
// Reads still work on B and C
let b_read = node_b.read(subject, &hash1).await;
let c_read = node_c.read(subject, &hash1).await;
assert!(b_read.is_some(), "Read on node B should succeed during A failure");
assert!(c_read.is_some(), "Read on node C should succeed during A failure");
// Writes still work on B and C
let hash2 = node_b.write(subject, "during-a-failure", 2000).await;
let hash3 = node_c.write(subject, "also-during-failure", 3000).await;
assert!(hash2.is_some(), "Write on node B should succeed during A failure");
assert!(hash3.is_some(), "Write on node C should succeed during A failure");
// Sync between surviving nodes
node_b.sync_from(&node_c).await;
node_c.sync_from(&node_b).await;
// Both B and C should have all data
assert_eq!(node_b.leaves().len(), 3, "Node B should have 3 assertions");
assert_eq!(node_c.leaves().len(), 3, "Node C should have 3 assertions");
}
/// Test: Read availability with quorum.
///
/// With RF=3 and 2 nodes available, reads should succeed.
#[tokio::test]
async fn test_read_quorum_availability() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
// Write and sync to all
let subject = "test:subject";
let hash = node_a.write(subject, "predicate", 1000).await.expect("write");
node_b.sync_from(&node_a).await;
node_c.sync_from(&node_a).await;
// Fail one node - quorum (2/3) still available
node_c.fail();
// Read should succeed on remaining nodes
let read_a = node_a.read(subject, &hash).await;
let read_b = node_b.read(subject, &hash).await;
assert!(read_a.is_some(), "Read on A should succeed with quorum available");
assert!(read_b.is_some(), "Read on B should succeed with quorum available");
}
/// Test: Write availability with quorum.
///
/// With RF=3 and 2 nodes available, writes should succeed.
#[tokio::test]
async fn test_write_quorum_availability() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
// Fail one node
node_c.fail();
// Writes should succeed on remaining nodes
let subject = "test:subject";
let write_a = node_a.write(subject, "pred1", 1000).await;
let write_b = node_b.write(subject, "pred2", 2000).await;
assert!(write_a.is_some(), "Write on A should succeed with quorum available");
assert!(write_b.is_some(), "Write on B should succeed with quorum available");
// Sync between surviving nodes
node_a.sync_from(&node_b).await;
node_b.sync_from(&node_a).await;
// Both should have both writes
assert_eq!(node_a.leaves().len(), 2);
assert_eq!(node_b.leaves().len(), 2);
}
/// Test: All replicas eventually have identical data.
///
/// This is the core eventual consistency guarantee.
#[tokio::test]
async fn test_eventual_consistency_across_replicas() {
let mut node_a = AvailabilityNode::new(1);
let mut node_b = AvailabilityNode::new(2);
let mut node_c = AvailabilityNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
node_a.init_shards(&nodes, 4, 3);
node_b.init_shards(&nodes, 4, 3);
node_c.init_shards(&nodes, 4, 3);
// Each node writes independently
let h1 = node_a.write("s1", "p1", 1000).await.expect("write");
let h2 = node_b.write("s2", "p2", 2000).await.expect("write");
let h3 = node_c.write("s3", "p3", 3000).await.expect("write");
// Before sync: each has only its own
assert_eq!(node_a.leaves().len(), 1);
assert_eq!(node_b.leaves().len(), 1);
assert_eq!(node_c.leaves().len(), 1);
// Full mesh sync (simulating anti-entropy)
node_a.sync_from(&node_b).await;
node_a.sync_from(&node_c).await;
node_b.sync_from(&node_a).await;
node_b.sync_from(&node_c).await;
node_c.sync_from(&node_a).await;
node_c.sync_from(&node_b).await;
// After sync: all have all data
assert_eq!(node_a.leaves().len(), 3, "Node A should have all 3 assertions");
assert_eq!(node_b.leaves().len(), 3, "Node B should have all 3 assertions");
assert_eq!(node_c.leaves().len(), 3, "Node C should have all 3 assertions");
// Verify specific hashes
let a_leaves: std::collections::HashSet<_> = node_a.leaves().into_iter().collect();
let b_leaves: std::collections::HashSet<_> = node_b.leaves().into_iter().collect();
let c_leaves: std::collections::HashSet<_> = node_c.leaves().into_iter().collect();
assert!(a_leaves.contains(&h1) && a_leaves.contains(&h2) && a_leaves.contains(&h3));
assert!(b_leaves.contains(&h1) && b_leaves.contains(&h2) && b_leaves.contains(&h3));
assert!(c_leaves.contains(&h1) && c_leaves.contains(&h2) && c_leaves.contains(&h3));
// All sets should be identical
assert_eq!(a_leaves, b_leaves, "A and B should have identical data");
assert_eq!(b_leaves, c_leaves, "B and C should have identical data");
}

View File

@ -0,0 +1,430 @@
//! Partition tolerance tests for distributed consistency.
//!
//! These tests verify that StemeDB continues to accept writes during network
//! partitions and converges correctly after partition heals.
//!
//! # Test Strategy
//!
//! We simulate partitions by:
//! 1. Creating multiple in-process "nodes" with separate membership views
//! 2. "Partitioning" = stopping gossip propagation between groups
//! 3. Verifying writes succeed on both sides of the partition
//! 4. "Healing" = resuming gossip propagation
//! 5. Verifying convergence via CRDT merge
#![allow(clippy::unwrap_used, clippy::expect_used)]
use std::collections::{HashMap, HashSet};
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
use std::sync::Arc;
use stemedb_cluster::config::SwimConfig;
use stemedb_cluster::membership::{NodeId, NodeInfo, SwimMembership};
use stemedb_cluster::sharding::{MetaRange, RangeRouter};
use stemedb_core::serde::serialize;
use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::HlcTimestamp;
use stemedb_merkle::MerkleTree;
use stemedb_storage::crdt::{AssertionTransfer, CrdtAssertionStore};
use stemedb_storage::HybridStore;
use tempfile::tempdir;
// =============================================================================
// Test Helpers
// =============================================================================
fn test_addr(port: u16) -> SocketAddr {
SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), port)
}
fn test_node_id(n: u8) -> NodeId {
NodeId::from_bytes([n; 16])
}
fn test_node_info(n: u8) -> NodeInfo {
let id = test_node_id(n);
NodeInfo::new(id, test_addr(9090 + n as u16), test_addr(8080 + n as u16))
}
/// A simulated cluster node for partition tolerance testing.
struct SimNode {
id: NodeId,
#[allow(dead_code)]
membership: Arc<SwimMembership>,
router: Arc<RangeRouter>,
#[allow(dead_code)]
store: Arc<HybridStore>,
crdt_store: Arc<CrdtAssertionStore<HybridStore>>,
merkle_tree: MerkleTree,
/// Maps hash -> (subject, data) for sync operations.
hash_to_data: HashMap<[u8; 32], (String, Vec<u8>)>,
#[allow(dead_code)]
temp_dir: tempfile::TempDir,
}
impl SimNode {
/// Create a new simulated node.
fn new(n: u8) -> Self {
let id = test_node_id(n);
let info = test_node_info(n);
let temp_dir = tempdir().expect("create temp dir");
let store = Arc::new(HybridStore::open(temp_dir.path()).expect("open store"));
let crdt_store = Arc::new(CrdtAssertionStore::new(store.clone(), *id.as_bytes()));
let membership = Arc::new(SwimMembership::new(info, SwimConfig::default()));
let router = Arc::new(RangeRouter::new(id));
Self {
id,
membership,
router,
store,
crdt_store,
merkle_tree: MerkleTree::new(),
hash_to_data: HashMap::new(),
temp_dir,
}
}
/// Initialize sharding with the given nodes.
fn init_shards(&self, nodes: &[NodeId], num_shards: u32, replication_factor: u32) {
let meta = MetaRange::with_initial_shards(num_shards, nodes, replication_factor);
self.router.update_meta_range(meta);
}
/// Add an assertion to this node (simulating a local write).
async fn write_assertion(&mut self, subject: &str, predicate: &str, hlc_time: u64) -> [u8; 32] {
let assertion = AssertionBuilder::new()
.subject(subject)
.predicate(predicate)
.hlc_timestamp(HlcTimestamp::new(hlc_time, *self.id.as_bytes()))
.source_hash(rand_hash())
.build();
let data = serialize(&assertion).expect("serialize");
let hash = self.crdt_store.put_assertion(subject, &data).await.expect("put");
self.merkle_tree.insert(hash).expect("insert");
self.hash_to_data.insert(hash, (subject.to_string(), data));
hash
}
/// Check if this node can accept a write for the given subject.
fn can_accept_write(&self, subject: &str) -> bool {
// Route the subject to a shard
let shard_id = match self.router.route_subject(subject) {
Ok(id) => id,
Err(_) => return false,
};
// Check if local node is a replica for this shard
match self.router.get_replicas(shard_id) {
Ok(replicas) => replicas.contains(&self.id),
Err(_) => false,
}
}
/// Get all leaves (assertion hashes).
fn leaves(&self) -> Vec<[u8; 32]> {
self.merkle_tree.leaves().to_vec()
}
/// Canonical Merkle root for convergence verification.
fn canonical_merkle_root(&self) -> Option<[u8; 32]> {
let mut sorted_leaves = self.merkle_tree.leaves().to_vec();
if sorted_leaves.is_empty() {
return None;
}
sorted_leaves.sort();
let mut canonical = MerkleTree::new();
for leaf in sorted_leaves {
canonical.insert(leaf).ok()?;
}
canonical.root().ok()
}
/// Sync from another node (simulating anti-entropy after partition heals).
async fn sync_from(&mut self, other: &SimNode) {
let my_leaves: HashSet<_> = self.leaves().into_iter().collect();
for hash in other.leaves() {
if !my_leaves.contains(&hash) {
if let Some((subject, data)) = other.hash_to_data.get(&hash) {
let transfer = AssertionTransfer { hash, data: data.clone() };
if self
.crdt_store
.merge_with_data(subject, std::slice::from_ref(&transfer))
.await
.is_ok()
{
self.merkle_tree.insert(hash).expect("insert");
self.hash_to_data.insert(hash, (subject.clone(), data.clone()));
}
}
}
}
}
}
/// Generate a random hash for test assertions.
fn rand_hash() -> [u8; 32] {
use std::time::{SystemTime, UNIX_EPOCH};
let nanos = SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_nanos()).unwrap_or(0);
let mut hash = [0u8; 32];
hash[..16].copy_from_slice(&nanos.to_le_bytes());
// Add some randomness with thread ID
let tid = std::thread::current().id();
hash[16..24].copy_from_slice(&format!("{tid:?}").as_bytes()[..8.min(format!("{tid:?}").len())]);
hash
}
// =============================================================================
// Partition Tolerance Tests
// =============================================================================
/// Test: Writes succeed on both sides of a partition.
///
/// Simulates a 3-node cluster partitioned into [A] and [B, C].
/// Both sides should continue accepting writes for their shards.
#[tokio::test]
async fn test_write_succeeds_during_partition() {
// Create 3 nodes
let mut node_a = SimNode::new(1);
let mut node_b = SimNode::new(2);
let node_c = SimNode::new(3);
let nodes = vec![node_a.id, node_b.id, node_c.id];
// Initialize shards: 4 shards, RF=2
// Each node will be replica for some shards
node_a.init_shards(&nodes, 4, 2);
node_b.init_shards(&nodes, 4, 2);
node_c.init_shards(&nodes, 4, 2);
// PARTITION: A is isolated from B and C
// (In this simulation, we simply don't sync between partitions)
// Find subjects that route to shards replicated on node A
let mut subject_for_a = None;
for i in 0..100 {
let subject = format!("test:subject:{i}");
if node_a.can_accept_write(&subject) {
subject_for_a = Some(subject);
break;
}
}
// Find subjects that route to shards replicated on node B
let mut subject_for_b = None;
for i in 100..200 {
let subject = format!("test:subject:{i}");
if node_b.can_accept_write(&subject) {
subject_for_b = Some(subject);
break;
}
}
let subject_a = subject_for_a.expect("should find subject for node A");
let subject_b = subject_for_b.expect("should find subject for node B");
// Both sides of partition can write
let hash_a = node_a.write_assertion(&subject_a, "predicate", 1000).await;
let hash_b = node_b.write_assertion(&subject_b, "predicate", 2000).await;
// Verify writes succeeded
assert!(!hash_a.iter().all(|&b| b == 0), "Node A write should succeed");
assert!(!hash_b.iter().all(|&b| b == 0), "Node B write should succeed");
// Each node has its own assertion
assert_eq!(node_a.leaves().len(), 1);
assert_eq!(node_b.leaves().len(), 1);
}
/// Test: Post-partition convergence.
///
/// After a partition heals, both sides should have all writes
/// via anti-entropy sync.
#[tokio::test]
async fn test_post_partition_convergence() {
let mut node_a = SimNode::new(1);
let mut node_b = SimNode::new(2);
let nodes = vec![node_a.id, node_b.id];
node_a.init_shards(&nodes, 4, 2);
node_b.init_shards(&nodes, 4, 2);
// PARTITION: A and B are isolated
// Node A writes assertion A1
let _hash_a = node_a.write_assertion("subject:a", "pred", 1000).await;
// Node B writes assertion B1
let _hash_b = node_b.write_assertion("subject:b", "pred", 2000).await;
// Before heal: each has only its own
assert_eq!(node_a.leaves().len(), 1);
assert_eq!(node_b.leaves().len(), 1);
assert_ne!(node_a.canonical_merkle_root(), node_b.canonical_merkle_root());
// PARTITION HEALS: Sync both ways
node_a.sync_from(&node_b).await;
node_b.sync_from(&node_a).await;
// After heal: both have all assertions
assert_eq!(node_a.leaves().len(), 2, "Node A should have 2 assertions after sync");
assert_eq!(node_b.leaves().len(), 2, "Node B should have 2 assertions after sync");
// Canonical roots should match
assert_eq!(
node_a.canonical_merkle_root(),
node_b.canonical_merkle_root(),
"Nodes should converge after partition heals"
);
}
/// Test: Concurrent writes to same subject from both sides of partition.
///
/// Both partitions write to the same subject. After heal:
/// - Both assertions should exist (append-only)
/// - Lens should resolve deterministically
#[tokio::test]
async fn test_concurrent_writes_both_survive() {
let mut node_a = SimNode::new(1);
let mut node_b = SimNode::new(2);
let nodes = vec![node_a.id, node_b.id];
node_a.init_shards(&nodes, 4, 2);
node_b.init_shards(&nodes, 4, 2);
// Both write to same subject during partition
let subject = "claim:earth-shape";
let hash_a = node_a.write_assertion(subject, "is:round", 1000).await;
let hash_b = node_b.write_assertion(subject, "is:spheroid", 2000).await;
// Hashes are different (different predicates, different HLC times)
assert_ne!(hash_a, hash_b);
// PARTITION HEALS
node_a.sync_from(&node_b).await;
node_b.sync_from(&node_a).await;
// Both assertions survive - append-only means no data loss
let a_leaves: HashSet<_> = node_a.leaves().into_iter().collect();
let b_leaves: HashSet<_> = node_b.leaves().into_iter().collect();
assert!(a_leaves.contains(&hash_a), "Node A should have assertion A");
assert!(a_leaves.contains(&hash_b), "Node A should have assertion B after sync");
assert!(b_leaves.contains(&hash_a), "Node B should have assertion A after sync");
assert!(b_leaves.contains(&hash_b), "Node B should have assertion B");
// Same set on both nodes
assert_eq!(a_leaves, b_leaves, "Both nodes should have identical assertion sets");
}
/// Test: Multi-partition scenario with 4 nodes.
///
/// Partition into [A, B] and [C, D]. Each partition writes.
/// After heal, all 4 nodes should converge.
#[tokio::test]
async fn test_multi_partition_convergence() {
let mut node_a = SimNode::new(1);
let mut node_b = SimNode::new(2);
let mut node_c = SimNode::new(3);
let mut node_d = SimNode::new(4);
let nodes = vec![node_a.id, node_b.id, node_c.id, node_d.id];
for node in [&mut node_a, &mut node_b, &mut node_c, &mut node_d] {
node.init_shards(&nodes, 8, 2);
}
// PARTITION: [A, B] and [C, D]
// Partition 1 writes
let _h1 = node_a.write_assertion("partition1:data", "value1", 1000).await;
node_b.sync_from(&node_a).await; // Sync within partition
// Partition 2 writes
let _h2 = node_c.write_assertion("partition2:data", "value2", 2000).await;
node_d.sync_from(&node_c).await; // Sync within partition
// Before heal: partitions have different data
assert_ne!(node_a.canonical_merkle_root(), node_c.canonical_merkle_root());
// PARTITION HEALS: Full mesh sync
node_a.sync_from(&node_c).await;
node_b.sync_from(&node_d).await;
node_c.sync_from(&node_a).await;
node_d.sync_from(&node_b).await;
// All nodes should have same canonical root
let root_a = node_a.canonical_merkle_root();
let root_b = node_b.canonical_merkle_root();
let root_c = node_c.canonical_merkle_root();
let root_d = node_d.canonical_merkle_root();
assert_eq!(root_a, root_b, "A and B should match");
assert_eq!(root_b, root_c, "B and C should match");
assert_eq!(root_c, root_d, "C and D should match");
// All should have 2 assertions
assert_eq!(node_a.leaves().len(), 2);
assert_eq!(node_b.leaves().len(), 2);
assert_eq!(node_c.leaves().len(), 2);
assert_eq!(node_d.leaves().len(), 2);
}
/// Test: Rapid writes during partition don't cause data loss.
///
/// Simulate high-frequency writes on both sides of partition,
/// then verify all writes survive after heal.
#[tokio::test]
async fn test_rapid_writes_during_partition_no_data_loss() {
let mut node_a = SimNode::new(1);
let mut node_b = SimNode::new(2);
let nodes = vec![node_a.id, node_b.id];
node_a.init_shards(&nodes, 4, 2);
node_b.init_shards(&nodes, 4, 2);
// Rapid writes on both sides
let mut hashes_a = Vec::new();
let mut hashes_b = Vec::new();
for i in 0..10 {
let subject = format!("rapid:a:{i}");
hashes_a.push(node_a.write_assertion(&subject, "pred", 1000 + i).await);
}
for i in 0..10 {
let subject = format!("rapid:b:{i}");
hashes_b.push(node_b.write_assertion(&subject, "pred", 2000 + i).await);
}
// Before heal
assert_eq!(node_a.leaves().len(), 10);
assert_eq!(node_b.leaves().len(), 10);
// PARTITION HEALS
node_a.sync_from(&node_b).await;
node_b.sync_from(&node_a).await;
// All 20 assertions should exist on both nodes
assert_eq!(node_a.leaves().len(), 20, "Node A should have all 20 assertions");
assert_eq!(node_b.leaves().len(), 20, "Node B should have all 20 assertions");
// Verify specific hashes
let a_leaves: HashSet<_> = node_a.leaves().into_iter().collect();
let b_leaves: HashSet<_> = node_b.leaves().into_iter().collect();
for hash in &hashes_a {
assert!(a_leaves.contains(hash), "Node A should have its own assertion");
assert!(b_leaves.contains(hash), "Node B should have A's assertion after sync");
}
for hash in &hashes_b {
assert!(a_leaves.contains(hash), "Node A should have B's assertion after sync");
assert!(b_leaves.contains(hash), "Node B should have its own assertion");
}
}

View File

@ -21,8 +21,8 @@ pub fn hello_world() -> String {
mod tests { mod tests {
use super::*; use super::*;
use crate::types::{ use crate::types::{
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Supersession, Assertion, Epoch, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass,
SupersessionType, Vote, Supersession, SupersessionType, Vote,
}; };
use rkyv::check_archived_root; use rkyv::check_archived_root;
use rkyv::ser::serializers::AllocSerializer; use rkyv::ser::serializers::AllocSerializer;
@ -55,6 +55,7 @@ mod tests {
}], }],
confidence: 0.95, confidence: 0.95,
timestamp: 123456789, timestamp: 123456789,
hlc_timestamp: HlcTimestamp::default(),
vector: Some(vec![0.1, 0.2, 0.3]), vector: Some(vec![0.1, 0.2, 0.3]),
}; };
@ -104,6 +105,7 @@ mod tests {
signatures: vec![], signatures: vec![],
confidence: 1.0, confidence: 1.0,
timestamp: 0, timestamp: 0,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };

View File

@ -92,6 +92,7 @@ pub enum SerdeError {
/// signatures: vec![], /// signatures: vec![],
/// confidence: 1.0, /// confidence: 1.0,
/// timestamp: 0, /// timestamp: 0,
/// hlc_timestamp: stemedb_core::types::HlcTimestamp::default(),
/// vector: None, /// vector: None,
/// }; /// };
/// ///
@ -159,7 +160,8 @@ where
mod tests { mod tests {
use super::*; use super::*;
use crate::types::{ use crate::types::{
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote, Assertion, Epoch, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass,
Vote,
}; };
#[test] #[test]
@ -183,6 +185,7 @@ mod tests {
}], }],
confidence: 0.95, confidence: 0.95,
timestamp: 123456789, timestamp: 123456789,
hlc_timestamp: HlcTimestamp::default(),
vector: Some(vec![0.1, 0.2, 0.3]), vector: Some(vec![0.1, 0.2, 0.3]),
}; };
@ -304,6 +307,7 @@ mod tests {
signatures: vec![], signatures: vec![],
confidence: 0.0, confidence: 0.0,
timestamp: 0, timestamp: 0,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -330,6 +334,7 @@ mod tests {
signatures: vec![], signatures: vec![],
confidence: 0.85, confidence: 0.85,
timestamp: 1700000000, timestamp: 1700000000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -356,6 +361,7 @@ mod tests {
signatures: vec![], signatures: vec![],
confidence: 1.0, confidence: 1.0,
timestamp: 0, timestamp: 0,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };

View File

@ -8,8 +8,8 @@
//! ``` //! ```
use crate::types::{ use crate::types::{
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, SupersessionType, Assertion, Epoch, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass,
Vote, SupersessionType, Vote,
}; };
/// Builder for constructing test [`Assertion`] instances. /// Builder for constructing test [`Assertion`] instances.
@ -54,6 +54,7 @@ pub struct AssertionBuilder {
agent_id: [u8; 32], agent_id: [u8; 32],
confidence: f32, confidence: f32,
timestamp: u64, timestamp: u64,
hlc_timestamp: HlcTimestamp,
vector: Option<Vec<f32>>, vector: Option<Vec<f32>>,
} }
@ -81,6 +82,7 @@ impl AssertionBuilder {
agent_id: [1u8; 32], agent_id: [1u8; 32],
confidence: 0.9, confidence: 0.9,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
} }
} }
@ -127,6 +129,15 @@ impl AssertionBuilder {
self self
} }
/// Set the HLC timestamp for distributed causal ordering.
///
/// This provides total ordering even with clock skew between nodes.
/// Most tests can rely on the default (HlcTimestamp::default()).
pub fn hlc_timestamp(mut self, hlc_timestamp: HlcTimestamp) -> Self {
self.hlc_timestamp = hlc_timestamp;
self
}
/// Set the lifecycle stage. /// Set the lifecycle stage.
pub fn lifecycle(mut self, lifecycle: LifecycleStage) -> Self { pub fn lifecycle(mut self, lifecycle: LifecycleStage) -> Self {
self.lifecycle = lifecycle; self.lifecycle = lifecycle;
@ -219,6 +230,7 @@ impl AssertionBuilder {
signatures, signatures,
confidence: self.confidence, confidence: self.confidence,
timestamp: self.timestamp, timestamp: self.timestamp,
hlc_timestamp: self.hlc_timestamp,
vector: self.vector, vector: self.vector,
} }
} }

View File

@ -2,7 +2,7 @@
use rkyv::{Archive, Deserialize, Serialize}; use rkyv::{Archive, Deserialize, Serialize};
use super::{EntityId, EpochId, Hash, PHash, RelationId}; use super::{EntityId, EpochId, Hash, HlcTimestamp, PHash, RelationId};
use crate::types::{LifecycleStage, SourceClass}; use crate::types::{LifecycleStage, SourceClass};
/// The atomic unit of knowledge in StemeDB. /// The atomic unit of knowledge in StemeDB.
@ -43,6 +43,14 @@ pub struct Assertion {
pub confidence: f32, pub confidence: f32,
/// The timestamp when the assertion was created (Unix epoch). /// The timestamp when the assertion was created (Unix epoch).
pub timestamp: u64, pub timestamp: u64,
/// Hybrid Logical Clock timestamp for distributed causal ordering.
///
/// Provides total ordering even with clock skew between nodes:
/// 1. NTP64 time (includes physical + logical counter)
/// 2. Node ID for deterministic tiebreaker
///
/// Used by `HlcRecencyLens` for consistent "most recent" resolution.
pub hlc_timestamp: HlcTimestamp,
/// The semantic embedding vector for fuzzy recall. /// The semantic embedding vector for fuzzy recall.
pub vector: Option<Vec<f32>>, pub vector: Option<Vec<f32>>,
} }

View File

@ -227,6 +227,8 @@ pub enum AliasOrigin {
Suggested, Suggested,
/// Created during an entity merge operation. /// Created during an entity merge operation.
Merged, Merged,
/// Auto-detected by conflict detection (e.g., Aphoria tail-path matching).
AutoDetected,
} }
impl fmt::Display for AliasOrigin { impl fmt::Display for AliasOrigin {
@ -235,6 +237,7 @@ impl fmt::Display for AliasOrigin {
AliasOrigin::Manual => write!(f, "manual"), AliasOrigin::Manual => write!(f, "manual"),
AliasOrigin::Suggested => write!(f, "suggested"), AliasOrigin::Suggested => write!(f, "suggested"),
AliasOrigin::Merged => write!(f, "merged"), AliasOrigin::Merged => write!(f, "merged"),
AliasOrigin::AutoDetected => write!(f, "auto_detected"),
} }
} }
} }

View File

@ -106,9 +106,11 @@ mod gold_standard;
mod hlc; mod hlc;
mod lifecycle; mod lifecycle;
mod materialized; mod materialized;
mod pow;
mod query; mod query;
mod source; mod source;
mod supersession; mod supersession;
mod trust_tier;
mod voting; mod voting;
// Re-exports - Maintain backward compatibility // Re-exports - Maintain backward compatibility
@ -127,3 +129,10 @@ pub use query::{ContributingAssertion, QueryAudit, QueryParams};
pub use source::SourceClass; pub use source::SourceClass;
pub use supersession::{Supersession, SupersessionType}; pub use supersession::{Supersession, SupersessionType};
pub use voting::{TrustPack, Vote}; pub use voting::{TrustPack, Vote};
// Admission control types
pub use pow::{
AdmissionConfig, PowError, PowProof, POW_GRADUATED_THRESHOLD, POW_INITIAL_DIFFICULTY,
POW_INITIAL_THRESHOLD, POW_MAX_AGE_SECONDS, POW_REDUCED_DIFFICULTY,
};
pub use trust_tier::{TrustTier, BASE_QUOTA_LIMIT, TRUST_POW_EXEMPTION_THRESHOLD};

View File

@ -0,0 +1,466 @@
//! Proof-of-Work (PoW) for admission control.
//!
//! New agents must solve BLAKE3-based puzzles before their assertions are accepted.
//! The difficulty is graduated based on assertion count:
//!
//! - First 10 assertions: 16 bits (~65K iterations, ~16 seconds)
//! - Assertions 11-50: 1 bit (trivial)
//! - 50+ assertions OR trust > 0.6: 0 bits (exempt)
//!
//! # Puzzle Format
//!
//! The agent must find a nonce such that:
//! `BLAKE3(nonce || agent_id || timestamp)` has `difficulty` leading zero bits.
//!
//! # Security Properties
//!
//! - Timestamp prevents replay attacks (max age: 5 minutes)
//! - Agent ID binds proof to specific identity
//! - BLAKE3 provides cryptographic security
//! - Asymmetric cost: O(2^difficulty) to solve, O(1) to verify
use thiserror::Error;
/// Maximum age of a PoW proof in seconds (5 minutes).
pub const POW_MAX_AGE_SECONDS: u64 = 300;
/// Default difficulty for first 10 assertions (16 bits).
pub const POW_INITIAL_DIFFICULTY: u8 = 16;
/// Reduced difficulty for assertions 11-50 (1 bit).
pub const POW_REDUCED_DIFFICULTY: u8 = 1;
/// Threshold for initial difficulty (first 10 assertions).
pub const POW_INITIAL_THRESHOLD: u64 = 10;
/// Threshold for graduation (50 assertions = exempt).
pub const POW_GRADUATED_THRESHOLD: u64 = 50;
/// Errors that can occur during PoW verification.
#[derive(Debug, Clone, Error, PartialEq, Eq)]
pub enum PowError {
/// The proof timestamp is too old.
#[error("Proof timestamp expired (max age: {max_age}s, actual age: {actual_age}s)")]
TimestampExpired {
/// Maximum allowed age in seconds.
max_age: u64,
/// Actual age of the proof in seconds.
actual_age: u64,
},
/// The proof timestamp is in the future.
#[error(
"Proof timestamp is in the future (server time: {server_time}, proof time: {proof_time})"
)]
TimestampInFuture {
/// Server's current time.
server_time: u64,
/// Proof's timestamp.
proof_time: u64,
},
/// The proof does not meet the required difficulty.
#[error("Insufficient difficulty (required: {required} leading zeros, found: {found})")]
InsufficientDifficulty {
/// Required number of leading zero bits.
required: u8,
/// Actual number of leading zero bits.
found: u8,
},
/// The agent ID in the proof does not match the request.
#[error("Agent ID mismatch")]
AgentIdMismatch,
}
/// A proof-of-work solution for admission control.
///
/// The proof demonstrates computational effort by finding a nonce that
/// produces a BLAKE3 hash with the required number of leading zero bits.
///
/// # Wire Format
///
/// When transmitted over HTTP, the proof is split into headers:
/// - `X-PoW-Nonce`: The nonce as a decimal string
/// - `X-PoW-Timestamp`: Unix timestamp as a decimal string
/// - `X-Agent-Id`: Agent's Ed25519 public key (hex, existing header)
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct PowProof {
/// The nonce value found by the agent.
pub nonce: u64,
/// The agent's Ed25519 public key.
pub agent_id: [u8; 32],
/// Unix timestamp when the proof was generated.
pub timestamp: u64,
}
impl PowProof {
/// Create a new PoW proof.
#[must_use]
pub fn new(nonce: u64, agent_id: [u8; 32], timestamp: u64) -> Self {
Self { nonce, agent_id, timestamp }
}
/// Compute the BLAKE3 hash of this proof.
///
/// Hash input: `nonce (8 bytes LE) || agent_id (32 bytes) || timestamp (8 bytes LE)`
#[must_use]
pub fn compute_hash(&self) -> blake3::Hash {
let mut hasher = blake3::Hasher::new();
hasher.update(&self.nonce.to_le_bytes());
hasher.update(&self.agent_id);
hasher.update(&self.timestamp.to_le_bytes());
hasher.finalize()
}
/// Count the number of leading zero bits in a hash.
///
/// # Example
/// ```
/// use stemedb_core::types::PowProof;
///
/// let hash = blake3::hash(b"test");
/// let zeros = PowProof::leading_zeros(&hash);
/// // Will vary based on hash value
/// ```
#[must_use]
pub fn leading_zeros(hash: &blake3::Hash) -> u8 {
let bytes = hash.as_bytes();
let mut count: u8 = 0;
for byte in bytes {
if *byte == 0 {
count = count.saturating_add(8);
} else {
count = count.saturating_add(byte.leading_zeros() as u8);
break;
}
}
count
}
/// Verify this proof against the required difficulty.
///
/// # Arguments
/// * `difficulty` - Required number of leading zero bits
/// * `max_age` - Maximum allowed age in seconds
/// * `server_time` - Server's current Unix timestamp
///
/// # Returns
/// `Ok(())` if the proof is valid, or an error describing the failure.
///
/// # Example
/// ```
/// use stemedb_core::types::{PowProof, PowError, POW_MAX_AGE_SECONDS};
///
/// let agent_id = [0u8; 32];
/// let now = 1700000000u64;
/// let proof = PowProof::new(12345, agent_id, now);
///
/// // Verification with difficulty 0 always passes (if timestamp is valid)
/// let result = proof.verify(0, POW_MAX_AGE_SECONDS, now);
/// assert!(result.is_ok());
/// ```
pub fn verify(&self, difficulty: u8, max_age: u64, server_time: u64) -> Result<(), PowError> {
// Check timestamp is not in the future (with small tolerance for clock skew)
const CLOCK_SKEW_TOLERANCE: u64 = 30; // 30 seconds
if self.timestamp > server_time.saturating_add(CLOCK_SKEW_TOLERANCE) {
return Err(PowError::TimestampInFuture { server_time, proof_time: self.timestamp });
}
// Check timestamp is not too old
let age = server_time.saturating_sub(self.timestamp);
if age > max_age {
return Err(PowError::TimestampExpired { max_age, actual_age: age });
}
// For difficulty 0, no hash check needed (exempt)
if difficulty == 0 {
return Ok(());
}
// Compute hash and check leading zeros
let hash = self.compute_hash();
let zeros = Self::leading_zeros(&hash);
if zeros < difficulty {
return Err(PowError::InsufficientDifficulty { required: difficulty, found: zeros });
}
Ok(())
}
/// Solve a PoW puzzle by brute-force search.
///
/// This is a convenience method for clients to find a valid nonce.
/// It iterates from 0 until finding a nonce that satisfies the difficulty.
///
/// # Arguments
/// * `agent_id` - The agent's Ed25519 public key
/// * `timestamp` - Unix timestamp for the proof
/// * `difficulty` - Required number of leading zero bits
///
/// # Returns
/// A valid `PowProof` with the found nonce.
///
/// # Panics
/// In theory could run forever if difficulty is impossibly high,
/// but in practice 16 bits completes in seconds.
#[must_use]
pub fn solve(agent_id: [u8; 32], timestamp: u64, difficulty: u8) -> Self {
// Difficulty 0 means exempt - any nonce works
if difficulty == 0 {
return Self::new(0, agent_id, timestamp);
}
for nonce in 0..u64::MAX {
let proof = Self::new(nonce, agent_id, timestamp);
let hash = proof.compute_hash();
if Self::leading_zeros(&hash) >= difficulty {
return proof;
}
}
// Mathematically impossible to reach here with reasonable difficulty
Self::new(0, agent_id, timestamp)
}
}
/// Configuration for admission control PoW requirements.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct AdmissionConfig {
/// Difficulty for first `initial_threshold` assertions.
pub initial_difficulty: u8,
/// Number of assertions requiring initial difficulty.
pub initial_threshold: u64,
/// Difficulty for assertions between initial and graduated thresholds.
pub reduced_difficulty: u8,
/// Number of assertions after which PoW is exempt.
pub graduated_threshold: u64,
/// Trust score above which PoW is exempt.
pub trust_exemption_score: f32,
/// Maximum age of PoW proofs in seconds.
pub pow_max_age: u64,
}
impl Default for AdmissionConfig {
fn default() -> Self {
Self {
initial_difficulty: POW_INITIAL_DIFFICULTY,
initial_threshold: POW_INITIAL_THRESHOLD,
reduced_difficulty: POW_REDUCED_DIFFICULTY,
graduated_threshold: POW_GRADUATED_THRESHOLD,
trust_exemption_score: super::trust_tier::TRUST_POW_EXEMPTION_THRESHOLD,
pow_max_age: POW_MAX_AGE_SECONDS,
}
}
}
impl AdmissionConfig {
/// Compute the required difficulty for an agent.
///
/// # Arguments
/// * `assertion_count` - Number of assertions the agent has made
/// * `trust_score` - Agent's current trust score
///
/// # Returns
/// Required difficulty in bits (0 = exempt).
#[must_use]
pub fn compute_difficulty(&self, assertion_count: u64, trust_score: f32) -> u8 {
// Trust-based exemption
if trust_score >= self.trust_exemption_score {
return 0;
}
// Assertion-count-based graduation
if assertion_count >= self.graduated_threshold {
return 0;
}
if assertion_count >= self.initial_threshold {
return self.reduced_difficulty;
}
self.initial_difficulty
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_leading_zeros_all_zero() {
let hash = blake3::Hash::from([0u8; 32]);
assert_eq!(PowProof::leading_zeros(&hash), 255); // Saturates at 255
}
#[test]
fn test_leading_zeros_first_byte_nonzero() {
let mut bytes = [0u8; 32];
bytes[0] = 0x80; // 10000000 binary = 0 leading zeros in first byte
let hash = blake3::Hash::from(bytes);
assert_eq!(PowProof::leading_zeros(&hash), 0);
bytes[0] = 0x40; // 01000000 binary = 1 leading zero
let hash = blake3::Hash::from(bytes);
assert_eq!(PowProof::leading_zeros(&hash), 1);
bytes[0] = 0x01; // 00000001 binary = 7 leading zeros
let hash = blake3::Hash::from(bytes);
assert_eq!(PowProof::leading_zeros(&hash), 7);
}
#[test]
fn test_leading_zeros_second_byte() {
let mut bytes = [0u8; 32];
bytes[0] = 0x00;
bytes[1] = 0x80;
let hash = blake3::Hash::from(bytes);
assert_eq!(PowProof::leading_zeros(&hash), 8); // 8 from first byte + 0 from second
}
#[test]
fn test_verify_expired_timestamp() {
let agent_id = [0u8; 32];
let proof = PowProof::new(0, agent_id, 1000);
let result = proof.verify(0, 300, 2000); // 1000 seconds old, max 300
assert!(matches!(
result,
Err(PowError::TimestampExpired { max_age: 300, actual_age: 1000 })
));
}
#[test]
fn test_verify_future_timestamp() {
let agent_id = [0u8; 32];
let proof = PowProof::new(0, agent_id, 2000);
let result = proof.verify(0, 300, 1000); // Proof claims timestamp 2000, server at 1000
assert!(matches!(
result,
Err(PowError::TimestampInFuture { server_time: 1000, proof_time: 2000 })
));
}
#[test]
fn test_verify_difficulty_zero_passes() {
let agent_id = [0u8; 32];
let now = 1700000000u64;
let proof = PowProof::new(12345, agent_id, now);
let result = proof.verify(0, POW_MAX_AGE_SECONDS, now);
assert!(result.is_ok());
}
#[test]
fn test_verify_insufficient_difficulty() {
let agent_id = [0u8; 32];
let now = 1700000000u64;
// Random nonce unlikely to have 16 leading zeros
let proof = PowProof::new(12345, agent_id, now);
let result = proof.verify(16, POW_MAX_AGE_SECONDS, now);
assert!(matches!(result, Err(PowError::InsufficientDifficulty { required: 16, .. })));
}
#[test]
fn test_solve_difficulty_zero() {
let agent_id = [1u8; 32];
let timestamp = 1700000000u64;
let proof = PowProof::solve(agent_id, timestamp, 0);
assert_eq!(proof.nonce, 0);
assert_eq!(proof.agent_id, agent_id);
assert_eq!(proof.timestamp, timestamp);
}
#[test]
fn test_solve_low_difficulty() {
let agent_id = [2u8; 32];
let timestamp = 1700000000u64;
let proof = PowProof::solve(agent_id, timestamp, 4);
// Verify the solution works
let result = proof.verify(4, POW_MAX_AGE_SECONDS, timestamp);
assert!(result.is_ok());
}
#[test]
fn test_admission_config_default() {
let config = AdmissionConfig::default();
assert_eq!(config.initial_difficulty, 16);
assert_eq!(config.initial_threshold, 10);
assert_eq!(config.reduced_difficulty, 1);
assert_eq!(config.graduated_threshold, 50);
assert!((config.trust_exemption_score - 0.6).abs() < f32::EPSILON);
assert_eq!(config.pow_max_age, 300);
}
#[test]
fn test_compute_difficulty_by_assertion_count() {
let config = AdmissionConfig::default();
// First 10: high difficulty
assert_eq!(config.compute_difficulty(0, 0.3), 16);
assert_eq!(config.compute_difficulty(5, 0.3), 16);
assert_eq!(config.compute_difficulty(9, 0.3), 16);
// 10-49: reduced difficulty
assert_eq!(config.compute_difficulty(10, 0.3), 1);
assert_eq!(config.compute_difficulty(25, 0.3), 1);
assert_eq!(config.compute_difficulty(49, 0.3), 1);
// 50+: exempt
assert_eq!(config.compute_difficulty(50, 0.3), 0);
assert_eq!(config.compute_difficulty(100, 0.3), 0);
}
#[test]
fn test_compute_difficulty_by_trust() {
let config = AdmissionConfig::default();
// Low trust, few assertions: high difficulty
assert_eq!(config.compute_difficulty(0, 0.3), 16);
// High trust: exempt regardless of assertion count
assert_eq!(config.compute_difficulty(0, 0.6), 0);
assert_eq!(config.compute_difficulty(5, 0.7), 0);
assert_eq!(config.compute_difficulty(100, 0.9), 0);
}
#[test]
fn test_hash_consistency() {
let proof1 = PowProof::new(123, [0xAB; 32], 1700000000);
let proof2 = PowProof::new(123, [0xAB; 32], 1700000000);
assert_eq!(proof1.compute_hash(), proof2.compute_hash());
}
#[test]
fn test_hash_changes_with_nonce() {
let proof1 = PowProof::new(123, [0xAB; 32], 1700000000);
let proof2 = PowProof::new(124, [0xAB; 32], 1700000000);
assert_ne!(proof1.compute_hash(), proof2.compute_hash());
}
#[test]
fn test_hash_changes_with_agent_id() {
let proof1 = PowProof::new(123, [0xAB; 32], 1700000000);
let proof2 = PowProof::new(123, [0xCD; 32], 1700000000);
assert_ne!(proof1.compute_hash(), proof2.compute_hash());
}
#[test]
fn test_hash_changes_with_timestamp() {
let proof1 = PowProof::new(123, [0xAB; 32], 1700000000);
let proof2 = PowProof::new(123, [0xAB; 32], 1700000001);
assert_ne!(proof1.compute_hash(), proof2.compute_hash());
}
}

View File

@ -0,0 +1,254 @@
//! Trust tiers for graduated admission control.
//!
//! Trust tiers map an agent's reputation score (0.0-1.0) to specific quotas
//! and proof-of-work requirements. This graduated system allows new agents
//! to earn trust over time while protecting the system from spam and Sybil attacks.
//!
//! # Tier Boundaries
//!
//! | Trust Range | Tier | Quota Multiplier | PoW Required |
//! |-------------|------------|------------------|--------------|
//! | 0.0-0.3 | Untrusted | 0.1x (1,000/hr) | Yes |
//! | 0.3-0.5 | Limited | 0.5x (5,000/hr) | Yes |
//! | 0.5-0.7 | Verified | 1.0x (10,000/hr) | No |
//! | 0.7-0.9 | Trusted | 2.0x (20,000/hr) | No |
//! | 0.9-1.0 | Authority | 10.0x (100k/hr) | No |
/// Base quota limit per hour (10,000 tokens).
/// Tiers apply multipliers to this base.
pub const BASE_QUOTA_LIMIT: u64 = 10_000;
/// Trust score threshold above which PoW is exempt.
/// Agents with trust >= this value skip proof-of-work regardless of tier.
pub const TRUST_POW_EXEMPTION_THRESHOLD: f32 = 0.6;
/// Trust tier classification based on reputation score.
///
/// Each tier determines:
/// - Quota multiplier: How many tokens per hour the agent can use
/// - PoW requirement: Whether proof-of-work is needed for submissions
///
/// New agents start at `Untrusted` (0.5 score, which is actually Verified tier).
/// They can improve by making accurate assertions verified against gold standards.
#[derive(
Debug, Clone, Copy, PartialEq, Eq, Hash, rkyv::Archive, rkyv::Deserialize, rkyv::Serialize,
)]
#[archive(check_bytes)]
pub enum TrustTier {
/// Untrusted tier: 0.0-0.3 trust score.
/// 0.1x quota multiplier (1,000 tokens/hr), PoW required.
Untrusted,
/// Limited tier: 0.3-0.5 trust score.
/// 0.5x quota multiplier (5,000 tokens/hr), PoW required.
Limited,
/// Verified tier: 0.5-0.7 trust score.
/// 1.0x quota multiplier (10,000 tokens/hr), PoW exempt.
Verified,
/// Trusted tier: 0.7-0.9 trust score.
/// 2.0x quota multiplier (20,000 tokens/hr), PoW exempt.
Trusted,
/// Authority tier: 0.9-1.0 trust score.
/// 10.0x quota multiplier (100,000 tokens/hr), PoW exempt.
Authority,
}
impl TrustTier {
/// Determine trust tier from a reputation score.
///
/// # Arguments
/// * `score` - Trust score in range [0.0, 1.0]
///
/// # Returns
/// The appropriate tier for the given score.
///
/// # Example
/// ```
/// use stemedb_core::types::TrustTier;
///
/// assert_eq!(TrustTier::from_score(0.1), TrustTier::Untrusted);
/// assert_eq!(TrustTier::from_score(0.5), TrustTier::Verified);
/// assert_eq!(TrustTier::from_score(0.95), TrustTier::Authority);
/// ```
#[must_use]
pub fn from_score(score: f32) -> Self {
// Clamp to valid range
let score = score.clamp(0.0, 1.0);
if score >= 0.9 {
TrustTier::Authority
} else if score >= 0.7 {
TrustTier::Trusted
} else if score >= 0.5 {
TrustTier::Verified
} else if score >= 0.3 {
TrustTier::Limited
} else {
TrustTier::Untrusted
}
}
/// Get the quota multiplier for this tier.
///
/// The multiplier is applied to the base quota (10,000 tokens/hr):
/// - Untrusted: 0.1x = 1,000/hr
/// - Limited: 0.5x = 5,000/hr
/// - Verified: 1.0x = 10,000/hr
/// - Trusted: 2.0x = 20,000/hr
/// - Authority: 10.0x = 100,000/hr
#[must_use]
pub fn quota_multiplier(&self) -> f32 {
match self {
TrustTier::Untrusted => 0.1,
TrustTier::Limited => 0.5,
TrustTier::Verified => 1.0,
TrustTier::Trusted => 2.0,
TrustTier::Authority => 10.0,
}
}
/// Get the effective quota limit for this tier.
///
/// # Returns
/// The per-hour quota limit (base * multiplier).
#[must_use]
pub fn effective_quota_limit(&self) -> u64 {
(BASE_QUOTA_LIMIT as f32 * self.quota_multiplier()) as u64
}
/// Check if this tier requires proof-of-work.
///
/// Only `Untrusted` and `Limited` tiers require PoW.
/// Note: Agents with 50+ assertions are exempt regardless of tier.
#[must_use]
pub fn requires_pow(&self) -> bool {
matches!(self, TrustTier::Untrusted | TrustTier::Limited)
}
/// Get human-readable name for this tier.
#[must_use]
pub fn name(&self) -> &'static str {
match self {
TrustTier::Untrusted => "Untrusted",
TrustTier::Limited => "Limited",
TrustTier::Verified => "Verified",
TrustTier::Trusted => "Trusted",
TrustTier::Authority => "Authority",
}
}
/// Get the trust score lower bound for this tier.
#[must_use]
pub fn min_score(&self) -> f32 {
match self {
TrustTier::Untrusted => 0.0,
TrustTier::Limited => 0.3,
TrustTier::Verified => 0.5,
TrustTier::Trusted => 0.7,
TrustTier::Authority => 0.9,
}
}
/// Get the trust score upper bound for this tier (exclusive).
#[must_use]
pub fn max_score(&self) -> f32 {
match self {
TrustTier::Untrusted => 0.3,
TrustTier::Limited => 0.5,
TrustTier::Verified => 0.7,
TrustTier::Trusted => 0.9,
TrustTier::Authority => 1.0,
}
}
}
impl std::fmt::Display for TrustTier {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.name())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_from_score_boundaries() {
// Untrusted: 0.0-0.3
assert_eq!(TrustTier::from_score(0.0), TrustTier::Untrusted);
assert_eq!(TrustTier::from_score(0.1), TrustTier::Untrusted);
assert_eq!(TrustTier::from_score(0.29), TrustTier::Untrusted);
// Limited: 0.3-0.5
assert_eq!(TrustTier::from_score(0.3), TrustTier::Limited);
assert_eq!(TrustTier::from_score(0.4), TrustTier::Limited);
assert_eq!(TrustTier::from_score(0.49), TrustTier::Limited);
// Verified: 0.5-0.7
assert_eq!(TrustTier::from_score(0.5), TrustTier::Verified);
assert_eq!(TrustTier::from_score(0.6), TrustTier::Verified);
assert_eq!(TrustTier::from_score(0.69), TrustTier::Verified);
// Trusted: 0.7-0.9
assert_eq!(TrustTier::from_score(0.7), TrustTier::Trusted);
assert_eq!(TrustTier::from_score(0.8), TrustTier::Trusted);
assert_eq!(TrustTier::from_score(0.89), TrustTier::Trusted);
// Authority: 0.9-1.0
assert_eq!(TrustTier::from_score(0.9), TrustTier::Authority);
assert_eq!(TrustTier::from_score(0.95), TrustTier::Authority);
assert_eq!(TrustTier::from_score(1.0), TrustTier::Authority);
}
#[test]
fn test_from_score_clamping() {
// Out of range values should be clamped
assert_eq!(TrustTier::from_score(-0.5), TrustTier::Untrusted);
assert_eq!(TrustTier::from_score(1.5), TrustTier::Authority);
}
#[test]
fn test_quota_multipliers() {
assert!((TrustTier::Untrusted.quota_multiplier() - 0.1).abs() < f32::EPSILON);
assert!((TrustTier::Limited.quota_multiplier() - 0.5).abs() < f32::EPSILON);
assert!((TrustTier::Verified.quota_multiplier() - 1.0).abs() < f32::EPSILON);
assert!((TrustTier::Trusted.quota_multiplier() - 2.0).abs() < f32::EPSILON);
assert!((TrustTier::Authority.quota_multiplier() - 10.0).abs() < f32::EPSILON);
}
#[test]
fn test_effective_quota_limits() {
assert_eq!(TrustTier::Untrusted.effective_quota_limit(), 1_000);
assert_eq!(TrustTier::Limited.effective_quota_limit(), 5_000);
assert_eq!(TrustTier::Verified.effective_quota_limit(), 10_000);
assert_eq!(TrustTier::Trusted.effective_quota_limit(), 20_000);
assert_eq!(TrustTier::Authority.effective_quota_limit(), 100_000);
}
#[test]
fn test_requires_pow() {
assert!(TrustTier::Untrusted.requires_pow());
assert!(TrustTier::Limited.requires_pow());
assert!(!TrustTier::Verified.requires_pow());
assert!(!TrustTier::Trusted.requires_pow());
assert!(!TrustTier::Authority.requires_pow());
}
#[test]
fn test_score_ranges() {
// Verify min/max scores don't overlap incorrectly
assert!(TrustTier::Untrusted.max_score() <= TrustTier::Limited.min_score() + f32::EPSILON);
assert!(TrustTier::Limited.max_score() <= TrustTier::Verified.min_score() + f32::EPSILON);
assert!(TrustTier::Verified.max_score() <= TrustTier::Trusted.min_score() + f32::EPSILON);
assert!(TrustTier::Trusted.max_score() <= TrustTier::Authority.min_score() + f32::EPSILON);
}
#[test]
fn test_display() {
assert_eq!(format!("{}", TrustTier::Untrusted), "Untrusted");
assert_eq!(format!("{}", TrustTier::Authority), "Authority");
}
}

View File

@ -15,7 +15,7 @@ use rand::rngs::OsRng;
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::testing::{self, AssertionBuilder}; use stemedb_core::testing::{self, AssertionBuilder};
use stemedb_core::types::{ use stemedb_core::types::{
Assertion, Epoch, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote, Assertion, Epoch, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote,
}; };
use stemedb_storage::HybridStore; use stemedb_storage::HybridStore;
use stemedb_wal::Journal; use stemedb_wal::Journal;

View File

@ -34,6 +34,7 @@ async fn test_rejects_invalid_signature() {
}], }],
confidence: 0.95, confidence: 0.95,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -86,6 +87,7 @@ async fn test_rejects_unsigned_assertion() {
signatures: vec![], // No signatures! signatures: vec![], // No signatures!
confidence: 0.95, confidence: 0.95,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -153,6 +155,7 @@ async fn test_multisig_all_must_be_valid() {
], ],
confidence: 0.95, confidence: 0.95,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };

View File

@ -38,6 +38,7 @@ async fn test_rejects_high_confidence() {
}], }],
confidence: 1.5, // Invalid: > 1.0 confidence: 1.5, // Invalid: > 1.0
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -94,6 +95,7 @@ async fn test_rejects_negative_confidence() {
}], }],
confidence: -0.5, // Invalid: < 0.0 confidence: -0.5, // Invalid: < 0.0
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -220,6 +222,7 @@ async fn test_rejects_oversized_subject() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -279,6 +282,7 @@ async fn test_rejects_oversized_predicate() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -340,6 +344,7 @@ async fn test_accepts_exact_max_subject_length() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -397,6 +402,7 @@ async fn test_accepts_exact_max_predicate_length() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -449,6 +455,7 @@ async fn test_rejects_nan_confidence() {
}], }],
confidence: f32::NAN, confidence: f32::NAN,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };

View File

@ -38,6 +38,7 @@ async fn test_rejects_infinite_confidence() {
}], }],
confidence: f32::INFINITY, confidence: f32::INFINITY,
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -180,6 +181,7 @@ async fn test_rejects_future_timestamp() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: future_timestamp, timestamp: future_timestamp,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -244,6 +246,7 @@ async fn test_accepts_near_future_timestamp() {
}], }],
confidence: 0.9, confidence: 0.9,
timestamp: near_future_timestamp, timestamp: near_future_timestamp,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -293,6 +296,7 @@ async fn test_accepts_zero_confidence() {
}], }],
confidence: 0.0, // Valid: boundary case confidence: 0.0, // Valid: boundary case
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };
@ -342,6 +346,7 @@ async fn test_accepts_one_confidence() {
}], }],
confidence: 1.0, // Valid: boundary case confidence: 1.0, // Valid: boundary case
timestamp: 1000, timestamp: 1000,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
}; };

View File

@ -0,0 +1,479 @@
//! EigenTrust Authority Lens: Resolves based on global + domain trust.
//!
//! This lens integrates with both EigenTrust (global trust) and DomainTrust
//! (domain-specific expertise) to weight assertions by the combined reputation
//! and expertise of the signing agent.
//!
//! # Design Philosophy
//!
//! Follows the "Deep Module" principle:
//! - Simple interface: `resolve_async(&[Assertion])` returns winner
//! - Complex implementation: Queries TrustGraphStore for EigenTrust, DomainTrustStore for expertise
//! - Sybil-resistant: Only seed-connected agents have meaningful global trust
//! - Domain-aware: Expertise in the relevant domain boosts effective weight
//!
//! # Resolution Formula
//!
//! ```text
//! weight = confidence × eigentrust_score × domain_factor
//! ```
//!
//! Where:
//! - confidence: The assertion's self-declared confidence (0.0 - 1.0)
//! - eigentrust_score: Global trust from power iteration (0.0 - 1.0)
//! - domain_factor: 0.5 + (domain_score × 0.5), ranges from 0.5 to 1.0
use crate::traits::{compute_conflict_score, Resolution};
use crate::vote_aware_consensus::AsyncLens;
use async_trait::async_trait;
use std::sync::Arc;
use stemedb_core::types::Assertion;
use stemedb_storage::domain_trust_store::DomainTrustStore;
use stemedb_storage::trust_graph_store::TrustGraphStore;
use tracing::{debug, instrument};
/// EigenTrust Authority Lens: Returns the assertion with the highest
/// global + domain trust-weighted score.
///
/// # Resolution Strategy
///
/// 1. For each candidate assertion, extract the primary signer's agent_id
/// 2. Lookup the agent's EigenTrust score (global trust)
/// 3. Lookup the agent's domain trust for this assertion's predicate
/// 4. Calculate: `weight = confidence × eigentrust × domain_factor`
/// 5. Return the assertion with highest weighted score
/// 6. Tiebreaker: If scores are equal, prefer most recent timestamp
/// 7. Agents with no EigenTrust score get 0.0 (Sybil protection)
/// 8. Agents with no domain trust get default 0.5 (neutral)
///
/// # Sybil Resistance
///
/// The key insight is that isolated agents (not connected to seed trust)
/// have near-zero EigenTrust scores, effectively filtering out Sybil attacks.
///
/// # Example
///
/// ```ignore
/// use stemedb_lens::EigenTrustAuthorityLens;
/// use stemedb_storage::{HybridStore, GenericTrustGraphStore, GenericDomainTrustStore};
/// use std::sync::Arc;
///
/// let store = Arc::new(HybridStore::open("./data")?);
/// let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
/// let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
/// let lens = EigenTrustAuthorityLens::new(trust_graph, domain_trust);
///
/// let resolution = lens.resolve_async(&candidates).await;
/// ```
pub struct EigenTrustAuthorityLens<T, D> {
trust_graph_store: Arc<T>,
domain_trust_store: Arc<D>,
}
impl<T: TrustGraphStore, D: DomainTrustStore> EigenTrustAuthorityLens<T, D> {
/// Create a new EigenTrustAuthorityLens with the given stores.
///
/// Both stores are wrapped in Arc for shared ownership, allowing
/// the lens to be used in multiple contexts.
pub fn new(trust_graph_store: Arc<T>, domain_trust_store: Arc<D>) -> Self {
Self { trust_graph_store, domain_trust_store }
}
/// Extract the primary agent ID from an assertion.
///
/// Uses the first signature's agent_id. Returns None if no signatures exist.
fn get_primary_agent(assertion: &Assertion) -> Option<[u8; 32]> {
assertion.signatures.first().map(|sig| sig.agent_id)
}
}
/// Internal struct to track assertion ranking data.
#[derive(Debug)]
struct RankedAssertion<'a> {
assertion: &'a Assertion,
eigentrust_score: f32,
domain_factor: f32,
weighted_score: f32,
}
#[async_trait]
impl<T: TrustGraphStore + 'static, D: DomainTrustStore + 'static> AsyncLens
for EigenTrustAuthorityLens<T, D>
{
#[instrument(skip(self, candidates), fields(candidates_count = candidates.len()))]
async fn resolve_async(&self, candidates: &[Assertion]) -> Resolution {
if candidates.is_empty() {
return Resolution::empty();
}
// For single candidate, still calculate weighted score
if candidates.len() == 1 {
let assertion = &candidates[0];
let (eigentrust_score, domain_factor, weighted_score) =
self.calculate_weight(assertion).await;
debug!(
subject = %assertion.subject,
eigentrust_score,
domain_factor,
weighted_score,
"Single candidate resolution"
);
return Resolution::with_winner(assertion.clone(), 1, weighted_score, 0.0);
}
// Collect trust-weighted scores for all candidates
let mut ranked: Vec<RankedAssertion> = Vec::with_capacity(candidates.len());
for assertion in candidates {
let (eigentrust_score, domain_factor, weighted_score) =
self.calculate_weight(assertion).await;
debug!(
subject = %assertion.subject,
eigentrust_score,
domain_factor,
weighted_score,
"Calculated weighted score"
);
ranked.push(RankedAssertion {
assertion,
eigentrust_score,
domain_factor,
weighted_score,
});
}
// Sort by weighted score (descending), then by timestamp (descending) for ties
ranked.sort_by(|a, b| {
b.weighted_score
.partial_cmp(&a.weighted_score)
.unwrap_or(std::cmp::Ordering::Equal)
.then_with(|| b.assertion.timestamp.cmp(&a.assertion.timestamp))
});
// Select the winner (highest ranked)
if let Some(winner) = ranked.first() {
let conflict = compute_conflict_score(candidates);
debug!(
winner_subject = %winner.assertion.subject,
eigentrust = winner.eigentrust_score,
domain_factor = winner.domain_factor,
weighted_score = winner.weighted_score,
conflict,
"Resolved via EigenTrust + domain authority"
);
Resolution::with_winner(
winner.assertion.clone(),
candidates.len(),
winner.weighted_score,
conflict,
)
} else {
// Should never happen since we checked for empty candidates above
Resolution::empty()
}
}
fn name(&self) -> &'static str {
"EigenTrustAuthority"
}
}
impl<T: TrustGraphStore + 'static, D: DomainTrustStore + 'static> EigenTrustAuthorityLens<T, D> {
/// Calculate the weighted score for an assertion.
///
/// Returns (eigentrust_score, domain_factor, weighted_score).
async fn calculate_weight(&self, assertion: &Assertion) -> (f32, f32, f32) {
// Extract primary agent
let agent_id = match Self::get_primary_agent(assertion) {
Some(id) => id,
None => {
debug!(
subject = %assertion.subject,
"Assertion has no signatures, treating as untrusted"
);
return (0.0, 1.0, 0.0);
}
};
// Get EigenTrust score (global trust)
let eigentrust_score = match self.trust_graph_store.get_eigentrust_score(&agent_id).await {
Ok(score) => score,
Err(e) => {
debug!(
agent_id = %hex::encode(agent_id),
error = %e,
"Failed to get EigenTrust score, using 0.0"
);
0.0 // No EigenTrust score = untrusted (Sybil protection)
}
};
// Get domain factor (domain-specific expertise)
let domain_factor = match self
.domain_trust_store
.get_effective_trust(&agent_id, &assertion.predicate, 1.0)
.await
{
Ok(effective) => effective, // get_effective_trust returns eigentrust × factor, so with 1.0 it returns just the factor
Err(e) => {
debug!(
agent_id = %hex::encode(agent_id),
predicate = %assertion.predicate,
error = %e,
"Failed to get domain trust, using default factor 0.75"
);
0.75 // Default domain factor (0.5 score → 0.75 factor)
}
};
// Calculate weighted score
// weight = confidence × eigentrust × domain_factor
let weighted_score = assertion.confidence * eigentrust_score * domain_factor;
(eigentrust_score, domain_factor, weighted_score)
}
}
#[cfg(test)]
mod tests {
use super::*;
use stemedb_core::testing::AssertionBuilder;
use stemedb_storage::domain_trust_store::{DomainTrust, GenericDomainTrustStore};
use stemedb_storage::trust_graph_store::{
EigenTrustConfig, GenericTrustGraphStore, TrustEdge, TrustGraphStore,
};
use stemedb_storage::HybridStore;
fn agent(id: u8) -> [u8; 32] {
let mut arr = [0u8; 32];
arr[0] = id;
arr
}
fn create_assertion(
subject: &str,
predicate: &str,
confidence: f32,
agent_id: [u8; 32],
timestamp: u64,
) -> Assertion {
AssertionBuilder::new()
.subject(subject)
.predicate(predicate)
.confidence(confidence)
.agent_id(agent_id)
.timestamp(timestamp)
.build()
}
#[tokio::test]
async fn test_empty_candidates() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens = EigenTrustAuthorityLens::new(trust_graph, domain_trust);
let resolution = lens.resolve_async(&[]).await;
assert!(resolution.winner.is_none());
assert_eq!(resolution.candidates_count, 0);
}
#[tokio::test]
async fn test_single_candidate_no_eigentrust() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens = EigenTrustAuthorityLens::new(trust_graph, domain_trust);
// Agent with no EigenTrust score
let assertion = create_assertion("Subject", "predicate", 0.8, agent(1), 1000);
let resolution = lens.resolve_async(&[assertion]).await;
assert!(resolution.winner.is_some());
// No EigenTrust = 0.0, so weighted score = 0.8 * 0.0 * factor = 0.0
assert!((resolution.resolution_confidence - 0.0).abs() < 0.01);
}
#[tokio::test]
async fn test_eigentrust_integrated() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens =
EigenTrustAuthorityLens::new(Arc::clone(&trust_graph), Arc::clone(&domain_trust));
// Set up trust graph: seed → agent1
trust_graph.set_seed_trust(&agent(0), 1.0).await.expect("set seed");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(1), 1.0, 1000, None))
.await
.expect("add edge");
// Compute EigenTrust
trust_graph.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
// Create assertion from agent 1
let assertion = create_assertion("Subject", "predicate", 0.8, agent(1), 1000);
let resolution = lens.resolve_async(&[assertion]).await;
assert!(resolution.winner.is_some());
// Agent 1 should have non-zero EigenTrust score
assert!(resolution.resolution_confidence > 0.0);
}
#[tokio::test]
async fn test_sybil_agent_gets_low_score() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens =
EigenTrustAuthorityLens::new(Arc::clone(&trust_graph), Arc::clone(&domain_trust));
// Set up: seed has trust, Sybil ring is isolated
trust_graph.set_seed_trust(&agent(0), 1.0).await.expect("set seed");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(1), 1.0, 1000, None))
.await
.expect("add edge");
// Sybil ring: 10 → 11 → 12 → 10 (not connected to seed)
trust_graph
.add_trust_edge(&TrustEdge::new(agent(10), agent(11), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(11), agent(12), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(12), agent(10), 1.0, 1000, None))
.await
.expect("add edge");
// Compute EigenTrust
trust_graph.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
// Legitimate agent vs Sybil agent
let legitimate = create_assertion("Subject", "predicate", 0.8, agent(1), 1000);
let sybil = create_assertion("Subject", "predicate", 1.0, agent(10), 1100); // Higher confidence!
let resolution = lens.resolve_async(&[legitimate.clone(), sybil]).await;
// Legitimate agent should win despite Sybil having higher confidence
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().unwrap().signatures[0].agent_id, agent(1));
}
#[tokio::test]
async fn test_domain_expertise_matters() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens =
EigenTrustAuthorityLens::new(Arc::clone(&trust_graph), Arc::clone(&domain_trust));
// Set up: both agents have same EigenTrust
trust_graph.set_seed_trust(&agent(0), 1.0).await.expect("set seed");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(1), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(2), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
// Agent 1: Expert in medicine (score 0.95)
let mut dt1 = DomainTrust::new(agent(1), "medicine".to_string(), 1000);
dt1.score = 0.95;
domain_trust.put_domain_trust(&dt1).await.expect("put");
// Agent 2: Novice in medicine (score 0.3)
let mut dt2 = DomainTrust::new(agent(2), "medicine".to_string(), 1000);
dt2.score = 0.3;
domain_trust.put_domain_trust(&dt2).await.expect("put");
// Same confidence, same predicate (medicine domain)
let expert_assertion = create_assertion("Drug", "treats_condition", 0.8, agent(1), 1000);
let novice_assertion = create_assertion("Drug", "treats_condition", 0.8, agent(2), 1100);
let resolution = lens.resolve_async(&[expert_assertion.clone(), novice_assertion]).await;
// Expert should win due to higher domain trust
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().unwrap().signatures[0].agent_id, agent(1));
}
#[tokio::test]
async fn test_no_signatures_treated_as_untrusted() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens =
EigenTrustAuthorityLens::new(Arc::clone(&trust_graph), Arc::clone(&domain_trust));
// Set up trusted agent
trust_graph.set_seed_trust(&agent(0), 1.0).await.expect("set seed");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(1), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
let signed = create_assertion("Subject", "predicate", 0.7, agent(1), 1000);
let mut unsigned = create_assertion("Subject", "predicate", 1.0, agent(99), 1100);
unsigned.signatures.clear();
let resolution = lens.resolve_async(&[signed.clone(), unsigned]).await;
// Signed assertion should win even with lower confidence
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().unwrap().signatures.len(), 1);
}
#[tokio::test]
async fn test_tie_breaking_by_timestamp() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens =
EigenTrustAuthorityLens::new(Arc::clone(&trust_graph), Arc::clone(&domain_trust));
// Set up: same agent makes two assertions
trust_graph.set_seed_trust(&agent(0), 1.0).await.expect("set seed");
trust_graph
.add_trust_edge(&TrustEdge::new(agent(0), agent(1), 1.0, 1000, None))
.await
.expect("add edge");
trust_graph.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
// Same agent, same confidence, different timestamps
let older = create_assertion("Subject", "predicate", 0.8, agent(1), 1000);
let newer = create_assertion("Subject", "predicate", 0.8, agent(1), 2000);
let resolution = lens.resolve_async(&[older, newer.clone()]).await;
// Newer should win on tiebreak
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().unwrap().timestamp, 2000);
}
#[tokio::test]
async fn test_lens_name() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_graph = Arc::new(GenericTrustGraphStore::new(store.clone()));
let domain_trust = Arc::new(GenericDomainTrustStore::new(store));
let lens = EigenTrustAuthorityLens::new(trust_graph, domain_trust);
assert_eq!(lens.name(), "EigenTrustAuthority");
}
}

View File

@ -0,0 +1,286 @@
//! HLC-based Recency Lens: Hybrid Logical Clock timestamp wins.
//!
//! This lens provides distributed-consistent recency ordering using HLC timestamps,
//! which handle clock skew between nodes better than Unix timestamps alone.
//!
//! # Why HLC over Unix timestamp?
//!
//! - **Clock skew tolerance**: Two nodes with drifted clocks will still produce
//! consistent ordering because HLC combines physical time with logical counters.
//! - **Total ordering**: HLC + node_id provides deterministic ordering even for
//! concurrent events on different nodes.
//! - **Causal consistency**: HLC preserves happens-before relationships across
//! distributed nodes.
//!
//! # Resolution Strategy
//!
//! 1. Compare by `hlc_timestamp` (includes NTP64 time + logical counter)
//! 2. If HLC times are equal (concurrent events), compare by `node_id`
//! 3. Final tiebreaker: `source_hash` for determinism
use crate::traits::{compute_conflict_score, Lens, Resolution};
use stemedb_core::types::Assertion;
use tracing::instrument;
/// HLC-based Recency Lens: Returns the assertion with the highest HLC timestamp.
///
/// # Resolution Strategy
///
/// 1. Find assertion with maximum `hlc_timestamp`
/// 2. If HLC tie: HLC's `node_id` provides tiebreaker
/// 3. Final tiebreaker: `source_hash` for determinism across identical HLCs
///
/// # Confidence Calculation
///
/// - Single candidate: 1.0 (trivial resolution)
/// - Multiple candidates: Based on HLC timestamp gap (in milliseconds) to next candidate
/// - > 1 day gap: 0.95
/// - > 1 hour gap: 0.8
/// - > 1 minute gap: 0.6
/// - Otherwise: 0.5
#[derive(Debug, Clone, Copy, Default)]
pub struct HlcRecencyLens;
impl Lens for HlcRecencyLens {
#[instrument(skip(self, candidates), fields(candidates_count = candidates.len(), lens = "HlcRecency"))]
fn resolve(&self, candidates: &[Assertion]) -> Resolution {
if candidates.is_empty() {
return Resolution::empty();
}
if candidates.len() == 1 {
return Resolution::with_winner(candidates[0].clone(), 1, 1.0, 0.0);
}
// Find the assertion with the highest HLC timestamp
// HLC's Ord implementation compares time_ntp64 first, then node_id
let winner = candidates
.iter()
.max_by(|a, b| {
// Primary: highest HLC timestamp (includes NTP64 time + node_id tiebreaker)
// Final tiebreaker: source_hash for determinism
a.hlc_timestamp
.cmp(&b.hlc_timestamp)
.then_with(|| a.source_hash.cmp(&b.source_hash))
})
.cloned();
match winner {
Some(w) => {
// Calculate confidence based on how much newer the winner is
let max_hlc = &w.hlc_timestamp;
let max_ms = max_hlc.millis();
// Find the second-highest HLC timestamp
let second_max_ms = candidates
.iter()
.filter(|a| a.hlc_timestamp < *max_hlc)
.map(|a| a.hlc_timestamp.millis())
.max()
.unwrap_or(0);
// Confidence is higher when the gap is larger
let gap_ms = max_ms.saturating_sub(second_max_ms);
let confidence = if gap_ms > 86_400_000 {
// More than a day: high confidence
0.95
} else if gap_ms > 3_600_000 {
// More than an hour: good confidence
0.8
} else if gap_ms > 60_000 {
// More than a minute: moderate confidence
0.6
} else {
// Very close: low confidence
0.5
};
let conflict = compute_conflict_score(candidates);
Resolution::with_winner(w, candidates.len(), confidence, conflict)
}
None => Resolution::empty(),
}
}
fn name(&self) -> &'static str {
"HlcRecency"
}
}
#[cfg(test)]
mod tests {
use super::*;
use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::HlcTimestamp;
fn create_assertion_with_hlc(subject: &str, time_ntp64: u64, node_id: [u8; 16]) -> Assertion {
AssertionBuilder::new()
.subject(subject)
.hlc_timestamp(HlcTimestamp::new(time_ntp64, node_id))
.build()
}
#[test]
fn test_empty_candidates() {
let lens = HlcRecencyLens;
let resolution = lens.resolve(&[]);
assert!(resolution.winner.is_none());
assert_eq!(resolution.candidates_count, 0);
}
#[test]
fn test_single_candidate() {
let lens = HlcRecencyLens;
let assertion = create_assertion_with_hlc("Tesla", 1000, [1u8; 16]);
let resolution = lens.resolve(std::slice::from_ref(&assertion));
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().map(|a| &a.subject), Some(&"Tesla".to_string()));
assert_eq!(resolution.candidates_count, 1);
assert!((resolution.resolution_confidence - 1.0).abs() < f32::EPSILON);
}
#[test]
fn test_hlc_ordering_beats_unix_timestamp() {
// Test that HLC ordering is used, not Unix timestamp
let lens = HlcRecencyLens;
// Create two assertions with same Unix timestamp but different HLC
let mut older = AssertionBuilder::new()
.subject("Older")
.timestamp(1000) // Same Unix timestamp
.hlc_timestamp(HlcTimestamp::new(1000, [1u8; 16]))
.build();
older.source_hash = [1u8; 32];
let mut newer = AssertionBuilder::new()
.subject("Newer")
.timestamp(1000) // Same Unix timestamp
.hlc_timestamp(HlcTimestamp::new(2000, [1u8; 16])) // Higher HLC
.build();
newer.source_hash = [2u8; 32];
let resolution = lens.resolve(&[older, newer.clone()]);
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().map(|a| &a.subject), Some(&"Newer".to_string()));
}
#[test]
fn test_deterministic_tiebreaker_same_hlc_time() {
// When HLC time is equal, node_id should break the tie
let lens = HlcRecencyLens;
let mut a1 = create_assertion_with_hlc("A", 1000, [1u8; 16]);
a1.source_hash = [1u8; 32];
let mut a2 = create_assertion_with_hlc("B", 1000, [2u8; 16]); // Higher node_id
a2.source_hash = [2u8; 32];
// Same HLC time, should use node_id as tiebreaker
let resolution1 = lens.resolve(&[a1.clone(), a2.clone()]);
let resolution2 = lens.resolve(&[a2.clone(), a1.clone()]);
// Should be deterministic regardless of input order
// Higher node_id wins
assert_eq!(
resolution1.winner.as_ref().map(|a| &a.subject),
resolution2.winner.as_ref().map(|a| &a.subject)
);
// Node B has higher node_id [2u8; 16] > [1u8; 16]
assert_eq!(resolution1.winner.as_ref().map(|a| &a.subject), Some(&"B".to_string()));
}
#[test]
fn test_clock_skew_scenario() {
// Scenario: Node A's wall clock is ahead, but Node B's assertion is causally later
// In HLC, the causally later assertion should have a higher HLC timestamp
let lens = HlcRecencyLens;
// Node A: wall clock ahead (higher NTP64 base), but logically older event
let node_a_ahead = create_assertion_with_hlc("NodeA_Ahead", 5000, [1u8; 16]);
// Node B: wall clock behind, but received Node A's timestamp and incremented
// In real HLC, this would be: max(local_time, received_time) + 1
let node_b_later = create_assertion_with_hlc("NodeB_CausallyLater", 5001, [2u8; 16]);
let resolution = lens.resolve(&[node_a_ahead, node_b_later.clone()]);
// Node B's assertion should win because it's causally later (higher HLC)
assert_eq!(
resolution.winner.as_ref().map(|a| &a.subject),
Some(&"NodeB_CausallyLater".to_string())
);
}
#[test]
fn test_source_hash_final_tiebreaker() {
// When HLC timestamps are completely identical, source_hash is final tiebreaker
let lens = HlcRecencyLens;
let mut a1 = AssertionBuilder::new()
.subject("A")
.hlc_timestamp(HlcTimestamp::new(1000, [1u8; 16]))
.build();
a1.source_hash = [1u8; 32];
let mut a2 = AssertionBuilder::new()
.subject("B")
.hlc_timestamp(HlcTimestamp::new(1000, [1u8; 16])) // Identical HLC!
.build();
a2.source_hash = [2u8; 32]; // Higher source_hash
let resolution = lens.resolve(&[a1.clone(), a2.clone()]);
// Higher source_hash should win
assert_eq!(resolution.winner.as_ref().map(|a| &a.subject), Some(&"B".to_string()));
}
#[test]
fn test_confidence_calculation() {
let lens = HlcRecencyLens;
// Create assertions with large time gap (> 1 day in milliseconds)
// NTP64 seconds are in upper 32 bits, so 1 second = 1 << 32
// For a 2-day gap: 2 * 86400 seconds = 172800 seconds
const NTP_UNIX_OFFSET: u64 = 2_208_988_800;
let base_seconds = NTP_UNIX_OFFSET + 1000;
let ntp64_base = base_seconds << 32;
let ntp64_later = (base_seconds + 172800) << 32; // 2 days later
let old = create_assertion_with_hlc("Old", ntp64_base, [1u8; 16]);
let new = create_assertion_with_hlc("New", ntp64_later, [1u8; 16]);
let resolution = lens.resolve(&[old, new]);
assert!(resolution.winner.is_some());
// With > 1 day gap, confidence should be 0.95
assert!(
resolution.resolution_confidence > 0.9,
"Expected high confidence for large gap, got {}",
resolution.resolution_confidence
);
}
#[test]
fn test_multiple_candidates_selects_newest() {
let lens = HlcRecencyLens;
let old = create_assertion_with_hlc("Old", 1000, [1u8; 16]);
let newer = create_assertion_with_hlc("Newer", 2000, [1u8; 16]);
let newest = create_assertion_with_hlc("Newest", 3000, [1u8; 16]);
let resolution = lens.resolve(&[old, newer, newest.clone()]);
assert!(resolution.winner.is_some());
assert_eq!(resolution.winner.as_ref().map(|a| &a.subject), Some(&"Newest".to_string()));
assert_eq!(resolution.candidates_count, 3);
}
#[test]
fn test_lens_name() {
let lens = HlcRecencyLens;
assert_eq!(lens.name(), "HlcRecency");
}
}

View File

@ -46,7 +46,9 @@
mod confidence; mod confidence;
mod consensus; mod consensus;
mod constraints; mod constraints;
mod eigentrust_authority;
mod epoch_aware; mod epoch_aware;
mod hlc_recency;
mod layered_consensus; mod layered_consensus;
mod recency; mod recency;
mod skeptic; mod skeptic;
@ -57,7 +59,9 @@ mod vote_aware_consensus;
pub use confidence::ConfidenceLens; pub use confidence::ConfidenceLens;
pub use consensus::ConsensusLens; pub use consensus::ConsensusLens;
pub use constraints::{ConstraintSet, ConstraintsLens}; pub use constraints::{ConstraintSet, ConstraintsLens};
pub use eigentrust_authority::EigenTrustAuthorityLens;
pub use epoch_aware::{EpochAwareLens, SyncLensWrapper}; pub use epoch_aware::{EpochAwareLens, SyncLensWrapper};
pub use hlc_recency::HlcRecencyLens;
pub use layered_consensus::LayeredConsensusLens; pub use layered_consensus::LayeredConsensusLens;
pub use recency::RecencyLens; pub use recency::RecencyLens;
pub use skeptic::SkepticLens; pub use skeptic::SkepticLens;

View File

@ -63,6 +63,82 @@ impl<S: KVStore + 'static> QueryEngine<S> {
Ok(results) Ok(results)
} }
/// Fetch assertions for multiple subjects, deduplicating by hash.
///
/// Used for alias resolution where a single query subject expands to
/// multiple aliased subjects (e.g., code:// and rfc:// paths).
pub(super) async fn fetch_by_subjects(&self, subjects: &[String]) -> Result<Vec<Assertion>> {
use std::collections::HashSet;
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
let mut results = Vec::new();
for subject in subjects {
let hash_list = self.index_store.get_by_subject(subject).await?;
for hash in hash_list {
if !seen_hashes.insert(hash) {
continue; // Already seen this hash from another subject
}
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) {
Ok(assertion) => results.push(assertion),
Err(e) => {
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e);
}
}
}
}
}
debug!(
subjects_count = subjects.len(),
total_assertions = results.len(),
"Fetched assertions for multiple subjects"
);
Ok(results)
}
/// Fetch assertions for multiple subjects with predicate filter, deduplicating by hash.
///
/// Used for alias resolution when both subject and predicate are specified.
pub(super) async fn fetch_by_subjects_predicate(
&self,
subjects: &[String],
predicate: &str,
) -> Result<Vec<Assertion>> {
use std::collections::HashSet;
let mut seen_hashes: HashSet<[u8; 32]> = HashSet::new();
let mut results = Vec::new();
for subject in subjects {
let hash_list = self.index_store.get_by_subject_predicate(subject, predicate).await?;
for hash in hash_list {
if !seen_hashes.insert(hash) {
continue; // Already seen this hash from another subject
}
let assertion_key = key_codec::assertion_key(subject, &hex::encode(hash));
if let Some(data) = self.store.get(&assertion_key).await? {
match self.deserialize_assertion(&data) {
Ok(assertion) => results.push(assertion),
Err(e) => {
debug!(hash = %hex::encode(hash), "Skipping malformed assertion: {:?}", e);
}
}
}
}
}
debug!(
subjects_count = subjects.len(),
predicate,
total_assertions = results.len(),
"Fetched assertions for multiple subjects with predicate"
);
Ok(results)
}
/// Fetch all assertions by scanning the subjects discovery index. /// Fetch all assertions by scanning the subjects discovery index.
/// ///
/// This scans `\x00SUBJECTS:` to discover all known subjects, then fetches /// This scans `\x00SUBJECTS:` to discover all known subjects, then fetches

View File

@ -6,7 +6,7 @@
use std::sync::Arc; use std::sync::Arc;
use stemedb_core::types::Assertion; use stemedb_core::types::Assertion;
use stemedb_storage::{GenericIndexStore, KVStore, VectorIndex, VisualIndex}; use stemedb_storage::{AliasStore, GenericIndexStore, KVStore, VectorIndex, VisualIndex};
// Trait import required for IndexStore methods on GenericIndexStore // Trait import required for IndexStore methods on GenericIndexStore
#[allow(unused_imports)] #[allow(unused_imports)]
use stemedb_storage::IndexStore; use stemedb_storage::IndexStore;
@ -43,13 +43,15 @@ pub struct QueryEngine<S> {
pub(super) vector_index: Option<Arc<dyn VectorIndex>>, pub(super) vector_index: Option<Arc<dyn VectorIndex>>,
/// Optional visual index for hamming distance search. /// Optional visual index for hamming distance search.
pub(super) visual_index: Option<Arc<dyn VisualIndex>>, pub(super) visual_index: Option<Arc<dyn VisualIndex>>,
/// Optional alias store for cross-scheme subject resolution.
pub(super) alias_store: Option<Arc<dyn AliasStore>>,
} }
impl<S: KVStore + 'static> QueryEngine<S> { impl<S: KVStore + 'static> QueryEngine<S> {
/// Create a new query engine backed by the given store. /// Create a new query engine backed by the given store.
pub fn new(store: Arc<S>) -> Self { pub fn new(store: Arc<S>) -> Self {
let index_store = GenericIndexStore::new(store.clone()); let index_store = GenericIndexStore::new(store.clone());
Self { store, index_store, vector_index: None, visual_index: None } Self { store, index_store, vector_index: None, visual_index: None, alias_store: None }
} }
/// Attach a vector index for k-NN similarity search. /// Attach a vector index for k-NN similarity search.
@ -70,6 +72,17 @@ impl<S: KVStore + 'static> QueryEngine<S> {
self self
} }
/// Attach an alias store for cross-scheme subject resolution.
///
/// When set and a query has `resolve_aliases: true`, the engine will
/// expand the subject to all aliased paths before fetching assertions.
/// This enables queries like `code://rust/myapp/tls/cert_verification`
/// to also return assertions from `rfc://5246/tls/cert_verification`.
pub fn with_alias_store(mut self, alias_store: Arc<dyn AliasStore>) -> Self {
self.alias_store = Some(alias_store);
self
}
/// Execute a query and return matching assertions. /// Execute a query and return matching assertions.
/// ///
/// # Query Execution Strategy /// # Query Execution Strategy
@ -118,7 +131,8 @@ impl<S: KVStore + 'static> QueryEngine<S> {
// Fast path: check materialized view when both subject and predicate are specified // Fast path: check materialized view when both subject and predicate are specified
// Skip fast path if as_of is set (MVs reflect current state, time-travel needs full scan) // Skip fast path if as_of is set (MVs reflect current state, time-travel needs full scan)
if query.as_of.is_none() { // Skip fast path if resolve_aliases is true (aliases expand to multiple subjects)
if query.as_of.is_none() && !query.resolve_aliases {
if let (Some(subject), Some(predicate)) = (&query.subject, &query.predicate) { if let (Some(subject), Some(predicate)) = (&query.subject, &query.predicate) {
if let Some(result) = self.try_fast_path(subject, predicate, query).await? { if let Some(result) = self.try_fast_path(subject, predicate, query).await? {
debug!(subject, predicate, "Fast path: used materialized view"); debug!(subject, predicate, "Fast path: used materialized view");
@ -128,21 +142,43 @@ impl<S: KVStore + 'static> QueryEngine<S> {
} }
// Slow path: determine scan strategy based on query filters // Slow path: determine scan strategy based on query filters
let candidates = match (&query.subject, &query.predicate) { // When resolve_aliases is true, expand subject to all aliased paths
// O(1) compound index lookup let candidates = match (&query.subject, &query.predicate, query.resolve_aliases) {
(Some(subject), Some(predicate)) => { // Alias-expanded compound index lookup
(Some(subject), Some(predicate), true) => {
let subjects = self.resolve_subject_aliases(subject).await?;
debug!(
original_subject = subject,
resolved_count = subjects.len(),
predicate,
"Alias-expanded compound index lookup"
);
self.fetch_by_subjects_predicate(&subjects, predicate).await?
}
// Alias-expanded subject index lookup
(Some(subject), None, true) => {
let subjects = self.resolve_subject_aliases(subject).await?;
debug!(
original_subject = subject,
resolved_count = subjects.len(),
"Alias-expanded subject index lookup"
);
self.fetch_by_subjects(&subjects).await?
}
// O(1) compound index lookup (no alias expansion)
(Some(subject), Some(predicate), false) => {
debug!( debug!(
subject, subject,
predicate, "Slow path: using compound index SP:{subject}:{predicate}" predicate, "Slow path: using compound index SP:{subject}:{predicate}"
); );
self.fetch_by_subject_predicate(subject, predicate).await? self.fetch_by_subject_predicate(subject, predicate).await?
} }
// O(1) subject index lookup // O(1) subject index lookup (no alias expansion)
(Some(subject), None) => { (Some(subject), None, false) => {
debug!(subject, "Using subject index S:{subject}"); debug!(subject, "Using subject index S:{subject}");
self.fetch_by_subject(subject).await? self.fetch_by_subject(subject).await?
} }
// O(n) full scan // O(n) full scan (resolve_aliases has no effect without subject)
_ => { _ => {
debug!("Using full scan (no subject filter)"); debug!("Using full scan (no subject filter)");
self.fetch_all_assertions().await? self.fetch_all_assertions().await?
@ -205,4 +241,31 @@ impl<S: KVStore + 'static> QueryEngine<S> {
stemedb_core::serde::deserialize(data) stemedb_core::serde::deserialize(data)
.map_err(|e| QueryError::Deserialization(e.to_string())) .map_err(|e| QueryError::Deserialization(e.to_string()))
} }
/// Resolve a subject to all aliased paths via the AliasStore.
///
/// If no alias store is configured, returns just the original subject.
/// This allows queries with `resolve_aliases: true` to gracefully degrade
/// when no alias store is available.
async fn resolve_subject_aliases(&self, subject: &str) -> Result<Vec<String>> {
match &self.alias_store {
Some(store) => {
let resolved = store.resolve_all(subject).await.map_err(QueryError::from)?;
debug!(
subject,
resolved_count = resolved.len(),
resolved_subjects = ?resolved,
"Resolved subject aliases"
);
Ok(resolved)
}
None => {
debug!(
subject,
"resolve_aliases: true but no alias_store configured, using exact subject"
);
Ok(vec![subject.to_string()])
}
}
}
} }

View File

@ -0,0 +1,254 @@
//! Tests for alias resolution in QueryEngine.
//!
//! Tests the `resolve_aliases` query flag and `alias_store` integration.
use std::sync::Arc;
use stemedb_core::testing::AssertionBuilder;
use stemedb_core::types::{AliasOrigin, ConceptAlias, ConceptPath, LifecycleStage};
use stemedb_storage::{AliasStore, GenericAliasStore, HybridStore};
use super::{store_assertion, QueryEngine};
use crate::query::Query;
/// Helper to create a test ConceptAlias.
fn create_alias(alias: &str, canonical: &str) -> ConceptAlias {
ConceptAlias::new(
ConceptPath::parse(alias).expect("valid alias path"),
ConceptPath::parse(canonical).expect("valid canonical path"),
[1u8; 32], // agent_id
1000, // timestamp
AliasOrigin::Manual,
)
}
/// Test that resolve_aliases: true expands subject to aliased paths.
#[tokio::test]
async fn test_resolve_aliases_expands_subjects() {
let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create assertions for two different subjects (aliased paths)
let code_assertion = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.object_text("enabled")
.confidence(0.8)
.lifecycle(LifecycleStage::Approved)
.build();
let rfc_assertion = AssertionBuilder::new()
.subject("rfc://5246/tls")
.predicate("cert_verification")
.object_text("required")
.confidence(0.95)
.lifecycle(LifecycleStage::Approved)
.build();
store_assertion(&store, &code_assertion).await;
store_assertion(&store, &rfc_assertion).await;
// Set up alias store with alias: code://rust/myapp/tls -> rfc://5246/tls
let alias_store = GenericAliasStore::new(store.clone());
let alias = create_alias("code://rust/myapp/tls", "rfc://5246/tls");
alias_store.set_alias(&alias).await.expect("set alias");
// Create query engine with alias store
let engine = QueryEngine::new(Arc::new(store.clone()))
.with_alias_store(Arc::new(alias_store) as Arc<dyn AliasStore>);
// Query with resolve_aliases: true should find BOTH assertions
let query = Query::builder()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.resolve_aliases(true)
.build();
let result = engine.execute(&query).await.expect("execute");
assert_eq!(result.assertions.len(), 2, "Should find assertions from both aliased subjects");
let subjects: Vec<&str> = result.assertions.iter().map(|a| a.subject.as_str()).collect();
assert!(subjects.contains(&"code://rust/myapp/tls"), "Should include code assertion");
assert!(subjects.contains(&"rfc://5246/tls"), "Should include rfc assertion");
}
/// Test that resolve_aliases: false queries exact subject only.
#[tokio::test]
async fn test_resolve_aliases_false_is_exact() {
let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create assertions for two different subjects
let code_assertion = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.object_text("enabled")
.confidence(0.8)
.lifecycle(LifecycleStage::Approved)
.build();
let rfc_assertion = AssertionBuilder::new()
.subject("rfc://5246/tls")
.predicate("cert_verification")
.object_text("required")
.confidence(0.95)
.lifecycle(LifecycleStage::Approved)
.build();
store_assertion(&store, &code_assertion).await;
store_assertion(&store, &rfc_assertion).await;
// Set up alias store with alias
let alias_store = GenericAliasStore::new(store.clone());
let alias = create_alias("code://rust/myapp/tls", "rfc://5246/tls");
alias_store.set_alias(&alias).await.expect("set alias");
let engine = QueryEngine::new(Arc::new(store.clone()))
.with_alias_store(Arc::new(alias_store) as Arc<dyn AliasStore>);
// Query with resolve_aliases: false (default) should find only the exact subject
let query = Query::builder()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.resolve_aliases(false)
.build();
let result = engine.execute(&query).await.expect("execute");
assert_eq!(result.assertions.len(), 1, "Should find only exact subject assertion");
assert_eq!(result.assertions[0].subject, "code://rust/myapp/tls");
}
/// Test that resolve_aliases: true without alias_store gracefully returns exact subject.
#[tokio::test]
async fn test_no_alias_store_graceful() {
let store = Arc::new(HybridStore::open_temp().expect("store"));
let assertion = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.object_text("enabled")
.confidence(0.8)
.lifecycle(LifecycleStage::Approved)
.build();
store_assertion(&store, &assertion).await;
// Query engine WITHOUT alias store
let engine = QueryEngine::new(Arc::new(store));
// Query with resolve_aliases: true should still work (graceful degradation)
let query = Query::builder()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.resolve_aliases(true)
.build();
let result = engine.execute(&query).await.expect("execute");
assert_eq!(result.assertions.len(), 1, "Should find exact subject assertion");
assert_eq!(result.assertions[0].subject, "code://rust/myapp/tls");
}
/// Test that alias resolution deduplicates by assertion hash.
#[tokio::test]
async fn test_resolve_aliases_deduplicates() {
let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create a single assertion
let assertion = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.object_text("enabled")
.confidence(0.8)
.lifecycle(LifecycleStage::Approved)
.build();
store_assertion(&store, &assertion).await;
// Set up alias store where both paths lead to the same subject
// (In real use, this wouldn't happen, but tests the dedup logic)
let alias_store = GenericAliasStore::new(store.clone());
// No alias set - both code:// and the query subject are the same
let engine = QueryEngine::new(Arc::new(store.clone()))
.with_alias_store(Arc::new(alias_store) as Arc<dyn AliasStore>);
// Query with resolve_aliases: true
let query = Query::builder().subject("code://rust/myapp/tls").resolve_aliases(true).build();
let result = engine.execute(&query).await.expect("execute");
// Should have exactly 1 result (no duplicates)
assert_eq!(result.assertions.len(), 1);
}
/// Test subject-only query with alias resolution.
#[tokio::test]
async fn test_resolve_aliases_subject_only() {
let store = Arc::new(HybridStore::open_temp().expect("store"));
// Create assertions for two aliased subjects with different predicates
let code_assertion1 = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("cert_verification")
.object_text("enabled")
.confidence(0.8)
.lifecycle(LifecycleStage::Approved)
.build();
let code_assertion2 = AssertionBuilder::new()
.subject("code://rust/myapp/tls")
.predicate("timeout")
.object_text("30s")
.confidence(0.9)
.lifecycle(LifecycleStage::Approved)
.build();
let rfc_assertion = AssertionBuilder::new()
.subject("rfc://5246/tls")
.predicate("cert_verification")
.object_text("required")
.confidence(0.95)
.lifecycle(LifecycleStage::Approved)
.build();
store_assertion(&store, &code_assertion1).await;
store_assertion(&store, &code_assertion2).await;
store_assertion(&store, &rfc_assertion).await;
// Set up alias store
let alias_store = GenericAliasStore::new(store.clone());
let alias = create_alias("code://rust/myapp/tls", "rfc://5246/tls");
alias_store.set_alias(&alias).await.expect("set alias");
let engine = QueryEngine::new(Arc::new(store.clone()))
.with_alias_store(Arc::new(alias_store) as Arc<dyn AliasStore>);
// Query by subject only (no predicate filter) with resolve_aliases: true
let query = Query::builder().subject("code://rust/myapp/tls").resolve_aliases(true).build();
let result = engine.execute(&query).await.expect("execute");
// Should find all 3 assertions (2 from code://, 1 from rfc://)
assert_eq!(result.assertions.len(), 3, "Should find all assertions from aliased subjects");
}
/// Test default query (resolve_aliases not set) behaves like exact match.
#[tokio::test]
async fn test_resolve_aliases_default_is_false() {
let query = Query::builder().subject("test").predicate("pred").build();
assert!(!query.resolve_aliases, "Default should be false");
}
/// Test query builder sets resolve_aliases correctly.
#[tokio::test]
async fn test_query_builder_resolve_aliases() {
let query_true =
Query::builder().subject("test").predicate("pred").resolve_aliases(true).build();
let query_false =
Query::builder().subject("test").predicate("pred").resolve_aliases(false).build();
assert!(query_true.resolve_aliases, "Should be true when set to true");
assert!(!query_false.resolve_aliases, "Should be false when set to false");
}

View File

@ -8,6 +8,7 @@ use stemedb_storage::{key_codec, GenericIndexStore, HybridStore, IndexStore, KVS
use super::QueryEngine; use super::QueryEngine;
mod alias_resolution;
mod basic; mod basic;
mod changelog; mod changelog;
mod conflict_score; mod conflict_score;

View File

@ -248,6 +248,37 @@ impl QueryBuilder {
self self
} }
/// Enable alias resolution for subject queries.
///
/// When enabled, the QueryEngine will resolve the subject to all aliased
/// paths (via `AliasStore.resolve_all()`) and fetch assertions for all
/// of them, deduplicating by hash.
///
/// This enables cross-scheme concept resolution. For example, querying
/// `code://rust/myapp/tls/cert_verification` would also return assertions
/// from `rfc://5246/tls/cert_verification` if they are aliased.
///
/// Requires an `AliasStore` to be configured on the `QueryEngine`.
///
/// # Arguments
///
/// * `enabled` - `true` to expand subject aliases, `false` for exact match
///
/// # Example
/// ```rust
/// use stemedb_query::Query;
///
/// // Find assertions from both code and authoritative sources
/// let query = Query::builder()
/// .subject("code://rust/myapp/tls/cert_verification")
/// .resolve_aliases(true)
/// .build();
/// ```
pub fn resolve_aliases(mut self, enabled: bool) -> Self {
self.query.resolve_aliases = enabled;
self
}
/// Build the query. /// Build the query.
pub fn build(self) -> Query { pub fn build(self) -> Query {
self.query self.query

View File

@ -217,6 +217,35 @@ pub struct Query {
/// .build(); /// .build();
/// ``` /// ```
pub max_conflict_score: Option<f32>, pub max_conflict_score: Option<f32>,
/// Resolve aliases when querying by subject.
///
/// When `true` and `subject` is specified, the QueryEngine will:
/// 1. Call `alias_store.resolve_all(&subject)` to find all related subjects
/// 2. Fetch assertions for ALL resolved subjects
/// 3. Deduplicate results by assertion hash
///
/// This enables cross-scheme concept resolution. For example, querying
/// `code://rust/myapp/tls/cert_verification` with aliases enabled would also
/// return assertions from `rfc://5246/tls/cert_verification` if they are aliased.
///
/// - `false` (default): Query exact subject only (backward-compatible)
/// - `true`: Expand subject to all aliased paths before querying
///
/// **Note**: Requires an `AliasStore` to be configured on the `QueryEngine`.
/// If no alias store is configured, this flag has no effect.
///
/// # Example
/// ```rust
/// use stemedb_query::Query;
///
/// // Find assertions from both code and RFC sources
/// let query = Query::builder()
/// .subject("code://rust/myapp/tls/cert_verification")
/// .resolve_aliases(true)
/// .build();
/// ```
pub resolve_aliases: bool,
} }
impl Query { impl Query {
@ -233,9 +262,13 @@ impl Query {
/// Check if an assertion matches this query's filters. /// Check if an assertion matches this query's filters.
pub fn matches(&self, assertion: &Assertion) -> bool { pub fn matches(&self, assertion: &Assertion) -> bool {
// Check subject filter // Check subject filter
if let Some(ref subject) = self.subject { // Skip subject check when resolve_aliases is true, since the expanded
if &assertion.subject != subject { // subjects (including aliases) were already used to fetch candidates.
return false; if !self.resolve_aliases {
if let Some(ref subject) = self.subject {
if &assertion.subject != subject {
return false;
}
} }
} }

View File

@ -3,7 +3,7 @@
use ed25519_dalek::{Signature, Signer, SigningKey, VerifyingKey}; use ed25519_dalek::{Signature, Signer, SigningKey, VerifyingKey};
use rand::rngs::OsRng; use rand::rngs::OsRng;
use stemedb_core::types::{ use stemedb_core::types::{
Assertion, Hash, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote, Assertion, Hash, HlcTimestamp, LifecycleStage, ObjectValue, SignatureEntry, SourceClass, Vote,
}; };
/// A simulated agent with a cryptographic identity. /// A simulated agent with a cryptographic identity.
@ -66,6 +66,7 @@ impl Agent {
}], }],
confidence: 1.0, confidence: 1.0,
timestamp, timestamp,
hlc_timestamp: HlcTimestamp::default(),
vector: None, vector: None,
} }
} }

View File

@ -32,6 +32,10 @@ memmap2 = "0.9"
crc32c = "0.6" crc32c = "0.6"
# Byte order encoding for checkpoint format # Byte order encoding for checkpoint format
byteorder = "1.5" byteorder = "1.5"
# Graph data structures for EigenTrust trust graph
petgraph = "0.6"
# Linear algebra for EigenTrust power iteration
nalgebra = "0.33"
[dev-dependencies] [dev-dependencies]
tokio = { version = "1", features = ["macros", "rt", "rt-multi-thread"] } tokio = { version = "1", features = ["macros", "rt", "rt-multi-thread"] }

View File

@ -0,0 +1,193 @@
//! Admission control storage for graduated PoW and trust tiers.
//!
//! The AdmissionStore provides spam protection for Episteme by requiring new or
//! untrusted agents to solve proof-of-work puzzles before their assertions are accepted.
//! As agents demonstrate good behavior (accurate assertions, verified against gold standards),
//! they graduate to higher trust tiers with reduced PoW requirements and increased quotas.
//!
//! # Key Design Decision: Reuse TrustRank Data
//!
//! The AdmissionStore wraps `TrustRankStore` rather than creating new storage keys.
//! The existing `TrustRank` struct already has:
//! - `score: f32` -> maps to TrustTier for quota multipliers
//! - `assertions_count: u64` -> used for PoW graduation thresholds
//!
//! This means no schema migration is needed and admission control is automatically
//! integrated with the existing reputation system.
//!
//! # Graduated PoW Difficulty
//!
//! | Assertions | Trust Score | Difficulty | Effort |
//! |------------|-------------|------------|--------|
//! | 0-9 | < 0.6 | 16 bits | ~16 sec |
//! | 10-49 | < 0.6 | 1 bit | trivial |
//! | 50+ | any | 0 bits | exempt |
//! | any | >= 0.6 | 0 bits | exempt |
mod model;
mod store_impl;
pub use model::{AdmissionStatus, AdmissionStatusResult};
pub use store_impl::GenericAdmissionStore;
use crate::error::Result;
use async_trait::async_trait;
use stemedb_core::types::{AdmissionConfig, PowError, PowProof};
/// Specialized storage trait for admission control operations.
///
/// This trait provides PoW-based spam protection layered on top of the existing
/// TrustRankStore. It computes admission status, verifies proofs, and records
/// assertion activity for graduation tracking.
///
/// # Example
///
/// ```ignore
/// let admission_store = GenericAdmissionStore::new(trust_rank_store, config);
///
/// // Get admission status for an agent
/// let status = admission_store.get_admission_status(&agent_id).await?;
///
/// if status.pow_required {
/// // Agent must submit a valid PoW proof
/// admission_store.verify_pow(&proof, server_time).await?;
/// }
///
/// // After successful assertion, record it
/// admission_store.record_assertion(&agent_id, timestamp).await?;
/// ```
#[async_trait]
pub trait AdmissionStore: Send + Sync {
/// Get the current admission status for an agent.
///
/// This returns the agent's trust tier, PoW requirements, and quota multipliers
/// based on their trust score and assertion count.
///
/// # Arguments
/// * `agent_id` - The agent's Ed25519 public key
///
/// # Returns
/// The agent's current admission status.
async fn get_admission_status(&self, agent_id: &[u8; 32]) -> Result<AdmissionStatus>;
/// Verify a proof-of-work solution.
///
/// This validates that:
/// 1. The proof timestamp is within the allowed window
/// 2. The hash has sufficient leading zeros for the required difficulty
///
/// Note: The caller should verify the agent_id in the proof matches the request.
///
/// # Arguments
/// * `proof` - The PoW solution submitted by the agent
/// * `required_difficulty` - Number of leading zero bits required
/// * `server_time` - Current server timestamp for validation
///
/// # Returns
/// `Ok(())` if the proof is valid, or `PowError` describing the failure.
async fn verify_pow(
&self,
proof: &PowProof,
required_difficulty: u8,
server_time: u64,
) -> Result<std::result::Result<(), PowError>>;
/// Compute the required PoW difficulty for an agent.
///
/// This considers both assertion count and trust score:
/// - First 10 assertions with trust < 0.6: 16 bits
/// - Assertions 10-49 with trust < 0.6: 1 bit
/// - 50+ assertions OR trust >= 0.6: 0 bits (exempt)
///
/// # Arguments
/// * `agent_id` - The agent's Ed25519 public key
///
/// # Returns
/// Required difficulty in bits (0 = exempt).
async fn compute_difficulty(&self, agent_id: &[u8; 32]) -> Result<u8>;
/// Record a successful assertion for graduation tracking.
///
/// Increments the agent's assertion count in TrustRank. This is called
/// after an assertion passes validation and is stored.
///
/// # Arguments
/// * `agent_id` - The agent's Ed25519 public key
/// * `timestamp` - Unix timestamp of the assertion
///
/// # Returns
/// The new assertion count.
async fn record_assertion(&self, agent_id: &[u8; 32], timestamp: u64) -> Result<u64>;
/// Get the admission configuration.
fn config(&self) -> &AdmissionConfig;
}
/// Extension trait for checking admission in a single call.
///
/// This is a convenience trait that combines status check and PoW verification.
#[async_trait]
pub trait AdmissionCheck: AdmissionStore {
/// Check if a request should be admitted, optionally with a PoW proof.
///
/// This is the primary entry point for the admission middleware:
/// 1. Get admission status
/// 2. If PoW required and proof provided, verify it
/// 3. If PoW required and no proof, return POW_REQUIRED status
/// 4. Return success if admitted
///
/// # Arguments
/// * `agent_id` - The agent's Ed25519 public key
/// * `proof` - Optional PoW proof (from X-PoW-Nonce/X-PoW-Timestamp headers)
/// * `server_time` - Current server timestamp
///
/// # Returns
/// The result of the admission check with full status details.
async fn check_admission(
&self,
agent_id: &[u8; 32],
proof: Option<&PowProof>,
server_time: u64,
) -> Result<AdmissionStatusResult>;
}
// Blanket implementation of AdmissionCheck for all AdmissionStore implementors
#[async_trait]
impl<T: AdmissionStore + Sync> AdmissionCheck for T {
async fn check_admission(
&self,
agent_id: &[u8; 32],
proof: Option<&PowProof>,
server_time: u64,
) -> Result<AdmissionStatusResult> {
let status = self.get_admission_status(agent_id).await?;
if !status.pow_required {
// No PoW needed, admit immediately
return Ok(AdmissionStatusResult::Admitted(status));
}
// PoW is required
match proof {
Some(p) => {
// Verify agent_id matches
if p.agent_id != *agent_id {
return Ok(AdmissionStatusResult::PowFailed {
status,
error: PowError::AgentIdMismatch,
});
}
// Verify the proof
match self.verify_pow(p, status.pow_difficulty, server_time).await? {
Ok(()) => Ok(AdmissionStatusResult::Admitted(status)),
Err(e) => Ok(AdmissionStatusResult::PowFailed { status, error: e }),
}
}
None => {
// No proof provided but required
Ok(AdmissionStatusResult::PowRequired(status))
}
}
}
}

View File

@ -0,0 +1,229 @@
//! Admission control data models.
//!
//! These types represent the admission status of an agent and the result of
//! admission checks. They are designed to be easily serialized for API responses.
use stemedb_core::types::{PowError, TrustTier, BASE_QUOTA_LIMIT};
/// Current admission status for an agent.
///
/// This snapshot represents the agent's standing in the admission control system
/// at a specific point in time. It includes all information needed by clients
/// to understand their quotas and PoW requirements.
#[derive(Debug, Clone, PartialEq)]
pub struct AdmissionStatus {
/// The agent's trust tier based on their reputation score.
pub tier: TrustTier,
/// The agent's current trust score (0.0 to 1.0).
pub trust_score: f32,
/// Total number of assertions made by this agent.
pub assertions_count: u64,
/// Required PoW difficulty in bits (0 = exempt).
pub pow_difficulty: u8,
/// Whether PoW is required for this agent's next submission.
pub pow_required: bool,
/// Base quota limit (before tier multiplier).
pub base_quota_limit: u64,
/// Effective quota limit after tier multiplier.
pub effective_quota_limit: u64,
/// Quota multiplier for this tier.
pub quota_multiplier: f32,
}
impl AdmissionStatus {
/// Create a new admission status from trust rank data.
///
/// # Arguments
/// * `trust_score` - Agent's trust score (0.0-1.0)
/// * `assertions_count` - Number of assertions made
/// * `pow_difficulty` - Required PoW difficulty in bits
pub fn new(trust_score: f32, assertions_count: u64, pow_difficulty: u8) -> Self {
let tier = TrustTier::from_score(trust_score);
let pow_required = pow_difficulty > 0;
let quota_multiplier = tier.quota_multiplier();
let base_quota_limit = BASE_QUOTA_LIMIT;
let effective_quota_limit = tier.effective_quota_limit();
Self {
tier,
trust_score,
assertions_count,
pow_difficulty,
pow_required,
base_quota_limit,
effective_quota_limit,
quota_multiplier,
}
}
/// Create a status for a new/unknown agent with default values.
///
/// New agents start at:
/// - Trust score: 0.5 (Verified tier)
/// - Assertions: 0
/// - PoW difficulty: 16 bits (first 10 assertions)
pub fn new_agent(initial_difficulty: u8) -> Self {
Self::new(0.5, 0, initial_difficulty)
}
}
/// Result of an admission check.
///
/// This enum represents the three possible outcomes when checking admission:
/// 1. Admitted - The agent can proceed (no PoW required, or valid PoW provided)
/// 2. PowRequired - The agent must provide a PoW proof (HTTP 428)
/// 3. PowFailed - The agent provided an invalid PoW proof (HTTP 428 with error)
#[derive(Debug, Clone)]
pub enum AdmissionStatusResult {
/// Agent is admitted, can proceed with the request.
Admitted(AdmissionStatus),
/// Agent must provide proof-of-work to proceed.
/// The status contains the required difficulty.
PowRequired(AdmissionStatus),
/// Agent provided an invalid proof-of-work.
/// The status contains the required difficulty for retry.
PowFailed {
/// The agent's current status (for building retry response).
status: AdmissionStatus,
/// The specific error that caused verification to fail.
error: PowError,
},
}
impl AdmissionStatusResult {
/// Check if the agent is admitted.
#[must_use]
pub fn is_admitted(&self) -> bool {
matches!(self, AdmissionStatusResult::Admitted(_))
}
/// Check if proof-of-work is required.
#[must_use]
pub fn requires_pow(&self) -> bool {
matches!(self, AdmissionStatusResult::PowRequired(_))
}
/// Check if the proof-of-work verification failed.
#[must_use]
pub fn pow_failed(&self) -> bool {
matches!(self, AdmissionStatusResult::PowFailed { .. })
}
/// Get the admission status regardless of outcome.
#[must_use]
pub fn status(&self) -> &AdmissionStatus {
match self {
AdmissionStatusResult::Admitted(s) => s,
AdmissionStatusResult::PowRequired(s) => s,
AdmissionStatusResult::PowFailed { status, .. } => status,
}
}
/// Get the PoW error if verification failed.
#[must_use]
pub fn pow_error(&self) -> Option<&PowError> {
match self {
AdmissionStatusResult::PowFailed { error, .. } => Some(error),
_ => None,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_admission_status_new() {
let status = AdmissionStatus::new(0.5, 10, 1);
assert_eq!(status.tier, TrustTier::Verified);
assert!((status.trust_score - 0.5).abs() < f32::EPSILON);
assert_eq!(status.assertions_count, 10);
assert_eq!(status.pow_difficulty, 1);
assert!(status.pow_required);
assert_eq!(status.base_quota_limit, 10_000);
assert_eq!(status.effective_quota_limit, 10_000);
assert!((status.quota_multiplier - 1.0).abs() < f32::EPSILON);
}
#[test]
fn test_admission_status_new_agent() {
let status = AdmissionStatus::new_agent(16);
assert_eq!(status.tier, TrustTier::Verified);
assert!((status.trust_score - 0.5).abs() < f32::EPSILON);
assert_eq!(status.assertions_count, 0);
assert_eq!(status.pow_difficulty, 16);
assert!(status.pow_required);
}
#[test]
fn test_admission_status_no_pow_required() {
let status = AdmissionStatus::new(0.7, 100, 0);
assert_eq!(status.tier, TrustTier::Trusted);
assert!(!status.pow_required);
assert_eq!(status.pow_difficulty, 0);
assert_eq!(status.effective_quota_limit, 20_000);
}
#[test]
fn test_admission_status_result_is_admitted() {
let status = AdmissionStatus::new(0.5, 50, 0);
let result = AdmissionStatusResult::Admitted(status);
assert!(result.is_admitted());
assert!(!result.requires_pow());
assert!(!result.pow_failed());
}
#[test]
fn test_admission_status_result_pow_required() {
let status = AdmissionStatus::new(0.3, 5, 16);
let result = AdmissionStatusResult::PowRequired(status);
assert!(!result.is_admitted());
assert!(result.requires_pow());
assert!(!result.pow_failed());
}
#[test]
fn test_admission_status_result_pow_failed() {
let status = AdmissionStatus::new(0.3, 5, 16);
let error = PowError::InsufficientDifficulty { required: 16, found: 8 };
let result = AdmissionStatusResult::PowFailed { status, error };
assert!(!result.is_admitted());
assert!(!result.requires_pow());
assert!(result.pow_failed());
assert!(matches!(
result.pow_error(),
Some(PowError::InsufficientDifficulty { required: 16, found: 8 })
));
}
#[test]
fn test_tier_quota_calculation() {
// Untrusted: 0.1x = 1,000
let status = AdmissionStatus::new(0.1, 0, 16);
assert_eq!(status.effective_quota_limit, 1_000);
// Limited: 0.5x = 5,000
let status = AdmissionStatus::new(0.4, 0, 16);
assert_eq!(status.effective_quota_limit, 5_000);
// Authority: 10.0x = 100,000
let status = AdmissionStatus::new(0.95, 0, 0);
assert_eq!(status.effective_quota_limit, 100_000);
}
}

View File

@ -0,0 +1,381 @@
//! AdmissionStore implementation backed by TrustRankStore.
//!
//! This module provides the concrete implementation of AdmissionStore operations.
//! It wraps the TrustRankStore to leverage existing trust score and assertion count
//! data for admission control decisions.
use crate::error::Result;
use crate::trust_rank_store::TrustRankStore;
use async_trait::async_trait;
use stemedb_core::types::{AdmissionConfig, PowError, PowProof};
use tracing::{debug, instrument};
use super::model::AdmissionStatus;
use super::AdmissionStore;
/// AdmissionStore implementation backed by TrustRankStore.
///
/// This implementation wraps an existing TrustRankStore and computes admission
/// decisions based on the agent's trust score and assertion count.
///
/// # Design Decision
///
/// Rather than creating new storage keys, this implementation reuses the existing
/// TrustRank data structure which already tracks:
/// - `score: f32` - Maps to TrustTier
/// - `assertions_count: u64` - Used for PoW graduation
///
/// This means admission control is automatically integrated with the reputation
/// system and no schema migration is needed.
#[derive(Clone)]
pub struct GenericAdmissionStore<T> {
trust_store: T,
config: AdmissionConfig,
}
impl<T: TrustRankStore> GenericAdmissionStore<T> {
/// Create a new AdmissionStore backed by the given TrustRankStore.
///
/// Uses default admission configuration.
pub fn new(trust_store: T) -> Self {
Self { trust_store, config: AdmissionConfig::default() }
}
/// Create a new AdmissionStore with custom configuration.
pub fn with_config(trust_store: T, config: AdmissionConfig) -> Self {
Self { trust_store, config }
}
}
#[async_trait]
impl<T: TrustRankStore + 'static> AdmissionStore for GenericAdmissionStore<T> {
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
async fn get_admission_status(&self, agent_id: &[u8; 32]) -> Result<AdmissionStatus> {
let trust_rank = self.trust_store.get_trust_rank(agent_id).await?;
let difficulty =
self.config.compute_difficulty(trust_rank.assertions_count, trust_rank.score);
let status =
AdmissionStatus::new(trust_rank.score, trust_rank.assertions_count, difficulty);
debug!(
tier = %status.tier,
trust_score = status.trust_score,
assertions_count = status.assertions_count,
pow_difficulty = status.pow_difficulty,
pow_required = status.pow_required,
"Retrieved admission status"
);
Ok(status)
}
#[instrument(skip(self, proof), fields(
nonce = proof.nonce,
timestamp = proof.timestamp,
required_difficulty
))]
async fn verify_pow(
&self,
proof: &PowProof,
required_difficulty: u8,
server_time: u64,
) -> Result<std::result::Result<(), PowError>> {
// If difficulty is 0, no verification needed
if required_difficulty == 0 {
debug!("PoW exempt (difficulty 0)");
return Ok(Ok(()));
}
// Verify the proof
let result = proof.verify(required_difficulty, self.config.pow_max_age, server_time);
match &result {
Ok(()) => {
debug!("PoW verified successfully");
}
Err(e) => {
debug!(error = %e, "PoW verification failed");
}
}
Ok(result)
}
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id)))]
async fn compute_difficulty(&self, agent_id: &[u8; 32]) -> Result<u8> {
let trust_rank = self.trust_store.get_trust_rank(agent_id).await?;
let difficulty =
self.config.compute_difficulty(trust_rank.assertions_count, trust_rank.score);
debug!(
assertions_count = trust_rank.assertions_count,
trust_score = trust_rank.score,
difficulty,
"Computed PoW difficulty"
);
Ok(difficulty)
}
#[instrument(skip(self), fields(agent_id = %hex::encode(agent_id), timestamp))]
async fn record_assertion(&self, agent_id: &[u8; 32], timestamp: u64) -> Result<u64> {
// Get current trust rank
let mut trust_rank = self.trust_store.get_trust_rank(agent_id).await?;
// Increment assertion count
let old_count = trust_rank.assertions_count;
trust_rank.assertions_count = trust_rank.assertions_count.saturating_add(1);
trust_rank.last_updated = timestamp;
// Store updated trust rank
self.trust_store.put_trust_rank(&trust_rank).await?;
debug!(
old_count,
new_count = trust_rank.assertions_count,
"Recorded assertion for admission tracking"
);
Ok(trust_rank.assertions_count)
}
fn config(&self) -> &AdmissionConfig {
&self.config
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::trust_rank_store::{TrustAdjustment, TrustRank};
use std::collections::HashMap;
use std::sync::Mutex;
/// Mock TrustRankStore for testing.
struct MockTrustRankStore {
ranks: Mutex<HashMap<[u8; 32], TrustRank>>,
}
impl MockTrustRankStore {
fn new() -> Self {
Self { ranks: Mutex::new(HashMap::new()) }
}
fn set_rank(&self, rank: TrustRank) {
self.ranks.lock().expect("lock").insert(rank.agent_id, rank);
}
}
#[async_trait]
impl TrustRankStore for MockTrustRankStore {
async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> {
let ranks = self.ranks.lock().expect("lock");
Ok(ranks.get(agent_id).cloned().unwrap_or_else(|| TrustRank::new(*agent_id, 0)))
}
async fn update_trust_rank(
&self,
_agent_id: &[u8; 32],
_delta: f32,
_timestamp: u64,
) -> Result<f32> {
unimplemented!("not needed for admission tests")
}
async fn decay_trust_ranks(
&self,
_current_timestamp: u64,
_half_life_seconds: Option<u64>,
) -> Result<usize> {
unimplemented!("not needed for admission tests")
}
async fn record_outcome(
&self,
_agent_id: &[u8; 32],
_was_accurate: bool,
_timestamp: u64,
) -> Result<f32> {
unimplemented!("not needed for admission tests")
}
async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> {
let mut ranks = self.ranks.lock().expect("lock");
ranks.insert(trust_rank.agent_id, trust_rank.clone());
Ok(())
}
async fn verify_agent_against_gold_standard(
&self,
_agent_id: &[u8; 32],
_agent_object: &str,
_gold_standard: &stemedb_core::types::GoldStandard,
_timestamp: u64,
) -> Result<TrustAdjustment> {
unimplemented!("not needed for admission tests")
}
}
#[tokio::test]
async fn test_new_agent_requires_pow() {
let trust_store = MockTrustRankStore::new();
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [1u8; 32];
let status = admission_store.get_admission_status(&agent_id).await.expect("get status");
// New agent: default trust 0.5, 0 assertions
assert!((status.trust_score - 0.5).abs() < f32::EPSILON);
assert_eq!(status.assertions_count, 0);
// Should require PoW with initial difficulty
assert!(status.pow_required);
assert_eq!(status.pow_difficulty, 16);
}
#[tokio::test]
async fn test_graduated_agent_no_pow() {
let trust_store = MockTrustRankStore::new();
// Set up an agent with 50+ assertions
let mut rank = TrustRank::new([2u8; 32], 0);
rank.assertions_count = 50;
rank.score = 0.4; // Low trust but high assertion count
trust_store.set_rank(rank);
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [2u8; 32];
let status = admission_store.get_admission_status(&agent_id).await.expect("get status");
// 50+ assertions = exempt
assert!(!status.pow_required);
assert_eq!(status.pow_difficulty, 0);
}
#[tokio::test]
async fn test_trusted_agent_no_pow() {
let trust_store = MockTrustRankStore::new();
// Set up an agent with high trust
let mut rank = TrustRank::new([3u8; 32], 0);
rank.score = 0.7; // High trust
rank.assertions_count = 5; // Few assertions
trust_store.set_rank(rank);
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [3u8; 32];
let status = admission_store.get_admission_status(&agent_id).await.expect("get status");
// High trust = exempt (trust_exemption_score is 0.6 by default)
assert!(!status.pow_required);
assert_eq!(status.pow_difficulty, 0);
}
#[tokio::test]
async fn test_reduced_difficulty_after_10_assertions() {
let trust_store = MockTrustRankStore::new();
// Set up an agent with 10-49 assertions
let mut rank = TrustRank::new([4u8; 32], 0);
rank.assertions_count = 15;
rank.score = 0.4; // Low trust
trust_store.set_rank(rank);
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [4u8; 32];
let status = admission_store.get_admission_status(&agent_id).await.expect("get status");
// 10-49 assertions = reduced difficulty (1 bit)
assert!(status.pow_required);
assert_eq!(status.pow_difficulty, 1);
}
#[tokio::test]
async fn test_verify_pow_exempt() {
let trust_store = MockTrustRankStore::new();
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [5u8; 32];
let proof = PowProof::new(0, agent_id, 1700000000);
// Difficulty 0 = always passes - just verify it doesn't error
admission_store.verify_pow(&proof, 0, 1700000000).await.expect("verify").expect("ok");
}
#[tokio::test]
async fn test_verify_pow_valid() {
let trust_store = MockTrustRankStore::new();
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [6u8; 32];
let timestamp = 1700000000u64;
// Solve a real puzzle (low difficulty for fast test)
let proof = PowProof::solve(agent_id, timestamp, 4);
// Should verify
let result = admission_store.verify_pow(&proof, 4, timestamp).await.expect("verify");
assert!(result.is_ok());
}
#[tokio::test]
async fn test_verify_pow_expired() {
let trust_store = MockTrustRankStore::new();
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [7u8; 32];
let old_timestamp = 1000u64;
let server_time = 2000u64; // 1000 seconds later
let proof = PowProof::new(0, agent_id, old_timestamp);
// Should fail due to expired timestamp
let result = admission_store.verify_pow(&proof, 1, server_time).await.expect("verify");
assert!(matches!(result, Err(PowError::TimestampExpired { .. })));
}
#[tokio::test]
async fn test_record_assertion() {
let trust_store = MockTrustRankStore::new();
let admission_store = GenericAdmissionStore::new(trust_store);
let agent_id = [8u8; 32];
let timestamp = 1700000000u64;
// Record first assertion
let count = admission_store.record_assertion(&agent_id, timestamp).await.expect("record");
assert_eq!(count, 1);
// Record second assertion
let count =
admission_store.record_assertion(&agent_id, timestamp + 1).await.expect("record");
assert_eq!(count, 2);
// Check status
let status = admission_store.get_admission_status(&agent_id).await.expect("get status");
assert_eq!(status.assertions_count, 2);
}
#[tokio::test]
async fn test_tier_affects_quota() {
let trust_store = MockTrustRankStore::new();
// Authority tier agent
let mut rank = TrustRank::new([9u8; 32], 0);
rank.score = 0.95;
trust_store.set_rank(rank);
let admission_store = GenericAdmissionStore::new(trust_store);
let status = admission_store.get_admission_status(&[9u8; 32]).await.expect("get status");
assert_eq!(status.tier, stemedb_core::types::TrustTier::Authority);
assert_eq!(status.effective_quota_limit, 100_000);
assert!((status.quota_multiplier - 10.0).abs() < f32::EPSILON);
}
}

View File

@ -0,0 +1,157 @@
//! Specialized storage for domain-specific trust tracking.
//!
//! The DomainTrustStore tracks per-agent expertise within specific domains.
//! This enables fine-grained trust: an agent can be highly trusted in medicine
//! but untrusted in finance.
//!
//! # Storage Layout
//!
//! | Key Pattern | Value | Purpose |
//! |-------------|-------|---------|
//! | `\x00DT:{agent}:{domain}` | Serialized DomainTrust | Domain-specific trust |
//!
//! # Use Case
//!
//! Combined with EigenTrust, domain trust enables weighted resolution:
//! ```text
//! effective_weight = confidence × eigentrust_score × domain_factor
//! ```
//!
//! This ensures that an agent's assertions are weighted by both their
//! global reputation AND their domain-specific expertise.
mod model;
mod store_impl;
pub use model::*;
pub use store_impl::*;
use crate::error::Result;
use async_trait::async_trait;
use std::sync::Arc;
/// Specialized storage trait for DomainTrust operations.
///
/// This trait provides domain-specific trust tracking for agents,
/// enabling expertise-weighted assertion resolution.
///
/// # Example
///
/// ```ignore
/// let domain_store = GenericDomainTrustStore::new(kv_store);
///
/// // Record domain-specific outcome
/// let score = domain_store.record_domain_outcome(
/// &agent_id,
/// "treats_condition", // Domain auto-extracted
/// true, // Was accurate
/// timestamp,
/// ).await?;
///
/// // Get effective trust (eigentrust × domain factor)
/// let effective = domain_store.get_effective_trust(
/// &agent_id,
/// "treats_condition",
/// eigentrust_score,
/// ).await?;
/// ```
#[async_trait]
pub trait DomainTrustStore: Send + Sync {
/// Get the domain trust for an agent in a specific domain.
///
/// Returns a default DomainTrust (score 0.5) if not found.
///
/// # Arguments
/// * `agent` - Agent's Ed25519 public key
/// * `domain` - Domain name (e.g., "medicine", "finance")
async fn get_domain_trust(&self, agent: &[u8; 32], domain: &str) -> Result<DomainTrust>;
/// Get all domain trust entries for an agent.
///
/// Returns all domains the agent has records in.
async fn get_all_domains_for_agent(&self, agent: &[u8; 32]) -> Result<Vec<DomainTrust>>;
/// Record a domain-specific outcome.
///
/// This method:
/// 1. Extracts the domain from the predicate
/// 2. Loads or creates the DomainTrust for (agent, domain)
/// 3. Updates accuracy tracking
/// 4. Adjusts the domain score
/// 5. Stores the updated DomainTrust
///
/// # Arguments
/// * `agent` - Agent's Ed25519 public key
/// * `predicate` - The assertion predicate (domain auto-extracted)
/// * `was_accurate` - Whether the assertion was correct
/// * `timestamp` - Unix timestamp
///
/// # Returns
/// The new domain score after update
async fn record_domain_outcome(
&self,
agent: &[u8; 32],
predicate: &str,
was_accurate: bool,
timestamp: u64,
) -> Result<f32>;
/// Store a DomainTrust directly (for testing or batch operations).
async fn put_domain_trust(&self, dt: &DomainTrust) -> Result<()>;
/// Calculate effective trust for an assertion.
///
/// Combines the global EigenTrust score with domain-specific expertise:
/// ```text
/// effective_trust = eigentrust_score × domain_factor(domain_score)
/// ```
///
/// # Arguments
/// * `agent` - Agent's Ed25519 public key
/// * `predicate` - The assertion predicate (domain auto-extracted)
/// * `eigentrust_score` - The agent's global EigenTrust score
///
/// # Returns
/// The effective trust score for this agent in this domain
async fn get_effective_trust(
&self,
agent: &[u8; 32],
predicate: &str,
eigentrust_score: f32,
) -> Result<f32>;
}
// Blanket implementation for Arc<T> where T: DomainTrustStore
#[async_trait]
impl<T: DomainTrustStore + ?Sized> DomainTrustStore for Arc<T> {
async fn get_domain_trust(&self, agent: &[u8; 32], domain: &str) -> Result<DomainTrust> {
(**self).get_domain_trust(agent, domain).await
}
async fn get_all_domains_for_agent(&self, agent: &[u8; 32]) -> Result<Vec<DomainTrust>> {
(**self).get_all_domains_for_agent(agent).await
}
async fn record_domain_outcome(
&self,
agent: &[u8; 32],
predicate: &str,
was_accurate: bool,
timestamp: u64,
) -> Result<f32> {
(**self).record_domain_outcome(agent, predicate, was_accurate, timestamp).await
}
async fn put_domain_trust(&self, dt: &DomainTrust) -> Result<()> {
(**self).put_domain_trust(dt).await
}
async fn get_effective_trust(
&self,
agent: &[u8; 32],
predicate: &str,
eigentrust_score: f32,
) -> Result<f32> {
(**self).get_effective_trust(agent, predicate, eigentrust_score).await
}
}

View File

@ -0,0 +1,286 @@
//! DomainTrustStore data models for domain-specific trust tracking.
//!
//! This module defines the core data structures for per-domain expertise:
//! - `DomainTrust`: An agent's trust score within a specific domain
//! - Domain extraction from predicates
/// Default domain trust score for new agent-domain pairs.
pub const DEFAULT_DOMAIN_SCORE: f32 = 0.5;
/// Domain trust for an agent within a specific domain.
///
/// Tracks an agent's expertise and accuracy within a domain (e.g., "medicine", "finance").
/// This allows fine-grained trust: an agent can be highly trusted in medicine
/// but untrusted in finance.
///
/// # Invariants
///
/// - `score` is in range [0.0, 1.0]
/// - `assertions_count >= accuracy_count`
#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize, Debug, Clone, PartialEq)]
#[archive(check_bytes)]
pub struct DomainTrust {
/// Agent's Ed25519 public key.
pub agent_id: [u8; 32],
/// Domain name (e.g., "medicine", "finance", "general").
pub domain: String,
/// Domain-specific trust score (0.0 to 1.0).
pub score: f32,
/// Total assertions made in this domain.
pub assertions_count: u64,
/// Assertions deemed accurate in this domain.
pub accuracy_count: u64,
/// Unix timestamp of last update.
pub last_updated: u64,
}
impl DomainTrust {
/// Create a new domain trust entry with default score.
pub fn new(agent_id: [u8; 32], domain: String, timestamp: u64) -> Self {
Self {
agent_id,
domain,
score: DEFAULT_DOMAIN_SCORE,
assertions_count: 0,
accuracy_count: 0,
last_updated: timestamp,
}
}
/// Record an outcome for this domain.
///
/// Updates the accuracy tracking and adjusts the score based on outcome.
///
/// # Arguments
/// * `was_accurate` - Whether the assertion was correct
/// * `timestamp` - Unix timestamp of this outcome
///
/// # Returns
/// The new score after recording the outcome
pub fn record_outcome(&mut self, was_accurate: bool, timestamp: u64) -> f32 {
self.assertions_count = self.assertions_count.saturating_add(1);
if was_accurate {
self.accuracy_count = self.accuracy_count.saturating_add(1);
}
self.last_updated = timestamp;
// Adjust score based on accuracy
// Accurate: +0.03, Inaccurate: -0.05 (penalty is higher)
let delta = if was_accurate { 0.03 } else { -0.05 };
self.score = (self.score + delta).clamp(0.0, 1.0);
self.score
}
/// Calculate the agent's accuracy rate in this domain.
///
/// Returns 0.0 if no assertions have been made.
pub fn accuracy_rate(&self) -> f32 {
if self.assertions_count == 0 {
return 0.0;
}
self.accuracy_count as f32 / self.assertions_count as f32
}
}
/// Predicate-to-domain mapping rules.
///
/// This is a curated list of predicate patterns and their domains.
/// Used by `extract_domain()` to categorize assertions.
static DOMAIN_MAPPINGS: &[(&str, &str)] = &[
// Medicine / Health
("treats", "medicine"),
("treats_condition", "medicine"),
("has_side_effect", "medicine"),
("contraindicated", "medicine"),
("dosage", "medicine"),
("symptoms", "medicine"),
("diagnoses", "medicine"),
("prescribed_for", "medicine"),
("drug_interaction", "medicine"),
("clinical_trial", "medicine"),
// Finance
("has_revenue", "finance"),
("market_cap", "finance"),
("stock_price", "finance"),
("earnings", "finance"),
("profit_margin", "finance"),
("debt_ratio", "finance"),
("dividend_yield", "finance"),
("pe_ratio", "finance"),
// Technology
("implements", "technology"),
("uses_framework", "technology"),
("depends_on", "technology"),
("version", "technology"),
("api_endpoint", "technology"),
("deprecates", "technology"),
// Science
("atomic_weight", "science"),
("chemical_formula", "science"),
("discovered_by", "science"),
("speed_of", "science"),
("temperature", "science"),
("pressure", "science"),
// Geography
("located_in", "geography"),
("capital_of", "geography"),
("population", "geography"),
("coordinates", "geography"),
("borders", "geography"),
("area", "geography"),
// Legal
("enacted_by", "legal"),
("effective_date", "legal"),
("jurisdiction", "legal"),
("supersedes", "legal"),
("penalty", "legal"),
// General (catch-all patterns)
("has_name", "general"),
("has_type", "general"),
("is_a", "general"),
("part_of", "general"),
];
/// Extract the domain from a predicate string.
///
/// Uses a curated mapping of predicate patterns to domains.
/// Falls back to "general" if no specific mapping is found.
///
/// # Examples
///
/// ```ignore
/// assert_eq!(extract_domain("treats_condition"), "medicine");
/// assert_eq!(extract_domain("has_revenue"), "finance");
/// assert_eq!(extract_domain("unknown_predicate"), "general");
/// ```
pub fn extract_domain(predicate: &str) -> String {
let predicate_lower = predicate.to_lowercase();
// Check exact matches first
for (pattern, domain) in DOMAIN_MAPPINGS {
if predicate_lower == *pattern {
return (*domain).to_string();
}
}
// Check prefix matches (e.g., "treats_xyz" → "medicine")
for (pattern, domain) in DOMAIN_MAPPINGS {
if predicate_lower.starts_with(pattern) {
return (*domain).to_string();
}
}
// Check contains matches (e.g., "xyz_treats_abc" → "medicine")
for (pattern, domain) in DOMAIN_MAPPINGS {
if predicate_lower.contains(pattern) {
return (*domain).to_string();
}
}
// Default to general domain
"general".to_string()
}
/// Calculate the domain factor for weighting assertions.
///
/// Returns a multiplier based on the agent's domain trust score.
/// This is used to scale the global EigenTrust score by domain expertise.
///
/// # Formula
///
/// `factor = 0.5 + (domain_score * 0.5)`
///
/// - Score 0.0 → Factor 0.5 (halved weight)
/// - Score 0.5 → Factor 0.75 (default, slight reduction)
/// - Score 1.0 → Factor 1.0 (full weight)
pub fn domain_factor(domain_score: f32) -> f32 {
0.5 + (domain_score.clamp(0.0, 1.0) * 0.5)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_domain_trust_new() {
let dt = DomainTrust::new([1u8; 32], "medicine".to_string(), 1000);
assert!((dt.score - 0.5).abs() < f32::EPSILON);
assert_eq!(dt.assertions_count, 0);
assert_eq!(dt.accuracy_count, 0);
}
#[test]
fn test_domain_trust_record_outcome_accurate() {
let mut dt = DomainTrust::new([1u8; 32], "medicine".to_string(), 1000);
let new_score = dt.record_outcome(true, 2000);
assert!((new_score - 0.53).abs() < 0.01);
assert_eq!(dt.assertions_count, 1);
assert_eq!(dt.accuracy_count, 1);
assert_eq!(dt.last_updated, 2000);
}
#[test]
fn test_domain_trust_record_outcome_inaccurate() {
let mut dt = DomainTrust::new([1u8; 32], "medicine".to_string(), 1000);
let new_score = dt.record_outcome(false, 2000);
assert!((new_score - 0.45).abs() < 0.01);
assert_eq!(dt.assertions_count, 1);
assert_eq!(dt.accuracy_count, 0);
}
#[test]
fn test_domain_trust_accuracy_rate() {
let mut dt = DomainTrust::new([1u8; 32], "medicine".to_string(), 1000);
// No assertions yet
assert!((dt.accuracy_rate() - 0.0).abs() < f32::EPSILON);
// 3 accurate, 1 inaccurate = 75% accuracy
dt.record_outcome(true, 1001);
dt.record_outcome(true, 1002);
dt.record_outcome(true, 1003);
dt.record_outcome(false, 1004);
assert!((dt.accuracy_rate() - 0.75).abs() < 0.01);
}
#[test]
fn test_extract_domain_exact_match() {
assert_eq!(extract_domain("treats_condition"), "medicine");
assert_eq!(extract_domain("has_revenue"), "finance");
assert_eq!(extract_domain("implements"), "technology");
assert_eq!(extract_domain("located_in"), "geography");
}
#[test]
fn test_extract_domain_prefix_match() {
assert_eq!(extract_domain("treats_xyz"), "medicine");
assert_eq!(extract_domain("stock_price_daily"), "finance");
}
#[test]
fn test_extract_domain_case_insensitive() {
assert_eq!(extract_domain("TREATS_CONDITION"), "medicine");
assert_eq!(extract_domain("Has_Revenue"), "finance");
}
#[test]
fn test_extract_domain_default_general() {
assert_eq!(extract_domain("unknown_predicate"), "general");
assert_eq!(extract_domain("foo_bar_baz"), "general");
}
#[test]
fn test_domain_factor() {
assert!((domain_factor(0.0) - 0.5).abs() < f32::EPSILON);
assert!((domain_factor(0.5) - 0.75).abs() < f32::EPSILON);
assert!((domain_factor(1.0) - 1.0).abs() < f32::EPSILON);
// Clamping
assert!((domain_factor(-1.0) - 0.5).abs() < f32::EPSILON);
assert!((domain_factor(2.0) - 1.0).abs() < f32::EPSILON);
}
}

View File

@ -0,0 +1,374 @@
//! DomainTrustStore implementation backed by a generic KVStore.
//!
//! This module provides the concrete implementation of DomainTrustStore operations,
//! including CRUD operations and effective trust calculation.
use crate::error::Result;
use crate::key_codec;
use crate::traits::KVStore;
use async_trait::async_trait;
use tracing::{debug, instrument};
use super::model::{domain_factor, extract_domain, DomainTrust};
use super::DomainTrustStore;
/// DomainTrustStore implementation backed by a generic KVStore.
///
/// This implementation stores DomainTrust data at `\x00DT:{agent_hex}:{domain}`
/// and provides all operations for domain-specific trust management.
pub struct GenericDomainTrustStore<S> {
store: S,
}
impl<S: KVStore> GenericDomainTrustStore<S> {
/// Create a new DomainTrustStore backed by the given KVStore.
pub fn new(store: S) -> Self {
Self { store }
}
/// Serialize a DomainTrust using the canonical serde helpers.
fn serialize_domain_trust(dt: &DomainTrust) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(dt)
}
/// Deserialize a DomainTrust using the canonical serde helpers.
fn deserialize_domain_trust(data: &[u8]) -> Result<DomainTrust> {
crate::serde_helpers::deserialize(data)
}
}
#[async_trait]
impl<S: KVStore + 'static> DomainTrustStore for GenericDomainTrustStore<S> {
#[instrument(skip(self), fields(agent = %hex::encode(agent), domain))]
async fn get_domain_trust(&self, agent: &[u8; 32], domain: &str) -> Result<DomainTrust> {
let agent_hex = hex::encode(agent);
let key = key_codec::domain_trust_key(&agent_hex, domain);
match self.store.get(&key).await? {
Some(data) => {
let dt = Self::deserialize_domain_trust(&data)?;
debug!(score = dt.score, assertions = dt.assertions_count, "Retrieved DomainTrust");
Ok(dt)
}
None => {
// New agent-domain pair, return default
let now = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0);
let dt = DomainTrust::new(*agent, domain.to_string(), now);
debug!(score = dt.score, "Created default DomainTrust for new agent-domain pair");
Ok(dt)
}
}
}
#[instrument(skip(self), fields(agent = %hex::encode(agent)))]
async fn get_all_domains_for_agent(&self, agent: &[u8; 32]) -> Result<Vec<DomainTrust>> {
let agent_hex = hex::encode(agent);
let prefix = key_codec::domain_trust_agent_prefix(&agent_hex);
let entries = self.store.scan_prefix(&prefix).await?;
let mut domains = Vec::with_capacity(entries.len());
for (_, data) in entries {
let dt = Self::deserialize_domain_trust(&data)?;
domains.push(dt);
}
debug!(count = domains.len(), "Retrieved all domains for agent");
Ok(domains)
}
#[instrument(skip(self), fields(
agent = %hex::encode(agent),
predicate,
was_accurate
))]
async fn record_domain_outcome(
&self,
agent: &[u8; 32],
predicate: &str,
was_accurate: bool,
timestamp: u64,
) -> Result<f32> {
// Extract domain from predicate
let domain = extract_domain(predicate);
debug!(domain = %domain, "Extracted domain from predicate");
// Get or create domain trust
let mut dt = self.get_domain_trust(agent, &domain).await?;
// Record the outcome
let new_score = dt.record_outcome(was_accurate, timestamp);
// Store updated domain trust
let agent_hex = hex::encode(agent);
let key = key_codec::domain_trust_key(&agent_hex, &domain);
let serialized = Self::serialize_domain_trust(&dt)?;
self.store.put(&key, &serialized).await?;
debug!(
new_score,
accuracy_rate = dt.accuracy_rate(),
"Recorded domain outcome and updated DomainTrust"
);
Ok(new_score)
}
#[instrument(skip(self, dt), fields(
agent = %hex::encode(dt.agent_id),
domain = %dt.domain
))]
async fn put_domain_trust(&self, dt: &DomainTrust) -> Result<()> {
let agent_hex = hex::encode(dt.agent_id);
let key = key_codec::domain_trust_key(&agent_hex, &dt.domain);
let serialized = Self::serialize_domain_trust(dt)?;
self.store.put(&key, &serialized).await?;
debug!(score = dt.score, "Stored DomainTrust");
Ok(())
}
#[instrument(skip(self), fields(
agent = %hex::encode(agent),
predicate,
eigentrust_score
))]
async fn get_effective_trust(
&self,
agent: &[u8; 32],
predicate: &str,
eigentrust_score: f32,
) -> Result<f32> {
// Extract domain from predicate
let domain = extract_domain(predicate);
// Get domain trust (returns default 0.5 if not found)
let dt = self.get_domain_trust(agent, &domain).await?;
// Calculate effective trust
let factor = domain_factor(dt.score);
let effective = eigentrust_score * factor;
debug!(
domain = %domain,
domain_score = dt.score,
factor,
effective,
"Calculated effective trust"
);
Ok(effective)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::HybridStore;
use std::sync::Arc;
fn agent(id: u8) -> [u8; 32] {
let mut arr = [0u8; 32];
arr[0] = id;
arr
}
#[tokio::test]
async fn test_get_domain_trust_default() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Non-existent should return default
let dt = domain_store.get_domain_trust(&agent(1), "medicine").await.expect("get");
assert_eq!(dt.agent_id, agent(1));
assert_eq!(dt.domain, "medicine");
assert!((dt.score - 0.5).abs() < f32::EPSILON); // Default score
assert_eq!(dt.assertions_count, 0);
}
#[tokio::test]
async fn test_put_and_get_domain_trust() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
let mut dt = DomainTrust::new(agent(1), "medicine".to_string(), 1000);
dt.score = 0.8;
dt.assertions_count = 10;
dt.accuracy_count = 8;
domain_store.put_domain_trust(&dt).await.expect("put");
let retrieved = domain_store.get_domain_trust(&agent(1), "medicine").await.expect("get");
assert!((retrieved.score - 0.8).abs() < f32::EPSILON);
assert_eq!(retrieved.assertions_count, 10);
assert_eq!(retrieved.accuracy_count, 8);
}
#[tokio::test]
async fn test_record_domain_outcome_accurate() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Record accurate outcome for medicine domain
let new_score = domain_store
.record_domain_outcome(&agent(1), "treats_condition", true, 1000)
.await
.expect("record");
// Score should increase from 0.5 → 0.53
assert!((new_score - 0.53).abs() < 0.01);
// Check the domain was extracted correctly
let dt = domain_store.get_domain_trust(&agent(1), "medicine").await.expect("get");
assert_eq!(dt.assertions_count, 1);
assert_eq!(dt.accuracy_count, 1);
}
#[tokio::test]
async fn test_record_domain_outcome_inaccurate() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Record inaccurate outcome
let new_score = domain_store
.record_domain_outcome(&agent(1), "has_revenue", false, 1000)
.await
.expect("record");
// Score should decrease from 0.5 → 0.45
assert!((new_score - 0.45).abs() < 0.01);
// Check the domain was extracted correctly (finance)
let dt = domain_store.get_domain_trust(&agent(1), "finance").await.expect("get");
assert_eq!(dt.assertions_count, 1);
assert_eq!(dt.accuracy_count, 0);
}
#[tokio::test]
async fn test_get_all_domains_for_agent() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Agent 1 has activity in multiple domains
domain_store
.record_domain_outcome(&agent(1), "treats_condition", true, 1000)
.await
.expect("record");
domain_store
.record_domain_outcome(&agent(1), "has_revenue", true, 1001)
.await
.expect("record");
domain_store
.record_domain_outcome(&agent(1), "located_in", true, 1002)
.await
.expect("record");
let domains = domain_store.get_all_domains_for_agent(&agent(1)).await.expect("get");
assert_eq!(domains.len(), 3);
// Check domains are correct
let domain_names: Vec<&str> = domains.iter().map(|dt| dt.domain.as_str()).collect();
assert!(domain_names.contains(&"medicine"));
assert!(domain_names.contains(&"finance"));
assert!(domain_names.contains(&"geography"));
}
#[tokio::test]
async fn test_get_effective_trust() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Set up: agent has 0.8 domain score in medicine
let mut dt = DomainTrust::new(agent(1), "medicine".to_string(), 1000);
dt.score = 0.8;
domain_store.put_domain_trust(&dt).await.expect("put");
// Effective trust with 0.6 eigentrust
// factor = 0.5 + (0.8 * 0.5) = 0.9
// effective = 0.6 * 0.9 = 0.54
let effective = domain_store
.get_effective_trust(&agent(1), "treats_condition", 0.6)
.await
.expect("get");
assert!((effective - 0.54).abs() < 0.01);
}
#[tokio::test]
async fn test_get_effective_trust_default_domain() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// No domain trust set (default 0.5)
// factor = 0.5 + (0.5 * 0.5) = 0.75
// effective = 0.6 * 0.75 = 0.45
let effective = domain_store
.get_effective_trust(&agent(1), "treats_condition", 0.6)
.await
.expect("get");
assert!((effective - 0.45).abs() < 0.01);
}
#[tokio::test]
async fn test_domain_expertise_affects_resolution() {
// Scenario: Two agents with same eigentrust, different domain expertise
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Agent 1: Expert in medicine (score 0.9)
let mut dt1 = DomainTrust::new(agent(1), "medicine".to_string(), 1000);
dt1.score = 0.9;
domain_store.put_domain_trust(&dt1).await.expect("put");
// Agent 2: Novice in medicine (score 0.3)
let mut dt2 = DomainTrust::new(agent(2), "medicine".to_string(), 1000);
dt2.score = 0.3;
domain_store.put_domain_trust(&dt2).await.expect("put");
// Both have same global eigentrust (0.7)
let effective1 = domain_store
.get_effective_trust(&agent(1), "treats_condition", 0.7)
.await
.expect("get");
let effective2 = domain_store
.get_effective_trust(&agent(2), "treats_condition", 0.7)
.await
.expect("get");
// Agent 1 (expert) should have significantly higher effective trust
assert!(effective1 > effective2 * 1.3, "Expert: {}, Novice: {}", effective1, effective2);
}
#[tokio::test]
async fn test_domain_isolation() {
// Scenario: Agent is expert in medicine but not in finance
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let domain_store = GenericDomainTrustStore::new(store);
// Agent is expert in medicine
let mut dt = DomainTrust::new(agent(1), "medicine".to_string(), 1000);
dt.score = 0.95;
domain_store.put_domain_trust(&dt).await.expect("put");
// Agent has poor track record in finance
let mut dt2 = DomainTrust::new(agent(1), "finance".to_string(), 1000);
dt2.score = 0.2;
domain_store.put_domain_trust(&dt2).await.expect("put");
// Effective trust in medicine is high
let effective_med = domain_store
.get_effective_trust(&agent(1), "treats_condition", 0.8)
.await
.expect("get");
// Effective trust in finance is low
let effective_fin =
domain_store.get_effective_trust(&agent(1), "has_revenue", 0.8).await.expect("get");
assert!(effective_med > 0.7, "Medicine effective trust: {}", effective_med);
assert!(effective_fin < 0.5, "Finance effective trust: {}", effective_fin);
}
}

View File

@ -271,11 +271,7 @@ pub fn hash_subject_key(hash_hex: &str) -> Vec<u8> {
global_key(b"HASH_SUBJECT:", hash_hex.as_bytes()) global_key(b"HASH_SUBJECT:", hash_hex.as_bytes())
} }
// ── Vector Index Persistence ───────────────────────────────────────── // ── Vector/Visual Index Persistence (future KV-backed cursor persistence) ────
//
// These keys are reserved for KV-backed cursor persistence (future phase).
// Currently, PersistentVectorIndex stores version in filename and cursors
// are rebuilt from WAL replay.
/// Vector index metadata key: `\x00VI:meta` /// Vector index metadata key: `\x00VI:meta`
#[allow(dead_code)] #[allow(dead_code)]
@ -284,23 +280,17 @@ pub fn vi_meta_key() -> Vec<u8> {
} }
/// Vector index hot cursor key: `\x00VI:hot_cursor` /// Vector index hot cursor key: `\x00VI:hot_cursor`
///
/// Stores the WAL offset from which the hot index should replay on restart.
#[allow(dead_code)] #[allow(dead_code)]
pub fn vi_hot_cursor_key() -> Vec<u8> { pub fn vi_hot_cursor_key() -> Vec<u8> {
global_key(b"VI:hot_cursor", b"") global_key(b"VI:hot_cursor", b"")
} }
/// Vector index cold version key: `\x00VI:cold_version` /// Vector index cold version key: `\x00VI:cold_version`
///
/// Stores the version number of the current cold index snapshot.
#[allow(dead_code)] #[allow(dead_code)]
pub fn vi_cold_version_key() -> Vec<u8> { pub fn vi_cold_version_key() -> Vec<u8> {
global_key(b"VI:cold_version", b"") global_key(b"VI:cold_version", b"")
} }
// ── Visual Index Persistence ─────────────────────────────────────────
/// Visual index metadata key: `\x00VH:meta` /// Visual index metadata key: `\x00VH:meta`
#[allow(dead_code)] #[allow(dead_code)]
pub fn vh_meta_key() -> Vec<u8> { pub fn vh_meta_key() -> Vec<u8> {
@ -330,6 +320,93 @@ pub fn alias_scan_prefix() -> Vec<u8> {
global_key(b"CA:", b"") global_key(b"CA:", b"")
} }
// ── Trust Graph Keys ─────────────────────────────────────────────────
/// Trust edge key: `\x00TG:{from_hex}:{to_hex}`
///
/// Stores a TrustEdge from one agent to another.
pub fn trust_edge_key(from_hex: &str, to_hex: &str) -> Vec<u8> {
let suffix = format!("{}:{}", from_hex, to_hex);
global_key(b"TG:", suffix.as_bytes())
}
/// Trust edge from-prefix: `\x00TG:{from_hex}:`
///
/// Scan all edges where `from_agent` is the source (outgoing edges).
pub fn trust_edge_from_prefix(from_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", from_hex);
global_key(b"TG:", suffix.as_bytes())
}
/// Trust edge reverse key: `\x00TGR:{to_hex}:{from_hex}`
///
/// Reverse index for fast lookup of incoming edges (who trusts this agent).
pub fn trust_edge_reverse_key(to_hex: &str, from_hex: &str) -> Vec<u8> {
let suffix = format!("{}:{}", to_hex, from_hex);
global_key(b"TGR:", suffix.as_bytes())
}
/// Trust edge reverse prefix: `\x00TGR:{to_hex}:`
///
/// Scan all edges where `to_agent` is the target (incoming edges).
pub fn trust_edge_reverse_prefix(to_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", to_hex);
global_key(b"TGR:", suffix.as_bytes())
}
/// Trust graph scan prefix: `\x00TG:`
///
/// Scan all trust edges in the graph.
pub fn trust_graph_scan_prefix() -> Vec<u8> {
global_key(b"TG:", b"")
}
/// EigenTrust state key: `\x00ET:state`
///
/// Stores the computed EigenTrust state (global trust scores).
pub fn eigentrust_state_key() -> Vec<u8> {
global_key(b"ET:state", b"")
}
/// Seed trust key: `\x00ET:seed:{agent_hex}`
///
/// Stores the seed trust value for a pre-trusted agent.
pub fn seed_trust_key(agent_hex: &str) -> Vec<u8> {
global_key(b"ET:seed:", agent_hex.as_bytes())
}
/// Seed trust scan prefix: `\x00ET:seed:`
///
/// Scan all seed trust entries.
pub fn seed_trust_scan_prefix() -> Vec<u8> {
global_key(b"ET:seed:", b"")
}
// ── Domain Trust Keys ────────────────────────────────────────────────
/// Domain trust key: `\x00DT:{agent_hex}:{domain}`
///
/// Stores domain-specific trust for an agent.
pub fn domain_trust_key(agent_hex: &str, domain: &str) -> Vec<u8> {
let suffix = format!("{}:{}", agent_hex, domain);
global_key(b"DT:", suffix.as_bytes())
}
/// Domain trust agent prefix: `\x00DT:{agent_hex}:`
///
/// Scan all domains for a specific agent.
pub fn domain_trust_agent_prefix(agent_hex: &str) -> Vec<u8> {
let suffix = format!("{}:", agent_hex);
global_key(b"DT:", suffix.as_bytes())
}
/// Domain trust scan prefix: `\x00DT:`
///
/// Scan all domain trust entries.
pub fn domain_trust_scan_prefix() -> Vec<u8> {
global_key(b"DT:", b"")
}
// ── Key extraction / parsing ──────────────────────────────────────── // ── Key extraction / parsing ────────────────────────────────────────
/// Extract subject from a `\x00SUBJECTS:{subject}` key. /// Extract subject from a `\x00SUBJECTS:{subject}` key.

View File

@ -141,10 +141,16 @@
//! } //! }
//! ``` //! ```
/// Admission control storage for graduated PoW and trust tiers.
pub mod admission_store;
/// CRDT (Conflict-free Replicated Data Type) implementations for distributed StemeDB. /// CRDT (Conflict-free Replicated Data Type) implementations for distributed StemeDB.
pub mod crdt; pub mod crdt;
/// Domain-specific trust tracking for per-domain expertise.
pub mod domain_trust_store;
/// Central key encoding/decoding for subject-prefix range sharding. /// Central key encoding/decoding for subject-prefix range sharding.
pub mod key_codec; pub mod key_codec;
/// EigenTrust trust graph for Sybil-resistant reputation.
pub mod trust_graph_store;
/// Shared checkpoint file format for index persistence. /// Shared checkpoint file format for index persistence.
pub mod checkpoint_format; pub mod checkpoint_format;
@ -186,8 +192,14 @@ pub mod visual_index;
/// High-velocity vote storage (The Ballot Box). /// High-velocity vote storage (The Ballot Box).
pub mod vote_store; pub mod vote_store;
pub use admission_store::{
AdmissionCheck, AdmissionStatus, AdmissionStatusResult, AdmissionStore, GenericAdmissionStore,
};
pub use alias_store::{AliasStore, GenericAliasStore}; pub use alias_store::{AliasStore, GenericAliasStore};
pub use audit_store::{AuditStore, GenericAuditStore}; pub use audit_store::{AuditStore, GenericAuditStore};
pub use domain_trust_store::{
domain_factor, extract_domain, DomainTrust, DomainTrustStore, GenericDomainTrustStore,
};
pub use error::{Result, StorageError}; pub use error::{Result, StorageError};
pub use escalation_store::{EscalationStore, GenericEscalationStore}; pub use escalation_store::{EscalationStore, GenericEscalationStore};
pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore}; pub use gold_standard_store::{GenericGoldStandardStore, GoldStandardStore};
@ -199,6 +211,10 @@ pub use quota_store::{
}; };
pub use supersession_store::{GenericSupersessionStore, SupersessionStore}; pub use supersession_store::{GenericSupersessionStore, SupersessionStore};
pub use traits::KVStore; pub use traits::KVStore;
pub use trust_graph_store::{
compute_eigentrust_scores, EigenTrustConfig, EigenTrustResult, EigenTrustState,
GenericTrustGraphStore, TrustEdge, TrustGraphStore,
};
pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore}; pub use trust_pack_store::{GenericTrustPackStore, TrustPackStore};
pub use trust_rank_store::{GenericTrustRankStore, TrustRank, TrustRankStore}; pub use trust_rank_store::{GenericTrustRankStore, TrustRank, TrustRankStore};
pub use vector_index::{ pub use vector_index::{

View File

@ -0,0 +1,487 @@
//! EigenTrust power iteration algorithm.
//!
//! Implements the EigenTrust algorithm for computing global trust scores
//! in a web of trust. The algorithm is Sybil-resistant because trust
//! only flows from pre-trusted seed agents.
//!
//! # Algorithm
//!
//! The EigenTrust algorithm computes global trust scores using power iteration:
//!
//! ```text
//! T = (1-α)C^T * T + α * P
//! ```
//!
//! Where:
//! - T: Trust vector (what we're computing)
//! - C: Row-normalized adjacency matrix (trust edges)
//! - P: Seed trust vector (pre-trusted agents)
//! - α: Damping factor (probability of jumping to a seed)
//!
//! # Sybil Resistance
//!
//! The key insight is that isolated rings of agents (not connected to seeds)
//! receive NO propagated trust. Only seed-connected agents accumulate meaningful trust.
use super::model::{AgentScore, EigenTrustConfig, EigenTrustResult, EigenTrustState, TrustEdge};
use std::collections::HashMap;
use tracing::{debug, instrument};
/// Compute EigenTrust scores using power iteration.
///
/// # Arguments
/// * `edges` - All trust edges in the graph
/// * `seed_trust` - Pre-trusted agents and their seed weights (will be normalized)
/// * `config` - Algorithm configuration
/// * `timestamp` - Unix timestamp for the result
///
/// # Returns
/// `EigenTrustResult` containing the computed state and convergence status.
///
/// # Sybil Resistance
///
/// Agents not connected to the seed trust network receive near-zero trust.
/// This is achieved through the α * P term in the power iteration:
/// - α = 0.1 means 10% of trust comes directly from seeds each iteration
/// - Isolated rings have no path to seeds, so their trust decays to zero
///
/// # Performance
///
/// - Time: O(iterations * edges)
/// - Space: O(agents)
/// - Typical convergence: 10-15 iterations
#[instrument(skip(edges, seed_trust), fields(edge_count = edges.len(), seed_count = seed_trust.len()))]
pub fn compute_eigentrust_scores(
edges: &[TrustEdge],
seed_trust: &[([u8; 32], f32)],
config: &EigenTrustConfig,
timestamp: u64,
) -> EigenTrustResult {
// Handle empty graph
if edges.is_empty() && seed_trust.is_empty() {
debug!("Empty graph, returning empty state");
return EigenTrustResult { state: EigenTrustState::empty(timestamp), converged: true };
}
// Build agent → index mapping
let mut agent_to_idx: HashMap<[u8; 32], usize> = HashMap::new();
let mut idx_to_agent: Vec<[u8; 32]> = Vec::new();
for edge in edges {
if !agent_to_idx.contains_key(&edge.from_agent) {
agent_to_idx.insert(edge.from_agent, idx_to_agent.len());
idx_to_agent.push(edge.from_agent);
}
if !agent_to_idx.contains_key(&edge.to_agent) {
agent_to_idx.insert(edge.to_agent, idx_to_agent.len());
idx_to_agent.push(edge.to_agent);
}
}
// Add seed agents that might not have edges
for (agent, _) in seed_trust {
if !agent_to_idx.contains_key(agent) {
agent_to_idx.insert(*agent, idx_to_agent.len());
idx_to_agent.push(*agent);
}
}
let n = idx_to_agent.len();
if n == 0 {
debug!("No agents in graph, returning empty state");
return EigenTrustResult { state: EigenTrustState::empty(timestamp), converged: true };
}
debug!(agent_count = n, "Building adjacency matrix");
// Build row-normalized adjacency matrix C
// C[i][j] = normalized weight from i to j
// We store as Vec<Vec<(target_idx, weight)>> for sparse representation
let mut outgoing: Vec<Vec<(usize, f32)>> = vec![Vec::new(); n];
let mut out_sum: Vec<f32> = vec![0.0; n];
for edge in edges {
if !edge.is_valid() {
continue;
}
if let (Some(&from_idx), Some(&to_idx)) =
(agent_to_idx.get(&edge.from_agent), agent_to_idx.get(&edge.to_agent))
{
outgoing[from_idx].push((to_idx, edge.weight));
out_sum[from_idx] += edge.weight;
}
}
// Normalize outgoing weights (row normalization)
for (i, edges_list) in outgoing.iter_mut().enumerate() {
let sum = out_sum[i];
if sum > 0.0 {
for (_, weight) in edges_list.iter_mut() {
*weight /= sum;
}
}
}
// Build seed vector P (normalized)
let mut p: Vec<f32> = vec![0.0; n];
let mut p_sum = 0.0_f32;
for (agent, weight) in seed_trust {
if let Some(&idx) = agent_to_idx.get(agent) {
p[idx] = *weight;
p_sum += *weight;
}
}
// Normalize P
if p_sum > 0.0 {
for pi in &mut p {
*pi /= p_sum;
}
} else {
// No seed trust: uniform distribution (fallback, not recommended)
debug!("Warning: No seed trust provided, using uniform distribution");
for pi in &mut p {
*pi = 1.0 / n as f32;
}
}
// Initialize trust vector T = P
let mut t: Vec<f32> = p.clone();
// Power iteration: T = (1-α) * C^T * T + α * P
let alpha = config.alpha;
let one_minus_alpha = 1.0 - alpha;
let mut iterations = 0_u32;
let mut convergence_delta = f32::MAX;
for iter in 0..config.max_iterations {
iterations = iter + 1;
// Compute new_t = (1-α) * C^T * t + α * p
let mut new_t: Vec<f32> = vec![0.0; n];
// C^T * t: for each node i, collect trust from nodes that trust i
// This is equivalent to: for each node j with outgoing edge to i,
// add normalized_weight * t[j] to new_t[i]
for (j, edges_list) in outgoing.iter().enumerate() {
let t_j = t[j];
for &(i, weight) in edges_list {
new_t[i] += weight * t_j;
}
}
// Handle dangling nodes (nodes with no outgoing edges)
// Distribute their trust uniformly to seeds
let mut dangling_mass = 0.0_f32;
for (j, sum) in out_sum.iter().enumerate() {
if *sum == 0.0 {
dangling_mass += t[j];
}
}
if dangling_mass > 0.0 {
for (i, pi) in p.iter().enumerate() {
new_t[i] += dangling_mass * pi;
}
}
// Apply (1-α) factor and add α * p
for i in 0..n {
new_t[i] = one_minus_alpha * new_t[i] + alpha * p[i];
}
// Compute L1 norm of change
convergence_delta = 0.0;
for i in 0..n {
convergence_delta += (new_t[i] - t[i]).abs();
}
debug!(iteration = iterations, delta = convergence_delta, "Power iteration step");
// Update t
t = new_t;
// Check convergence
if convergence_delta < config.epsilon {
debug!(iterations, delta = convergence_delta, "Converged");
break;
}
}
// Normalize final scores to sum to 1.0
let t_sum: f32 = t.iter().sum();
if t_sum > 0.0 {
for ti in &mut t {
*ti /= t_sum;
}
}
// Build result
let scores: Vec<AgentScore> = idx_to_agent
.into_iter()
.zip(t)
.map(|(agent, score)| AgentScore::new(agent, score))
.collect();
let converged = convergence_delta < config.epsilon;
debug!(
iterations,
converged,
delta = convergence_delta,
agents = scores.len(),
"EigenTrust computation complete"
);
EigenTrustResult {
state: EigenTrustState { scores, computed_at: timestamp, iterations, convergence_delta },
converged,
}
}
#[cfg(test)]
mod tests {
use super::*;
fn agent(id: u8) -> [u8; 32] {
let mut arr = [0u8; 32];
arr[0] = id;
arr
}
#[test]
fn test_empty_graph() {
let result = compute_eigentrust_scores(&[], &[], &EigenTrustConfig::default(), 1000);
assert!(result.converged);
assert!(result.state.scores.is_empty());
}
#[test]
fn test_single_seed_no_edges() {
let seeds = vec![(agent(1), 1.0)];
let result = compute_eigentrust_scores(&[], &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
assert_eq!(result.state.scores.len(), 1);
// Single seed gets all the trust
let score = result.state.get_score(&agent(1));
assert!((score - 1.0).abs() < 0.01);
}
#[test]
fn test_simple_chain() {
// Seed → A → B
// Seed trusts A, A trusts B
let seed = agent(0);
let a = agent(1);
let b = agent(2);
let edges =
vec![TrustEdge::new(seed, a, 1.0, 1000, None), TrustEdge::new(a, b, 1.0, 1000, None)];
let seeds = vec![(seed, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
// Seed should have highest trust (directly seeded)
// A should have next highest (trusted by seed)
// B should have lowest (trusted by A)
let seed_score = result.state.get_score(&seed);
let a_score = result.state.get_score(&a);
let b_score = result.state.get_score(&b);
assert!(seed_score > a_score);
assert!(a_score > b_score);
assert!(b_score > 0.0);
}
#[test]
fn test_isolated_ring_gets_low_trust() {
// This is the key Sybil resistance test
//
// Network:
// - Seed agent S with seed trust
// - Isolated ring: A → B → C → A (no connection to S)
//
// Expected: Ring agents get near-zero trust
let s = agent(0);
let a = agent(1);
let b = agent(2);
let c = agent(3);
// S has no edges (just seed trust)
// A → B → C → A forms isolated ring
let edges = vec![
TrustEdge::new(a, b, 1.0, 1000, None),
TrustEdge::new(b, c, 1.0, 1000, None),
TrustEdge::new(c, a, 1.0, 1000, None),
];
let seeds = vec![(s, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
// Seed retains most trust (since it's the only pre-trusted agent)
let s_score = result.state.get_score(&s);
assert!(s_score > 0.9, "Seed score should be high: {}", s_score);
// Ring agents should have near-zero trust (not connected to seed)
let a_score = result.state.get_score(&a);
let b_score = result.state.get_score(&b);
let c_score = result.state.get_score(&c);
assert!(a_score < 0.05, "Isolated agent A should have low trust: {}", a_score);
assert!(b_score < 0.05, "Isolated agent B should have low trust: {}", b_score);
assert!(c_score < 0.05, "Isolated agent C should have low trust: {}", c_score);
}
#[test]
fn test_ring_connected_to_seed_gets_trust() {
// Network:
// - Seed S → A (S trusts A)
// - A → B → C → A (ring connected to seed via A)
//
// Expected: Ring agents get trust because connected to seed
let s = agent(0);
let a = agent(1);
let b = agent(2);
let c = agent(3);
let edges = vec![
TrustEdge::new(s, a, 1.0, 1000, None), // Seed trusts A
TrustEdge::new(a, b, 1.0, 1000, None),
TrustEdge::new(b, c, 1.0, 1000, None),
TrustEdge::new(c, a, 1.0, 1000, None),
];
let seeds = vec![(s, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
// All agents should have non-trivial trust
let s_score = result.state.get_score(&s);
let a_score = result.state.get_score(&a);
let b_score = result.state.get_score(&b);
let c_score = result.state.get_score(&c);
assert!(s_score > 0.0);
assert!(a_score > 0.1, "Agent A connected to seed should have trust: {}", a_score);
assert!(b_score > 0.05, "Agent B should have some trust: {}", b_score);
assert!(c_score > 0.05, "Agent C should have some trust: {}", c_score);
}
#[test]
fn test_multiple_seeds() {
// Two seeds, each trusts one agent
let s1 = agent(0);
let s2 = agent(1);
let a = agent(2);
let b = agent(3);
let edges =
vec![TrustEdge::new(s1, a, 1.0, 1000, None), TrustEdge::new(s2, b, 1.0, 1000, None)];
let seeds = vec![(s1, 1.0), (s2, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
// Both seeds and their trusted agents should have trust
let s1_score = result.state.get_score(&s1);
let s2_score = result.state.get_score(&s2);
let a_score = result.state.get_score(&a);
let b_score = result.state.get_score(&b);
// Seeds should have roughly equal trust (equal seed weight)
assert!((s1_score - s2_score).abs() < 0.1);
// Trusted agents should have roughly equal trust
assert!((a_score - b_score).abs() < 0.1);
}
#[test]
fn test_weighted_edges() {
// Seed trusts A strongly, B weakly
let s = agent(0);
let a = agent(1);
let b = agent(2);
let edges = vec![
TrustEdge::new(s, a, 0.9, 1000, None), // Strong trust
TrustEdge::new(s, b, 0.1, 1000, None), // Weak trust
];
let seeds = vec![(s, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
let a_score = result.state.get_score(&a);
let b_score = result.state.get_score(&b);
// A should have significantly more trust than B
assert!(a_score > b_score * 2.0, "A: {}, B: {}", a_score, b_score);
}
#[test]
fn test_invalid_edges_ignored() {
// Self-trust and zero-weight edges should be ignored
let s = agent(0);
let a = agent(1);
let edges = vec![
TrustEdge::new(s, a, 1.0, 1000, None), // Valid
TrustEdge::new(a, a, 1.0, 1000, None), // Invalid: self-trust
TrustEdge::new(s, a, 0.0, 1000, None), // Invalid: zero weight
];
let seeds = vec![(s, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
assert!(result.converged);
// Should not crash or produce weird results
assert!(result.state.scores.len() >= 2);
}
#[test]
fn test_convergence_within_max_iterations() {
// Even a moderately complex graph should converge in 20 iterations
let seed = agent(0);
let mut edges = Vec::new();
// Create a star topology: seed trusts 10 agents
for i in 1..=10 {
edges.push(TrustEdge::new(seed, agent(i), 1.0, 1000, None));
}
let seeds = vec![(seed, 1.0)];
let config = EigenTrustConfig::default();
let result = compute_eigentrust_scores(&edges, &seeds, &config, 1000);
assert!(result.converged, "Should converge within {} iterations", config.max_iterations);
assert!(result.state.iterations < config.max_iterations);
}
#[test]
fn test_scores_sum_to_one() {
let s = agent(0);
let a = agent(1);
let b = agent(2);
let edges =
vec![TrustEdge::new(s, a, 1.0, 1000, None), TrustEdge::new(a, b, 1.0, 1000, None)];
let seeds = vec![(s, 1.0)];
let result = compute_eigentrust_scores(&edges, &seeds, &EigenTrustConfig::default(), 1000);
let sum: f32 = result.state.scores.iter().map(|s| s.score).sum();
assert!((sum - 1.0).abs() < 0.01, "Scores should sum to 1.0, got {}", sum);
}
}

View File

@ -0,0 +1,219 @@
//! Specialized storage for EigenTrust trust graph.
//!
//! The TrustGraphStore provides a web of trust where agents can express
//! trust in other agents. This trust graph is used to compute global
//! EigenTrust scores that are Sybil-resistant.
//!
//! # Storage Layout
//!
//! | Key Pattern | Value | Purpose |
//! |-------------|-------|---------|
//! | `\x00TG:{from}:{to}` | Serialized TrustEdge | Trust edge (forward) |
//! | `\x00TGR:{to}:{from}` | Serialized TrustEdge | Trust edge (reverse index) |
//! | `\x00ET:state` | Serialized EigenTrustState | Computed global scores |
//! | `\x00ET:seed:{agent}` | f32 bytes | Seed trust for pre-trusted agents |
//!
//! # Sybil Resistance
//!
//! The EigenTrust algorithm ensures that isolated rings of colluding agents
//! cannot accumulate trust. Only agents connected to pre-trusted seeds
//! can gain meaningful reputation.
mod eigentrust;
mod model;
mod store_impl;
#[cfg(test)]
mod store_tests;
pub use eigentrust::compute_eigentrust_scores;
pub use model::*;
pub use store_impl::*;
use crate::error::Result;
use async_trait::async_trait;
use std::sync::Arc;
/// Specialized storage trait for TrustGraph operations.
///
/// This trait provides trust graph management for the EigenTrust system,
/// enabling Sybil-resistant reputation across the network.
///
/// # Example
///
/// ```ignore
/// let trust_store = GenericTrustGraphStore::new(kv_store);
///
/// // Add trust relationship
/// let edge = TrustEdge::new(agent_a, agent_b, 0.8, timestamp, None);
/// trust_store.add_trust_edge(&edge).await?;
///
/// // Compute EigenTrust scores
/// let state = trust_store.compute_eigentrust(&EigenTrustConfig::default()).await?;
///
/// // Query score
/// let score = trust_store.get_eigentrust_score(&agent_b).await?;
/// ```
#[async_trait]
pub trait TrustGraphStore: Send + Sync {
// ── Edge CRUD ────────────────────────────────────────────────────────
/// Add or update a trust edge in the graph.
///
/// This creates both the forward index (`TG:{from}:{to}`) and
/// reverse index (`TGR:{to}:{from}`) for efficient bidirectional queries.
///
/// # Arguments
/// * `edge` - The trust edge to add
async fn add_trust_edge(&self, edge: &TrustEdge) -> Result<()>;
/// Remove a trust edge from the graph.
///
/// # Arguments
/// * `from` - Agent granting trust
/// * `to` - Agent receiving trust
///
/// # Returns
/// `true` if the edge existed and was removed, `false` if not found
async fn remove_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<bool>;
/// Get a specific trust edge.
///
/// # Arguments
/// * `from` - Agent granting trust
/// * `to` - Agent receiving trust
///
/// # Returns
/// The trust edge if it exists
async fn get_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<Option<TrustEdge>>;
// ── Graph traversal ──────────────────────────────────────────────────
/// Get all outgoing trust edges from an agent.
///
/// Returns (to_agent, weight) pairs for agents that this agent trusts.
///
/// # Arguments
/// * `from` - Agent granting trust
async fn get_trusts(&self, from: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>>;
/// Get all incoming trust edges to an agent.
///
/// Returns (from_agent, weight) pairs for agents that trust this agent.
/// Uses the reverse index for efficient queries.
///
/// # Arguments
/// * `to` - Agent receiving trust
async fn get_trusted_by(&self, to: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>>;
/// Get all edges in the trust graph.
///
/// Used by the EigenTrust computation. May be expensive for large graphs.
async fn get_all_edges(&self) -> Result<Vec<TrustEdge>>;
// ── Seed trust ───────────────────────────────────────────────────────
/// Set seed trust for a pre-trusted agent.
///
/// Seed trust defines the "P" vector in EigenTrust. These are agents
/// that are pre-trusted (e.g., verified organizations, system admins).
///
/// # Arguments
/// * `agent` - Agent to pre-trust
/// * `trust` - Seed trust weight (0.0 to 1.0)
async fn set_seed_trust(&self, agent: &[u8; 32], trust: f32) -> Result<()>;
/// Get seed trust for an agent.
///
/// Returns 0.0 if the agent has no seed trust.
async fn get_seed_trust(&self, agent: &[u8; 32]) -> Result<f32>;
/// Get all seed trust entries.
///
/// Used by the EigenTrust computation to build the P vector.
async fn get_all_seed_trust(&self) -> Result<Vec<([u8; 32], f32)>>;
/// Remove seed trust for an agent.
async fn remove_seed_trust(&self, agent: &[u8; 32]) -> Result<bool>;
// ── EigenTrust computation ───────────────────────────────────────────
/// Compute EigenTrust scores for all agents in the graph.
///
/// This runs the power iteration algorithm and stores the result.
/// Should be called periodically (e.g., daily) to update global scores.
///
/// # Arguments
/// * `config` - Algorithm configuration
///
/// # Returns
/// The computed state
async fn compute_eigentrust(&self, config: &EigenTrustConfig) -> Result<EigenTrustState>;
/// Get the current EigenTrust state (previously computed scores).
///
/// Returns `None` if EigenTrust has never been computed.
async fn get_eigentrust_state(&self) -> Result<Option<EigenTrustState>>;
/// Get the EigenTrust score for a specific agent.
///
/// Returns 0.0 if:
/// - The agent is not in the graph
/// - EigenTrust has never been computed
async fn get_eigentrust_score(&self, agent: &[u8; 32]) -> Result<f32>;
}
// Blanket implementation for Arc<T> where T: TrustGraphStore
#[async_trait]
impl<T: TrustGraphStore + ?Sized> TrustGraphStore for Arc<T> {
async fn add_trust_edge(&self, edge: &TrustEdge) -> Result<()> {
(**self).add_trust_edge(edge).await
}
async fn remove_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<bool> {
(**self).remove_trust_edge(from, to).await
}
async fn get_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<Option<TrustEdge>> {
(**self).get_trust_edge(from, to).await
}
async fn get_trusts(&self, from: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>> {
(**self).get_trusts(from).await
}
async fn get_trusted_by(&self, to: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>> {
(**self).get_trusted_by(to).await
}
async fn get_all_edges(&self) -> Result<Vec<TrustEdge>> {
(**self).get_all_edges().await
}
async fn set_seed_trust(&self, agent: &[u8; 32], trust: f32) -> Result<()> {
(**self).set_seed_trust(agent, trust).await
}
async fn get_seed_trust(&self, agent: &[u8; 32]) -> Result<f32> {
(**self).get_seed_trust(agent).await
}
async fn get_all_seed_trust(&self) -> Result<Vec<([u8; 32], f32)>> {
(**self).get_all_seed_trust().await
}
async fn remove_seed_trust(&self, agent: &[u8; 32]) -> Result<bool> {
(**self).remove_seed_trust(agent).await
}
async fn compute_eigentrust(&self, config: &EigenTrustConfig) -> Result<EigenTrustState> {
(**self).compute_eigentrust(config).await
}
async fn get_eigentrust_state(&self) -> Result<Option<EigenTrustState>> {
(**self).get_eigentrust_state().await
}
async fn get_eigentrust_score(&self, agent: &[u8; 32]) -> Result<f32> {
(**self).get_eigentrust_score(agent).await
}
}

View File

@ -0,0 +1,244 @@
//! TrustGraphStore data models for EigenTrust computation.
//!
//! This module defines the core data structures for the trust graph:
//! - `TrustEdge`: A directed trust relationship between agents
//! - `EigenTrustState`: The computed global trust scores
//! - `EigenTrustConfig`: Configuration for power iteration
/// Default alpha (damping factor) for EigenTrust.
/// 0.1 means 90% of trust flows through the graph, 10% from seeds.
pub const DEFAULT_ALPHA: f32 = 0.1;
/// Default maximum iterations for power iteration convergence.
pub const DEFAULT_MAX_ITERATIONS: u32 = 20;
/// Default convergence threshold (epsilon).
pub const DEFAULT_EPSILON: f32 = 1e-6;
/// A directed trust edge from one agent to another.
///
/// # Invariants
///
/// - `weight` is in range [0.0, 1.0]
/// - `from_agent` and `to_agent` are Ed25519 public keys
/// - An agent cannot trust themselves (from_agent != to_agent)
#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize, Debug, Clone, PartialEq)]
#[archive(check_bytes)]
pub struct TrustEdge {
/// Agent granting trust (Ed25519 public key).
pub from_agent: [u8; 32],
/// Agent receiving trust (Ed25519 public key).
pub to_agent: [u8; 32],
/// Trust weight (0.0 = no trust, 1.0 = full trust).
pub weight: f32,
/// Unix timestamp when this edge was created.
pub created_at: u64,
/// Optional human-readable reason for the trust relationship.
pub reason: Option<String>,
}
impl TrustEdge {
/// Create a new trust edge.
///
/// # Arguments
/// * `from_agent` - Agent granting trust
/// * `to_agent` - Agent receiving trust
/// * `weight` - Trust weight (clamped to [0.0, 1.0])
/// * `created_at` - Unix timestamp
/// * `reason` - Optional reason for trust
pub fn new(
from_agent: [u8; 32],
to_agent: [u8; 32],
weight: f32,
created_at: u64,
reason: Option<String>,
) -> Self {
Self { from_agent, to_agent, weight: weight.clamp(0.0, 1.0), created_at, reason }
}
/// Check if this edge represents valid trust (non-zero weight, different agents).
pub fn is_valid(&self) -> bool {
self.weight > 0.0 && self.from_agent != self.to_agent
}
}
/// Configuration for EigenTrust power iteration.
///
/// # Parameters
///
/// - `alpha`: Damping factor. Controls how much trust flows from seeds vs. graph.
/// - α = 0.0: All trust from graph (vulnerable to Sybil attacks)
/// - α = 1.0: All trust from seeds (no graph propagation)
/// - α = 0.1 (default): 90% graph, 10% seeds (balanced)
///
/// - `max_iterations`: Safety limit for convergence.
/// - Most graphs converge in 10-15 iterations
/// - Default 20 provides safety margin
///
/// - `epsilon`: Convergence threshold (L1 norm of change).
/// - 1e-6 is sufficient for most applications
#[derive(Debug, Clone, Copy)]
pub struct EigenTrustConfig {
/// Damping factor: probability of jumping to a seed (default 0.1).
pub alpha: f32,
/// Maximum iterations before stopping (default 20).
pub max_iterations: u32,
/// Convergence threshold (default 1e-6).
pub epsilon: f32,
}
impl Default for EigenTrustConfig {
fn default() -> Self {
Self {
alpha: DEFAULT_ALPHA,
max_iterations: DEFAULT_MAX_ITERATIONS,
epsilon: DEFAULT_EPSILON,
}
}
}
impl EigenTrustConfig {
/// Create a new config with custom parameters.
pub fn new(alpha: f32, max_iterations: u32, epsilon: f32) -> Self {
Self { alpha: alpha.clamp(0.0, 1.0), max_iterations, epsilon }
}
}
/// An agent-score pair for EigenTrust state serialization.
#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize, Debug, Clone, PartialEq)]
#[archive(check_bytes)]
pub struct AgentScore {
/// Agent's Ed25519 public key.
pub agent: [u8; 32],
/// Global trust score (normalized to sum to 1.0 across all agents).
pub score: f32,
}
impl AgentScore {
/// Create a new agent-score pair.
pub fn new(agent: [u8; 32], score: f32) -> Self {
Self { agent, score }
}
}
/// The computed EigenTrust state after power iteration.
///
/// This represents a snapshot of global trust scores for all agents
/// in the trust graph at a specific point in time.
#[derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize, Debug, Clone)]
#[archive(check_bytes)]
pub struct EigenTrustState {
/// Agent ID → global trust score pairs.
/// Scores are normalized to sum to 1.0.
pub scores: Vec<AgentScore>,
/// Unix timestamp when this state was computed.
pub computed_at: u64,
/// Number of iterations to converge.
pub iterations: u32,
/// Final L1 norm of change (convergence delta).
pub convergence_delta: f32,
}
impl EigenTrustState {
/// Create an empty state (no agents).
pub fn empty(timestamp: u64) -> Self {
Self { scores: Vec::new(), computed_at: timestamp, iterations: 0, convergence_delta: 0.0 }
}
/// Get the trust score for a specific agent.
///
/// Returns 0.0 if the agent is not in the graph.
pub fn get_score(&self, agent: &[u8; 32]) -> f32 {
self.scores.iter().find(|s| &s.agent == agent).map(|s| s.score).unwrap_or(0.0)
}
/// Check if the computation converged.
pub fn converged(&self, config: &EigenTrustConfig) -> bool {
self.convergence_delta < config.epsilon
}
}
/// Result of EigenTrust computation.
#[derive(Debug)]
pub struct EigenTrustResult {
/// The computed state.
pub state: EigenTrustState,
/// Whether the algorithm converged within max_iterations.
pub converged: bool,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_trust_edge_new_clamps_weight() {
let edge = TrustEdge::new([1u8; 32], [2u8; 32], 1.5, 1000, None);
assert!((edge.weight - 1.0).abs() < f32::EPSILON);
let edge = TrustEdge::new([1u8; 32], [2u8; 32], -0.5, 1000, None);
assert!((edge.weight - 0.0).abs() < f32::EPSILON);
}
#[test]
fn test_trust_edge_is_valid() {
// Valid edge
let edge = TrustEdge::new([1u8; 32], [2u8; 32], 0.8, 1000, None);
assert!(edge.is_valid());
// Zero weight = invalid
let edge = TrustEdge::new([1u8; 32], [2u8; 32], 0.0, 1000, None);
assert!(!edge.is_valid());
// Self-trust = invalid
let edge = TrustEdge::new([1u8; 32], [1u8; 32], 0.8, 1000, None);
assert!(!edge.is_valid());
}
#[test]
fn test_eigentrust_config_default() {
let config = EigenTrustConfig::default();
assert!((config.alpha - 0.1).abs() < f32::EPSILON);
assert_eq!(config.max_iterations, 20);
assert!((config.epsilon - 1e-6).abs() < f32::EPSILON);
}
#[test]
fn test_eigentrust_state_get_score() {
let state = EigenTrustState {
scores: vec![
AgentScore::new([1u8; 32], 0.5),
AgentScore::new([2u8; 32], 0.3),
AgentScore::new([3u8; 32], 0.2),
],
computed_at: 1000,
iterations: 10,
convergence_delta: 1e-8,
};
assert!((state.get_score(&[1u8; 32]) - 0.5).abs() < f32::EPSILON);
assert!((state.get_score(&[2u8; 32]) - 0.3).abs() < f32::EPSILON);
assert!((state.get_score(&[99u8; 32]) - 0.0).abs() < f32::EPSILON); // Missing agent
}
#[test]
fn test_eigentrust_state_converged() {
let config = EigenTrustConfig::default();
let converged_state = EigenTrustState {
scores: vec![],
computed_at: 1000,
iterations: 10,
convergence_delta: 1e-8,
};
assert!(converged_state.converged(&config));
let not_converged_state = EigenTrustState {
scores: vec![],
computed_at: 1000,
iterations: 20,
convergence_delta: 1e-4,
};
assert!(!not_converged_state.converged(&config));
}
}

View File

@ -0,0 +1,327 @@
//! TrustGraphStore implementation backed by a generic KVStore.
//!
//! This module provides the concrete implementation of TrustGraphStore operations,
//! including edge management, seed trust, and EigenTrust computation.
use crate::error::Result;
use crate::key_codec;
use crate::traits::KVStore;
use async_trait::async_trait;
use tracing::{debug, instrument};
use super::eigentrust::compute_eigentrust_scores;
use super::model::{EigenTrustConfig, EigenTrustState, TrustEdge};
use super::TrustGraphStore;
/// TrustGraphStore implementation backed by a generic KVStore.
///
/// This implementation stores trust edges and EigenTrust state using the
/// key patterns defined in `key_codec`.
pub struct GenericTrustGraphStore<S> {
store: S,
}
impl<S: KVStore> GenericTrustGraphStore<S> {
/// Create a new TrustGraphStore backed by the given KVStore.
pub fn new(store: S) -> Self {
Self { store }
}
/// Serialize a TrustEdge using the canonical serde helpers.
fn serialize_edge(edge: &TrustEdge) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(edge)
}
/// Deserialize a TrustEdge using the canonical serde helpers.
fn deserialize_edge(data: &[u8]) -> Result<TrustEdge> {
crate::serde_helpers::deserialize(data)
}
/// Serialize an EigenTrustState using the canonical serde helpers.
fn serialize_state(state: &EigenTrustState) -> Result<Vec<u8>> {
crate::serde_helpers::serialize(state)
}
/// Deserialize an EigenTrustState using the canonical serde helpers.
fn deserialize_state(data: &[u8]) -> Result<EigenTrustState> {
crate::serde_helpers::deserialize(data)
}
/// Extract agent ID from a seed trust key.
///
/// Key format: `\x00ET:seed:{agent_hex}`
fn extract_agent_from_seed_key(key: &[u8]) -> Option<[u8; 32]> {
let prefix = b"\x00ET:seed:";
if !key.starts_with(prefix) {
return None;
}
let hex_str = std::str::from_utf8(&key[prefix.len()..]).ok()?;
let bytes = hex::decode(hex_str).ok()?;
if bytes.len() != 32 {
return None;
}
let mut arr = [0u8; 32];
arr.copy_from_slice(&bytes);
Some(arr)
}
/// Extract (from, to) agent IDs from a trust edge key.
///
/// Key format: `\x00TG:{from_hex}:{to_hex}`
fn extract_agents_from_edge_key(key: &[u8]) -> Option<([u8; 32], [u8; 32])> {
let prefix = b"\x00TG:";
if !key.starts_with(prefix) {
return None;
}
let rest = std::str::from_utf8(&key[prefix.len()..]).ok()?;
let parts: Vec<&str> = rest.split(':').collect();
if parts.len() != 2 {
return None;
}
let from_bytes = hex::decode(parts[0]).ok()?;
let to_bytes = hex::decode(parts[1]).ok()?;
if from_bytes.len() != 32 || to_bytes.len() != 32 {
return None;
}
let mut from = [0u8; 32];
let mut to = [0u8; 32];
from.copy_from_slice(&from_bytes);
to.copy_from_slice(&to_bytes);
Some((from, to))
}
}
#[async_trait]
impl<S: KVStore + 'static> TrustGraphStore for GenericTrustGraphStore<S> {
#[instrument(skip(self, edge), fields(
from = %hex::encode(edge.from_agent),
to = %hex::encode(edge.to_agent),
weight = edge.weight
))]
async fn add_trust_edge(&self, edge: &TrustEdge) -> Result<()> {
let from_hex = hex::encode(edge.from_agent);
let to_hex = hex::encode(edge.to_agent);
// Store forward index
let forward_key = key_codec::trust_edge_key(&from_hex, &to_hex);
let serialized = Self::serialize_edge(edge)?;
self.store.put(&forward_key, &serialized).await?;
// Store reverse index (same data, different key)
let reverse_key = key_codec::trust_edge_reverse_key(&to_hex, &from_hex);
self.store.put(&reverse_key, &serialized).await?;
debug!("Added trust edge");
Ok(())
}
#[instrument(skip(self), fields(from = %hex::encode(from), to = %hex::encode(to)))]
async fn remove_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<bool> {
let from_hex = hex::encode(from);
let to_hex = hex::encode(to);
let forward_key = key_codec::trust_edge_key(&from_hex, &to_hex);
let exists = self.store.get(&forward_key).await?.is_some();
if exists {
// Delete forward index
self.store.delete(&forward_key).await?;
// Delete reverse index
let reverse_key = key_codec::trust_edge_reverse_key(&to_hex, &from_hex);
self.store.delete(&reverse_key).await?;
debug!("Removed trust edge");
}
Ok(exists)
}
#[instrument(skip(self), fields(from = %hex::encode(from), to = %hex::encode(to)))]
async fn get_trust_edge(&self, from: &[u8; 32], to: &[u8; 32]) -> Result<Option<TrustEdge>> {
let from_hex = hex::encode(from);
let to_hex = hex::encode(to);
let key = key_codec::trust_edge_key(&from_hex, &to_hex);
match self.store.get(&key).await? {
Some(data) => {
let edge = Self::deserialize_edge(&data)?;
Ok(Some(edge))
}
None => Ok(None),
}
}
#[instrument(skip(self), fields(from = %hex::encode(from)))]
async fn get_trusts(&self, from: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>> {
let from_hex = hex::encode(from);
let prefix = key_codec::trust_edge_from_prefix(&from_hex);
let entries = self.store.scan_prefix(&prefix).await?;
let mut trusts = Vec::with_capacity(entries.len());
for (_, data) in entries {
let edge = Self::deserialize_edge(&data)?;
trusts.push((edge.to_agent, edge.weight));
}
debug!(count = trusts.len(), "Retrieved outgoing trust edges");
Ok(trusts)
}
#[instrument(skip(self), fields(to = %hex::encode(to)))]
async fn get_trusted_by(&self, to: &[u8; 32]) -> Result<Vec<([u8; 32], f32)>> {
let to_hex = hex::encode(to);
let prefix = key_codec::trust_edge_reverse_prefix(&to_hex);
let entries = self.store.scan_prefix(&prefix).await?;
let mut trusted_by = Vec::with_capacity(entries.len());
for (_, data) in entries {
let edge = Self::deserialize_edge(&data)?;
trusted_by.push((edge.from_agent, edge.weight));
}
debug!(count = trusted_by.len(), "Retrieved incoming trust edges");
Ok(trusted_by)
}
#[instrument(skip(self))]
async fn get_all_edges(&self) -> Result<Vec<TrustEdge>> {
let prefix = key_codec::trust_graph_scan_prefix();
let entries = self.store.scan_prefix(&prefix).await?;
let mut edges = Vec::with_capacity(entries.len());
for (key, data) in entries {
// Only process forward index keys (skip reverse index)
if Self::extract_agents_from_edge_key(&key).is_some() {
let edge = Self::deserialize_edge(&data)?;
edges.push(edge);
}
}
debug!(count = edges.len(), "Retrieved all trust edges");
Ok(edges)
}
#[instrument(skip(self), fields(agent = %hex::encode(agent), trust))]
async fn set_seed_trust(&self, agent: &[u8; 32], trust: f32) -> Result<()> {
let agent_hex = hex::encode(agent);
let key = key_codec::seed_trust_key(&agent_hex);
// Store as f32 bytes
let clamped = trust.clamp(0.0, 1.0);
let bytes = clamped.to_le_bytes();
self.store.put(&key, &bytes).await?;
debug!("Set seed trust");
Ok(())
}
#[instrument(skip(self), fields(agent = %hex::encode(agent)))]
async fn get_seed_trust(&self, agent: &[u8; 32]) -> Result<f32> {
let agent_hex = hex::encode(agent);
let key = key_codec::seed_trust_key(&agent_hex);
match self.store.get(&key).await? {
Some(data) if data.len() == 4 => {
let bytes: [u8; 4] = data[..4].try_into().map_err(|_| {
crate::error::StorageError::Serialization("Invalid f32 bytes".to_string())
})?;
Ok(f32::from_le_bytes(bytes))
}
_ => Ok(0.0),
}
}
#[instrument(skip(self))]
async fn get_all_seed_trust(&self) -> Result<Vec<([u8; 32], f32)>> {
let prefix = key_codec::seed_trust_scan_prefix();
let entries = self.store.scan_prefix(&prefix).await?;
let mut seeds = Vec::with_capacity(entries.len());
for (key, data) in entries {
if let Some(agent) = Self::extract_agent_from_seed_key(&key) {
if data.len() == 4 {
let bytes: [u8; 4] = data[..4].try_into().map_err(|_| {
crate::error::StorageError::Serialization("Invalid f32 bytes".to_string())
})?;
let trust = f32::from_le_bytes(bytes);
seeds.push((agent, trust));
}
}
}
debug!(count = seeds.len(), "Retrieved all seed trust entries");
Ok(seeds)
}
#[instrument(skip(self), fields(agent = %hex::encode(agent)))]
async fn remove_seed_trust(&self, agent: &[u8; 32]) -> Result<bool> {
let agent_hex = hex::encode(agent);
let key = key_codec::seed_trust_key(&agent_hex);
let exists = self.store.get(&key).await?.is_some();
if exists {
self.store.delete(&key).await?;
debug!("Removed seed trust");
}
Ok(exists)
}
#[instrument(skip(self, config))]
async fn compute_eigentrust(&self, config: &EigenTrustConfig) -> Result<EigenTrustState> {
// Get all edges and seed trust
let edges = self.get_all_edges().await?;
let seeds = self.get_all_seed_trust().await?;
// Get current timestamp
let timestamp = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0);
// Run EigenTrust algorithm
let result = compute_eigentrust_scores(&edges, &seeds, config, timestamp);
// Store the computed state
let state_key = key_codec::eigentrust_state_key();
let serialized = Self::serialize_state(&result.state)?;
self.store.put(&state_key, &serialized).await?;
debug!(
converged = result.converged,
iterations = result.state.iterations,
agents = result.state.scores.len(),
"Computed and stored EigenTrust state"
);
Ok(result.state)
}
#[instrument(skip(self))]
async fn get_eigentrust_state(&self) -> Result<Option<EigenTrustState>> {
let key = key_codec::eigentrust_state_key();
match self.store.get(&key).await? {
Some(data) => {
let state = Self::deserialize_state(&data)?;
Ok(Some(state))
}
None => Ok(None),
}
}
#[instrument(skip(self), fields(agent = %hex::encode(agent)))]
async fn get_eigentrust_score(&self, agent: &[u8; 32]) -> Result<f32> {
match self.get_eigentrust_state().await? {
Some(state) => Ok(state.get_score(agent)),
None => Ok(0.0),
}
}
}

View File

@ -0,0 +1,217 @@
//! Tests for TrustGraphStore implementation.
use super::model::{EigenTrustConfig, TrustEdge};
use super::store_impl::GenericTrustGraphStore;
use super::TrustGraphStore;
use crate::HybridStore;
use std::sync::Arc;
fn agent(id: u8) -> [u8; 32] {
let mut arr = [0u8; 32];
arr[0] = id;
arr
}
#[tokio::test]
async fn test_add_and_get_trust_edge() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
let edge = TrustEdge::new(agent(1), agent(2), 0.8, 1000, Some("Test".to_string()));
trust_store.add_trust_edge(&edge).await.expect("add");
let retrieved = trust_store.get_trust_edge(&agent(1), &agent(2)).await.expect("get");
assert!(retrieved.is_some());
let retrieved_edge = retrieved.expect("edge");
assert_eq!(retrieved_edge.from_agent, agent(1));
assert_eq!(retrieved_edge.to_agent, agent(2));
assert!((retrieved_edge.weight - 0.8).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_remove_trust_edge() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
let edge = TrustEdge::new(agent(1), agent(2), 0.8, 1000, None);
trust_store.add_trust_edge(&edge).await.expect("add");
// Remove should return true
let removed = trust_store.remove_trust_edge(&agent(1), &agent(2)).await.expect("remove");
assert!(removed);
// Edge should be gone
let retrieved = trust_store.get_trust_edge(&agent(1), &agent(2)).await.expect("get");
assert!(retrieved.is_none());
// Remove again should return false
let removed_again = trust_store.remove_trust_edge(&agent(1), &agent(2)).await.expect("remove");
assert!(!removed_again);
}
#[tokio::test]
async fn test_get_trusts_outgoing() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Agent 1 trusts agents 2, 3, 4
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(2), 0.8, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(3), 0.6, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(4), 0.4, 1000, None))
.await
.expect("add");
let trusts = trust_store.get_trusts(&agent(1)).await.expect("get");
assert_eq!(trusts.len(), 3);
// Verify weights
let weight_2 = trusts.iter().find(|(a, _)| *a == agent(2)).map(|(_, w)| *w);
let weight_3 = trusts.iter().find(|(a, _)| *a == agent(3)).map(|(_, w)| *w);
let weight_4 = trusts.iter().find(|(a, _)| *a == agent(4)).map(|(_, w)| *w);
assert!((weight_2.expect("weight") - 0.8).abs() < f32::EPSILON);
assert!((weight_3.expect("weight") - 0.6).abs() < f32::EPSILON);
assert!((weight_4.expect("weight") - 0.4).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_get_trusted_by_incoming() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Agents 1, 2, 3 all trust agent 4
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(4), 0.8, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(2), agent(4), 0.6, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(3), agent(4), 0.4, 1000, None))
.await
.expect("add");
let trusted_by = trust_store.get_trusted_by(&agent(4)).await.expect("get");
assert_eq!(trusted_by.len(), 3);
}
#[tokio::test]
async fn test_seed_trust_crud() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Set seed trust
trust_store.set_seed_trust(&agent(1), 1.0).await.expect("set");
trust_store.set_seed_trust(&agent(2), 0.5).await.expect("set");
// Get individual
let trust1 = trust_store.get_seed_trust(&agent(1)).await.expect("get");
let trust2 = trust_store.get_seed_trust(&agent(2)).await.expect("get");
let trust3 = trust_store.get_seed_trust(&agent(3)).await.expect("get");
assert!((trust1 - 1.0).abs() < f32::EPSILON);
assert!((trust2 - 0.5).abs() < f32::EPSILON);
assert!((trust3 - 0.0).abs() < f32::EPSILON); // Not set
// Get all
let all_seeds = trust_store.get_all_seed_trust().await.expect("get all");
assert_eq!(all_seeds.len(), 2);
// Remove
let removed = trust_store.remove_seed_trust(&agent(1)).await.expect("remove");
assert!(removed);
let all_seeds = trust_store.get_all_seed_trust().await.expect("get all");
assert_eq!(all_seeds.len(), 1);
}
#[tokio::test]
async fn test_compute_and_get_eigentrust() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Set up: seed trusts agent 2, agent 2 trusts agent 3
trust_store.set_seed_trust(&agent(1), 1.0).await.expect("set");
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(2), 1.0, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(2), agent(3), 1.0, 1000, None))
.await
.expect("add");
// Compute EigenTrust
let state =
trust_store.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
assert!(state.scores.len() >= 3);
assert!(state.iterations > 0);
// Get state
let retrieved = trust_store.get_eigentrust_state().await.expect("get");
assert!(retrieved.is_some());
// Get individual score
let score = trust_store.get_eigentrust_score(&agent(1)).await.expect("get");
assert!(score > 0.0);
}
#[tokio::test]
async fn test_eigentrust_score_without_computation() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Without computing, should return 0.0
let score = trust_store.get_eigentrust_score(&agent(1)).await.expect("get");
assert!((score - 0.0).abs() < f32::EPSILON);
}
#[tokio::test]
async fn test_sybil_resistance_isolated_ring() {
let store = Arc::new(HybridStore::open_temp().expect("Failed to create store"));
let trust_store = GenericTrustGraphStore::new(store);
// Seed with no edges
trust_store.set_seed_trust(&agent(0), 1.0).await.expect("set");
// Isolated ring: 1 → 2 → 3 → 1 (not connected to seed)
trust_store
.add_trust_edge(&TrustEdge::new(agent(1), agent(2), 1.0, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(2), agent(3), 1.0, 1000, None))
.await
.expect("add");
trust_store
.add_trust_edge(&TrustEdge::new(agent(3), agent(1), 1.0, 1000, None))
.await
.expect("add");
let state =
trust_store.compute_eigentrust(&EigenTrustConfig::default()).await.expect("compute");
// Seed should have high trust
let seed_score = state.get_score(&agent(0));
assert!(seed_score > 0.9, "Seed should have high trust: {}", seed_score);
// Isolated ring should have near-zero trust
let ring1_score = state.get_score(&agent(1));
let ring2_score = state.get_score(&agent(2));
let ring3_score = state.get_score(&agent(3));
assert!(ring1_score < 0.05, "Isolated agent 1 should have low trust: {}", ring1_score);
assert!(ring2_score < 0.05, "Isolated agent 2 should have low trust: {}", ring2_score);
assert!(ring3_score < 0.05, "Isolated agent 3 should have low trust: {}", ring3_score);
}

View File

@ -32,6 +32,7 @@ pub use store_impl::*;
use crate::error::Result; use crate::error::Result;
use async_trait::async_trait; use async_trait::async_trait;
use std::sync::Arc;
/// Specialized storage trait for TrustRank operations. /// Specialized storage trait for TrustRank operations.
/// ///
@ -148,3 +149,54 @@ pub trait TrustRankStore: Send + Sync {
timestamp: u64, timestamp: u64,
) -> Result<model::TrustAdjustment>; ) -> Result<model::TrustAdjustment>;
} }
// Blanket implementation for Arc<T> where T: TrustRankStore
// This enables sharing TrustRankStore across threads and components.
#[async_trait]
impl<T: TrustRankStore + ?Sized> TrustRankStore for Arc<T> {
async fn get_trust_rank(&self, agent_id: &[u8; 32]) -> Result<TrustRank> {
(**self).get_trust_rank(agent_id).await
}
async fn update_trust_rank(
&self,
agent_id: &[u8; 32],
delta: f32,
timestamp: u64,
) -> Result<f32> {
(**self).update_trust_rank(agent_id, delta, timestamp).await
}
async fn decay_trust_ranks(
&self,
current_timestamp: u64,
half_life_seconds: Option<u64>,
) -> Result<usize> {
(**self).decay_trust_ranks(current_timestamp, half_life_seconds).await
}
async fn record_outcome(
&self,
agent_id: &[u8; 32],
was_accurate: bool,
timestamp: u64,
) -> Result<f32> {
(**self).record_outcome(agent_id, was_accurate, timestamp).await
}
async fn put_trust_rank(&self, trust_rank: &TrustRank) -> Result<()> {
(**self).put_trust_rank(trust_rank).await
}
async fn verify_agent_against_gold_standard(
&self,
agent_id: &[u8; 32],
agent_object: &str,
gold_standard: &stemedb_core::types::GoldStandard,
timestamp: u64,
) -> Result<model::TrustAdjustment> {
(**self)
.verify_agent_against_gold_standard(agent_id, agent_object, gold_standard, timestamp)
.await
}
}

View File

@ -37,6 +37,8 @@ async-trait = "0.1"
# Utilities # Utilities
hex = "0.4" hex = "0.4"
blake3 = "1.5" blake3 = "1.5"
parking_lot = "0.12"
[dev-dependencies] [dev-dependencies]
tempfile = "3.10" tempfile = "3.10"
stemedb-lens = { path = "../stemedb-lens" }

Some files were not shown because too many files have changed in this diff Show More