Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint,
security hardening, signed assertions, source registry, dashboard enhancements)
and fixed all test failures across the full workspace (2656/2656 passing).
Key fixes:
- fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node
- DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock
- Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges()
- 6 previously-hanging SWIM tests now pass in <2s
- fix(sim): replace background-task+polling ingestion with synchronous process_pending()
- smoke_high_volume_simulation was CPU-starved under 2656 parallel tests
- Removed ingestor.start() + wait_until_ingested() pattern throughout sim
- All arena functions now call ingestor.process_pending() directly (deterministic)
- fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2)
- fix(test): quota test signed "test" but v1 requires "subject:predicate" format
- fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex
- fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change)
- config: add nextest.toml with slow-timeout for background-task-tests group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `content: Option<String>` to SourceRecord with rkyv schema evolution
(LegacySourceRecord compat deserializer for backward compatibility)
- Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation
- Strip content from list responses, include in single-source GET
- Update Go SDK RegisterSourceRequest with Content field
- FCM pipeline extracts PDF text via pdftotext and passes to registration
- Dashboard impact panel fetches and displays source content with expand/collapse
- Add feed endpoint, dashboard feed panel, and signed assertion support
- Update data-structures.md, API docs, and storage docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add /v1/feed API endpoint with handler and tests
- Remove health endpoint rate limiting (behind firewall, caused spurious 429s)
- Add dashboard feed panel with list, row, empty state, and loading skeleton
- Update home page to show feed instead of redirecting to skeptic
- Improve API key auth middleware and DTO create/query params
- Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples
- Add FindMyHealth application scaffolding (vision, architecture, prototypes)
- Add FindMyHealth designer/writer and Aphoria founder-CEO agents
- Update roadmap with current progress
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claims now flow through StemeDB's append-only knowledge graph instead of
mutable TOML files. This resolves all 6 critical claim-bypass code paths:
- Bridge: lossless AuthoredClaim ↔ Assertion round-trip (comparison, status, lifecycle mapping)
- LocalEpisteme: ingest_authored_claim() and fetch_authored_claims() with AUTHORED_CLAIM predicate index
- EpistemeClaimStore: ClaimStore trait backed by StemeDB (append-only delete via deprecation)
- CLI handlers: all claim commands read/write through StemeDB
- Scanner: loads claims from StemeDB with auto-migration fallback to TOML
- Export: new `aphoria claims export` serializes StemeDB claims to TOML/JSON
Also cleans up dead code (EpistemeConfig.url), renames ingest_claims→ingest_observations,
fixes ClaimFilter.authority_tier type, adds Draft variant to ClaimStatus, and fixes
pre-existing clippy warnings (too_many_arguments, filter_next→rfind).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit implements comprehensive production hardening across multiple
layers to prepare StemeDB for enterprise pilot deployments:
## API Layer
- Add rate limiting middleware with configurable limits per endpoint
- Enhance error handling with detailed context and proper HTTP status codes
- Add security hardening tests for input validation and boundary conditions
- Create store_helpers module for defensive storage access patterns
## Storage & WAL
- Optimize group commit batching for higher throughput
- Add defensive error handling in hybrid backend with proper fallbacks
- Enhance WAL journal durability guarantees with fsync validation
- Improve index store query performance with better caching
## Operations & Deployment
- Add comprehensive operations documentation (deployment, monitoring, DR)
- Create systemd units for backup, WAL archival, and verification
- Add monitoring configs (Prometheus alerts, metrics exporters)
- Implement backup/restore scripts with verification and S3 archival
- Add DR drill automation and runbook procedures
- Create load balancer configs (nginx, envoy) with health checks
## Documentation
- Update CLAUDE.md with operations and troubleshooting guides
- Expand roadmap with production readiness milestones
- Add pilot success criteria and deployment reference architecture
- Document TLS setup, monitoring integration, and incident response
## Configuration
- Add .env.example with all required environment variables
- Document resource sizing for different deployment scales
- Add configuration examples for various deployment topologies
This positions StemeDB for successful enterprise pilots with proper
operational discipline, monitoring, backup/DR, and security hardening.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:
1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
bracket notation (?sources[]=value) used by the dashboard
Result: 0/22 CLI-created items were queryable.
## Solution
### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results
### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior
## Verification
- ✅ All 22 CLI-created community items now queryable (was 0)
- ✅ Source filtering works: community (22), RFC (2), vendor (5)
- ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
- ✅ All 89 API tests pass + 3 new extractor tests
- ✅ Clippy clean (0 warnings)
- ✅ No regressions in existing functionality
## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module
Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 17 of the Aphoria roadmap, adding:
**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)
**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)
**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns
**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion
**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries
**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion
Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed 3 bugs in Aphoria's claim verification engine that were causing
false positives in Maxwell validation testing:
**Bug 1: Path matching + predicate filtering**
- Added predicate filtering to prevent cross-predicate matches
- Added path prefix matching to respect crate boundaries
- Prevents core/imports/serde from matching hypervisor/vsock/imports/serde
**Bug 2: Value-specific absent checks**
- Absent mode now checks for specific forbidden value, not any observation
- Example: "Clone absent" + "Debug present" = PASS (not CONFLICT)
- Only conflicts when the exact forbidden value is found
**Bug 3: Wildcard pattern support**
- Wildcard patterns like message/*/derives now match multiple paths
- Enhanced wildcard_matches() to support prefix/*/suffix patterns
- Correctly strips full scheme+language from observation paths
**Test coverage:**
- All 39 existing tests passing
- 3 new tests added for bug fixes
- 2 tests updated to use correct predicates
- Zero clippy warnings
**Maxwell validation:**
- maxwell-core-no-serde-001: CONFLICT → PASS (respects path boundaries)
- maxwell-singleton-no-clone-001: CONFLICT → PASS (value-specific absent)
- 5 claims now correctly show as MISSING (expose predicate mismatches)
The fixes successfully eliminate false positives while exposing pre-existing
issues where claims used incorrect predicates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add scans panel with finding details, verdict badges, and filters
- Add corpus panel for managing knowledge sources
- Add scan cache for API state management
- Update sidebar navigation with new routes
- Extend API types for scans and corpus endpoints
- Add .aphoria/ to gitignore (contains project keys)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add benchmark: false to ScanArgs in stemedb-api handler
- Change test float from 3.14 to 7.25 to avoid clippy approx_constant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enterprise Features:
- Hosted mode with remote sync for team pattern aggregation
- Community sharing with privacy-preserving anonymization
- LLM-based semantic claim extraction with Gemini integration
- Pattern learning with promotion to declarative extractors
- High-entropy secrets extractor with configurable thresholds
- Auth bypass and insecure cookies extractors
Module Refactoring:
- Split oversized files to comply with 500-line limit
- Config split: types/core.rs, types/extractors.rs, types/hosted.rs, etc.
- Handlers split: scan.rs, policy.rs, report.rs modules
- Extractors split: declarative/, high_entropy_secrets/, insecure_cookies/
- Learning split: store modules with metrics and persistence
SDK & Ontology:
- stemedb-ontology SDK with fluent builders and StemeDB client
- Pharma domain extractors for FDA Orange Book data
- Consumer health UAT test infrastructure
Code Quality:
- Fixed clippy warnings (needless_borrows_for_generic_args)
- Added KVStore trait imports where needed
- Fixed utoipa path re-exports for OpenAPI docs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation
Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PolicySourceStore for tracking where policies come from
- Implement claim extraction skill and API endpoints
- Add community UI text selection extractor component
- Create Go SDK aphoria client for policy operations
- Document patent specifications and legal disclosures
- Add guides: golden path loop, policy audit trails, pre-flight checks
- Expand Unreal Engine config extractor with source tracking
- Add UAT reports for policy source tracking validation
- Refactor tests.rs into modular test files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching
Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Hybrid Logical Clock (HLC) for causality tracking across nodes
- Implement Merkle tree for efficient diff/sync with BLAKE3 hashing
- Add CRDT-aware stores for assertions and votes with vector clocks
- Create stemedb-sync crate with anti-entropy and gossip protocols
- Add stemedb-rpc crate with gRPC sync service (proto definitions)
- Implement SupersessionChain for tracking assertion lifecycles
- Add Aphoria application for code analysis/reporting
- Add battery11 replication test scaffolding
- Fix .gitignore to exclude nested target directories
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.
Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Layered() method to Go SDK for per-source-class consensus queries
- Add LayeredQueryParams, LayeredResult, TierResolution types to Go SDK
- Create conflict example demonstrating Skeptic and Layered endpoints
- Update quickstart.md with sections 6 (conflict detection) and 7 (authority tiers)
- Remove tracked Go binary and add data/ to .gitignore
The new quickstart sections demonstrate Episteme's differentiating features:
- Skeptic endpoint shows "Trust but Verify" conflict analysis
- Layered endpoint shows per-tier resolution (Clinical vs Anecdotal)
Note: Pre-existing large files flagged by pre-commit hook (technical debt from prior sessions)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>