Reformats import blocks, function signatures, and expression line wrapping
in stemedb-api handlers, stemedb-core serde/source_record, and serde_helpers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint,
security hardening, signed assertions, source registry, dashboard enhancements)
and fixed all test failures across the full workspace (2656/2656 passing).
Key fixes:
- fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node
- DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock
- Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges()
- 6 previously-hanging SWIM tests now pass in <2s
- fix(sim): replace background-task+polling ingestion with synchronous process_pending()
- smoke_high_volume_simulation was CPU-starved under 2656 parallel tests
- Removed ingestor.start() + wait_until_ingested() pattern throughout sim
- All arena functions now call ingestor.process_pending() directly (deterministic)
- fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2)
- fix(test): quota test signed "test" but v1 requires "subject:predicate" format
- fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex
- fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change)
- config: add nextest.toml with slow-timeout for background-task-tests group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `content: Option<String>` to SourceRecord with rkyv schema evolution
(LegacySourceRecord compat deserializer for backward compatibility)
- Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation
- Strip content from list responses, include in single-source GET
- Update Go SDK RegisterSourceRequest with Content field
- FCM pipeline extracts PDF text via pdftotext and passes to registration
- Dashboard impact panel fetches and displays source content with expand/collapse
- Add feed endpoint, dashboard feed panel, and signed assertion support
- Update data-structures.md, API docs, and storage docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add /v1/feed API endpoint with handler and tests
- Remove health endpoint rate limiting (behind firewall, caused spurious 429s)
- Add dashboard feed panel with list, row, empty state, and loading skeleton
- Update home page to show feed instead of redirecting to skeptic
- Improve API key auth middleware and DTO create/query params
- Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples
- Add FindMyHealth application scaffolding (vision, architecture, prototypes)
- Add FindMyHealth designer/writer and Aphoria founder-CEO agents
- Update roadmap with current progress
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claims now flow through StemeDB's append-only knowledge graph instead of
mutable TOML files. This resolves all 6 critical claim-bypass code paths:
- Bridge: lossless AuthoredClaim ↔ Assertion round-trip (comparison, status, lifecycle mapping)
- LocalEpisteme: ingest_authored_claim() and fetch_authored_claims() with AUTHORED_CLAIM predicate index
- EpistemeClaimStore: ClaimStore trait backed by StemeDB (append-only delete via deprecation)
- CLI handlers: all claim commands read/write through StemeDB
- Scanner: loads claims from StemeDB with auto-migration fallback to TOML
- Export: new `aphoria claims export` serializes StemeDB claims to TOML/JSON
Also cleans up dead code (EpistemeConfig.url), renames ingest_claims→ingest_observations,
fixes ClaimFilter.authority_tier type, adds Draft variant to ClaimStatus, and fixes
pre-existing clippy warnings (too_many_arguments, filter_next→rfind).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Declarative extractors in separate .toml files under .aphoria/extractors/ were
silently ignored because config loading only parsed the main config.toml. Now
from_file() scans the extractors directory after loading the main config and
merges any [[extractors.declarative]] definitions found in .toml files. Invalid
files produce warnings but don't fail the load. Also includes show_observations
field additions to scan args and removes unused import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit implements comprehensive production hardening across multiple
layers to prepare StemeDB for enterprise pilot deployments:
## API Layer
- Add rate limiting middleware with configurable limits per endpoint
- Enhance error handling with detailed context and proper HTTP status codes
- Add security hardening tests for input validation and boundary conditions
- Create store_helpers module for defensive storage access patterns
## Storage & WAL
- Optimize group commit batching for higher throughput
- Add defensive error handling in hybrid backend with proper fallbacks
- Enhance WAL journal durability guarantees with fsync validation
- Improve index store query performance with better caching
## Operations & Deployment
- Add comprehensive operations documentation (deployment, monitoring, DR)
- Create systemd units for backup, WAL archival, and verification
- Add monitoring configs (Prometheus alerts, metrics exporters)
- Implement backup/restore scripts with verification and S3 archival
- Add DR drill automation and runbook procedures
- Create load balancer configs (nginx, envoy) with health checks
## Documentation
- Update CLAUDE.md with operations and troubleshooting guides
- Expand roadmap with production readiness milestones
- Add pilot success criteria and deployment reference architecture
- Document TLS setup, monitoring integration, and incident response
## Configuration
- Add .env.example with all required environment variables
- Document resource sizing for different deployment scales
- Add configuration examples for various deployment topologies
This positions StemeDB for successful enterprise pilots with proper
operational discipline, monitoring, backup/DR, and security hardening.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:
1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
bracket notation (?sources[]=value) used by the dashboard
Result: 0/22 CLI-created items were queryable.
## Solution
### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results
### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior
## Verification
- ✅ All 22 CLI-created community items now queryable (was 0)
- ✅ Source filtering works: community (22), RFC (2), vendor (5)
- ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
- ✅ All 89 API tests pass + 3 new extractor tests
- ✅ Clippy clean (0 warnings)
- ✅ No regressions in existing functionality
## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module
Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 17 of the Aphoria roadmap, adding:
**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)
**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)
**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns
**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion
**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries
**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion
Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed 3 bugs in Aphoria's claim verification engine that were causing
false positives in Maxwell validation testing:
**Bug 1: Path matching + predicate filtering**
- Added predicate filtering to prevent cross-predicate matches
- Added path prefix matching to respect crate boundaries
- Prevents core/imports/serde from matching hypervisor/vsock/imports/serde
**Bug 2: Value-specific absent checks**
- Absent mode now checks for specific forbidden value, not any observation
- Example: "Clone absent" + "Debug present" = PASS (not CONFLICT)
- Only conflicts when the exact forbidden value is found
**Bug 3: Wildcard pattern support**
- Wildcard patterns like message/*/derives now match multiple paths
- Enhanced wildcard_matches() to support prefix/*/suffix patterns
- Correctly strips full scheme+language from observation paths
**Test coverage:**
- All 39 existing tests passing
- 3 new tests added for bug fixes
- 2 tests updated to use correct predicates
- Zero clippy warnings
**Maxwell validation:**
- maxwell-core-no-serde-001: CONFLICT → PASS (respects path boundaries)
- maxwell-singleton-no-clone-001: CONFLICT → PASS (value-specific absent)
- 5 claims now correctly show as MISSING (expose predicate mismatches)
The fixes successfully eliminate false positives while exposing pre-existing
issues where claims used incorrect predicates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements the --show-claims feature requested by users who need to verify
extractors are working correctly and debug false negatives.
Changes:
- Add `claims: Option<Vec<ExtractedClaim>>` field to ScanResult
- Add `--show-claims` CLI flag to scan command
- Add `show_claims: bool` parameter to ScanArgs
- Populate claims in scanner when flag is set (sorted by file, then line)
- Display claims in all output formats:
* Table: New "Extracted Claims" section with concept/value/file/line/confidence
* JSON: Top-level `claims` array with full claim details
* Markdown: "## Extracted Claims" section with table
* SARIF: Informational-level results (level: "note") for IDE integration
User outcome:
- `aphoria scan . --show-claims` displays all claims (not just conflicts)
- Users can verify extractors detected their code patterns
- Users can debug false negatives by seeing what WAS extracted
- Builds trust through transparency
Quality:
- Zero breaking changes (opt-in flag, backward compatible)
- All tests passing (943 passed)
- Clippy clean (no warnings)
- Manual testing verified all 4 output formats
Addresses user feedback from /home/jml/Workspace/maxwell/.aphoria/.notes-for-aphoria-team
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add scans panel with finding details, verdict badges, and filters
- Add corpus panel for managing knowledge sources
- Add scan cache for API state management
- Update sidebar navigation with new routes
- Extend API types for scans and corpus endpoints
- Add .aphoria/ to gitignore (contains project keys)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add benchmark: false to ScanArgs in stemedb-api handler
- Change test float from 3.14 to 7.25 to avoid clippy approx_constant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enterprise Features:
- Hosted mode with remote sync for team pattern aggregation
- Community sharing with privacy-preserving anonymization
- LLM-based semantic claim extraction with Gemini integration
- Pattern learning with promotion to declarative extractors
- High-entropy secrets extractor with configurable thresholds
- Auth bypass and insecure cookies extractors
Module Refactoring:
- Split oversized files to comply with 500-line limit
- Config split: types/core.rs, types/extractors.rs, types/hosted.rs, etc.
- Handlers split: scan.rs, policy.rs, report.rs modules
- Extractors split: declarative/, high_entropy_secrets/, insecure_cookies/
- Learning split: store modules with metrics and persistence
SDK & Ontology:
- stemedb-ontology SDK with fluent builders and StemeDB client
- Pharma domain extractors for FDA Orange Book data
- Consumer health UAT test infrastructure
Code Quality:
- Fixed clippy warnings (needless_borrows_for_generic_args)
- Added KVStore trait imports where needed
- Fixed utoipa path re-exports for OpenAPI docs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation
Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
knowledge graph system
- Update community scripts for claim extraction pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PolicySourceStore for tracking where policies come from
- Implement claim extraction skill and API endpoints
- Add community UI text selection extractor component
- Create Go SDK aphoria client for policy operations
- Document patent specifications and legal disclosures
- Add guides: golden path loop, policy audit trails, pre-flight checks
- Expand Unreal Engine config extractor with source tracking
- Add UAT reports for policy source tracking validation
- Refactor tests.rs into modular test files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix super:: imports in tests.rs which is included via #[path] directive.
When using #[path = "tests.rs"], super refers to the module containing
the directive (store_impl), not the parent module.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching
Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add stemedb-cluster crate implementing horizontal scaling:
- SWIM-based membership protocol for node discovery and failure detection
- Consistent hashing (jump hash) for subject-to-shard routing
- Range management with dynamic split (>64MB) and merge (<20MB) operations
- Stateless HTTP gateway for client request routing via axum
- Meta-range gossip merge for cluster-wide metadata propagation
Includes restrictive CORS policy, proper error propagation from routing,
replica cache invalidation on node failure, and 84 tests (57 unit + 27
integration). Raft MV coordination deferred per design decision.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Hybrid Logical Clock (HLC) for causality tracking across nodes
- Implement Merkle tree for efficient diff/sync with BLAKE3 hashing
- Add CRDT-aware stores for assertions and votes with vector clocks
- Create stemedb-sync crate with anti-entropy and gossip protocols
- Add stemedb-rpc crate with gRPC sync service (proto definitions)
- Implement SupersessionChain for tracking assertion lifecycles
- Add Aphoria application for code analysis/reporting
- Add battery11 replication test scaffolding
- Fix .gitignore to exclude nested target directories
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add CRC32C checksums to WAL record format (v2), implement crash recovery
with automatic truncation of corrupt records, add feature-gated group commit
buffer for batched fsync under concurrent load, and implement log rotation
via segment files with global offset addressing.
Key changes:
- Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N]
- recover_file() scans and truncates corrupt tail records
- GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate)
- SegmentManager with binary search resolution and cursor-based cleanup
- Journal::read() auto-refreshes segments on miss for writer/reader split
- Split recovery.rs and key_codec.rs into directory modules for 500-line max
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Layered() method to Go SDK for per-source-class consensus queries
- Add LayeredQueryParams, LayeredResult, TierResolution types to Go SDK
- Create conflict example demonstrating Skeptic and Layered endpoints
- Update quickstart.md with sections 6 (conflict detection) and 7 (authority tiers)
- Remove tracked Go binary and add data/ to .gitignore
The new quickstart sections demonstrate Episteme's differentiating features:
- Skeptic endpoint shows "Trust but Verify" conflict analysis
- Layered endpoint shows per-tier resolution (Clinical vs Anecdotal)
Note: Pre-existing large files flagged by pre-commit hook (technical debt from prior sessions)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>