Declarative extractors in separate .toml files under .aphoria/extractors/ were
silently ignored because config loading only parsed the main config.toml. Now
from_file() scans the extractors directory after loading the main config and
merges any [[extractors.declarative]] definitions found in .toml files. Invalid
files produce warnings but don't fail the load. Also includes show_observations
field additions to scan args and removes unused import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit implements comprehensive production hardening across multiple
layers to prepare StemeDB for enterprise pilot deployments:
## API Layer
- Add rate limiting middleware with configurable limits per endpoint
- Enhance error handling with detailed context and proper HTTP status codes
- Add security hardening tests for input validation and boundary conditions
- Create store_helpers module for defensive storage access patterns
## Storage & WAL
- Optimize group commit batching for higher throughput
- Add defensive error handling in hybrid backend with proper fallbacks
- Enhance WAL journal durability guarantees with fsync validation
- Improve index store query performance with better caching
## Operations & Deployment
- Add comprehensive operations documentation (deployment, monitoring, DR)
- Create systemd units for backup, WAL archival, and verification
- Add monitoring configs (Prometheus alerts, metrics exporters)
- Implement backup/restore scripts with verification and S3 archival
- Add DR drill automation and runbook procedures
- Create load balancer configs (nginx, envoy) with health checks
## Documentation
- Update CLAUDE.md with operations and troubleshooting guides
- Expand roadmap with production readiness milestones
- Add pilot success criteria and deployment reference architecture
- Document TLS setup, monitoring integration, and incident response
## Configuration
- Add .env.example with all required environment variables
- Document resource sizing for different deployment scales
- Add configuration examples for various deployment topologies
This positions StemeDB for successful enterprise pilots with proper
operational discipline, monitoring, backup/DR, and security hardening.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Priority 1 (Critical): Database files removed from git tracking
- Added **/.aphoria/db/ and **/.aphoria/wal/ to .gitignore
- Removed 7 database files from dogfood/dbpool/.aphoria/db/
- Database files are runtime state (like target/), not source code
- Prevents repository bloat and incorrect content type in git
Priority 2 (Housekeeping): Dated documentation archived
- Created archive/ structure with fixes/ and deprecated/ subdirectories
- Moved SYSTEMATIC-FIXES-2026-02-10.md to archive/fixes/
- Moved SYSTEMATIC-FIXES-COMPLETE.md to archive/fixes/
- Moved PROJECT2-QUICKSTART-DEPRECATED.md to archive/deprecated/
- Moved PROJECT2-READY.md to archive/deprecated/
- Moved verify-project2-ready.sh to archive/deprecated/
- Created archive/README.md documenting archival policy
These files are preserved for historical reference but no longer clutter
the main dogfood directory. See archive/README.md for details.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add documentation for:
- --template flag (generate example TOML)
- --validate-only flag (check without importing)
- --format flag (table|json output)
- Validation details (what gets checked)
- Link to comprehensive bulk import guide
All examples tested and working.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Problem
CLI-created community corpus items (tier 3) were stored correctly but
invisible via API queries. Two issues blocked discoverability:
1. **Prefix mismatch**: API hardcoded 'community://pattern/' for
aggregated patterns, but CLI creates 'community://rust/http/...' URIs
2. **Query parameter parsing**: Axum's default parser doesn't support
bracket notation (?sources[]=value) used by the dashboard
Result: 0/22 CLI-created items were queryable.
## Solution
### Fix 1: Broaden Community Prefix
- Changed: 'community://pattern/' → 'community://' in corpus handler
- Impact: Now matches both aggregated patterns AND CLI-created items
- Backward compatible: Broader prefix includes narrower results
### Fix 2: Add QsQuery Extractor
- Added: serde_qs dependency + custom QsQuery extractor
- Supports: Bracket notation for array parameters (?sources[]=a&sources[]=b)
- Compatible: Works with JavaScript URLSearchParams standard
- Tested: 3 new unit tests for extractor behavior
## Verification
- ✅ All 22 CLI-created community items now queryable (was 0)
- ✅ Source filtering works: community (22), RFC (2), vendor (5)
- ✅ Multi-source queries work: ?sources[]=community&sources[]=rfc → 24
- ✅ All 89 API tests pass + 3 new extractor tests
- ✅ Clippy clean (0 warnings)
- ✅ No regressions in existing functionality
## Files Changed
- crates/stemedb-api/Cargo.toml: Add serde_qs dependency
- crates/stemedb-api/src/extractors.rs: New QsQuery extractor (117 lines)
- crates/stemedb-api/src/handlers/aphoria/corpus.rs: Use QsQuery, broaden prefix
- crates/stemedb-api/src/lib.rs: Export extractors module
Also includes: Scale-adaptive thresholds, wiki corpus extraction,
documentation updates, and dashboard UI improvements from prior work.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 17 of the Aphoria roadmap, adding:
**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)
**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)
**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns
**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion
**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries
**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion
Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Git Commit Tracking**
- Automatically capture git commit hash when claims/observations are ingested
- Store in assertion metadata for temporal context and audit trails
- Graceful degradation in non-git environments
- Solves double-commit problem by capturing hash at ingestion time
**Implementation**
- walker/git.rs: get_current_commit_hash() utility function
- bridge.rs: Accept optional git_commit parameter in all conversion functions
- episteme/local: Store project_root, capture git hash during ingestion
- 5 new tests for git hash tracking + metadata validation
- All 1162 aphoria tests passing
**Documentation Overhaul**
- README: Added Observations vs Claims distinction, git tracking, dashboard
- CLI Reference: New sections for git integration and ignore/exclusion system
- Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods
- Enhanced verification engine docs with matching capabilities
- DOCUMENTATION_UPDATES.md: Complete audit summary
**Dashboard Separation**
- Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard
- Clean separation of concerns: StemeDB for core, Aphoria for security
- Added dashboard documentation and setup guides
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed 3 bugs in Aphoria's claim verification engine that were causing
false positives in Maxwell validation testing:
**Bug 1: Path matching + predicate filtering**
- Added predicate filtering to prevent cross-predicate matches
- Added path prefix matching to respect crate boundaries
- Prevents core/imports/serde from matching hypervisor/vsock/imports/serde
**Bug 2: Value-specific absent checks**
- Absent mode now checks for specific forbidden value, not any observation
- Example: "Clone absent" + "Debug present" = PASS (not CONFLICT)
- Only conflicts when the exact forbidden value is found
**Bug 3: Wildcard pattern support**
- Wildcard patterns like message/*/derives now match multiple paths
- Enhanced wildcard_matches() to support prefix/*/suffix patterns
- Correctly strips full scheme+language from observation paths
**Test coverage:**
- All 39 existing tests passing
- 3 new tests added for bug fixes
- 2 tests updated to use correct predicates
- Zero clippy warnings
**Maxwell validation:**
- maxwell-core-no-serde-001: CONFLICT → PASS (respects path boundaries)
- maxwell-singleton-no-clone-001: CONFLICT → PASS (value-specific absent)
- 5 claims now correctly show as MISSING (expose predicate mismatches)
The fixes successfully eliminate false positives while exposing pre-existing
issues where claims used incorrect predicates.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add [profile.test] with opt-level=1 and debug=0 for faster compile/link
- Add [profile.test.build-override] with opt-level=3 for proc-macros
- Add tiered test targets: test-fast (single crate), test-lib (unit tests)
- Add install-nextest target for parallel test runner
- Update CLAUDE.md with new test command options
- Add CRATE variable guard to test-fast for helpful error messages
Expected improvement: ~50% faster incremental test builds
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the --show-claims feature requested by users who need to verify
extractors are working correctly and debug false negatives.
Changes:
- Add `claims: Option<Vec<ExtractedClaim>>` field to ScanResult
- Add `--show-claims` CLI flag to scan command
- Add `show_claims: bool` parameter to ScanArgs
- Populate claims in scanner when flag is set (sorted by file, then line)
- Display claims in all output formats:
* Table: New "Extracted Claims" section with concept/value/file/line/confidence
* JSON: Top-level `claims` array with full claim details
* Markdown: "## Extracted Claims" section with table
* SARIF: Informational-level results (level: "note") for IDE integration
User outcome:
- `aphoria scan . --show-claims` displays all claims (not just conflicts)
- Users can verify extractors detected their code patterns
- Users can debug false negatives by seeing what WAS extracted
- Builds trust through transparency
Quality:
- Zero breaking changes (opt-in flag, backward compatible)
- All tests passing (943 passed)
- Clippy clean (no warnings)
- Manual testing verified all 4 output formats
Addresses user feedback from /home/jml/Workspace/maxwell/.aphoria/.notes-for-aphoria-team
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add scans panel with finding details, verdict badges, and filters
- Add corpus panel for managing knowledge sources
- Add scan cache for API state management
- Update sidebar navigation with new routes
- Extend API types for scans and corpus endpoints
- Add .aphoria/ to gitignore (contains project keys)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When installation encounters bugs or unexpected behavior, the skill now:
- Creates notes in ~/.aphoria/notes/{date}-{issue}.md
- Documents environment, steps to reproduce, errors, workarounds
- Checks for existing notes before starting new installs
- Includes note format template with tags for categorization
This creates a feedback loop for improving installation experience
based on real-world issues.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Creates skill for installing and running StemeDB/Aphoria:
- Three installation tiers: Solo, Team, Enterprise
- Step-by-step installation protocol (prerequisites, build, init, verify)
- Optional StemeDB server setup for team observation aggregation
- Troubleshooting section for common issues
- Uninstall instructions
- Environment variable reference
Routing added to CLAUDE.md for discoverability.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Created solo-developer-guide.md for individual/side projects
- Created enterprise-pilot-guide.md with 7-phase pilot methodology
- Updated guides/README.md with new guide references
- Updated main README.md with guides table and time estimates
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add benchmark: false to ScanArgs in stemedb-api handler
- Change test float from 3.14 to 7.25 to avoid clippy approx_constant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix reference customer answer in amazement-demo-2 (remove placeholder)
- Add Pilot Delivery Milestones section linking demo capabilities to roadmap tasks
- Add SOC 2 Type II certification task (9C.4) with Q3 2026 target
- Add "real data not mockups" success criterion to P5.4 demo validation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use `super::*` instead of `super::tls_version::TlsVersionExtractor` since
the test module is included via #[path] inside tls_version.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enterprise Features:
- Hosted mode with remote sync for team pattern aggregation
- Community sharing with privacy-preserving anonymization
- LLM-based semantic claim extraction with Gemini integration
- Pattern learning with promotion to declarative extractors
- High-entropy secrets extractor with configurable thresholds
- Auth bypass and insecure cookies extractors
Module Refactoring:
- Split oversized files to comply with 500-line limit
- Config split: types/core.rs, types/extractors.rs, types/hosted.rs, etc.
- Handlers split: scan.rs, policy.rs, report.rs modules
- Extractors split: declarative/, high_entropy_secrets/, insecure_cookies/
- Learning split: store modules with metrics and persistence
SDK & Ontology:
- stemedb-ontology SDK with fluent builders and StemeDB client
- Pharma domain extractors for FDA Orange Book data
- Consumer health UAT test infrastructure
Code Quality:
- Fixed clippy warnings (needless_borrows_for_generic_args)
- Added KVStore trait imports where needed
- Fixed utoipa path re-exports for OpenAPI docs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation
Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
knowledge graph system
- Update community scripts for claim extraction pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use std::slice::from_ref instead of &[x.clone()]
- Avoid approx_constant lint with explicit f64 suffix
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PolicySourceStore for tracking where policies come from
- Implement claim extraction skill and API endpoints
- Add community UI text selection extractor component
- Create Go SDK aphoria client for policy operations
- Document patent specifications and legal disclosures
- Add guides: golden path loop, policy audit trails, pre-flight checks
- Expand Unreal Engine config extractor with source tracking
- Add UAT reports for policy source tracking validation
- Refactor tests.rs into modular test files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix super:: imports in tests.rs which is included via #[path] directive.
When using #[path = "tests.rs"], super refers to the module containing
the directive (store_impl), not the parent module.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching
Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add stemedb-cluster crate implementing horizontal scaling:
- SWIM-based membership protocol for node discovery and failure detection
- Consistent hashing (jump hash) for subject-to-shard routing
- Range management with dynamic split (>64MB) and merge (<20MB) operations
- Stateless HTTP gateway for client request routing via axum
- Meta-range gossip merge for cluster-wide metadata propagation
Includes restrictive CORS policy, proper error propagation from routing,
replica cache invalidation on node failure, and 84 tests (57 unit + 27
integration). Raft MV coordination deferred per design decision.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Hybrid Logical Clock (HLC) for causality tracking across nodes
- Implement Merkle tree for efficient diff/sync with BLAKE3 hashing
- Add CRDT-aware stores for assertions and votes with vector clocks
- Create stemedb-sync crate with anti-entropy and gossip protocols
- Add stemedb-rpc crate with gRPC sync service (proto definitions)
- Implement SupersessionChain for tracking assertion lifecycles
- Add Aphoria application for code analysis/reporting
- Add battery11 replication test scaffolding
- Fix .gitignore to exclude nested target directories
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>